Case Studies

“The picture is white” – on visual versus semantic analysis of fine art paintings

When applying machine learning to fine art paintings, one obvious approach is to analyse the visual content of the paintings. We discuss two major problems which caused us to take a semantic route instead: (i) state-of-the-art image analysis has been trained on photos and does not work well with paintings; (ii) visual information obtained from paintings is not sufficient for building a curatorial narrative.

Let us start by using the DenseCap online tool to automatically compute captions for a photo of two dogs playing.

The DenseCap model correctly identifies the dogs and many of their properties (e.g. “the dog is brown”, “the dog has brown eyes”, “the front legs of a dog”, “the ear|head|paw of a dog”) as well as aspects of the backgound (“a piece of grass”, “a leaf on the ground”). There are some wrong captions for the dogs (“the dogs tail is white”, but there is no tail in the picture) and for the background also (“the fence is white”). But all in all the computer vision system does a good job in what it has been trained to do: localize and describe salient regions in images in natural language [Johnson et al 2016].

Let us now apply this system to a fairly realistic dog painting from the collection of our partner museum Belvedere, Vienna.

Again many characteristics of the dog are correctly identified (“the dog is looking at the camera”, “the eye|ear of the dog”) and also “a bowl of food”. The background already provides more problems, with some confusions still comprehensible (“a white napkin”, “the curtain is white”) but others less so (“the flowers are on the wall”).

Testing the system on a more abstract painting, Belvedere’s collection highlight “The Kiss” by Gustav Klimt, yields even stranger results.

While some captions are correct (“the mans hand”, “the dress is yellow”, “the flowers are yellow|green”), others are are somewhat off (“the hat is on the mans head”) or just completely wrong (“the picture is white”, “the wall is made of bricks”, “a black door”, “a window on the building”). The essential aspect of the painting, a man and a women embracing, is not comprehended at all. Of course this is understandable since the DenseCap system has been trained on 94,000 photo images (plus 4,100,000 localized captions) but not on fine art paintings, which explains that it cannot generalize to more abstract forms of art.

On the other hand, even if an image analysis system could perfectly detect that “Caesar am Rubicon” shows a dog looking at sausage in a bowl on a table, it would still not grasp the meaning of the painting: Caesar is both the name of the dog and the historical figure who crossed the Rubicon which was a declaration of war on the Roman Senate ultimately leading to Caesar’s ascent to Roman dictator. Hence “crossing the Rubicon” is now a metaphor that means to pass a point of no return.

The same holds for Gustav Klimt’s “The Kiss”. Even if the image analysis system were not fooled by Klimt’s use of mosaic-like two-dimensional Art Nouveau organic forms and would be able to detect two lovers embracing in a kiss, it would still not grasp the significance of the decadence conveyed by the opulent exalted golden robes or the possible connotation to the tale of Orpheus and Eurydice.

The DAD project is about exploring the influence of Artificial Intelligence on the art of curating. From a curatorial perspective, grasping the semantic meaning of works of art is essential to build curatorial narratives that are not just based on a purely aesthetic procedure. See our previous blogposts [1][2] on such a semantic driven approach towards the collection of the Belvedere, where we chose to analyse text about the paintings rather than the paintings themselves.

Case Studies

Dusty visits the Glyptothek

Do you remember Dusty, the vacuum cleaner robot that explored a model version of the Glyptothek during this spring’s COVID19 related lockdown? This summer Dusty was able to experience the real Glyptothek, using its somewhat limited artificial intelligence, basically trying to avoid obstacles on its way through the maze of shelves full of plaster casts.

The Glyptothek of the Academy of Fine Arts, Vienna, is a collection
of plaster casts dating back to the late 17th century. Its main task was to serve as study material for Academy students, containing copies of a canon of world renown sculptures, ranging from plaster casts of Egyptian originals to copies of Greek and Roman, medieval, renaissance and historism statues. This collection of copies of works of art can be seen as an early analog blueprint of digital collections: the Glyptothek made the essence of European sculpture available to local audiences, who could enjoy international pieces of art without leaving their home town, very much like today‘s internet population can access digital images of the world‘s artistic heritage at the click of their handheld device.

Speaking of digital images, the above image of Dusty in the Glyptothek actually is a digital copy of an analog photograph, which in itself is an analog copy of a plaster cast which is a copy of a statue which is a copy of a real (or imagined) person …

Case Studies

Automatic colorization of classic statues?

Automatic colorization of black and white photographs has recently been enabled by advances in machine learning (see [Zhang et al 2016] for the methods we used for the following results). Basically deep neural networks are shown millions of black and white photographs and their color versions to learn a mapping from black and white to color. After successful training, it is possible to colorize black and white photos which the machine learning algorithm has never seen before. Online services like ALGORITHMIA enable everyone to test and use this technique by simply uploading their images.

Successful colorization via ALGORITHMIA of our family’s dog Ozzy

One focus for DUST AND DATA is the Glyptothek of the Academy of Fine Arts, Vienna, which is a collection of plaster casts dating back to the late 17th century. Its main task was to serve as study material for Academy students, but it also became publicly accessible as a museum. The collection contains copies of a canon of world renown sculptures, ranging from plaster casts of Egyptian originals to copies of Greek and Roman, medieval, renaissance and historism statues. Thanks to German archaeologist Vinzenz Brinkmann, it is now an established fact that classic statues from e.g. Greek antiquity were originally painted in bright colors.

This color restoration shows what a statue of a Trojan archer from the Temple of Aphaia, Aegina would have originally looked like (CC BY-SA 2.5, from Wikipedia)

As for DUST AND DATA, all we are given are black and white photos of plaster casts from the Glyptothek. These are digital photo copies of analog plaster copies of statues. Is this sufficient to obtain any kind of meaningful result when trying to automatically colorize classic statues?

Plaster cast of dying warrior from the Temple of Aphaia, Aegina (Photo: Gemäldegalerie der Akademie der bildenden Künste Wien)
Automatic colorization via ALGORITHMIA

The automatic colorization of our dying warrior did not quite succeed, but is has many interesting features nevertheless. E.g. the algorithm did correctly “understand” that the statue is held by a real person, hence the colorization to skin tones of the person’s arm and dark color of the trousers. As for the statue, it is rendered in a light brown color, probably imitating the color of statues it has seen during training of the machine learning system. But what about the arm bottom left in the picture? It has almost a lifelike skin tone. And even more astonishing, the red bloodlike colorization of the amputed arm stump!

The Glyptothek also has a copy of Michelangelo’s David, or at least its head. The colorization above does provide a skinlike pink tone for the face and even blond hair.

Applying the colorization to the full David statue gives a lifelike pinkish skin tone, at least much more so than for the dying warrior above. The fact that Michelangelo’s David is such a realistic sculpture probably made it possible to trick the algorithm into treating the photo of David as the photo of a real naked person.

So can we use machine learning to automatically colorize photos of plaster casts from the Glyptothek? This would require training an algorithm with thousands of color restorations of antique statues (see photo of the Trojan archer above) to have any chance of success. But application of state-of-the-art colorization algorithms already now provides interesting results by exposing some of the biases and failures of their machine learning machinery.

Case Studies

On a blind spot in distant reading

The accessibility of vast amounts of text in digital form has enabled humanities to add ‘distant reading’ of thousands of books by algorithms as a new research tool to its repertoire of methods. This ‘distant reading’ of course has to be complemented with traditional ‘close reading’ of individual books. See e.g. [Jänicke et al 2015] for a survey and discussion of challenges.

We are currently working on applying machine learning as a distant reading tool to the journal Österreichische Zeitschrift für Volkskunde (OEZV) of the Austrian Museum of Folk Life and Folk Art. The OEZV has been published almost continuously since 1895 and we are able to use the result of an ‘Optical Character Recognition’ (OCR) scan of the entire publication history for our analysis. One of our interests is how the discourse and corresponding topics have changed over the years of its publication.

We applied topic modelling via Latent Dirichlet Allocation (LDA) for this analysis. Topic modelling tries to model latent topics in large amounts of text. Its basic entitity are documents which are modelled as probability distributions over word occurences. The assumption is that documents contain text about different topics which manifests itself in usage of different words. These latent topics are then also modelled as probability distributions over word occurences, since different topics will use different words related to their different content. LDA then finds a probability distribution of topics across documents, trying to optimize separation of documents over topics. The assumption here is that different documents contain information about rather different (and few) topics, but of course overlaps are allowed.

Usually a document is one article in a journal or newspaper, but for OEZV we have no access to article boundaries, therefore our basic entity are individual pages (more than 34000 for all years of OEZV together). Since we are interested in topic evolution over time, we aggregated all pages per year into overall documents. Using the LDA visualization package pyLDAvis, we modelled all OEZV volumes as distributions over 30 topics.

In this visualization, every topic is represented as a sphere in the left part of the figure. Clicking on one of the spheres (no. 17 in our picture), the right part of the figure shows a distribution of words which are prevalent for this topic. Some of the words have a religious meaning like ‘apostel’, ‘christusfigur’, ‘passionsspiele’ or ‘gründonnerstag’. Others are more about dancing like ‘tänzer’, ‘bandltanz’ or ‘getanzt’, which might lead to the conclusion that this topic is about religion and certain folkroristic rituals aorund it. It has to be said that such topics are often hard to interpret and maybe 30 topics for the entire OEZV collection is a too coarse resolution.

Every year of OEZV is now a distribution over 30 topics, i.e. a vector of probabilities of size 30. We can use this representation to compute how similar annual volumes of OEZV are in their distribution across topics. In the figure above both axes are the years of publication, from 1895 in the bottom left corner to 2018 bottom right and top left. One coloured cell in this figure represents the similarity of these years (inverted distance between probability vectors), with dark blue being very similar, bright yellow not similar and green in-between. Therefore the main diagonal is dark blue, since OEZV from one year is of course very similar to itself. The most interesting patterns are the larger dark blue rectangles along the main diagonal, indicating a number of consecutive years which are all similar to each other. This is most evident for the years 1940 to 1944, which is highlighted with a red ring around it. This time span coincides with Austria being part of Nazi Germany’s Third Reich, hence it is expected that the discourse in OEZV might be very different from all other years of its publication.

However, looking directly at the results from the OCR scan for the years 1940 to 1944, we realize that all of it is some kind of pseudo-German nonsense language, e.g.: “Vrautbater fteßte eine Dteipe bon Dtätfelfragen, auf bie ber Vrautfüprer alg Veboßmäiptigter beg Vräutigamg bie paffenbe Slnimori finbett mufgte”. To understand what happened here we look directly at the PDF-files of the OEZV, showing one page for the year 1944 below. As you can see a special typeface called ‘Frakturschrift’ is being used, which was typical for Nazi Germany.

Compare this to a page from year 1895, where a more common typeface has been used, as in all years except 1940 to 1944.

Appearantly the OCR scan failed miserably on the ‘Frakturschrift’ typeface resulting in the years 1940 to 1944 using its “own” language, some kind of gibberish German. This of course has a very harmful impact on our machine learning approach, since the years 1940 to 1944 use completely different (non-sensical) words than all other years. As a result these years end up having very different distributions across words and topics. Hence the high similarity between years 1940 to 1944 turns out to not be a very significant result after all, but an artefact of the processing pipeline with the OCR mistake propagating through the whole system.

Nevertheless we find this result interesting, because the time when Austria was part of the Third Reich has always been a sort of blind spot for Austria’s society, taking decades to accept its own disreputable role in the horrific events during this historic time span, slowly emerging from “Hitler’s first victim” to perpetrator and culprit. It is therefore quite ironic that this blind spot reappears via distant reading of Austria’s main scientific journal on Austrian folk life and folk art …

Case Studies

Dusty’s Dada: „Startcleaning“

Wohnzimmerexperimente hinter der viralen Firewall

COVID19 zwingt auch „Dust and Data“ auf Distanz und zur Änderung einiger Arbeitsweisen. Für ihre Forschungen zu neuen, AI-basierten Ausstellungsdisplays haben Sanja und Irina ihre modellbauerische Passion ins Wohnzimmer verfrachtet und dort einen Galerie-Parcours für einen intelligenten Mitbewohner errichtet: ihren Saugroboter „Dusty“. Getreu seinem dienenden Credo „Start cleaning“, begann dieser sogleich mit der autonomen Vermessung des angestaubten Kunstraumraums. Ob sich in seinem skulpturalen Trainingsset neben einer lurchaneignenden auch eine „ästhetische Erfahrung“ ausbildet, von deren Bewegungsmustern im musealen Raum wir lernen könnten, ist Gegenstand weiterer Beobachtungen.

Dusty’s Dada: the Making-Of

A drone shot of the “museum”
Dusty in its full glory
Dusty fast forward
Case Studies

An seinen hellen Tagen ist es ein großartiges Gewächshaus

Das Gespräch mit Alexander Martos und Niko Wahl über ihre Projekte am Volkskundemuseum Wien und das Sammeln und Kuratieren der Zukunft.

Instagram (Kurzversion):

View this post on Instagram

An seinen hellen Tagen ist es ein großartiges Gewächshaus Ein Gespräch mit Alexander Martos und Niko Wahl über ihre Projekte am Volkskundemuseum Wien und das Sammeln und Kuratieren der Zukunft. (in voller Länger auf Facebook) Eure erste Zusammenarbeit war im Rahmen von Museum auf der Flucht. Mit einem Fellowship-Programm habt Ihr hochqualifizierte AsylwerberInnen ans Haus geholt und die Grundlage für eine intensive Auseinandersetzung mit den Themen Flucht, Migration und Ankommen in den Forschungs-, Sammlungs-, Ausstellungs- und Vermittlungstätigkeiten des Museums gelegt. Wie ist es dazu gekommen? Niko Wahl: Ich hatte zu dem Thema einen Ausstellungsvorschlag. Gleichzeitig gab es bereits eine Idee zwischen Matthias Beitl und Alexander Martos sowie einen Projektantrag zu Museum auf der Flucht. Ich hatte es mir aber ganz anders vorgestellt. Wenn man über dieses Haus und seine Nutzung spricht, ist das vielleicht ein Leitsatz. Die eigene Vorstellungswelt trifft hier meist auf etwas völlig anderes. Inwiefern ist das so? Niko Wahl: Die Offenheit des Hauses ermöglicht es, dass Dinge wachsen und Menschen sich entwickeln können. Gleichzeitig gibt das Museum im positiven Sinn Kontrolle auf, um einen Möglichkeitsraum zu schaffen. An seinen dunklen Tagen ist es Laissez-faire, aber an seinen hellen Tagen ist es ein großartiges Gewächshaus. Und es gibt mehr helle als dunkle Tage. Worin äußert sich das zum Beispiel? Niko Wahl: Bei Museum auf der Flucht war am Anfang vieles nicht ausdefiniert. Die Fellows hatten Luft zum Atmen, konnten sich entwickeln und in dieses Haus hineinwachsen. Sie haben Kontakte geknüpft und ein gegenseitiges Lernen ist entstanden. Das Haus ist Schritt für Schritt diesen Weg mitgegangen und hat, wenn etwas weiterwachsen musste, den notwendigen Raum geschaffen. Weiterlesen auf Fb oder online in unseren Vereinsnachrichten 😘 #alexandermartos #nikowahl #museumaufderflucht #kuratieren #curating #sammlung #volkskundemuseumwien

A post shared by Volkskundemuseum | since 1895 (@volkskundemuseumwien) on

Facebook (Vollversion):

"An seinen hellen Tagen ist es ein großartiges Gewächshaus"Wer sammelt, macht sich angreifbar In ganzer Länge: Das…

Gepostet von Volkskundemuseum Wien am Samstag, 29. Februar 2020