Case Studies

“The picture is white” – on visual versus semantic analysis of fine art paintings

When applying machine learning to fine art paintings, one obvious approach is to analyse the visual content of the paintings. We discuss two major problems which caused us to take a semantic route instead: (i) state-of-the-art image analysis has been trained on photos and does not work well with paintings; (ii) visual information obtained from paintings is not sufficient for building a curatorial narrative.

Let us start by using the DenseCap online tool to automatically compute captions for a photo of two dogs playing.

The DenseCap model correctly identifies the dogs and many of their properties (e.g. “the dog is brown”, “the dog has brown eyes”, “the front legs of a dog”, “the ear|head|paw of a dog”) as well as aspects of the backgound (“a piece of grass”, “a leaf on the ground”). There are some wrong captions for the dogs (“the dogs tail is white”, but there is no tail in the picture) and for the background also (“the fence is white”). But all in all the computer vision system does a good job in what it has been trained to do: localize and describe salient regions in images in natural language [Johnson et al 2016].

Let us now apply this system to a fairly realistic dog painting from the collection of our partner museum Belvedere, Vienna.

Again many characteristics of the dog are correctly identified (“the dog is looking at the camera”, “the eye|ear of the dog”) and also “a bowl of food”. The background already provides more problems, with some confusions still comprehensible (“a white napkin”, “the curtain is white”) but others less so (“the flowers are on the wall”).

Testing the system on a more abstract painting, Belvedere’s collection highlight “The Kiss” by Gustav Klimt, yields even stranger results.

While some captions are correct (“the mans hand”, “the dress is yellow”, “the flowers are yellow|green”), others are are somewhat off (“the hat is on the mans head”) or just completely wrong (“the picture is white”, “the wall is made of bricks”, “a black door”, “a window on the building”). The essential aspect of the painting, a man and a women embracing, is not comprehended at all. Of course this is understandable since the DenseCap system has been trained on 94,000 photo images (plus 4,100,000 localized captions) but not on fine art paintings, which explains that it cannot generalize to more abstract forms of art.

On the other hand, even if an image analysis system could perfectly detect that “Caesar am Rubicon” shows a dog looking at sausage in a bowl on a table, it would still not grasp the meaning of the painting: Caesar is both the name of the dog and the historical figure who crossed the Rubicon which was a declaration of war on the Roman Senate ultimately leading to Caesar’s ascent to Roman dictator. Hence “crossing the Rubicon” is now a metaphor that means to pass a point of no return.

The same holds for Gustav Klimt’s “The Kiss”. Even if the image analysis system were not fooled by Klimt’s use of mosaic-like two-dimensional Art Nouveau organic forms and would be able to detect two lovers embracing in a kiss, it would still not grasp the significance of the decadence conveyed by the opulent exalted golden robes or the possible connotation to the tale of Orpheus and Eurydice.

The DAD project is about exploring the influence of Artificial Intelligence on the art of curating. From a curatorial perspective, grasping the semantic meaning of works of art is essential to build curatorial narratives that are not just based on a purely aesthetic procedure. See our previous blogposts [1][2] on such a semantic driven approach towards the collection of the Belvedere, where we chose to analyse text about the paintings rather than the paintings themselves.


2nd Liminal Space at the ]a[ Research Day

On 12.11.2020 DAD’s Niko Wahl presented our intermediate results on the third research day of the Academy of Fine Arts, Vienna. Due to COVID-19, this event was an online ZOOM meeting. The goal of the ]a[ research days is to give an overview of all ongoing research projects at the academy including discussions with all participating colleagues.

Niko Wahl gave a short introduction of our project and an overview of DAD’ s collaborations with three different museums, where we work with an archive of an ethnological journal, a fine arts gallery and the statues in the Academy’s Glyptothek.

Since our work with the Austrian Museum of Folk Life and Folk Art and with the Belvedere, Vienna, has already been documented in previous blogposts [1][2], lets turn to the presentation of plaster casts at the Academy‘s Glyptothek, which we explored with Dusty, an off-the-shelf household robot.

Many people associate Artificial Intelligence (AI) with the development of ever more powerful and dextrous robots, along with horror scenarios of these machines taking over the planet. In reality robots are a small part of AI which is rather dominated by machine learning software solutions powering your Internet search engine, the natural language interface to your mobile phone, online music, movie and product recommendations and many other everyday technologies.

On the other hand, many people already own robots with limited forms of AI, for instance vacuum cleaning robots. What if we confront such a household robot with a – supposedly obsolete – museum collection of historic plaster copies of famous statues, whose very physis seems to be made of dust.

The robot takes its own route through the museum space. Following its built-in algorithms it perpetually finds new ways through the collection. It seemingly decides for itself in what order to visit the museum objects, all the time metaphorically internalizing the objects of art while inhaling their dust.

Other visitors are free to follow the robot on its path through the museum space engaging with its exhibition narrative. They might benefit form surprising relationships between objects of art established by the often creative course of the robot. Smart last generation vacuum cleaning robots are able to share their sensory experiences with others of their kind. These shared experiences usually are measurements of objects and how to avoid them when traversing a room. But what if this cloud communication, usually not accessible to us, deals with objects of art instead of everyday items? Will meeting David or the Pieta change the robots’ discourse? What if the robot meets a portrait of itself?


DAD at the ]a[ Research Day #3

Photo by Wikipedia, CC BY-SA 3.0

On 12.11.2020 we will give an overview of DUST AND DATA at the third research day of the Academy of Fine Arts, Vienna. This event will be online, attendance is free but you should register before the 4th of November. Our talk will include news about Dusty!

Case Studies

Dusty visits the Glyptothek

Do you remember Dusty, the vacuum cleaner robot that explored a model version of the Glyptothek during this spring’s COVID19 related lockdown? This summer Dusty was able to experience the real Glyptothek, using its somewhat limited artificial intelligence, basically trying to avoid obstacles on its way through the maze of shelves full of plaster casts.

The Glyptothek of the Academy of Fine Arts, Vienna, is a collection
of plaster casts dating back to the late 17th century. Its main task was to serve as study material for Academy students, containing copies of a canon of world renown sculptures, ranging from plaster casts of Egyptian originals to copies of Greek and Roman, medieval, renaissance and historism statues. This collection of copies of works of art can be seen as an early analog blueprint of digital collections: the Glyptothek made the essence of European sculpture available to local audiences, who could enjoy international pieces of art without leaving their home town, very much like today‘s internet population can access digital images of the world‘s artistic heritage at the click of their handheld device.

Speaking of digital images, the above image of Dusty in the Glyptothek actually is a digital copy of an analog photograph, which in itself is an analog copy of a plaster cast which is a copy of a statue which is a copy of a real (or imagined) person …


4th Critical Space at the Belvedere Research Center

On the 22nd of September 2020 the DAD team met with Christian Huemer and Johanna Aufreiter from the Belvedere Research Center to discuss our results concerning Belvedere’s online collection. One focus of the meeting was our engagement with the room on “Viennese Portraiture in the Biedermeier Period” in Belvedere’s permanent exhibition.

Applying our algorithm to find pathways of semantic meaning [Flexer 2020] between works of art, we are able to suggest additional works for the liminal spaces between individiual positions in the curatorial narrative, opening up new sub-narratives for the room. Based on a word embedding [Mikolov et al 2013] of the keywords associated with the paintings, our algorithm suggests works of art which follow a pathway between the respective semantic meanings. Moreover we are able to further constrain our liminal curation by requiring all art works to fit an additional overall topic chosen by a human curator, again translated to the language of Belvedere’s keyword system via word embedding. As an example see a “Gender” constraint applied to the Biedermeier room.

A conceivable outcome is a revision of the Biedermeier room achieved via a joint curation of human and machine. This, as well as other approaches towards the Belvedere collection, will be the center of further exchange between DAD and the Belvedere.

All depicted paintings in this blog post by Belvedere, Vienna, Austria (CC BY-SA 4.0).


“Der Kurator ist eine Maschine”

Austrian newspaper Kurier published an article about using machine learning to curate museum collections, mentioning our DUST AND DATA project (in German, behind paywall).

Activities Spaces

1st Liminal Space at the Machine Learning for Media Discovery Workshop

DAD’s Arthur Flexer presented our work on discovering semantic pathways through Belvedere’s fine arts collection at the “Machine Learning for Media Discovery Workshop” (18th of July 2020) of the “International Conference on Machine Learning”. The conference was supposed to happen in Vienna, Austria, but due to COVID-19 went fully virtual. You can see Arthur present his poster in a dedicated Zoom room below.

You can read about the results in our previous blog post, read the respective scientific paper and look at the poster.

While a virtual workshop is not able to replace the experience and liveliness of a physical scientific meeting, it still allowed us to get an increasing degree of public exposure for our work in progress, which is the purpose of our Liminal Spaces.

Citation: Flexer A.: Discovering X Degrees of Keyword Separation in a Fine Arts Collection, in Proceedings of the 37th International Conference on Machine Learning, Machine Learning for Media Discovery Workshop, Vienna, Austria, PMLR 108, 2020.

Activities General

Mid-term Conclusive Space in Drosendorf

The DUST AND DATA team evaluated their progress and current status in a one week workshop at Drosendorf (Lower Austria). We also planned the second year of the project including concrete next steps for our three Case Studies: the Glyptothek of the Academy of Fine Arts Vienna, the Volkskundemuseum Wien and the Belvedere.


3rd Critical Space on discovering semantic pathways through a fine arts collection

DAD´s Arthur Flexer gave a semi-virtual lecture on “Discovering X Degrees of Keyword Separation in a Fine Arts Collection” at the Austrian Research Institute for Artificial Intelligence (OFAI, 24.6.2020). The presented work is inspired by the project ‘X Degrees of Separation‘ by ‘Google Arts and Culture’, which explores the “hidden paths through culture” by analyzing visual features of artworks to find pathways between any two artifacts through a chain of artworks. In his work, Arthur Flexer is more interested in finding pathways of the semantic meaning of works of art rather than just their visual features. Therefore he used word embedding [Mikolov et al 2013], which encodes semantic similarities between words by modelling the context to their neighboring words in a large training text corpus. This is used to embed keywords of Belvedere´s online fine arts collection and obtain pathways through the resulting semantic space.

Keywords from left to right: [‘Resurrection’, ‘Christ’], [‘Christ’], [‘Death’, ‘Skeleton’], [‘Vulture’], [‘Angel’, ‘Air’, ‘Martyrdom’, ‘Suffering’, ‘Failure’, ‘Death’, ‘Andreas’, ‘Multiple Layer Room’]. All images by Belvedere, Vienna, Austria (CC BY-SA 4.0).

The above exemplary result starts with a sculpture with keywords ‘Resurrection’ and ‘Christ’ where the painting in the end position has keywords around the topic of ‘Death’ and ‘Martyrdom’. The second artwork in the pathway is a relief showing ‘Christ’, while the third is a painting tagged with ‘Death’ and ‘Skeleton’, hence already semantically closer to the topics of ‘Martyrdom’, ‘Suffering’ and ‘Death’ of the end artwork. In fourth position is an etching with the only keyword ‘Vulture’, which is semantically close to ‘Angel’, ‘Air’ and ‘Death’ of the ending artwork.

In the ensuing discussion of results it was found remarkable how machine learning via word embedding replicates existing biases and prejudice in the society. In the above query with the word “Homosexuality” the most similar word out of 22 million terms in the word embedding model is “Paedophilia”, one of the worst prejudice against homosexual people. The word embedding model has been trained on the Wikipedia and Common Crawl corpus [Mikolov et al 2018], which helps explaining the replication of very common and persisting prejudice in our society.

OFAI´s Brigitte Krenn found it interesting how the very reglemented and almost scientific language in Belvedere’s keywords (stemming from the Iconclass project) is contrasted with everyday language via usage of word embedding. As can be seen above, the most similar keywords to “Homosexuality” are “Rape”, “Religion”, “Violence” and “Islam” (all translated from German). This is of course a direct result of the biases inherent to the word embedding model. DAD’s Alexander Martos called this phenomenon “re-socialising of arts via natural language processing” or rather “re-a-socialising” since it uncovers asocial societal tendencies and (re-?) introduces them to the world of fine arts.

Case Studies

Automatic colorization of classic statues?

Automatic colorization of black and white photographs has recently been enabled by advances in machine learning (see [Zhang et al 2016] for the methods we used for the following results). Basically deep neural networks are shown millions of black and white photographs and their color versions to learn a mapping from black and white to color. After successful training, it is possible to colorize black and white photos which the machine learning algorithm has never seen before. Online services like ALGORITHMIA enable everyone to test and use this technique by simply uploading their images.

Successful colorization via ALGORITHMIA of our family’s dog Ozzy

One focus for DUST AND DATA is the Glyptothek of the Academy of Fine Arts, Vienna, which is a collection of plaster casts dating back to the late 17th century. Its main task was to serve as study material for Academy students, but it also became publicly accessible as a museum. The collection contains copies of a canon of world renown sculptures, ranging from plaster casts of Egyptian originals to copies of Greek and Roman, medieval, renaissance and historism statues. Thanks to German archaeologist Vinzenz Brinkmann, it is now an established fact that classic statues from e.g. Greek antiquity were originally painted in bright colors.

This color restoration shows what a statue of a Trojan archer from the Temple of Aphaia, Aegina would have originally looked like (CC BY-SA 2.5, from Wikipedia)

As for DUST AND DATA, all we are given are black and white photos of plaster casts from the Glyptothek. These are digital photo copies of analog plaster copies of statues. Is this sufficient to obtain any kind of meaningful result when trying to automatically colorize classic statues?

Plaster cast of dying warrior from the Temple of Aphaia, Aegina (Photo: Gemäldegalerie der Akademie der bildenden Künste Wien)
Automatic colorization via ALGORITHMIA

The automatic colorization of our dying warrior did not quite succeed, but is has many interesting features nevertheless. E.g. the algorithm did correctly “understand” that the statue is held by a real person, hence the colorization to skin tones of the person’s arm and dark color of the trousers. As for the statue, it is rendered in a light brown color, probably imitating the color of statues it has seen during training of the machine learning system. But what about the arm bottom left in the picture? It has almost a lifelike skin tone. And even more astonishing, the red bloodlike colorization of the amputed arm stump!

The Glyptothek also has a copy of Michelangelo’s David, or at least its head. The colorization above does provide a skinlike pink tone for the face and even blond hair.

Applying the colorization to the full David statue gives a lifelike pinkish skin tone, at least much more so than for the dying warrior above. The fact that Michelangelo’s David is such a realistic sculpture probably made it possible to trick the algorithm into treating the photo of David as the photo of a real naked person.

So can we use machine learning to automatically colorize photos of plaster casts from the Glyptothek? This would require training an algorithm with thousands of color restorations of antique statues (see photo of the Trojan archer above) to have any chance of success. But application of state-of-the-art colorization algorithms already now provides interesting results by exposing some of the biases and failures of their machine learning machinery.