Austrian newspaper “Der Standard” published an article about DAD (in German), focusing on our problems with algorithmic bias and our critical stance towards digital curation in general.
DAD’s Arthur Flexer presented our work on analysing the semantic meaning of works of art at the International online conference The Art Museum in the Digital Age of the Belvedere Research Center. This conference is concerned with the digital transformation of art museums, which seems even more relevant lately because of COVID-19 related lockdowns and closures.
Arthur presented our (somewhat radical) approach to analyse text about artworks rather than the usual route of analysing images of the artworks. We chose this semantic driven approach because a lot of information about an artwork cannot be found in the artwork itself. Think e.g. of subjecting the “Mona Lisa” to an automatic visual analysis. Computational results will tell you that it is a picture of a young woman, in front of a landscape, and (if your algorithm is really good) is sort of smiling. This information of course totally misses the significance of the painting for (Western) art history, its immense relevance and the many connotations it has. All of this rather is a societal construct and result of centuries of discourse and reception history (for more on this see our previous blogpost). Our semantic driven approach  towards the collection of the Belvedere enables us to discover X degrees of keyword separation between works of art.
This is achieved by using the technique of word embedding [Mikolov et al 2013], which encodes semantic similarities between words by modelling the context to their neighboring words in a large training text corpus. This was used to embed keywords of Belvedere´s online fine arts collection and obtain pathways through the resulting semantic space.
The above result starts with a painting having keywords ’Clouds’, ’Mountain’, ’Meadow’ from which we transit to ’Mountain’, ’Lake’, ’Alps’ and ’Austria’, next to a painting tagged ’Fog’, then one with ’Rocky coast’ and finally with ’Clouds’, ’Rocky Coast’, ’Sea’. Our pathway therefore smoothly transits from a mountain setting to a lake in the mountains to the sea.
We also presented one very concrete solution for a room in Belvedere’s permanent exhibition. It is a room about “Viennese portraiture in the Biedermeier period”, assembling the “greatest portrait painters” from this period. In the above picture you can see four blue frames which indicate empty slots which we like to fill using our algorithm with the respective neighboring artworks as input.
The keywords for these neighboring artworks however are purely descriptive, e.g. ‘headgear’, ‘necklace’, ‘bonnet’, ‘eye contact’, probably not doing the semantic content of the artworks full justice. We believe that one underlying topic of the Biedermeier room is ‘gender’, with all but one painting depicting females. We therefore add an additional algorithmic constraint by requiring all suggested artworks to respects both the requirement of being part of a pathway and having a ‘gender’ related keyword. Since ‘gender’ is not a keyword in the Belvedere taxonomy, we use word embedding to obtain Belvedere keywords with high similarity to the topic of ‘gender’. This translation step yields keywords like: ‘femaleness’, ‘religion’, ‘islam’, ‘equality’, ‘motherhood’ or ‘headscarf’. It is obvious that these keywords point to a stereotypical discourse of gender, quickly derailing towards topics of religion and a compulsion to wear headscarfs or women being predominantly seen in their role as mothers.
This is also why we termed the use of word embedding in this context world embedding: it confronts the very rigid taxonomy of the Belvedere keywords (based on Iconclass, a classification system for cultural content) with everyday language as represented in the textual training data of the word embedding. It thereby recontextualizes or even “resocializes” taxonomic art histories via natural language processing since it uncovers biases and prejudice in our use of language and (re-?) introduces them to the world of fine arts.
The above picture shows three paintings from the Biedermeier room plus four additional paintings (with red frames) which our algorithm suggests. The second painting from the left is suggested because its keyword ‘femaleness’ is a gender keyword and its keyword ‘necklace’ makes it similar to the keywords of the first painting (‘earrings’, ‘pearl necklace’) and the one in the middle (‘brooch’, ‘bracelet’). The 5th painting from the left is suggested because ‘headscarf’ is a gender keyword and ‘eye contact’ and ‘earring’ make it similar to both the painting in the middle (‘brooch’, ‘bracelet’, ‘eye contact’) and the painting on the far right (‘eye contact’, ‘bonnet’).
In the ensuing discussion with the conference’s audience Arthur Flexer advocated that our semantic apprach is more helpful for building a curatorial narrative than a purely aesthetic procedure. It allows to answer the question about curatorial gaps between artworks shown in an exhibition. What works of art exist in the holdings of the museum that fit the curatorial narrative but did not succeed in becoming part of the exhibition?
He also tried to make clear that by using such a machine learning tool like word embedding, curating becomes a joint endeavor of man and machine, where curatorial decisions have to be formulated as input and constraints to the algorithm. But even a simple curatorial Google search already is an interaction of man and machine, with algorithms (oblique to the curator) nevertheless to a certain extent shaping their curatorial enterprise by showing specific selections of information only. It was also discussed that such a man/machine approach is able to uncover algorithmic biases in the methods used, as e.g. stereotypical representations of societal discourse in word embedding.
Looking towards future extensions of our work it can be said that of course we could analyse longer (art historic) texts about artworks with the same methodology thereby gaining much richer semantic context then by relying on simple keywords only. Another possible extension is to embed semantic and visual information simultaneously which could yield curatorial solutions that respect semantic and viusal constraints at the same time [Frome et al 2013].
Automatic colorization of black and white photographs has recently been enabled by advances in machine learning (see [Zhang et al 2016] for the methods we used for the following results). Basically deep neural networks are shown millions of black and white photographs and their color versions to learn a mapping from black and white to color. After successful training, it is possible to colorize black and white photos which the machine learning algorithm has never seen before. Online services like ALGORITHMIA enable everyone to test and use this technique by simply uploading their images.
One focus for DUST AND DATA is the Glyptothek of the Academy of Fine Arts, Vienna, which is a collection of plaster casts dating back to the late 17th century. Its main task was to serve as study material for Academy students, but it also became publicly accessible as a museum. The collection contains copies of a canon of world renown sculptures, ranging from plaster casts of Egyptian originals to copies of Greek and Roman, medieval, renaissance and historism statues. Thanks to German archaeologist Vinzenz Brinkmann, it is now an established fact that classic statues from e.g. Greek antiquity were originally painted in bright colors.
As for DUST AND DATA, all we are given are black and white photos of plaster casts from the Glyptothek. These are digital photo copies of analog plaster copies of statues. Is this sufficient to obtain any kind of meaningful result when trying to automatically colorize classic statues?
The automatic colorization of our dying warrior did not quite succeed, but is has many interesting features nevertheless. E.g. the algorithm did correctly “understand” that the statue is held by a real person, hence the colorization to skin tones of the person’s arm and dark color of the trousers. As for the statue, it is rendered in a light brown color, probably imitating the color of statues it has seen during training of the machine learning system. But what about the arm bottom left in the picture? It has almost a lifelike skin tone. And even more astonishing, the red bloodlike colorization of the amputed arm stump!
The Glyptothek also has a copy of Michelangelo’s David, or at least its head. The colorization above does provide a skinlike pink tone for the face and even blond hair.
Applying the colorization to the full David statue gives a lifelike pinkish skin tone, at least much more so than for the dying warrior above. The fact that Michelangelo’s David is such a realistic sculpture probably made it possible to trick the algorithm into treating the photo of David as the photo of a real naked person.
So can we use machine learning to automatically colorize photos of plaster casts from the Glyptothek? This would require training an algorithm with thousands of color restorations of antique statues (see photo of the Trojan archer above) to have any chance of success. But application of state-of-the-art colorization algorithms already now provides interesting results by exposing some of the biases and failures of their machine learning machinery.