Case Studies

On a blind spot in distant reading

The accessibility of vast amounts of text in digital form has enabled humanities to add ‘distant reading’ of thousands of books by algorithms as a new research tool to its repertoire of methods. This ‘distant reading’ of course has to be complemented with traditional ‘close reading’ of individual books. See e.g. [Jänicke et al 2015] for a survey and discussion of challenges.

We are currently working on applying machine learning as a distant reading tool to the journal Österreichische Zeitschrift für Volkskunde (OEZV) of the Austrian Museum of Folk Life and Folk Art. The OEZV has been published almost continuously since 1895 and we are able to use the result of an ‘Optical Character Recognition’ (OCR) scan of the entire publication history for our analysis. One of our interests is how the discourse and corresponding topics have changed over the years of its publication.

We applied topic modelling via Latent Dirichlet Allocation (LDA) for this analysis. Topic modelling tries to model latent topics in large amounts of text. Its basic entitity are documents which are modelled as probability distributions over word occurences. The assumption is that documents contain text about different topics which manifests itself in usage of different words. These latent topics are then also modelled as probability distributions over word occurences, since different topics will use different words related to their different content. LDA then finds a probability distribution of topics across documents, trying to optimize separation of documents over topics. The assumption here is that different documents contain information about rather different (and few) topics, but of course overlaps are allowed.

Usually a document is one article in a journal or newspaper, but for OEZV we have no access to article boundaries, therefore our basic entity are individual pages (more than 34000 for all years of OEZV together). Since we are interested in topic evolution over time, we aggregated all pages per year into overall documents. Using the LDA visualization package pyLDAvis, we modelled all OEZV volumes as distributions over 30 topics.

In this visualization, every topic is represented as a sphere in the left part of the figure. Clicking on one of the spheres (no. 17 in our picture), the right part of the figure shows a distribution of words which are prevalent for this topic. Some of the words have a religious meaning like ‘apostel’, ‘christusfigur’, ‘passionsspiele’ or ‘gründonnerstag’. Others are more about dancing like ‘tänzer’, ‘bandltanz’ or ‘getanzt’, which might lead to the conclusion that this topic is about religion and certain folkroristic rituals aorund it. It has to be said that such topics are often hard to interpret and maybe 30 topics for the entire OEZV collection is a too coarse resolution.

Every year of OEZV is now a distribution over 30 topics, i.e. a vector of probabilities of size 30. We can use this representation to compute how similar annual volumes of OEZV are in their distribution across topics. In the figure above both axes are the years of publication, from 1895 in the bottom left corner to 2018 bottom right and top left. One coloured cell in this figure represents the similarity of these years (inverted distance between probability vectors), with dark blue being very similar, bright yellow not similar and green in-between. Therefore the main diagonal is dark blue, since OEZV from one year is of course very similar to itself. The most interesting patterns are the larger dark blue rectangles along the main diagonal, indicating a number of consecutive years which are all similar to each other. This is most evident for the years 1940 to 1944, which is highlighted with a red ring around it. This time span coincides with Austria being part of Nazi Germany’s Third Reich, hence it is expected that the discourse in OEZV might be very different from all other years of its publication.

However, looking directly at the results from the OCR scan for the years 1940 to 1944, we realize that all of it is some kind of pseudo-German nonsense language, e.g.: “Vrautbater fteßte eine Dteipe bon Dtätfelfragen, auf bie ber Vrautfüprer alg Veboßmäiptigter beg Vräutigamg bie paffenbe Slnimori finbett mufgte”. To understand what happened here we look directly at the PDF-files of the OEZV, showing one page for the year 1944 below. As you can see a special typeface called ‘Frakturschrift’ is being used, which was typical for Nazi Germany.

Compare this to a page from year 1895, where a more common typeface has been used, as in all years except 1940 to 1944.

Appearantly the OCR scan failed miserably on the ‘Frakturschrift’ typeface resulting in the years 1940 to 1944 using its “own” language, some kind of gibberish German. This of course has a very harmful impact on our machine learning approach, since the years 1940 to 1944 use completely different (non-sensical) words than all other years. As a result these years end up having very different distributions across words and topics. Hence the high similarity between years 1940 to 1944 turns out to not be a very significant result after all, but an artefact of the processing pipeline with the OCR mistake propagating through the whole system.

Nevertheless we find this result interesting, because the time when Austria was part of the Third Reich has always been a sort of blind spot for Austria’s society, taking decades to accept its own disreputable role in the horrific events during this historic time span, slowly emerging from “Hitler’s first victim” to perpetrator and culprit. It is therefore quite ironic that this blind spot reappears via distant reading of Austria’s main scientific journal on Austrian folk life and folk art …

Case Studies

Dusty’s Dada: „Startcleaning“

Wohnzimmerexperimente hinter der viralen Firewall

COVID19 zwingt auch „Dust and Data“ auf Distanz und zur Änderung einiger Arbeitsweisen. Für ihre Forschungen zu neuen, AI-basierten Ausstellungsdisplays haben Sanja und Irina ihre modellbauerische Passion ins Wohnzimmer verfrachtet und dort einen Galerie-Parcours für einen intelligenten Mitbewohner errichtet: ihren Saugroboter „Dusty“. Getreu seinem dienenden Credo „Start cleaning“, begann dieser sogleich mit der autonomen Vermessung des angestaubten Kunstraumraums. Ob sich in seinem skulpturalen Trainingsset neben einer lurchaneignenden auch eine „ästhetische Erfahrung“ ausbildet, von deren Bewegungsmustern im musealen Raum wir lernen könnten, ist Gegenstand weiterer Beobachtungen.

Dusty’s Dada: the Making-Of

A drone shot of the “museum”
Dusty in its full glory
Dusty fast forward

“GO CURATOR” mock-up presented during 2nd Critical Space

During our 2nd Critical Space with guest expert Bob Sturm (KTH Royal Institute of Technology, Stockholm, Sweden) we presented and discussed a first mock-up of our “GO CURATOR” idea.

Here you can see DAD team members setting up the physical model.

GO CURATOR analyses text describing paintings in museum collections. Topic modelling is used to represent the semantic content in these texts, thereby targeting the semantic meaning of the paintings themselves.

Result is a probabilistic distribution across topics for every painting. E.g. in the painting below, the topics “food”, “act”, “human” and “object” are present to an equal extent of 25%.

Curators or museum visitors can change the exhibition interactively by adjusting which topics should be present to what extent. GO CURATOR then automatically adjusts the choice of paintings and their exact hanging in the museum room.

Case Studies

An seinen hellen Tagen ist es ein großartiges Gewächshaus

Das Gespräch mit Alexander Martos und Niko Wahl über ihre Projekte am Volkskundemuseum Wien und das Sammeln und Kuratieren der Zukunft.

Instagram (Kurzversion):

View this post on Instagram

An seinen hellen Tagen ist es ein großartiges Gewächshaus Ein Gespräch mit Alexander Martos und Niko Wahl über ihre Projekte am Volkskundemuseum Wien und das Sammeln und Kuratieren der Zukunft. (in voller Länger auf Facebook) Eure erste Zusammenarbeit war im Rahmen von Museum auf der Flucht. Mit einem Fellowship-Programm habt Ihr hochqualifizierte AsylwerberInnen ans Haus geholt und die Grundlage für eine intensive Auseinandersetzung mit den Themen Flucht, Migration und Ankommen in den Forschungs-, Sammlungs-, Ausstellungs- und Vermittlungstätigkeiten des Museums gelegt. Wie ist es dazu gekommen? Niko Wahl: Ich hatte zu dem Thema einen Ausstellungsvorschlag. Gleichzeitig gab es bereits eine Idee zwischen Matthias Beitl und Alexander Martos sowie einen Projektantrag zu Museum auf der Flucht. Ich hatte es mir aber ganz anders vorgestellt. Wenn man über dieses Haus und seine Nutzung spricht, ist das vielleicht ein Leitsatz. Die eigene Vorstellungswelt trifft hier meist auf etwas völlig anderes. Inwiefern ist das so? Niko Wahl: Die Offenheit des Hauses ermöglicht es, dass Dinge wachsen und Menschen sich entwickeln können. Gleichzeitig gibt das Museum im positiven Sinn Kontrolle auf, um einen Möglichkeitsraum zu schaffen. An seinen dunklen Tagen ist es Laissez-faire, aber an seinen hellen Tagen ist es ein großartiges Gewächshaus. Und es gibt mehr helle als dunkle Tage. Worin äußert sich das zum Beispiel? Niko Wahl: Bei Museum auf der Flucht war am Anfang vieles nicht ausdefiniert. Die Fellows hatten Luft zum Atmen, konnten sich entwickeln und in dieses Haus hineinwachsen. Sie haben Kontakte geknüpft und ein gegenseitiges Lernen ist entstanden. Das Haus ist Schritt für Schritt diesen Weg mitgegangen und hat, wenn etwas weiterwachsen musste, den notwendigen Raum geschaffen. Weiterlesen auf Fb oder online in unseren Vereinsnachrichten 😘 #alexandermartos #nikowahl #museumaufderflucht #kuratieren #curating #sammlung #volkskundemuseumwien

A post shared by Volkskundemuseum | since 1895 (@volkskundemuseumwien) on

Facebook (Vollversion):

"An seinen hellen Tagen ist es ein großartiges Gewächshaus"Wer sammelt, macht sich angreifbar In ganzer Länge: Das…

Gepostet von Volkskundemuseum Wien am Samstag, 29. Februar 2020

2nd Critical Space with Bob Sturm on frontiers of artificial creativity

During the research visit of Bob Sturm we will discuss the frontiers of artificial creativity and its criticism in the context of DUST AND DATA. Bob Sturm will also give a public lecture about his work on using machine learning to compose Irish folk music. His talk will also feature live accordion playing.

“Folk the Algorithms” – Bob Sturm, KTH Royal Institute of Technology, Stockholm, Sweden

In this talk/musical performance, I will recount how a bit of Saturday morning humor turned into an ERC Consolidator Grant four years later. It’s a story of an engineer with an artistic bent meeting a machine learning algorithm through a blog. One part of the story involves the naive misappropriation of music data without consideration of its provenance and significance. Another part involves the serious contemplation of such transgressions, and then endeavors taken to redress them. A variety of interesting perspectives and questions have arisen out of this story, which will be subject to study in the project, Music at the Frontiers of Artificial Creativity and Criticism (MUSAiC, ERC-2019-COG No. 864189).

Time: Wednesday, 26th of February 2020, 6:30 p.m. sharp

Location: Oesterreichisches Forschungsinstitut fuer Artificial Intelligence, OFAI
Freyung 6, Stiege 6, Tuer 7, 1010 Wien


DAD at the Austrian Museumsbund’s “Digital Fair” in the Austrian Museum of Folk Art and Folk Culture (Apr 22-24)

DAD will present preliminary results and critical aspects of it’s case study-project based at Austrian Museum of Folk Art and Folk Culture (Volkskundemuseum Wien) at the “Digitaler Jahrmarkt”, organised by the Museumsbund, which will take place between Apr 22 and 24. On Thursday at 12pm we will talk about “Big Data vs. Thick Data? Vom Nutzen und Nachteil der künstlichen Intelligenz für das Kuratieren an kulturwissenschaftlichen Museen“.

Further informations & download of the program:


“Sound Art and Curating – Machine Learning and Limits of Control”

DAD-AI-Experte Arthur Flexer (OFAI, Intelligent Music Processing and Machine Learning Group) diskutiert mit Thomas Grill  (Sound Artist & Researcher, Machine Learning | ELAK) im Rahmen der Tagung “AIL x SOUNDFRAME – Navigieren im Postdigitalen. Wie Kunst und Wissenschaft unsere Zukunft gestalten” (15.-17.1.2020, Angewandte Innovation Lab)

Where do you want to go? Enter starting point. Enter destination. Get directions.
Eine einfache Gleichung. Doch was, wenn man auf diese simplen Fragen keine Antwort weiß, weil die Parameter unbestimmt sind? Wo stehe ich eigentlich? Wo will ich hin?

Das Postdigitale verweigert sich dem linearen Denken. Es ist wild, vernetzt, assoziativ, sackgassenintensiv. Gleichzeitig ermöglicht es eine Neuauslotung von Kategorien und Prioritäten und schafft damit Raum. Gemeinsam mit Künstler*innen und Wissenschaftler*innen diskutieren wir, welche Erkenntnisse der interdisziplinäre Austausch von Kunst und Wissenschaft bringt, wenn es darum geht, mögliche Parameter für die Gestaltung unserer Gesellschaft zu finden.


15.1.2020, 18 Uhr
AIL, Franz-Josefs-Kai 3, 1010 Wien

Weitere Informationen:

Review of the conference by Eva Fischer (SOUND:FRAME)

International Conference “The Art Museum in the Digital Age”

The team of “DAD” visited the high level international conference “The Art Museum in the Digital Age“. Here Eva Fischer from sound:frame is presenting current projects on AI & collections, also mentioning and introducing DAD.


1st Critical Space @OFAI

DAD’s first “Critical Space” took place at OFAI on 16th of December 2019. The full day event included the complete DAD team and Fritz von Sunderhaar.


Kick off meeting Drosendorf

The DUST AND DATA team started the project with an intensive one week workshop at Drosendorf (Lower Austria).