The Potential for AI in Audiovisual Archives

Artificial Intelligence has enormous potential to expand the reach of audiovisual archives. This blog post offers examples of how AI is being used for search and discovery, creative re-use, education, audience engagement, and Collections as Data.

Blog

10 May 2021

Randi Cecchine

Student Preservation and Presentation of the Moving Image

Photo: The Grateful Dead in the BBC Archive

Automatic description opens up the archive for search and discovery

In an interview, Jake Berger from the BBC Archive, which safeguards and preserves over 15 million assets from the British Broadcasting Corporation, explained the organization’s interest in using machine learning:

The BBC Archive’s editorial mission is to open up the archive to as many people as possible in as many ways as possible and try to optimize how people can find what they’re looking for, or how they can be introduced to things they didn’t know [they were] looking for but would enjoy.

That’s from our perspective where the greatest opportunity for machine learning comes in, so the main thing that our team is interested in is automated enrichment of metadata in archive content.

Berger further explained that the BBC archive is working to create transcripts of the millions of digitized radio and video files in their collection. Most programs broadcast since the 1980s have human-generated subtitle files that can be imported into their database. For the rest, the BBC is using a customized version of the Kaldi open-source, machine-generated automatic speech recognition (ASR) toolkit, trained on BBC archive material. Berger shared an example of a TV show that included what he believes to be the very first televised appearance of the band The Grateful Dead. Before running ASR technology, there had been no way for anyone to know that this material existed in the archive, as it had never been noted by a human cataloger.

This small example is a simple illustration of how transformational using AI technology across a large audiovisual collection can be. With the ability to annotate audiovisual materials at scale, AI technologies help audiovisual archives create more detailed descriptive information about their materials than is feasible with human annotation, thus opening up new potential for access and re-use. Staff from other archives expressed hope that these technologies will not only make collections more easily searchable for audiences and for producers re-using footage, but that they will also increase accessibility for people with disabilities and across languages.

In addition to ASR for transcription, archives are exploring a variety of technologies to analyze and describe audio, image, and language, plus the layering of tools to analyze those descriptions. The range of tools will be described in further detail in the next blog post.

Image: Citizen DJ

Creative Re-Use

Some archives are partnering with artists to explore the creative potential of using AI tools. One example is the collaboration between the (United States) Library of Congress and Brian Foo on Citizen DJ, a platform for users to create Hip Hop music from the Library’s audio and video collections that have been annotated by AI. Developed as part of the Library of Congress’s Innovator In Residence Program, Citizen DJ is a unique approach to opening the archive to internet users to explore and recontextualize music, film, and dialect collections. It also educates users about the complexities of re-use through a comprehensive copyright and ethics guide, raising important questions about attribution, compensation, and historical and cultural contexts.

Photo: Jan Bot installation at the Eye Collection Centre

Another novel approach to creative re-use is Jan Bot, a fully automated website and installation that uses daily trending news topics to create short films from clips in the collection of the Eye Filmmuseum in Amsterdam. The Jan Bot installation sits in the hallway of the Eye Collection Centre, as a reminder to staff and visitors of the ongoing engagement between AI and the Collection.

Education and Learning Curriculum

PBS Learning brings programming from the (United States) Public Broadcasting Service to educators and students, and connects it to curriculum used in classrooms and at home. A cross-departmental group is working on a pilot project that blends together different types of machine learning – the group refers to it as “smart stacking.” They are working with the machine-learning company GrayMeta to analyze the beloved children’s TV show Sesame Street. GrayMeta works with clients to train algorithms with their own content as training data, so PBS Learning is using facial recognition algorithms not only on the people in the show, but also on the “faces” of the animated and puppet monster characters. In addition, the team is working to build models for automatic speech detection with children’s voices, something that standard tools haven’t done very well.

With the metadata generated from this analysis, PBS Learning is working with the Pool Party semantic web company to assign clips to taxonomies that are correlated with school curricula. These clips can eventually be made available to teachers for use with concepts such as geometry or emotional development, and can also be delivered directly to students in individualized user environments.

In an interview, Athina Livanos-Propst, Editorial Services Manager & Digital Librarian at PBS Learning Media, talked about the project and the potential this “smart stacking” has for shaping audience experience in the future. For example, in working with GrayMeta to create focused categories for analysis, called “Insight groups,” algorithms have been trained to search for unique characteristics such as “fruit” or “feelings” or “weather,” and this information can be matched to audience preferences. She shared a hypothetical scenario in which this type of process could be used to identify that a viewer not only enjoyed British murder mysteries, but that they also preferred ones that included a lot of fog. Eventually, this type of information could be used to provide viewers with individually tailored experiences.

Photo: ReTV’s 4u2 Messenger delivers AI-driven personalized content

Audience Engagement

The business models of digital media and entertainment companies like Spotify and Netflix are driven by the intersecting analysis of customer and product data – but what place does it play in the world of the archive? These types of questions are being asked as part of the ReTV project, a collaboration (supported by the European Union’s Horizon 2020 research and innovation programme) that “aims to provide broadcasters and content distributors with technologies and insights to leverage the converging digital media landscape.” Along with creating technology to help broadcasters and archives use prediction and analytics to re-purpose content, ReTV helps to disseminate information within the archive and cultural heritage community, for example through the DataTV Webinar.

In an interview with staff from meemoo, The Flemish Institute for Archives, Debbie Esmans, Manager Policy & Strategy, explored the question of what to do with user information so that it doesn’t encourage the “funnel” effect that most digital providers achieve. She spoke about how public broadcasters have an official mandate to help audiences broaden their tastes (smaakverruiming), and wondered how AI could help in that process. This is an area rich for investigation, and points to the potential for humans and AI to work together in more collaborative ways as proposed in the publication “A Research Agenda for Hybrid Intelligence: Augmenting Human Intellect With Collaborative, Adaptive, Responsible, and Explainable Artificial Intelligence." (Link) In it, the notion of hybrid intelligence (HI) is defined as “the combination of human and machine intelligence, augmenting human intellect and capabilities instead of replacing them and achieving goals that were unreachable by either humans or machines.” Public service-driven organizations such as broadcasters and archives are in a unique position to develop projects in which AI serves to amplify human intelligence.

Collections as Data

As AI opens up new opportunities for archives to reconceive their role, scholars are also reconsidering their relationship to archival materials. Scholars in the humanities have been working with technologies such as Optical Character Recognition (OCR) for several years, giving new life to the data found in digitized text documents. Computer vision technology has recently created new opportunities to study still and moving images, and the field of digital humanities is rapidly taking up this new opportunity and theorizing about how it will expand the kinds of questions that researchers can ask of data. In 'The visual digital turn: Using neural networks to study historical images', Melvin Wevers and Thomas Smits explain:

Computer vision techniques offer the possibility to track and analyze specific kinds of visual representations over long periods in the archive. In addition, the use of these techniques and the results they yield can lead to new research questions and areas of inquiry (p 11.)

These new types of questions are created when researchers become familiar with new tools, and with the data in particular collections. The Santa Barbara Statement on Collections as Data explains the concept of Collections as Data:

Collections as data development aims to encourage computational use of digitized and born digital collections. By conceiving of, packaging, and making collections available as data, cultural heritage institutions work to expand the set of possible opportunities for engaging with collections.

Some audiovisual archives are making their collections available directly to scholars, such as The American Archive of Public Broadcasting’s Dataset Research Access service, while others are partnering with academics to build a research infrastructure, like the CLARIAH Media Suite in the Netherlands. The following examples illustrate how archives and researchers are working together to find the value of the data in collections.

The GDELT project

Perhaps the largest endeavor in this space, the GDELT project, uses a suite of Google’s AI technologies to analyze a massive trove of daily online news productions assembled by the Internet Archive. By analyzing online newspapers and broadcast news from around the world in 152 languages, GDELT is able to assist researchers in asking novel questions about media and culture, representation of news across cultures, and global trends. It publishes all of the data it creates as annotations to be used freely by researchers, relying on the concept of Non-Consumptive Use. One example of the potential of this type of global news tracking happened when Bluedog Global’s machine-learning tools noticed local news outlets in Wuhan, China talking about a disease outbreak in December of 2019. By following this thread, researchers at Bluedog Global were able to make the very first global announcement about the COVID-19 outbreak on December 31, 2019, a week before the US Centers for Disease Control.

BoB FOR AI

The BoB FOR AI project from Learning on Screen, the British audiovisual resource for educators, is “… exploring the potential in our archive of over 2.4 million television and radio broadcasts as a dataset to train algorithms – the BoB archive – available for research purposes, particularly in the field of Artificial Intelligence.” BoB FOR AI is collaborating directly with researchers to explore what kinds of questions can be asked of the data that have a direct impact upon society. In one pilot project with data scientists at Nesta, they are studying racial and gender diversity in British Television. In an interview, data scientist Raphael Leung explained to me that while recent initiatives such as the BFI Diversity Standards are actively increasing diversity in film and television, they also created an evidence gap that computer vision could fill. Motivated by a belief that “inclusion fuels creativity,” the project uses computer vision to measure the actual inclusivity of the media landscape, and looks to imagine what other sorts of gaps computer vision can fill.

Analyzing Visual Culture through Distant Viewing

At the University of Richmond in the US, scholars Taylor Arnold and Lauren Tilton have created the Distant Viewing Lab, which

…uses and develops computational techniques to analyze visual culture on a large scale. It develops tools, methods, and datasets that can be re-used by other researchers. The lab engages closely with critical cultural and data studies, aiming to make explicit the interpretive act of algorithmic logic.

Arnold and Tilton’s paper Distant Viewing: Analyzing Large Visual Corpora argues for an interdisciplinary perspective on computational analysis of large amounts of visual material – which they call “Distant Viewing” – that takes into account deeper levels of meaning such as cultural and semiotic interpretation. An example of this method is the essay Visual Style in Two Network Era Sitcoms by Taylor Arnold, Lauren Tilton, and Annie Berke, in which they compare narrative structures and character centrality in the 1960s American sitcoms Bewitched and I Dream of Jeannie.

CLARIAH Media Suite

In the Netherlands, multiple archives contribute to the CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) Media Suite, where data and tools for analysis are bundled together. The Media Suite, an online resource accessible to people affiliated with Dutch academic institutions, is actively experimenting with and exploring different ways this data can be used. They have recently begun sharing some of their own work Visualising The Archive of the collection of the Netherlands Institute for Sound & Vision. They are also working with researchers and students to create Media Suite Data Stories such as 'Factual divergences in Dutch television news: comparisons between AIDS/HIV, SARS and COVID-19 news reportings' (Link). In an interview, data engineer Mari Wigham expressed her enthusiasm for working with people on creating data visualizations and data stories and the potential they have to open up the archive. She also added a note of caution that the data was best used to find interesting trends for further investigation, but not as hard proof of social or historical phenomena. She explained that since metadata comes from a variety of sources, both machine- and human-generated, sometimes small changes in how metadata has been documented over the years might lead people to misinterpret findings. “To draw accurate conclusions from archival data, it is essential both to understand how it was produced, and how it relates to the real-life question being asked.”

Conclusion

This blog post has looked at ways audiovisual archives are using AI to expand their roles in a range of fields. The next blog post will look at the kinds of analysis that AI can perform on audiovisual collections.

Context of the Research

The research for this blog series was conducted by Randi Cecchine in the context of a Master’s study in the Preservation and Presentation of the Moving Image program at the University of Amsterdam, and an internship placement at the Netherlands Institute for Sound & Vision. Randi has a background in documentary filmmaking, education, and media literacy. Sound & Vision is a leading archive in implementing AI in its workflows, and is also involved with numerous collaborative projects that touch on these topics, including the CLARIAH Media Suite; NL AI Coalition and its Culture and Media Workgroup; Cultural AI Lab, AI4media; and ReTV/Data TV, amongst others. In many of these partnerships, Sound & Vision plays a strong role in communications and information dissemination. Sound & Vision’s commitment to open culture and the sharing of knowledge makes it a highly fertile environment from which to conduct research.

Research

Research was conducted through online video interviews, attendance at professional conferences and webinars, and literature review. Fifteen online video interviews were conducted with professionals working in archives or related projects. Interviewees represented institutions in the United States, the United Kingdom, and Europe. The interviews were conducted between September and December 2020, during a period when people became quite accustomed to interacting through online video conferencing platforms due to the COVID-19 pandemic. The interviewees include:

Alessandra Luciano, Centre national de l’audiovisuel (Luxembourg)
Athina Livanos-Propst, PBS Educational
Matt Eaton, GrayMeta
Raphael Leung, Nesta
Virginia Bazan, Radiotelevisión Española J
James David Duran, Vanderbilt University
Mari Wigham, Netherlands Institute for Sound & Vision
Shawn Averkamp, AVP/AMP
Stephen McConnachie, BFI
Lauri Saarikoski, YLE
Casey David Kaufman & Karen Cariani, GBH
Marco Rendina, Istituto Luce Cinecittà
Jake Berger, BBC Archive
Kalev Leetaru, GDELT project
Jean Carrive, Institut national de l'audiovisuel (Ina)
Debbie Esmans, Matthias Priem, Miel Vander Sande, Rony Vissers, meemoo - Vlaams instituut voor het archief

Additional data was gathered through attendance at 50 online professional conference sessions and webinars including:

ACM Multimedia 2019 Nice, France
ACM Multimedia 2020
Association of Moving Image Archivists (AMIA) 2020
Creative Commons Global Summit 2020
DataTV 2020 webinar
Digital Asset Symposium (AMIA) 2020
Digital Storage Futures 2020
International Association of Sound and Audiovisual Archives (IASA) 2019, Hilversum, Netherlands
2020 IASA - FIAT/IFTA Joint Conference

The next blog post will look at the possibilities of AI analysis in audiovisual archives.

Newsletter Research

Subscribe to the newsletter Research of Sound & Vision and stay informed of all meetings and activities we do to make our collections accessible for research. The newsletter is in Dutch.