Bringing AI into the Audiovisual Archive

This blog post attempts to clarify some of the key considerations that audiovisual archives are encountering when embarking on AI projects. It explores aspects related to the institutional context, including how different types of archives may make different decisions, which departments within organizations are involved, and who does the technical work of designing and maintaining systems. It then looks at the development and integration of technologies into systems, including decisions about using proprietary or open-source technology, examples of testing and evaluation, and questions about how the data created by machines is stored. It then examines three use cases: Vanderbilt University’s TV News Archive, AMPPD (Audiovisual Metadata Platform Pilot Development), and the Netherlands Institute for Sound & Vision.

Blog

14 June 2021

Randi Cecchine

Student Preservation and Presentation of the Moving Image

Themes

Metadata

Photo: The Netherlands Institute for Sound & Vision

The institutional context

Many audiovisual archives are not working with AI

While there are many examples of archives using AI, the research conducted for this series also revealed that many audiovisual archives are at the very beginning of the process of formulating an approach to AI. In interviews with professionals who have not begun using AI, the primary barrier was expressed as lack of access to sufficient funding and the challenges towards a clear plan for operationalizing. But professionals who have not yet implemented AI also expressed a clear and enthusiastic understanding of how automated metadata will help their mission, as well as a remarkably mature understanding of the topic. It is important to note that since these technologies are so new, there is no standard starting point, and no established protocols, guides, or standards regarding choice of tools or implementation in existing systems. While the (US) Library of Congress/LC Labs document Machine Learning + Libraries: A Report on the State of the Field offers a comprehensive overview of this topic in the realm of libraries, a similar resource does not yet exist that is tailored specifically for the needs of audiovisual archival practitioners.

Photo: Digital collections stored in an LTO tape robot at the Netherlands Institute for Sound & Vision

Types of institutions

Audiovisual archives span a wide variety of types and sizes, with different funding mechanisms, institutional affiliations, organizational structures, cultural values, and missions. They may be independent, or part of larger institutions such as broadcasters, universities, or national archives. Archives may have very different motivations for choosing particular technologies. For example, a small archive that generates most of its income from footage licensing may have a strong economic incentive to create extremely granular visual description to help in finding each shot in its collection, whereas an archive in a broadcast institution may have more incentive to transcribe its footage for in-house journalists. Some archives have a strong commitment to long-term, trustworthy storage, and may be more conservative about implementing experimental systems, while others are not the long-term custodians of their collections, and may have a mission more related to education or access that gives them more freedom to explore. The unique qualities and values of each institution guide its approach to implementation of AI in its systems, and there is no single approach suitable for every institution.

Which departments in an organization are involved in developing an approach to the use of AI technologies?

The distinctions above are important to keep in mind when looking at how different archives approach the development and use of AI on an organizational level. The following examples illustrate how AI is being used across different institutional settings.

At the BBC, AI projects are spread across multiple parts of the organization and its partners. Many large AI projects have emerged from the Research and Development and Newslabs departments, including StoryKit: An Object-Based Media Toolkit, and The News Slicer, which segments and tags programs to help journalists find content for re-use. The BBC archive editorial team itself has focused primarily on training Kaldi tools to transcribe their massive archive of pre-1980s content; most programs made since the 1980s have human-generated caption files that can be attached to the files. Under the ERA (Educational Recording Agency) License, the archive also makes available content to partners such as Learning on Screen, a streaming resource for education. At Learning on Screen, BoB FOR AI is aimed at creating a dataset for scholarly research and AI projects. While all of these projects are archival in nature, they do not arise from the archive itself, and raise questions about the changing role of the archive in a digital environment where multiple parties can work with archival collections.

At an organization much smaller than the BBC but similar in mission, Lauri Saarikoski from YLE, the Finnish Broadcasting Company, explained in an interview that AI technologies are currently used in:

“… plenty of departments and different areas of independent work – the news production may have some set of tools, for some line of work, archives may have something else. It is hard to get a comprehensive overall picture.”

Lauri explained that the archive is running proof-of-concept AI projects, and trying to understand how to integrate the metadata that might be generated into different stages of production. He explained how partnering on the MeMAD project (supported by the European Union’s Horizon 2020 research and innovation programme) has helped staff at YLE try out different technologies, discuss them in their departments, and explore their potential.

Sally Hubbard, Director of Media Management at PBS (Public Broadcasting Service, United States), presented at the Association of Moving Image Archivists’ Digital Asset Symposium in 2018 and 2020 about “Smart Stacking” – the bringing together of information science and data science to support metadata management. At DAS 2020 she explained that a cross-departmental internal working group known as “Switzerland” (a term often used in the United States to connote a neutral territory) is working on a pilot project to explore the potential of combining semantic and machine-learning technologies. The pilot is analyzing the children’s show Sesame Street and blends the computer vision and automatic speech recognition tools of the company GrayMeta with the semantic system of the company Pool Party.

In reflecting on the question of which departments formulate the approach to AI, Jean Carrive from INA, The French National Audiovisual Institute, mentioned in an interview that:

The experiments come either from research or from the IT department, and I think the good idea would be to integrate scientists inside [the] archive department and work from inside the archives. Because the real problems are professional problems – the idea is that the IT people have to be aware of the real problems, the real things to do; if they don’t introduce themselves in the archives they won’t be aware of what to do.

These examples illustrate that AI projects are being driven by people from multiple disciplines and departments within organizations, and that their unique perspectives drive different approaches.

Who does the technical work of designing and operating systems?

In order to use technologies to analyze audiovisual materials, skilled personnel and technical infrastructure are needed. Knowledgeable annotators create labels for training data sets necessary to train algorithms. Computer and data scientists write programs in programming languages such as Python to implement algorithms, and engineers develop systems to connect the algorithms into workflows and connect the workflows with the stored digital audiovisual assets. Such pipelines take powerful computational resources and become increasingly expensive when running these processes at scale. Systems also have to be monitored for performance levels. This infrastructure may be designed and operated by external vendors offering commercial solutions, by the archival institution itself – possibly with academic partners – or through a combination of approaches.

Commercial solutions

Various commercial solutions are emerging that blend the services of multiple companies. In a recent sales webinar called Finding Gold In Your Media Archives: Leveraging AI for Content Monetization, representatives from Dalet, Amazon Web Services (AWS), and Quantiphi explained how their products work together to analyze moving-image collections. Archives may already use Dalet as a media asset management system, and can choose to add on AI services with their existing infrastructure. AWS’s partnerships ensure that customers are using their cloud services and tools. Staff members I spoke with at various archives told me they were hesitant to use Amazon’s tools because they were unable to access clear terms of service that would ensure that the algorithms trained on their data would remain their intellectual property. These terms of service show that AWS will now allow companies using custom solutions to “opt out of having your content stored or used for service improvements.” Archives are becoming aware that it is important to protect their rights to trained algorithms, and commercially-driven companies are finding ways to respond.

Non-commercial solutions

Archives are also developing systems with the help of academic partners. While some archivists have complaints that academics are too driven by the imperative to publish, and are therefore focused on solutions that are not well tailored to the real needs of the archive, there are other examples where these multidisciplinary collaborations work very well, opening up new potential for discovery.

One such example is the Computational Linguistics Application for Multimedia Services (CLAMS) project, a partnership between the Brandeis [University] Lab of Language and Computation (LLC) and the American Archive of Public Broadcasting (AAPB) (itself a collaboration between the Library of Congress and American media group GBH, formerly public broadcaster WGBH). According to the website:

The CLAMS project aims at providing archivists and media researchers with an open platform to access and explore archival audiovisual material to extract insightful metadata as well as computer scientists and developers of content analysis tools with an interoperable platform to integrate their tools for custom workflows and pipelines.

The CLAMS project, funded by the Andrew W. Mellon Foundation, is developing tools that are uniquely geared towards the need of a public television archive, and developing systems to bring these tools together with other open-source tools. The team works with image and sound, but their main strength is in understanding how to create interoperability between tools. One example, as noted in the previous blog post, is an OCR tool that reads onscreen slates, identifies information such as “director,” and sends that information to the correct database field in the PBCore metadata cataloging standard. More technical details about the interoperability framework can be found in the article Interchange Formats for Visualization: LIF and MMIF.

In cases where technologies and systems are developed with non-commercial partners, we can see that these multidisciplinary collaborations infuse each project with the unique strengths of their partners, such as the CLAMS project’s expertise in Natural Language Programming, or the SEMIA project’s focus on syntactic features such as color or shape. Archives are already creating and sharing open-source technologies, such as the INA Speech Segmenter, and are now moving into developing openly available systems and processing environments. This is a very important area for further study and attention.

Development and Implementation

The following section explores some of the decisions and activities that are part of the process of developing and implementing AI technologies. It includes a look at the choice of proprietary vs. open-source technologies, testing and evaluation, and questions about how the data is stored within systems and connected to standards frameworks.

Proprietary vs. open-source technologies

Along with deciding what kind of partner to work with comes the decision about what types of tools to use. As discussed in the previous blog post, algorithms are created by commercial providers who charge for their use, or by parties who make their tools or toolkits available for free use through open-source licenses. Archives may have a preference to work only with open-source tools, may be open to commercial tools, or may blend the two. Archives make decisions based on factors such as cost, accuracy, and convictions – and sometimes because their institution already has a contract with a commercial provider.

Testing and evaluation

Archives are taking a variety of approaches to testing and evaluation, which are important factors in choosing technologies, training them, and proving their value to the organization.

Shawn Averkamp, a Senior Consultant with the data management solutions company AVP, who is working on the AMPPD (Audiovisual Metadata Platform Pilot Development) project, explained in an interview the importance and challenge of archives doing their own testing:

I don’t think you should take someone else’s accuracy score. We do a combination of qualitative and quantitative testing – it’s important for people to do their own accuracy testing before they try to run tools on a large amount of material.

That’s something I would love to see – more tools in the system for people to do that testing. I see that as something that’s really needed. It’s a lot of work – mark up ground truth, and, I know Python, I had to write a lot of the scripts for doing this analysis, but not everyone has a person on staff who can write scripts. If there were more tools out there that would take some of the coding requirement out of the equation, the testing would be more accessible to more people just like the platform itself.

Some archives have taken a creative approach to testing, such as the Spanish broadcaster RTVE’s Iberspeech Challenge, where participants were involved in testing a selection of technologies for Spanish language Automatic Speech Recognition. The challenge is explained in Virginia Bazán-Gil’s article FIAT/IFTA #Archival Reads Artificial Intelligence: an object of desire.

Aside from testing the technologies, archives may find that there’s important information to be learned by testing user experience. For example, the team from MeMAD (Methods for Managing Audiovisual Data) investigated the user experience with the language tools in the MeMAD prototype in MEMAD PROJECT: END USER FEEDBACK ON AI IN THE MEDIA PRODUCTION WORKFLOW (Link).

In Evaluating unsupervised thesaurus-based labeling of audiovisual content in an archive production environment ( Link) staff from the Netherlands Institute for Sound & Vision explain how they investigated multiple processes related to the use of automatic term suggestions. The authors explain the importance of testing and evaluation in the choice of technology solutions:

A key requirement with respect to this type of innovation is that the archive remains in control of the quality of the automatically generated labels. Not only because of principles of archival reliability and integrity, but also from a service-level point of view.

Testing and evaluation projects allow archives not only to evaluate solutions and processes, but also to articulate their own values and priorities. These projects can also serve as important factors for internal communication and knowledge sharing, helping staff learn about how AI is developing and opening space for dialog and collaboration.

How is the new data created by AI technologies stored and archived?

When AI tools are used to analyze audiovisual materials, a lot of new data is produced. Archives that work with commercial providers may have their metadata delivered to them in an organized way, but they may not have access to all of the data that was generated along the way. Archives that manage their systems in-house develop their own approaches to storing this information. Archivists are also beginning to understand that many of the pieces of the AI workflow, including algorithms and data sets, may turn out to be of interest to researchers in the future.

The European Broadcasters Union’s Metadata and Artificial Intelligence umbrella group includes sub-groups on topics such as AI and Automatic Metadata Extraction, Media Cloud Microservices Architecture, Metadata models, and AI Benchmarking, which work with broadcasters to create models, share best practices, and collaborate on specific topics of interest such as fake news detection. Their website explains the importance of the data coming from AI systems:

The volume of data is growing but also its value. It is crucial to be able to efficiently manage, extract, index and retrieve information through the value chain, using well established conceptual and process models.

Archives of all types can learn from the developments made by broadcasters in this realm, as they establish the relationship to the data coming out of AI tools as part of their archival practice.

Developing fully operational systems: three cases

The following examples show promising advances towards fully operational systems, and illustrate how different types of archives are finding solutions that work for their unique needs.

Photo: Off Air Room Circa 1976, courtesy Vanderbilt University Libraries

The Vanderbilt University Television News Archive

The Vanderbilt University Television News Archive holds a vast collection of major US television news broadcasts, and was started in 1968. The team described their approach in the Association of Moving Image (AMIA) 2021 conference panel entitled “Cloud Computing and Storage Workflows for Digital Media.” The Archive has the in-house technical skills to develop a workflow that is targeted to their content and the needs of researchers, and that blends automatic and human metadata creation. They are using the automatic speech recognition service Trint to create transcripts, and they hire student workers to correct the transcripts in Trint’s interactive platform. During this process students also note the start and stop time of segments, and write relevant on-screen information such as names into the transcript. Each 30-minute program takes one hour of student worker time to correct. When the transcript is complete, it is exported to Amazon’s Named Entity Recognition service on the cloud, where information such as person, location, or topic is pulled out. That information is then brought back into the workflow with scripts written in Python that turn the information into individual titles for each news segment. Additional information such as reporter or location are added to database fields, and duplicate records are kept in PBCore for future use when more of the information from the Named Entity Recognition system might be stored in the database.

Image: AMP system architecture

The News Archive’s position inside a university library gives it a strong technical and intellectual infrastructure. The archive has collaborated on projects with researchers, such as a study of sentiment analysis in news stories about climate change, and is working with the computer science department to develop technologies to segment news stories.

AMPPD (Audiovisual Metadata Platform Pilot Development)

AMPPD (Audiovisual Metadata Platform Pilot Development) is an initiative creating a platform to apply AI technologies to audiovisual content. Their website describes the partners and the project:

The Indiana University Libraries, in collaboration with the University of Texas at Austin, New York Public Library, and digital consultant AVP, were awarded a grant from the Andrew W. Mellon Foundation in 2018 to support initial development, implementation, and pilot testing of an Audiovisual Metadata Platform (AMP) that will enable more efficient generation of metadata to support discovery and use of digitized and born-digital audio and moving image collections.

AMP’s documentation is, in itself, a very important contribution to the field, as it describes in detail all parts of the system, including the Data Model and list of technologies, which they refer to as Metadata Generation Mechanisms. It includes a detailed explanation of the Evaluation Criteria that they use to choose the MGMs. One unique feature of the system is that for each type of MGM, such as Speech to Text, they recommend one open-source option (such as Kaldi) as well as one proprietary option (such as AWS Transcribe). The documentation explains the features of most types of MGM, including if training data is unknown/black box or if users can train it, and what that training would entail.

The Evaluation Criteria are well explained and include categories such as Accuracy, Output Formats, Processing Time, and Computing Resources. A unique and important category is Social Impact, which is described as follows:

The potential unintended consequences of an unmediated MGM’s output. How could the MGM express hidden biases? What are the possible unintended negative impacts that could come from the output of this MGM? What measures can be taken to mitigate them? See FAT/ML’s Principles for Accountable Algorithms for more information: http://www.fatml.org/resources/principles-for-accountable-algorithms

A note about bias in AI:

Awareness is growing around questions of bias and ethics in AI, and archives are exploring the various ways that technologies may introduce bias into the archive, or may amplify bias existing in the collections. There is a growing body of literature looking at racial and gender bias in AI, and The (US) Library of Congress/LC Labs document Machine Learning + Libraries: A Report on the State of the Field offers a thoughtful section on Managing Bias, encouraging institutions to approach these difficult questions in the following manner:

Rather than trying to consider every ML project in terms of fairness, a library ML project might instead serve as a diagnostic to the problems inherent in existing digital collections, a rebuttal to ML work falsely claiming objectivity, or a synecdoche that helps patrons better understand the historical stakes of library collections and archives.

Understanding the socially-constructed nature of AI technologies – and their ability to either codify bias, or recognize and interrupt it – is essential for the people building AI systems. This awareness can help push technology development towards more equitable and reflective methods, which is particularly important since audiovisual archives are in a unique position to spearhead projects that help educate the public about this socially-constructed nature.

The Netherlands Institute for Sound & Vision and the CLARIAH Media Suite

The Netherlands Institute for Sound & Vision (NISV) serves the public through various functions, as a museum, knowledge center, educational partner, and archive of Dutch media history.

The archive is committed to being a Trustworthy Digital Repository (TDR) with a Data Seal of Approval and operates as a central production archive for Dutch broadcasters. This collection is made available through a digital asset management system called DAAN, utilized extensively by media professionals for re-use of materials.

NISV is also a partner in CLARIAH, a research environment funded by NWO, the Netherlands Organization for Scientific Research. The Media Suite, developed in CLARIAH, brings distributed audiovisual collections to humanities researchers. Collections include the archives of The Netherlands Institute for Sound & Vision, along with collections from other institutions such as the Eye Filmmuseum, KB National Library of the Netherlands, and the Meertens Institute. This suite is available to all students, educators, and researchers in the Netherlands, at no cost.

DAAN and CLARIAH have different approaches to using AI technologies. To develop the tools for named-entity extraction, speaker labeling, and facial recognition used in DAAN, NISV collaborated with academics and technology companies. In this catalog, where broadcasters search for and access content for re-use, the NISV recognizes that producers need a stable, reliable system, so it prioritizes technologies that will produce low error rates and yield consistent and reliable results, over experimental technologies that might not be as exact.

In the CLARIAH Media Suite, more focus is placed on experimentation and innovation in the creation of audiovisual processing tools to analyze features such as color or pose; exploring user interaction with the development of “data stories;” and building a processing environment. In the article Automatic Annotations and Enrichments for Audiovisual Archives, members of the CLARIAH team describe this new environment:

As part of the CLARIAH infrastructure we have developed a processing environment that is optimised for deploying AVP tools efficiently on (high performance) computer clusters in a transparent and reproducible way. This environment, called DANE (Distributed Annotation ‘n’ Enrichment), provides the framework needed for processing AV data and is designed to allow researchers maximum flexibility when deploying AVP algorithms. In addition, it keeps track of and provides researchers with clear insight into the operations that are performed on the data, ensuring that research can perform data and tool criticism at all stages of the research process.

Moreover, DANE is designed to be able to function in environments where there are fewer computer resources, or environments where access to the data is rate limited, as is typical for AV archives.

In maintaining these two separate systems using different AI solutions, Sound & Vision is able to address the needs of both the broadcasters – who are primarily interested in granular search and retrieval of footage for re-use – and the needs of researchers, who may be more interested in exploring new approaches to collections as data. It also allows Sound & Vision to learn from these different approaches as it plans for the future.

Conclusion

This blog post aimed to give an overview about the variety of approaches to using AI tools in audiovisual archives. It looked at how AI initiatives fit into institutional structures; examined questions about development and implementation; and gave examples of three functioning systems. It also explored some of the topics that audiovisual archives are considering at various stages of AI implementation, and offered examples to serve as reference and inspiration for archives exploring their own plans for AI.

Through the process of researching and writing this series of blogs, I witnessed a field in rapid motion. As technology is being developed that requires new ways of thinking and working, the field is responding with more knowledge-sharing and collaboration. Through conferences and webinars, audiovisual archival professional organizations are playing an important role in highlighting developments and encouraging exchange, while funders are recognizing the necessity of supporting multidisciplinary collaborations. Audiovisual archives are in a unique position to contribute to creative and meaningful advances in AI, and this is an exciting moment for a field in transition to a new and promising collaborative future.

Context of the Research

The research for this blog series was conducted by Randi Cecchine in the context of a Master’s study in the Preservation and Presentation of the Moving Image program at the University of Amsterdam, and an internship placement at the Netherlands Institute for Sound & Vision. Randi has a background in documentary filmmaking, education, and media literacy. Sound & Vision is a leading archive in implementing AI in its workflows, and is also involved with numerous collaborative projects that touch on these topics, including the CLARIAH Media Suite; NL AI Coalition and its Culture and Media Workgroup; Cultural AI Lab, AI4media; and ReTV/Data TV, amongst others. In many of these partnerships, Sound & Vision plays a strong role in communications and information dissemination. Sound & Vision’s commitment to open culture and the sharing of knowledge makes it a highly fertile environment from which to conduct research. The next blog post will explore how audiovisual archives are implementing AI technologies.

Research

Research was conducted through online video interviews, attendance at professional conferences and webinars, and literature review. Fifteen online video interviews were conducted with professionals working in archives or related projects. Interviewees represented institutions in the United States, the United Kingdom, and Europe. The interviews were conducted between September and December 2020, during a period when people became quite accustomed to interacting through online video conferencing platforms due to the COVID-19 pandemic. The interviewees include:

Alessandra Luciano, Centre national de l’audiovisuel (Luxembourg)
Athina Livanos-Propst, PBS Educational
Matt Eaton, GrayMeta
Raphael Leung, Nesta
Virginia Bazan, Radiotelevisión Española J
James David Duran, Vanderbilt University
Mari Wigham, Netherlands Institute for Sound & Vision
Shawn Averkamp, AVP/AMP
Stephen McConnachie, BFI
Lauri Saarikoski, YLE
Casey David Kaufman & Karen Cariani, GBH
Marco Rendina, Istituto Luce Cinecittà
Jake Berger, BBC Archive
Kalev Leetaru, GDELT project
Jean Carrive, Institut national de l'audiovisuel (Ina)
Debbie Esmans, Matthias Priem, Miel Vander Sande, Rony Vissers, meemoo - Vlaams instituut voor het archief

Additional data was gathered through attendance at 50 online professional conference sessions and webinars including:

ACM Multimedia 2019 Nice, France
ACM Multimedia 2020
Association of Moving Image Archivists (AMIA) 2020
Creative Commons Global Summit 2020
DataTV 2020 webinar
Digital Asset Symposium (AMIA) 2020
Digital Storage Futures 2020
International Association of Sound and Audiovisual Archives (IASA) 2019, Hilversum, Netherlands
2020 IASA - FIAT/IFTA Joint Conference