You are here:

Speech recognition helps research into moving image collections

The British Library recently closed a challenging project on the novel applications that technologies related to speech recognition have to offer. By learning computer systems to parse text that is being spoken in audio and video recordings, the retrievability (how easy you can find a piece of archive material) increases heavily. Speech recognition thus improves the knowledge you have about that recording - before you've even begun describing the content.

The aim of The British Library's Opening up Speech Archives project, which was funded by the Arts & Humanities Research Council, was not to identify the best systems out there, but instead to look at "how speech-to-text will affect the research experience, and trying to learn from researchers how they can work with audiovisual media opened up through such tools - and how their needs can be fed back to developers".

Algorithms make mistakes that the human brain wouldn't: automating recognition can possibly generate a high volume of unrelated results that make searching for relevant information more complicated. The R&D department at Sound & Vision has been studying and experimenting on this topic in a number of different projects and was invited to the closing conference of the Opening up Speech Archives project on February 8th to discuss the lessons we learned and the experience we've built up over the years.

The project was lead by the British Library's curator of the moving image collections, Luke McKernan, who hosted the day. He spoke on the needs of academic researchers using speech-to-text systems.  You can read his blog post on which this was based, or download a copy of his talk, which tried to look at the bigger picture, arguing that speech-to-text technologies will bring about a huge change in how we discover things. Other R&D partners that presented during the day were our colleagues from the BBC's R&D department Theo Jones, Chris Lowis and Pete Warren. They presented on the BBC World Service Prototype, which uses a mixture of catalogue, machine indexing of speech files (using the open source CMU Sphinx) and crowdsourcing to categorise the digitised World Service radio archive. If you are keen to test this out, you can sign up for the prototype at http://worldservice.prototyping.bbc.co.uk.

Luís Carrasqueiro heads the British Universities Film & Video Council, an organisation we're lucky to work with in the framework of the EUscreen and EUscreenXL projects. Luís presented on Widening the use of audiovisual materials in education. He mentioned the BUFVC's forthcoming citation guidelines for AV media, which could play a major part in helping making sound and video integral to the research process. As Luke Mc Kernan summarises:

He gave us the key message for the day, which was "Imperfection is OK - live with it". Wise words - too many considering speech-to-text systems dream of something that will create the word-perfect transcription. They can't, and it doesn't matter. It's more than enough to be able to search across the words that they can find, and to extract from these keywords, location terms, names, dates and more, which can then be linked together with other digital objects.

For Sound & Vision, Johan Oomen, Erwin Verbruggen and Roeland Ordelman spoke about the work being funded in the Netherlands to open up audiovisual archives. You can find our presentation on SlideShare or below. Overall, it was an interesting day with varied takes on the research issue at hand and we're looking forward to learning more about the topic and creating much needed use cases to apply the technnology.

Colofon & more info