On the 30th of October 2015 a special session on Video Hyperlinking took place during the SLAM Workshop on Speech, Language and Audio in Multimedia, connected to the ACM Multimedia conference in Brisbane, Australia. It was a good opportunity to discuss in more detail the results of the MediaEval benchmark evaluation on ‘Searching and Anchoring in Video Archives (SAVA)’. It was also an opportunity to look forward to the 2015 TRECVid benchmark evaluation workshop held on 16-18 November in Gaithersburg, USA. At TRECVid 2015 the results of the evaluation and the plans for next year were discussed. Sound and Vision's Roeland Ordelman reports on both events.
Video Hyperlinking at SLAM
At SLAM, there was a session with four presentations on Video Hyperlinking. Benoît Huet (Eurecom) introduced the session with an overview on the topic. The rationale behind video hyperlinking is that it can help to improve access to archived video, a topic that was central to recently finished EU projects AXES and LinkedTV. Benoît provided examples of how video hyperlinking in practice could work, mentioning inside-in, inside-out, and outside-in linking scenarios, and presented the evaluation framework that is being used at the MediaEval and TRECVid benchmark evaluations.
Benoît Huet (Eurecom)
Petra Galuščákova (Charles University) presented the Video Hyperlinking system that she used for the 2014 MediaEval benchmark evaluation and specifically focussed on the audio & speech retrieval part of the system that uses speech recognition transcripts of the anchor videos to search for relevant link targets. She discussed approaches to deal with restricted vocabulary of the speech recognition system (data and query expansion, combination of speech transcripts of different systems), and noise (errors) in the speech transcripts. Also, she presented approaches that tries to improve hyperlinking results by taking music and acoustics into account: acoustic fingerprinting (deploying the Doreso API) acoustic similarity.
Petra Galuščákova (Charles University)
Guillaume Gravier’s (Irisa) presentation was on the use of topic models in video hyperlinking. A ‘traditional’ search paradigm to link anchor videos to target videos, runs the risk of producing target videos that are very similar to the anchor video, such as ‘near duplicates’. Guillaume argued that this is suboptimal, as video hyperlinking should be focussing on stimulating exploration of videos by different users that may have different intends. As topic modelling would allow the linking of anchor/target pairs that have only few words in common, topic modelling would be a good strategy to stimulate diversity in links (from a data perspective) and serendipity during exploration (from a user perspective). Experimental results on the MediaEval benchmark data sets were presented that uses hierarchical topic modelling that showed that this could indeed be an interesting direction.
Guillaume Gravier’s (Irisa)
Maria Eskevich (Radboud University Nijmegen), one of the early organisers of the Video Hyperlinking benchmark evaluations, was unfortunately not able to come to Brisbane, so her presentation was given by Benoît Huet. Maria ‘presented’ the Video Hyperlinking system used in the 2014 MediaEval benchmark that incorporates also visual analysis. It uses scene segmentation based on visual and temporal coherence of the video segments and visual analysis of the video (151 visual concepts).
One important conclusion was that incorporating visual features in the video hyperlinking framework is not straightforward. Currently, using speech transcripts worked best for the 2014 evaluation. To improve on this, one possible next step could be to use the anchor semantics (e.g., named-entity recognition) to propose visual concepts that are most important given a specific anchor.
After running a video hyperlinking benchmark evaluation for a number of years at MediaEval, there's now an evaluation running on video hyperlinking at TRECVid as well. Sound and Vision is involved in the video hyperlinking task by advising on the use scenario of the video hyperlinking concept: aiming to improve access opportunities to large video repositories.
Benchmarking the concept of video hyperlinking already started in 2009 with the Linking Task in VideoCLEF that involved linking video to Wikipedia material on the same subject in a different language. In 2012, we started a ‘brave new task’ in MediaEval where we explored approaches to benchmark the concept of linking videos to other videos using internet video from blip.tv. In 2013-2014, ‘search and hyperlinking’ ran as a regular MediaEval task, this time with a collection of about 2500 hours of broadcast video from BBC instead of internet video.
The Video Hyperlinking task
Thanks to MediaEval we could improve our understanding of the concept of Video Hyperlinking and fine-tune its evaluation which is relatively complex. The task in the evaluation is to provide relevant target video segments – segments that users want to link to - on the basis of manually generated example anchor video segments – segments that users want to link from. To ensure that the anchors are representative and reflect anchors in real-life scenarios, we asked ‘end-users’ to select anchors manually. These anchors are then provided to participants of the evaluation that return for each anchor a list of relevant target video segments. The relevance of each target video is then assessed using a crowdsourcing approach (Amazon Mechanical Turk). Note that our definition of relevance here is that the content in the target video should be about what is represented in the anchor video, and not what is visually similar.
The video hyperlinking task at TRECVid had 10 participants this year, submitting in total 40 runs. To measure the performance of systems we used an adapted version of Mean Average Precision (MAP). MAP provides an indication of the quality of a system by counting the number of relevant documents in the top-N list of documents a system retrieves for each query, averaged over all queries. A returned target video segment is regarded as relevant when it overlaps with the ground truth segment as defined by human assessors. For the VH evaluation also an adapted measure (MAiSP) is used that takes into account the amount of overlap in seconds to avoid positive overestimation in cases that many short segments are returned that all overlap with a larger relevant segment. The best performing systems reach a MAiSP-score of just above 0.25 which is not very high but given the difficulty of the task a reasonable starting point for further exploration and improvement.
During the workshop we also discussed a number of topics that need to be addressed in order to reach a better understanding of both the task and its underlying theoretical framework and technical challenges. Among others, one topic that needs to be defined better is the notion of relevance, especially with respect to similarity and ‘aboutness’. The current definition – a link target should be about what is represented in an anchor – is not sufficiently clear and gives rise to questions about the exact goal of the task. Also, as emphasised specifically by the participating IRISA-team, the evaluation should be able to take the diversity of relevant targets into account.
This topic is connected with the discussion on the use of additional measures, for example more precision-oriented, and the introduction of subtasks in which information on the expected target video is included with the anchor video. Finally, we discussed how the video hyperlinking task can be defined in terms of more generic video retrieval type of problems as this can help to make the task more interesting for peripheral research fields in the video retrieval domain. For example, video hyperlinking could be regarded as a video retrieval task using multimodal documents –a video segment, a text document, a mixture of video and text, etc.– as input that need to be processed to create a query formulation that can be used in a search system to retrieve related video documents.
The discussion on these topics will continue, especially the upcoming weeks when the task organizers will be analysing in more detail the results of this year’s evaluation. Of course we are very interested in your feedback, comments and suggestions in order to come up with an improved evaluation set-up for next year.
- The introduction slides of the Video Hyperlinking task
- Stay up to date on Video Hyperlinking at videohyperlinking.com
This blog is made up of two blogs previously posted on videohyperlinking.com. Photos of SLAM by the author.