The SLAM Workshop on Speech, Language and Audio in Multimedia, connected to the ACM Multimedia conference and located in Brisbane, Australia this year, had a special session on Video Hyperlinking on Friday the 30th of November 2015. It was a good opportunity to discuss in more detail the results of the MediaEval benchmark evaluation on ‘Searching and Anchoring in Video Archives (SAVA)’ and to look forward to the upcoming 2015 TRECVid benchmark evaluation workshop. Video Hyperlinking became one of the TRECVid tasks this year. More on Video Hyperlinking and TRECVid after the workshop 16-18 November.
At SLAM, there was a session with four presentations on Video Hyperlinking. Benoît Huet (Eurecom) introduced the session with an overview on the topic. The rationale behind video hyperlinking is that it can help to improve access to archived video, a topic that was central to recently finished EU projects AXES and LinkedTV. Benoît provided examples of how video hyperlinking in practice could work, mentioning inside-in, inside-out, and outside-in linking scenarios, and presented the evaluation framework that is being used at the MediaEval and TRECVid benchmark evaluations. His presentation can be viewed here.
Petra Galuščákova (Charles University) presented the Video Hyperlinking system that she used for the 2014 MediaEval benchmark evaluation and specifically focussed on the audio & speech retrieval part of the system that uses speech recognition transcripts of the anchor videos to search for relevant link targets. She discussed approaches to deal with restricted vocabulary of the speech recognition system (data and query expansion, combination of speech transcripts of different systems), and noise (errors) in the speech transcripts. Also, she presented approaches that tries to improve hyperlinking results by taking music and acoustics into account: acoustic fingerprinting (deploying the Doreso API) and acoustic similarity. Petra’s presentation can be viewed here.
Guillaume Gravier’s (Irisa) presentation was on the use of topic models in video hyperlinking. A ‘traditional’ search paradigm to link anchor videos to target videos, runs the risk of producing target videos that are very similar to the anchor video, such as ‘near duplicates’. Guillaume argued that this is suboptimal, as video hyperlinking should be focussing on stimulating exploration of videos by different users that may have different intends. As topic modelling would allow the linking of anchor/target pairs that have only few words in common, topic modelling would be a good strategy to stimulate diversity in links (from a data perspective) and serendipity during exploration (from a user perspective). Experimental results on the MediaEval benchmark data sets were presented that uses hierarchical topic modelling that showed that this could indeed be an interesting direction. Guillaume’s presentation can be found here: Slides Guillaume (pdf)
Maria Eskevich (Radboud University Nijmegen), one of the early organisers of the Video Hyperlinking benchmark evaluations, was unfortunately not able to come to Brisbane so her presentation was given by Benoît Huet. Maria’s ‘presented’ the Video Hyperlinking system used in the 2014 MediaEval benchmark that incorporates also visual analysis. It uses scene segmentation based on visual and temporal coherence of the video segments and visual analysis of the video (151 visual concepts). One important conclusion was that incorporating visual features in the video hyperlinking framework is not straightforward. Currently, using speech transcripts worked best for the 2014 evaluation. To improve on this, one possible next step could be to use the anchor semantics (e.g., named-entity recognition) to propose visual concepts that are most important given a specific anchor. Maria’s presentation is here.