The use of hierarchical topic models to find anchor-target pairs could potentially improve diversity in video hyperlinking, and the evaluation of video hyperlinking should focus more on assessing serendipity in the links. These are two important findings of the work of Anca-Roxanna Simon who defended successfully her PhD thesis on “Semantic Structuring of Video Collections from Speech: Segmentation and Hyperlinking”, Wednesday 2nd of December at the University of Rennes, France.
‘Diversity’ is a concept that is being mentioned frequently in discussions regarding recommendation and linking. The point of departure in this discussion is that video hyperlinking systems should typically provide the end-user with means to browse and explore large video repositories, stimulating users to stumble upon interesting video content while moving serendipitously from one video to another. Finding a lot of link target segments that are similar to the anchor video segment may not what we are aiming at. Instead of similarity, a system should aim at diversity, providing link targets that are about the anchor and provide different perspectives on it.
In her thesis, Anca advocates the use of hierarchical topic models to better address diversity in video hyperlinking. With topic modelling videos are structured into topics using speech recognition transcripts of the video, at both a general and more specific topic level. In the video hyperlinking framework, first video targets are retrieved using a standard retrieval approach. In a second step, topic segmentation is performed on these potential video targets. The topic segments that best match the anchor video segment are returned.
The topic modelling approach was evaluated in the course of the MediaEval video hyperlinking benchmark evaluations. The evaluations showed that the approach is at the least a valid one, performing on comparable levels as other systems. But Anca notes that the way the evaluations are set-up neglects the full potential of the approach as it does not favour diversity in the results. On the basis of an in-depth result analysis and a user survey, she points at the disadvantage of the small amount of Mechanical Turk assessors per anchor-target pair, the binary judgement of relevance of targets by the MT workers, and the fact that assessors are provided with a description of the expected target, generated during the anchor creation process by somebody else (note that the creation of anchors, a manual process, was done in a separate session as part of the data selection process).
Indeed, the current video hyperlinking evaluation set-up has shortcomings with respect to diversity and some of the issues mentioned by Anca have been discussed during the SLAM and TRECVid workshops. The task organisers are very grateful with Anca’s analysis and suggestions and are currently looking into possibilities to address these shortcomings given for example the available funding resources for crowdsourcing and given the restrictions of the framework as a whole.
Anca’s thesis ‘Semantic Structuring of Video Collections from Speech: Segmentation and Hyperlinking’ can be downloaded here.