After running a video hyperlinking benchmark evaluation for a number of years at MediaEval, we are excited to have now an evaluation running on video hyperlinking at TRECVid as well. On the 17th of November 2015, we discussed the results of the evaluation and the plans for next year at the TRECVid workshop in Gaithersburg, US.
Benchmarking the concept of video hyperlinking already started in 2009 with the Linking Task in VideoCLEF that involved linking video to wikipedia material on the same subject in a different language. In 2012, we started a ‘brave new task’ in MediaEval, where we explored approaches to benchmark the concept of linking videos to other videos using internet video from blip.tv. In 2013-2014, ‘search and hyperlinking’ ran as a regular MediaEval task, this time with a collection of about 2500 hours of broadcast video from BBC instead of internet video.
Thanks to MediaEval we could improve our understanding of the concept of Video Hyperlinking and fine-tune its evaluation which is relatively complex. The task in the evaluation is to provide relevant target video segments –segments that users want to link to— on the basis of manually generated example anchor video segments –segments that users want to link from. To ensure that the anchors are representative and reflect anchors in real-life scenarios, we asked ‘end-users’ to select anchors manually. These anchors are then provided to participants of the evaluation that return for each anchor a list of relevant target video segments. The relevance of each target video is then assessed using a crowdsourcing approach (Amazon Mechanical Turk). Note that our definition of relevance here is that the content in the target video should be about what is represented in the anchor video, and not what is visually similar.
The video hyperlinking task at TRECVid had 10 participants this year, submitting in total 40 runs. To measure the performance of systems we used an adapted version of Mean Average Precision (MAP). MAP provides an indication of the quality of a system by counting the number of relevant documents in the top-N list of documents a system retrieves for each query, averaged over all queries. A returned target video segment is regarded as relevant when it overlaps with the ground truth segment as defined by human assessors. For the VH evaluation also an adapted measure (MAiSP) is used that takes into account the amount of overlap in seconds to avoid positive overestimation in cases that many short segments are returned that all overlap with a larger relevant segment. The best performing systems reach a MAiSP-score of just above 0.25 which is not very high but given the difficulty of the task a reasonable starting point for further exploration and improvement.
During the workshop we also discussed a number of topics that need to be addressed in order to reach a better understanding of both the task and its underlying theoretical framework and technical challenges. Among others, one topic that needs to be defined better is the notion of relevance, especially with respect to similarity and ‘aboutness’. The current definition –a link target should be about what is represented in an anchor– is not sufficiently clear and gives rise to questions about the exact goal of the task. Also, as emphasised specifically by the participating IRISA-team, the evaluation should be able to take the diversity of relevant targets into account. This topic is connected with the discussion on the use of additional measures, for example more precision-oriented, and the introduction of subtasks in which information on the expected target video is included with the anchor video. Finally, we discussed how the video hyperlinking task can be defined in terms of more generic video retrieval type of problems as this can help to make the task more interesting for peripheral research fields in the video retrieval domain. For example, video hyperlinking could be regarded as a video retrieval task using multimodal documents –a video segment, a text document, a mixture of video and text, etc.– as input that need to be processed to create a query formulation that can be used in a search system to retrieve related video documents.
The discussion on these topics will continue, especially the upcoming weeks when the task organizers will be analysing in more detail the results of this year’s evaluation. Of course we are very interested in your feedback, comments and suggestions in order to come up with an improved evaluation set-up for next year. In the mean time, stay tuned at videohyperlinking.com!
The introduction slides of the Video Hyperlinking task can be found here: TV15_VH_final