Today, on the 5th of January 2017, we discuss various aspects of the Video Hyperlinking benchmark evaluation task definition and results assessments, at the poster session of the 23rd International Conference on MultiMedia Modeling, in Reykjavík, Iceland.
As the researchers in the area of Video Hyperlinking, we are interested in advancing the technology and expanding the general understanding of the video-to-video search process in terms of task set up and assessment of technology output by potential end-users. We turn to crowdsourcing, as this enables us to gather information from a large and diverse group that is representative of these end-users. We can test our hypothesis on the prominence of the multimodal information streams combination for anchor definition that would converge across different users.
In 2016, we defined a 3-stage crowdsourcing evaluation framework:
- Stage 1 “Anchor Verification”:
- Goal: to verify the verbal-visual nature of the anchors defined by end-users. Specifically, to check whether the perceptions of other end-users aligns with those of the anchor creators.
- Stage 2 “Target Vetting”:
- Goal: to compare potential targets to the textual description of the anchors originally generated by the end-users.
- Stage 3 ” Video-to-Video Relevance Analysis”:
- Goal: to collect descriptions of the relation between an anchor and a target that was retrieved as relevant.
The details of all MTurk Human Intelligence Tasks (HITs) can be found at github resource.
To summarise our insights:
- Video-to-video linking allows users to explore collections without the need to explicitly querying a video retrieval system.
- Multimodal anchors are useful for efficient video-to-video linking evaluation within the benchmark context:
- We assume that the creators of video want to draw more attention to the video segments, when they use both audio and video channels together.
- Focusing on multimodal anchors allows the descriptions given by crowdworkers to converge, because the perception by individual viewer’s of the videomaker’s intent is more stable in this set up.
- Crowdworkers confirmed the multimodal nature of the anchors defined within our framework.
The paper describing the framework and results of the Stage 1 crowdsourcing can be found in the paper “Multimodal Video-to-Video Linking: Turning to the Crowd for Insight and Evaluation“