The DVU development, testing datasets, and 2022 Queries are now available (please submit the data agreement to access testing dataset)
Deep Video Understanding (DVU) dataset is split into a development data of 14 movies from the 2020-2021 versions of this challenge with a Creative Commons licenses, and a new set of 10 movies licensed from KinoLorberEdu platform. 4 new movies out of the 10 will be added to the 14 movies, while 6 will be chosen as the testing data in 2022. The challenge will continue working with the same ontology used in 2021 iteration which made use of the MovieGraphs vocabulary (Vicol, Paul, et al. "Moviegraphs: Towards understanding human-centric situations from videos." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018) to include character attributes, interactions, scene locations, and situations.
If you make use of the DVU dataset or otherwise participate in the challenge please cite this paper using the following bibtex:
@inproceedings{curtis2020hlvu,
title={HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do},
author={Curtis, Keith and Awad, George and Rajput, Shahzad and Soboroff, Ian},
booktitle={Proceedings of the 2020 International Conference on Multimedia Retrieval},
pages={355--361},
year={2020}
}
The full Deep Video Understanding training set of 14 movies and total duration of about 17 hr is available from this link. The extra 4 new movies will be added in a few months when their annotations are available. This training set has been annotated by human assessors and final ground truth, both at the overall movie level (Ontology of relations, entities, actions & events, Knowledge Graph, and names and images of all main characters), and the individual scene level (Ontology of locations, people/entities, interactions and their order between people, sentiments, and text summary) has been be provided for the training set to participating researchers for training and development of their systems. More information about movies' genres and duration are provided:
Honey - Romance - 86 mins.
Let's bring back Sophie - Drama - 50 mins.
Nuclear Family - Drama - 28 mins.
Shooters - Drama - 41 mins.
Spiritual Contact The Movie - Fantasy - 66 mins.
Super Hero - Fantasy - 18 mins.
The Adventures of Huckleberry Finn - Adventure - 106 mins.
The Big Something - Comedy - 101 mins.
Time Expired - Comedy / Drama - 92 mins.
Valkaama - Adventure - 93 mins.
Bagman - Drama / Thriller - 107 mins.
Manos - Horror - 73 mins.
Road to Bali - Comedy / Musical - 90 mins.
The Illusionist - Adventure / Drama - 109 mins.
To request the testing dataset, each team (participant) need to first sign and submit back to organizers the data agreement form located HERE. After receiving the signed form, the organizers will send access information to the 2022 testing dataset (original movies, segmented scenes, master scene reference table files). Same scenes ontology (vocabulary classes) will be used in annotating the scenes in the testing dataset. Movie-level relationships ontology can be found here
Automatically generated transcripts by university of Zurich is available from HERE. Please cite the team's 2020 system paper: https://dl.acm.org/doi/10.1145/3394171.3416292
Speech and person/face bounding box annotations for subset of the HLVU dataset are available by TokyoTech team from HERE. Please cite the team's 2020 system paper: https://dl.acm.org/doi/abs/10.1145/3395035.3425639
Scene annotations and resources (for 2020 DVU edition) by Nanjing University are available from HERE together with a README file. Please cite the team's 2020 system paper:
https://dl.acm.org/doi/10.1145/3394171.3416303
Resources by Nanjing University (updating 2020 data resources and adding new resources for 2021 testing dataset) are available from HERE. Please cite the team's paper if you made use of their tools and/or outputs: https://dl.acm.org/doi/10.1145/3474085.3479214
Relations between characters: How is character X related to character Y ? This query type question asks participants about all routes through the KG from one person to another. The main objective of this query type is to test the quality of the established KG. If the system managed to build a representative KG reflecting the real story line of the movie, then it should be able to return back all valid paths, including the shortest, between characters (i.e. how they are related to each other).
Fill in the graph space: Fill in spaces in the Knowledge Graph (KG). Given the listed relationships, events or actions for certain nodes, where some nodes are replaced by variables X, Y, etc., solve for X, Y etc. Example of The Simpsons: X Married To Marge. X Friend Of Lenny. Y Volunteers at Church. Y Neighbor Of X. Solution for X and Y in that case would be: X = Homer, Y = Ned Flanders.
Question Answering: This query type represents questions on the resulting KG, including actions and events, of the movies in the described dataset. For example, we may ask 'How many children does Person A have?', in which case participating researchers should count the 'Parent Of' relationships Person A has in the Knowledge Graph. These are multiple choice questions.
In this query type, systems are asked to submit all valid paths from a source node to another target node with the goal of maximizing recall and precision. Each path will be evaluated if it is a valid path (i.e the submitted order of nodes and edges leads to a path from the source person to the target person of the query) and finally the recall, precision and F1 measures will be reported.
Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.
Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.
Find the Unique Scene: Given a full, inclusive list of interactions, unique to a specific scene in the movie, teams should find which scene this is.
Fill in the graph space: Find the person in a specific scene with the following attributes and interactions with others. Participating teams will be given a scene number, a list of person attributes, and a list of interactions to and from other people. Teams should find the only person in that scene with those attributes and interactions.
Find next or previous interaction: Given a specific scene and a specific interaction between person X and person Y, participants will be asked to return either the previous interaction or the next interaction, in either direction, between person X and Person Y. This can be specifically the next or previous interaction within the same scene, or over the entire movie. These will be multiple choice questions selected from a list of possible interactions, only one of which will be correct.
Find the 1-to-1 relationship between scenes and natural language descriptions: Given a set of scenes, and a set of natural language descriptions of movie scenes, match the correct natural language description for each scene.
Classify scene sentiment from a given scene: Given a specific movie scene and a set of possible sentiments, classify the correct sentiment label for each given scene.
Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.
Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.
Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.
Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.
Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.
A new query to aid in AI Explainability has been added to this years challenge. Some Relationship questions on the movie level challenge and some interaction questions in the scene-level challenge have an additional Explainability context. For movie level queries this simply asks for the relevant scene number to be supplied as evidence. For scene-level queries this simply asks for the start time and end time of the relevant interaction within the scene in seconds (seconds expired since the beginning of the scene). In terms of scoring and evaluation, this explainability element will be judged manually and totally separate from the official challenge scoring metrics as it is only used to inspect the quality of evidence provided by systems. Please see example provided below. Please refer to the queries README FILE for more details
Teams can submit to either the Movie-level queries only, the scene-level queries only, or both the movie and scene queries.
For movie-level queries, teams can choose to submit results to either or both groups of:
Group 1 - Question 1 (relation between characters)
Group 2 - Question 2 & 3 (fill in the graph and question answering)
For scene-level queries, teams can choose to submit results to either or both groups of:
Group 1 - Questions 1, 2, & 3 (all related to interactions between characters)
Group 2 - Questions 4 & 5 (matching of scenes and their descriptions in text, and sentiment classification)
Please see sample XML response files for a movie-level run and scene-level run. Please make sure your runs validates against the DTD files for both movie and scene query results. Two DTD files for movie-level and scene-level results are available : Movie-level DTD , Scene-level DTD
Sample Query:
<DeepVideoUnderstandingTopicQuery question="1" id="1">
<item source="Superintendent Chalmers" target="Lenny"/>
<item description="List all possible paths between Superintendent Chalmers and Lenny"/>
</DeepVideoUnderstandingTopicQuery>
Sample Response:
<DeepVideoUnderstandingTopicResult question="1" id="1" path="1">
<item source="Superintendent Chalmers" relation="Superintendent At" target="Springfield Elementary"/>
<item source="Springfield Elementary" relation="Studied At By" target="Bart"/>
<item source="Bart" relation="Child_of" target="Homer"/>
<item source="Homer" relation="Friend_of" target="Lenny"/>
</DeepVideoUnderstandingTopicResult>
Sample Query:
<DeepVideoUnderstandingTopicQuery question="2" id="1">
<item subject="Person:Unknown_1" predicate="Relation:Spouse Of" object="Person:Marge"/>
<item subject="Person:Unknown_1" predicate="Relation:Parent Of" object="Person:Bart"/>
<item subject="Person:Unknown_1" predicate="Relation:Parent Of" object="Person:Lisa"/>
<item subject="Person:Unknown_1" predicate="Relation:Friend Of" object="Person:Lenny"/>
<item subject="Person:Unknown_1" predicate="Relation:Friend Of" object="Person:Barnie"/>
<item subject="Person:Unknown_1" predicate="Relation:Socialises At" object="Entity:Moe's Tavern"/>
<item subject="Person:Unknown_1" predicate="Relation:Works At" object="Entity:Nuclear Power Plant"/>
<item subject="Person:Unknown_1" predicate="Relation:Attends" object="Entity:[BLANK]"/>
<item description="Which Person has the following Relations: Spouse Of Person:Marge, Parent Of Person:Bart, Parent Of Person:Lisa, Friend Of Person:Lenny, Friend Of Person:Barnie, Socialises At Entity:Moe's Tavern, Works At Entity:Nuclear Power Plant, Attends Entity:[BLANK]?"/>
</DeepVideoUnderstandingTopicQuery>
<DeepVideoUnderstandingTopicQuery question="2" id="2">
<item subject="Person:Unknown_2" predicate="Relation:Superintendent At" object="Entity:Springfield Elementary"/>
<item subject="Person:Unknown_2" predicate="Relation:Supervisor Of" object="Person:Principal Skinner"/>
<item subject="Person:Unknown_2" predicate="Relation:Attends" object="Entity:Church"/>
<item description="Which Person has the following Relations: Superintendent At Entity:Springfield Elementary, Supervisor Of Person:Principal Skinner, Attends Entity:Church?"/>
</DeepVideoUnderstandingTopicQuery>
Sample Response:
<DeepVideoUnderstandingTopicResult question="2" id="1">
<item order="1" subject="Homer" confidence="64"/>
<item order="2" subject="Apu" confidence="18"/>
<item order="3" subject="Flanders" confidence="12"/>
<item order="4" subject="Reverend Lovejoy" confidence="6"/>
</DeepVideoUnderstandingTopicResult>
<DeepVideoUnderstandingTopicResult question="2" id="2">
<item order="1" subject="Superintendent Chalmers" confidence="92"/>
<item order="2" subject="Agnes Skinner" confidence="8"/>
</DeepVideoUnderstandingTopicResult>
Sample Query:
<DeepVideoUnderstandingTopicQuery question="3" id="1">
<item subject="Person:Ms. Krabappel" predicate="Relation:Unknown_1" object="Entity:Springfield Elementary"/>
<item description="What is the relation / connection from Ms. Krabappel to Springfield Elementary?"/>
<Answers>
<item type="Entity" answer="Attends" scene=""/>
<item type="Entity" answer="Teacher At" scene=""/>
<item type="Entity" answer="Owns" scene=""/>
<item type="Entity" answer="Studies At" scene=""/>
<item type="Entity" answer="Principal At" scene=""/>
<item type="Entity" answer="Friend Of" scene=""/>
</Answers>
</DeepVideoUnderstandingTopicQuery>
Sample Response:
<DeepVideoUnderstandingTopicResult question="3" id="1">
<item type="Relation" answer="Teacher_At" scene="13"/>
</DeepVideoUnderstandingTopicResult>
Sample Query:
<DeepVideoUnderstandingTopicQuery question="1" id="1">
<item subject="Scene:Unknown_1" predicate="Interaction:orders" />
<item subject="Scene:Unknown_1" predicate="Interaction:talks to"/>
<item subject="Scene:Unknown_1" predicate="Interaction:explains to" />
<item subject="Scene:Unknown_1" predicate="Interaction:threatens" />
<item description="Which Unique Scene contains the following Interactions: orders, talks to, explains to, threatens"/>
</DeepVideoUnderstandingTopicQuery>
Sample Response:
<DeepVideoUnderstandingTopicResult question="1" id="1">
<item order="1" scene="24" confidence="72"/>
<item order="2" scene="13" confidence="22"/>
<item order="3" scene="2" confidence="4"/>
<item order="4" scene="38" confidence="2"/>
</DeepVideoUnderstandingTopicResult>
Sample Query:
<DeepVideoUnderstandingTopicQuery question="2" id="1">
<item subject="Person:Unknown_1" scene="10" predicate="Interaction:talks to" object="Target_Person:Homer"/>
<item subject="Person:Unknown_1" scene="10" predicate="Interaction:asks" object="Target_Person:Apu"/>
<item subject="Person:Unknown_1" scene="10" predicate="Interaction:asks" object="Target_Person:Homer"/>
<item subject="Person:Unknown_1" scene="10" predicate="Interaction:embraces" object="Target_Person:Homer"/>
<item subject="Person:Unknown_1" scene="10" predicate="Interaction:explains to" object="Source_Person:Homer"/>
<item description="Which Person in scene 10 has the following Interactions: talks to Target_Person:Homer, asks Target_Person:Apu, asks Target_Person:Homer, embraces Target_Person:Homer, Source_Person:Homer explains to"/>
</DeepVideoUnderstandingTopicQuery>
Sample Response:
<DeepVideoUnderstandingTopicResult question="2" id="1">
<item order="1" subject="Person:Lisa" confidence="63"/>
<item order="2" subject="Person:Maggie" confidence="19"/>
<item order="3" subject="Person:Bart" confidence="10"/>
<item order="4" subject="Person:Marge" confidence="8"/>
</DeepVideoUnderstandingTopicResult>
Sample Query:
<DeepVideoUnderstandingTopicQuery question="3" id="1">
<item subject="Person:Homer" scene="7" predicate="Interaction:talks to" object="Person:Marge"/>
<item description="In Scene 7, Homer talks to Marge. What is the immediate next / following interaction between Marge and Homer, in scene 19?"/>
<Answers>
<item type="Interaction" scene="19" answer="explains to" start_time="" end_time=""/>
<item type="Interaction" scene="19" answer="yells at" start_time="" end_time=""/>
<item type="Interaction" scene="19" answer="talks to" start_time="" end_time=""/>
<item type="Interaction" scene="19" answer="kisses" start_time="" end_time=""/>
<item type="Interaction" scene="19" answer="greets" start_time="" end_time=""/>
<item type="Interaction" scene="19" answer="stops" start_time="" end_time=""/>
</Answers>
</DeepVideoUnderstandingTopicQuery>
Sample Response:
<DeepVideoUnderstandingTopicResult question="3" id="1">
<item type="Interaction" answer="talks to" start_time="23" end_time="29"/>
</DeepVideoUnderstandingTopicResult>
Sample Query:
<DeepVideoUnderstandingTopicQuery question="4" id="1">
<item subject="Person:Nelson" scene="36" predicate="Interaction:hits" object="Person:Bart"/>
<item description="In Scene 36, Nelson hits Bart. What is the immediate prior / previous interaction between Nelson and Bart, in scene 36?"/>
<Answers>
<item type="Interaction" scene="36" answer="yells at" start_time="" end_time=""/>
<item type="Interaction" scene="36" answer="touches" start_time="" end_time=""/>
<item type="Interaction" scene="36" answer="kisses" start_time="" end_time=""/>
<item type="Interaction" scene="36" answer="watches" start_time="" end_time=""/>
<item type="Interaction" scene="36" answer="talks to" start_time="" end_time=""/>
<item type="Interaction" scene="36" answer="fights with" start_time="" end_time=""/>
</Answers>
</DeepVideoUnderstandingTopicQuery>
Sample Response:
<DeepVideoUnderstandingTopicResult question="4" id="1">
<item type="Interaction" answer="yells at" start_time="5" end_time="7"/>
</DeepVideoUnderstandingTopicResult>
Sample Query:
<DeepVideoUnderstandingTopicQuery question="5" id="1">
<item subject="Scene:Unknown" predicate="Description"/>
<item description="Homer shouts at the TV. Marge shouts at Homer telling him to stop shouting."/>
<Answers>
<item type="Integer:Scene" answer="2"/>
<item type="Integer:Scene" answer="6"/>
<item type="Integer:Scene" answer="9"/>
<item type="Integer:Scene" answer="16"/>
<item type="Integer:Scene" answer="21"/>
<item type="Integer:Scene" answer="28"/>
<item type="Integer:Scene" answer="32"/>
<item type="Integer:Scene" answer="36"/>
<item type="Integer:Scene" answer="40"/>
<item type="Integer:Scene" answer="44"/>
</Answers>
</DeepVideoUnderstandingTopicQuery>
Sample Response:
<DeepVideoUnderstandingTopicResult question="5" id="1">
<item type="Integer:Scene" answer="9"/>
</DeepVideoUnderstandingTopicResult>
Sample Query:
<DeepVideoUnderstandingTopicQuery question="6" id="1">
<item subject="Sentiment:Unknown" scene="18"/>
<item description="In Scene 18, What is the correct sentiment label?"/>
<Answers>
<item type="Sentiment" answer="fight"/>
<item type="Sentiment" answer="sexual harassment"/>
<item type="Sentiment" answer="travel"/>
<item type="Sentiment" answer="talking / conversation"/>
<item type="Sentiment" answer="greeting"/>
<item type="Sentiment" answer="dressing"/>
</Answers>
</DeepVideoUnderstandingTopicQuery>
Sample Response:
<DeepVideoUnderstandingTopicResult question="6" id="1">
<item type="Sentiment" answer="fight"/>
</DeepVideoUnderstandingTopicResult>