Supported Challenge Dataset

The organizers have upgraded the 2020 HLVU dataset previously composed of ten Creative Commons license movies used at the first edition of the challenge. In 2021, additional set of creative common movies will be added. Full ground truth as annotated by human annotators (on the whole movie level, as well on a scene-based level) will be released for the 10 HLVU movies which can be used by systems as a training set. The DVU Challenge will also complement the 2020 relationship ontology by merging and adding to the MovieGraphs used vocabulary (Vicol, Paul, et al. "Moviegraphs: Towards understanding human-centric situations from videos." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018) to include character interactions, scene locations and sentiments.

If you make use of the DVU dataset or otherwise participate in the challenge please cite this paper using the following bibtex:

@inproceedings{curtis2020hlvu,

title={HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do},

author={Curtis, Keith and Awad, George and Rajput, Shahzad and Soboroff, Ian},

booktitle={Proceedings of the 2020 International Conference on Multimedia Retrieval},

pages={355--361},

year={2020}

}

Movie/TV Dataset

The full Deep Video Understanding training set is available from this link www-nlpir.nist.gov/projects/trecvid/dvu/training/ . This training set has been annotated by human assessors and final ground truth, both at the overall movie level (Ontology of relations, entities, actions & events, Knowledge Graph, and names and images of all main characters), and the individual scene level (Ontology of locations, scene textual summaries, interactions between characters, character emotional states, and scene sentiments) has been be provided for the training set to participating researchers for training and development of their systems. Full details of these movies are provided:


Training dataset:

  1. Honey - Romance - 86 mins.

  2. Let's bring back Sophie - Drama - 50 mins.

  3. Nuclear Family - Drama - 28 mins.

  4. Shooters - Drama - 41 mins.

  5. Spiritual Contact The Movie - Fantasy - 66 mins.

  6. Super Hero - Fantasy - 18 mins.

  7. The Adventures of Huckleberry Finn - Adventure - 106 mins.

  8. The Big Something - Comedy - 101 mins.

  9. Time Expired - Comedy / Drama - 92 mins.

  10. Valkaama - Adventure - 93 mins.

Testing dataset:

1- Bagman - Drama / Thriller - 107 mins.

2- Manos - Horror - 73 mins.

3- Road to Bali - Comedy / Musical - 90 mins.

4- The Illusionist - Adventure / Drama - 109 mins.

Resources by participating teams

Automatically generated transcripts by university of Zurich is available from HERE. Please cite the team's 2020 system paper: https://dl.acm.org/doi/10.1145/3394171.3416292

Speech and person/face bounding box annotations for subset of the HLVU dataset are available by TokyoTech team from HERE. Please cite the team's 2020 system paper: https://dl.acm.org/doi/abs/10.1145/3395035.3425639


Scene annotations and resources by Nanjing University are available from HERE together with a README file. Please cite the team's 2020 system paper:

https://dl.acm.org/doi/10.1145/3394171.3416303


Movie Level

Query Types


  1. Fill in the graph space: Fill in spaces in the Knowledge Graph (KG). Given the listed relationships, events or actions for certain nodes, where some nodes are replaced by variables X, Y, etc., solve for X, Y etc. Example of The Simpsons: X Married To Marge. X Friend Of Lenny. Y Volunteers at Church. Y Neighbor Of X. Solution for X and Y in that case would be: X = Homer, Y = Ned Flanders.

  2. Question Answering: This query type represents questions on the resulting KG, including actions and events, of the movies in the described dataset. For example, we may ask 'How many children does Person A have?', in which case participating researchers should count the 'Parent Of' relationships Person A has in the Knowledge Graph. These are multiple choice questions.

  3. Relations between characters: How is character X related to character Y ? This query type question asks participants about all routes through the KG from one person to another. The main objective of this query type is to test the quality of the established KG. If the system managed to build a representative KG reflecting the real story line of the movie, then it should be able to return back all valid paths, including the shortest, between characters (i.e. how they are related to each other).


Movie Level

Metrics


  1. Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.

  2. Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.

  3. In this query type, systems are asked to submit all valid paths from a source node to another target node with the goal of maximizing recall and precision. Each path will be evaluated if it is a valid path (i.e the submitted order of nodes and edges leads to a path from the source person to the target person of the query) and finally the recall, precision and F1 measures will be reported.


Scene Level

Query Types


  1. Find the Unique Scene: Given a full, inclusive list of interactions, unique to a specific scene in the movie, teams should find which scene this is.

  2. Fill in the graph space: Find the person in a specific scene with the following attributes and interactions with others. Participating teams will be given a scene number, a list of person attributes, and a list of interactions to and from other people. Teams should find the only person in that scene with those attributes and interactions.

  3. Find next or previous interaction: Given a specific scene and a specific interaction between person X and person Y, participants will be asked to return either the previous interaction or the next interaction, in either direction, between person X and Person Y. This can be specifically the next or previous interaction within the same scene, or over the entire movie. These will be multiple choice questions selected from a list of possible interactions, only one of which will be correct.

  4. Find the 1-to-1 relationship between scenes and natural language descriptions: Given a set of scenes, and a set of natural language descriptions of movie scenes, match the correct natural language description for each scene.

  5. Classify scene sentiment from a given scene: Given a specific movie scene and a set of possible sentiments, classify the correct sentiment label for each given scene.


Scene Level

Metrics


  1. Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.

  2. Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.

  3. Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.

  4. Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.

  5. Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.

Please see sample XML response files for a movie-level run and scene-level run. Please make sure your runs validates against the DTD files for both movie and scene query results. Two DTD files for movie-level and scene-level results are available : Movie-level DTD , Scene-level DTD.

A readme file for queries setup is available from HERE

Sample Queries and Responses (Movie-level)

Relational paths between characters:

  • Sample Query:

<DeepVideoUnderstandingTopicQuery question="1" id="1">

<item source="Superintendent Chalmers" target="Lenny"/>

<item description="List all possible paths between Superintendent Chalmers and Lenny"/>

</DeepVideoUnderstandingTopicQuery>

  • Sample Response:

<DeepVideoUnderstandingTopicResult question="1" id="1" path="1">

<item source="Superintendent Chalmers" relation="Superintendent At" target="Springfield Elementary"/>

<item source="Springfield Elementary" relation="Studied At By" target="Bart"/>

<item source="Bart" relation="Child_of" target="Homer"/>

<item source="Homer" relation="Friend_of" target="Lenny"/>

</DeepVideoUnderstandingTopicResult>

Fill in the graph space:

  • Sample Query:

<DeepVideoUnderstandingTopicQuery question="2" id="1">

<item subject="Person:Unknown_1" predicate="Relation:Spouse Of" object="Person:Marge"/>

<item subject="Person:Unknown_1" predicate="Relation:Parent Of" object="Person:Bart"/>

<item subject="Person:Unknown_1" predicate="Relation:Parent Of" object="Person:Lisa"/>

<item subject="Person:Unknown_1" predicate="Relation:Friend Of" object="Person:Lenny"/>

<item subject="Person:Unknown_1" predicate="Relation:Friend Of" object="Person:Barnie"/>

<item subject="Person:Unknown_1" predicate="Relation:Socialises At" object="Entity:Moe's Tavern"/>

<item subject="Person:Unknown_1" predicate="Relation:Works At" object="Entity:Nuclear Power Plant"/>

<item subject="Person:Unknown_1" predicate="Relation:Attends" object="Entity:[BLANK]"/>

<item description="Which Person has the following Relations: Spouse Of Person:Marge, Parent Of Person:Bart, Parent Of Person:Lisa, Friend Of Person:Lenny, Friend Of Person:Barnie, Socialises At Entity:Moe's Tavern, Works At Entity:Nuclear Power Plant, Attends Entity:[BLANK]?"/>

</DeepVideoUnderstandingTopicQuery>

<DeepVideoUnderstandingTopicQuery question="2" id="2">

<item subject="Person:Unknown_2" predicate="Relation:Superintendent At" object="Entity:Springfield Elementary"/>

<item subject="Person:Unknown_2" predicate="Relation:Supervisor Of" object="Person:Principal Skinner"/>

<item subject="Person:Unknown_2" predicate="Relation:Attends" object="Entity:Church"/>

<item description="Which Person has the following Relations: Superintendent At Entity:Springfield Elementary, Supervisor Of Person:Principal Skinner, Attends Entity:Church?"/>

</DeepVideoUnderstandingTopicQuery>

  • Sample Response:

<DeepVideoUnderstandingTopicResult question="2" id="1">

<item order="1" subject="Homer" confidence="64"/>

<item order="2" subject="Apu" confidence="18"/>

<item order="3" subject="Flanders" confidence="12"/>

<item order="4" subject="Reverend Lovejoy" confidence="6"/>

</DeepVideoUnderstandingTopicResult>

<DeepVideoUnderstandingTopicResult question="2" id="2">

<item order="1" subject="Superintendent Chalmers" confidence="92"/>

<item order="2" subject="Agnes Skinner" confidence="8"/>

</DeepVideoUnderstandingTopicResult>

Question Answering:

  • Sample Query:

<DeepVideoUnderstandingTopicQuery question="3" id="1">

<item subject="Person:Ms. Krabappel" predicate="Relation:Unknown_1" object="Entity:Springfield Elementary"/>

<item description="What is the relation / connection from Ms. Krabappel to Springfield Elementary?"/>

<Answers>

<item type="Entity" answer="Attends"/>

<item type="Entity" answer="Teacher At"/>

<item type="Entity" answer="Owns"/>

<item type="Entity" answer="Studies At"/>

<item type="Entity" answer="Principal At"/>

<item type="Entity" answer="Friend Of"/>

</Answers>

</DeepVideoUnderstandingTopicQuery>

  • Sample Response:

<DeepVideoUnderstandingTopicResult question="3" id="1">

<item type="Relation" answer="Teacher_At"/>

</DeepVideoUnderstandingTopicResult>

Sample Queries and Responses (Scene-level)

Find the Unique Scene:

  • Sample Query:

<DeepVideoUnderstandingTopicQuery question="1" id="1">

<item subject="Scene:Unknown_1" predicate="Interaction:orders" />

<item subject="Scene:Unknown_1" predicate="Interaction:talks to"/>

<item subject="Scene:Unknown_1" predicate="Interaction:explains to" />

<item subject="Scene:Unknown_1" predicate="Interaction:threatens" />

<item description="Which Unique Scene contains the following Interactions: orders, talks to, explains to, threatens"/>

</DeepVideoUnderstandingTopicQuery>

  • Sample Response:

<DeepVideoUnderstandingTopicResult question="1" id="1">

<item order="1" scene="24" confidence="72"/>

<item order="2" scene="13" confidence="22"/>

<item order="3" scene="2" confidence="4"/>

<item order="4" scene="38" confidence="2"/>

</DeepVideoUnderstandingTopicResult>

Fill in the Graph Space:

  • Sample Query:

<DeepVideoUnderstandingTopicQuery question="2" id="1">

<item subject="Person:Unknown_1" scene="10" predicate="Interaction:talks to" object="Target_Person:Homer"/>

<item subject="Person:Unknown_1" scene="10" predicate="Interaction:asks" object="Target_Person:Apu"/>

<item subject="Person:Unknown_1" scene="10" predicate="Interaction:asks" object="Target_Person:Homer"/>

<item subject="Person:Unknown_1" scene="10" predicate="Interaction:embraces" object="Target_Person:Homer"/>

<item subject="Person:Unknown_1" scene="10" predicate="Interaction:explains to" object="Source_Person:Homer"/>

<item description="Which Person in scene 10 has the following Interactions: talks to Target_Person:Homer, asks Target_Person:Apu, asks Target_Person:Homer, embraces Target_Person:Homer, Source_Person:Homer explains to"/>

</DeepVideoUnderstandingTopicQuery>

  • Sample Response:

<DeepVideoUnderstandingTopicResult question="2" id="1">

<item order="1" subject="Person:Lisa" confidence="63"/>

<item order="2" subject="Person:Maggie" confidence="19"/>

<item order="3" subject="Person:Bart" confidence="10"/>

<item order="4" subject="Person:Marge" confidence="8"/>

</DeepVideoUnderstandingTopicResult>

What is the Next Interaction:

  • Sample Query:

<DeepVideoUnderstandingTopicQuery question="3" id="1">

<item subject="Person:Homer" scene="7" predicate="Interaction:talks to" object="Person:Marge"/>

<item description="In Scene 7, Homer talks to Marge. What is the immediate next / following interation between Marge and Homer, in scene 19?"/>

<Answers>

<item type="Interaction" scene="19" answer="explains to"/>

<item type="Interaction" scene="19" answer="yells at"/>

<item type="Interaction" scene="19" answer="talks to"/>

<item type="Interaction" scene="19" answer="kisses"/>

<item type="Interaction" scene="19" answer="greets"/>

<item type="Interaction" scene="19" answer="stops"/>

</Answers>

</DeepVideoUnderstandingTopicQuery>

  • Sample Response:

<DeepVideoUnderstandingTopicResult question="3" id="1">

<item type="Interaction" answer="talks to"/>

</DeepVideoUnderstandingTopicResult>

What is the Previous Interaction:

  • Sample Query:

<DeepVideoUnderstandingTopicQuery question="4" id="1">

<item subject="Person:Nelson" scene="36" predicate="Interaction:hits" object="Person:Bart"/>

<item description="In Scene 36, Nelson hits Bart. What is the immediate prior / previous interation between Nelson and Bart, in scene 36?"/>

<Answers>

<item type="Interaction" scene="36" answer="yells at"/>

<item type="Interaction" scene="36" answer="touches"/>

<item type="Interaction" scene="36" answer="kisses"/>

<item type="Interaction" scene="36" answer="watches"/>

<item type="Interaction" scene="36" answer="talks to"/>

<item type="Interaction" scene="36" answer="fights with"/>

</Answers>

</DeepVideoUnderstandingTopicQuery>

  • Sample Response:

<DeepVideoUnderstandingTopicResult question="4" id="1">

<item type="Interaction" answer="yells at"/>

</DeepVideoUnderstandingTopicResult>

Match scene with natural language description:

  • Sample Query:

<DeepVideoUnderstandingTopicQuery question="5" id="1">

<item subject="Scene:Unknown" predicate="Description"/>

<item description="Homer shouts at the TV. Marge shouts at Homer telling him to stop shouting."/>

<Answers>

<item type="Integer:Scene" answer="2"/>

<item type="Integer:Scene" answer="6"/>

<item type="Integer:Scene" answer="9"/>

<item type="Integer:Scene" answer="16"/>

<item type="Integer:Scene" answer="21"/>

<item type="Integer:Scene" answer="28"/>

<item type="Integer:Scene" answer="32"/>

<item type="Integer:Scene" answer="36"/>

<item type="Integer:Scene" answer="40"/>

<item type="Integer:Scene" answer="44"/>

</Answers>

</DeepVideoUnderstandingTopicQuery>

  • Sample Response:

<DeepVideoUnderstandingTopicResult question="5" id="1">

<item type="Integer:Scene" answer="9"/>

</DeepVideoUnderstandingTopicResult>

Classify Sentiment Label for a Given Scene:

  • Sample Query:

<DeepVideoUnderstandingTopicQuery question="6" id="1">

<item subject="Sentiment:Unknown" scene="18"/>

<item description="In Scene 18, What is the correct sentiment label?"/>

<Answers>

<item type="Sentiment" answer="fight"/>

<item type="Sentiment" answer="sexual harassment"/>

<item type="Sentiment" answer="travel"/>

<item type="Sentiment" answer="talking / conversation"/>

<item type="Sentiment" answer="greeting"/>

<item type="Sentiment" answer="dressing"/>

</Answers>

</DeepVideoUnderstandingTopicQuery>

  • Sample Response:

<DeepVideoUnderstandingTopicResult question="6" id="1">

<item type="Sentiment" answer="fight"/>

</DeepVideoUnderstandingTopicResult>