The DVU development, testing datasets, and 2022 Queries are now available (please submit the data agreement to access testing dataset)

Dataset & Queries

Deep Video Understanding (DVU) dataset is split into a development data of 14 movies from the 2020-2021 versions of this challenge with a Creative Commons licenses, and a new set of 10 movies licensed from KinoLorberEdu platform. 4 new movies out of the 10 will be added to the 14 movies, while 6 will be chosen as the testing data in 2022. The challenge will continue working with the same ontology used in 2021 iteration which made use of the MovieGraphs vocabulary (Vicol, Paul, et al. "Moviegraphs: Towards understanding human-centric situations from videos." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018) to include character attributes, interactions, scene locations, and situations.

If you make use of the DVU dataset or otherwise participate in the challenge please cite this paper using the following bibtex:

@inproceedings{curtis2020hlvu,

title={HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do},

author={Curtis, Keith and Awad, George and Rajput, Shahzad and Soboroff, Ian},

booktitle={Proceedings of the 2020 International Conference on Multimedia Retrieval},

pages={355--361},

year={2020}

}

Movie Dataset

The full Deep Video Understanding training set of 14 movies and total duration of about 17 hr is available from this link. The extra 4 new movies will be added in a few months when their annotations are available. This training set has been annotated by human assessors and final ground truth, both at the overall movie level (Ontology of relations, entities, actions & events, Knowledge Graph, and names and images of all main characters), and the individual scene level (Ontology of locations, people/entities, interactions and their order between people, sentiments, and text summary) has been be provided for the training set to participating researchers for training and development of their systems. More information about movies' genres and duration are provided:

Training dataset:

Honey - Romance - 86 mins.
Let's bring back Sophie - Drama - 50 mins.
Nuclear Family - Drama - 28 mins.
Shooters - Drama - 41 mins.
Spiritual Contact The Movie - Fantasy - 66 mins.
Super Hero - Fantasy - 18 mins.
The Adventures of Huckleberry Finn - Adventure - 106 mins.
The Big Something - Comedy - 101 mins.
Time Expired - Comedy / Drama - 92 mins.
Valkaama - Adventure - 93 mins.
Bagman - Drama / Thriller - 107 mins.
Manos - Horror - 73 mins.
Road to Bali - Comedy / Musical - 90 mins.
The Illusionist - Adventure / Drama - 109 mins.

testing dataset

To request the testing dataset, each team (participant) need to first sign and submit back to organizers the data agreement form located HERE. After receiving the signed form, the organizers will send access information to the 2022 testing dataset (original movies, segmented scenes, master scene reference table files). Same scenes ontology (vocabulary classes) will be used in annotating the scenes in the testing dataset. Movie-level relationships ontology can be found here

Resources by PAST participating teams

Automatically generated transcripts by university of Zurich is available from HERE. Please cite the team's 2020 system paper: https://dl.acm.org/doi/10.1145/3394171.3416292

Speech and person/face bounding box annotations for subset of the HLVU dataset are available by TokyoTech team from HERE. Please cite the team's 2020 system paper: https://dl.acm.org/doi/abs/10.1145/3395035.3425639

Scene annotations and resources (for 2020 DVU edition) by Nanjing University are available from HERE together with a README file. Please cite the team's 2020 system paper:

https://dl.acm.org/doi/10.1145/3394171.3416303

Resources by Nanjing University (updating 2020 data resources and adding new resources for 2021 testing dataset) are available from HERE. Please cite the team's paper if you made use of their tools and/or outputs: https://dl.acm.org/doi/10.1145/3474085.3479214

Movie-Level

Query Types

Relations between characters: How is character X related to character Y ? This query type question asks participants about all routes through the KG from one person to another. The main objective of this query type is to test the quality of the established KG. If the system managed to build a representative KG reflecting the real story line of the movie, then it should be able to return back all valid paths, including the shortest, between characters (i.e. how they are related to each other).
Fill in the graph space: Fill in spaces in the Knowledge Graph (KG). Given the listed relationships, events or actions for certain nodes, where some nodes are replaced by variables X, Y, etc., solve for X, Y etc. Example of The Simpsons: X Married To Marge. X Friend Of Lenny. Y Volunteers at Church. Y Neighbor Of X. Solution for X and Y in that case would be: X = Homer, Y = Ned Flanders.
Question Answering: This query type represents questions on the resulting KG, including actions and events, of the movies in the described dataset. For example, we may ask 'How many children does Person A have?', in which case participating researchers should count the 'Parent Of' relationships Person A has in the Knowledge Graph. These are multiple choice questions.

Movie-Level

Metrics

In this query type, systems are asked to submit all valid paths from a source node to another target node with the goal of maximizing recall and precision. Each path will be evaluated if it is a valid path (i.e the submitted order of nodes and edges leads to a path from the source person to the target person of the query) and finally the recall, precision and F1 measures will be reported.
Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.
Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.

Scene-Level

Query Types

Find the Unique Scene: Given a full, inclusive list of interactions, unique to a specific scene in the movie, teams should find which scene this is.
Fill in the graph space: Find the person in a specific scene with the following attributes and interactions with others. Participating teams will be given a scene number, a list of person attributes, and a list of interactions to and from other people. Teams should find the only person in that scene with those attributes and interactions.
Find next or previous interaction: Given a specific scene and a specific interaction between person X and person Y, participants will be asked to return either the previous interaction or the next interaction, in either direction, between person X and Person Y. This can be specifically the next or previous interaction within the same scene, or over the entire movie. These will be multiple choice questions selected from a list of possible interactions, only one of which will be correct.
Find the 1-to-1 relationship between scenes and natural language descriptions: Given a set of scenes, and a set of natural language descriptions of movie scenes, match the correct natural language description for each scene.
Classify scene sentiment from a given scene: Given a specific movie scene and a set of possible sentiments, classify the correct sentiment label for each given scene.

Scene-Level

Metrics

Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.
Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.
Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.
Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.
Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.

AI Explainability

A new query to aid in AI Explainability has been added to this years challenge. Some Relationship questions on the movie level challenge and some interaction questions in the scene-level challenge have an additional Explainability context. For movie level queries this simply asks for the relevant scene number to be supplied as evidence. For scene-level queries this simply asks for the start time and end time of the relevant interaction within the scene in seconds (seconds expired since the beginning of the scene). In terms of scoring and evaluation, this explainability element will be judged manually and totally separate from the official challenge scoring metrics as it is only used to inspect the quality of evidence provided by systems. Please see example provided below. Please refer to the queries README FILE for more details

Testing queries are now available from THIS LINK with a README page

Submission format and specifications are below.

Run Submission types and conditions

Teams can submit to either the Movie-level queries only, the scene-level queries only, or both the movie and scene queries.
For movie-level queries, teams can choose to submit results to either or both groups of:
- Group 1 - Question 1 (relation between characters)
- Group 2 - Question 2 & 3 (fill in the graph and question answering)
For scene-level queries, teams can choose to submit results to either or both groups of:
- Group 1 - Questions 1, 2, & 3 (all related to interactions between characters)
- Group 2 - Questions 4 & 5 (matching of scenes and their descriptions in text, and sentiment classification)

Please see sample XML response files for a movie-level run and scene-level run. Please make sure your runs validates against the DTD files for both movie and scene query results. Two DTD files for movie-level and scene-level results are available : Movie-level DTD , Scene-level DTD

Sample Queries and Responses (Movie-level)