Dataset & Queries

Deep Video Understanding (DVU) dataset is split into a development data of 19 movies from the 2020-2022 versions of this challenge. With 14 of these having Creative Commons licenses and used in the 2020-2021 versions of the challenge, and 5 movies licensed from KinoLorberEdu platform which were part of the test set for the 2022 version. The test set will be comprised of 5 movies licensed from KinoLorberEdu platform. One of these movies is reused from the 2022 version of this challenge while the other 4 have not been used in this challenge before. The challenge will continue working with the same ontology used in 2021 iteration which made use of the MovieGraphs vocabulary (Vicol, Paul, et al. "Moviegraphs: Towards understanding human-centric situations from videos." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018) to include character attributes, interactions, scene locations, and situations.

If you make use of the DVU dataset or otherwise participate in the challenge please cite this paper using the following bibtex:

@inproceedings{curtis2020hlvu,

  title={HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do},

  author={Curtis, Keith and Awad, George and Rajput, Shahzad and Soboroff, Ian},

  booktitle={Proceedings of the 2020 International Conference on Multimedia Retrieval},

  pages={355--361},

  year={2020}

}

Movie Dataset

The full Deep Video Understanding training set of 19 movies and total duration of ~25 hours is available from this link. To access the Kinolorber licensed full movies, each team (participant) needs to first sign and submit back to organizers the data agreement form located HERE. After receiving the signed form, the organizers will send access information to the Kinolorber licensed movies. This training set has been annotated by human assessors and final ground truth, both at the overall movie level (Ontology of relations, entities, actions & events, Knowledge Graph, and names and images of all main characters), and  the individual scene level (Ontology of locations, people/entities, interactions and their order between people, sentiments, and text summary) has been be provided for the training set to participating researchers for training and development of their systems. More information about movies' genres and duration are provided:

testing dataset

To request the testing dataset, each team (participant) needs to first sign and submit back to organizers the data agreement form located HERE. After receiving the signed form, the organizers will send access information to the 2023 testing dataset (original movies, segmented scenes, master scene reference table files). The ontology/vocabulary of classes used in the movie-level (relationships) and scene-level (interactions, sentiments, etc) annotations and testing dataset can be found here (also as a pdf here).

Resources by PAST participating teams

Automatically generated transcripts by university of Zurich is available from HERE. Please cite the team's 2020 system paper: https://dl.acm.org/doi/10.1145/3394171.3416292

Speech and person/face bounding box annotations for subset of the HLVU dataset are available by TokyoTech team from HERE. Please cite the team's 2020 system paper: https://dl.acm.org/doi/abs/10.1145/3395035.3425639


Scene annotations and resources (for 2020 DVU edition) by Nanjing University are available from HERE together with a README file. Please cite the team's 2020 system paper:

https://dl.acm.org/doi/10.1145/3394171.3416303


Resources by Nanjing University (updating 2020 data resources and adding new resources for 2021 testing dataset) are available from HERE. Please cite the team's paper if you made use of their tools and/or outputs: https://dl.acm.org/doi/10.1145/3474085.3479214


Subtitles & tracking/recognition results for 2022 testing dataset are available from HERE. Please cite the team's paper if you will use their outputs in your research: https://dl.acm.org/doi/abs/10.1145/3503161.3551600


Movie-Level

Query Types



Movie-Level

Metrics




Scene-Level

Query Types



Scene-Level

Metrics


Testing queries : will be available later in the next coming few month (~ 1.5 month before submission)

An addition to the 2023 challenge is a new robustness sub-task that systems can choose to submit solutions to. To measure the robustness of the multi-modal systems participating in the challenge this year, we will provide a secondary version of the testing dataset after introducing various types of perturbations and corruptions observed in real-world multi-modal data. This will allow teams and organizers to measure how much systems performance can get affected by the introduced noise.

Submission format and specifications are below (Please check regularly for the latest updates)

Run Submission types and conditions

Please see sample XML response files for a movie-level run and scene-level run. Please make sure your runs validates against the DTD files for both movie and scene query results. Two DTD files for movie-level and scene-level results are available : Movie-level DTD , Scene-level DTD.

Please make sure your movie-level and scene-level xml files include this 1st line to link to the correct DTD file:

Movie-level:

<!DOCTYPE DeepVideoUnderstandingResults SYSTEM "https://www-nlpir.nist.gov/projects/tv2023/dtds/DeepVideoUnderstandingResults.dtd">

Scene-level:

<!DOCTYPE DeepVideoUnderstandingSceneResults SYSTEM "https://www-nlpir.nist.gov/projects/tv2023/dtds/DeepVideoUnderstandingSceneResults.dtd">

Testing queries are now available (xml query files for movie-level, scene-level, and image snapshots of main entities). Please download from HERE . Provided is a readme file. If you have not done yet and planning to join the challenge, please make sure to submit the data agreement form to be able to download the testing dataset.

Sample Queries and Responses (Movie-level)

Fill in the graph space:

<DeepVideoUnderstandingTopicQuery question="1" id="1">

  <item subject="Person:Unknown_1" predicate="Relation:Spouse Of" object="Person:Marge"/>

  <item subject="Person:Unknown_1" predicate="Relation:Parent Of" object="Person:Bart"/>

  <item subject="Person:Unknown_1" predicate="Relation:Parent Of" object="Person:Lisa"/>

  <item subject="Person:Unknown_1" predicate="Relation:Friend Of" object="Person:Lenny"/>

  <item subject="Person:Unknown_1" predicate="Relation:Friend Of" object="Person:Barnie"/>

  <item subject="Person:Unknown_1" predicate="Relation:Socialises At" object="Entity:Moe's Tavern"/>

  <item subject="Person:Unknown_1" predicate="Relation:Works At" object="Entity:Nuclear Power Plant"/>

  <item subject="Person:Unknown_1" predicate="Relation:Attends" object="Entity:[BLANK]"/>

  <item description="Which Person has the following Relations: Spouse Of Person:Marge, Parent Of Person:Bart, Parent Of Person:Lisa, Friend Of Person:Lenny, Friend Of Person:Barnie, Socialises At Entity:Moe's Tavern, Works At Entity:Nuclear Power Plant, Attends Entity:[BLANK]?"/>

</DeepVideoUnderstandingTopicQuery>

<DeepVideoUnderstandingTopicQuery question="1" id="2">

  <item subject="Person:Unknown_2" predicate="Relation:Superintendent At" object="Entity:Springfield Elementary"/>

  <item subject="Person:Unknown_2" predicate="Relation:Supervisor Of" object="Person:Principal Skinner"/>

  <item subject="Person:Unknown_2" predicate="Relation:Attends" object="Entity:Church"/>

  <item description="Which Person has the following Relations: Superintendent At Entity:Springfield Elementary, Supervisor Of Person:Principal Skinner, Attends Entity:Church?"/>

</DeepVideoUnderstandingTopicQuery>

<DeepVideoUnderstandingTopicResult question="1" id="1">

  <item order="1" subject="Homer" confidence="64"/>

  <item order="2" subject="Apu" confidence="18"/>

  <item order="3" subject="Flanders" confidence="12"/>

  <item order="4" subject="Reverend Lovejoy" confidence="6"/>

</DeepVideoUnderstandingTopicResult>

<DeepVideoUnderstandingTopicResult question="1" id="2">

  <item order="1" subject="Superintendent Chalmers" confidence="92"/>

  <item order="2" subject="Agnes Skinner" confidence="8"/>

</DeepVideoUnderstandingTopicResult>

Question Answering:

<DeepVideoUnderstandingTopicQuery question="2" id="1">

  <item description="What is the relation / connection from Ms. Krabappel to Springfield Elementary?"/>

  <Answers>

    <item answer="Attends"/>

    <item answer="Teacher At"/>

    <item answer="Owns"/>

    <item answer="Studies At"/>

    <item answer="Principal At"/>

    <item answer="Friend Of"/>

  </Answers>

</DeepVideoUnderstandingTopicQuery>

<DeepVideoUnderstandingTopicResult question="2" id="1">

  <item answer="Teacher_At"/>

</DeepVideoUnderstandingTopicResult>

Sample Queries and Responses (Scene-level)

Find the Unique Scene:

<DeepVideoUnderstandingTopicQuery question="1" id="1">

    <item subject="Scene:Unknown_1" predicate="Interaction:orders" />

    <item subject="Scene:Unknown_1" predicate="Interaction:talks to"/>

    <item subject="Scene:Unknown_1" predicate="Interaction:explains to" />

    <item subject="Scene:Unknown_1" predicate="Interaction:threatens" />

    <item description="Which Unique Scene contains the following Interactions: orders, talks to, explains to, threatens"/>

</DeepVideoUnderstandingTopicQuery>

<DeepVideoUnderstandingTopicResult question="1" id="1">

    <item order="1" scene="24" confidence="72"/>

    <item order="2" scene="13" confidence="22"/>

    <item order="3" scene="2" confidence="4"/>

    <item order="4" scene="38" confidence="2"/>

</DeepVideoUnderstandingTopicResult>

Fill in the Graph Space:

<DeepVideoUnderstandingTopicQuery question="2" id="1">

    <item subject="Person:Unknown_1" scene="10" predicate="Interaction:talks to" object="Target_Person:Homer"/>

    <item subject="Person:Unknown_1" scene="10" predicate="Interaction:asks" object="Target_Person:Apu"/>

    <item subject="Person:Unknown_1" scene="10" predicate="Interaction:asks" object="Target_Person:Homer"/>

    <item subject="Person:Unknown_1" scene="10" predicate="Interaction:embraces" object="Target_Person:Homer"/>

    <item subject="Person:Unknown_1" scene="10" predicate="Interaction:explains to" object="Source_Person:Homer"/>

    <item description="Which Person in scene 10 has the following Interactions: talks to Target_Person:Homer, asks Target_Person:Apu, asks Target_Person:Homer, embraces Target_Person:Homer, Source_Person:Homer explains to"/>

</DeepVideoUnderstandingTopicQuery>

<DeepVideoUnderstandingTopicResult question="2" id="1">

    <item order="1" subject="Person:Lisa" confidence="63"/>

    <item order="2" subject="Person:Maggie" confidence="19"/>

    <item order="3" subject="Person:Bart" confidence="10"/>

    <item order="4" subject="Person:Marge" confidence="8"/>

</DeepVideoUnderstandingTopicResult>

What is the Next Interaction:

<DeepVideoUnderstandingTopicQuery question="3" id="1">

    <item subject="Person:Homer" scene="7" predicate="Interaction:talks to" object="Person:Marge"/>

    <item description="In Scene 7, Homer talks to Marge. What is the immediate next / following interaction between Marge and Homer, in  scene 19?"/>

    <Answers>

        <item type="Interaction" scene="19" answer="explains to"/>

        <item type="Interaction" scene="19" answer="yells at"/>

        <item type="Interaction" scene="19" answer="talks to"/>

        <item type="Interaction" scene="19" answer="kisses"/>

        <item type="Interaction" scene="19" answer="greets"/>

        <item type="Interaction" scene="19" answer="stops"/>

    </Answers>

</DeepVideoUnderstandingTopicQuery>

<DeepVideoUnderstandingTopicResult question="3" id="1">

    <item type="Interaction" answer="talks to"/>

</DeepVideoUnderstandingTopicResult>

What is the Previous Interaction:

<DeepVideoUnderstandingTopicQuery question="4" id="1">

    <item subject="Person:Nelson" scene="36" predicate="Interaction:hits" object="Person:Bart"/>

    <item description="In Scene 36, Nelson hits Bart. What is the immediate prior / previous interaction between Nelson and Bart, in  scene 36?"/>

    <Answers>

        <item type="Interaction" scene="36" answer="yells at"/>

        <item type="Interaction" scene="36" answer="touches"/>

        <item type="Interaction" scene="36" answer="kisses"/>

        <item type="Interaction" scene="36" answer="watches"/>

        <item type="Interaction" scene="36" answer="talks to"/>

        <item type="Interaction" scene="36" answer="fights with"/>

    </Answers>

</DeepVideoUnderstandingTopicQuery>

<DeepVideoUnderstandingTopicResult question="4" id="1">

    <item type="Interaction" answer="yells at"/>

</DeepVideoUnderstandingTopicResult>

Match scene with natural language description:

<DeepVideoUnderstandingTopicQuery question="5" id="1">

    <item subject="Scene:Unknown" predicate="Description"/>

    <item description="Homer shouts at the TV. Marge shouts at Homer telling him to stop shouting."/>

    <Answers>

        <item type="Integer:Scene" answer="2"/>

        <item type="Integer:Scene" answer="6"/>

        <item type="Integer:Scene" answer="9"/>

        <item type="Integer:Scene" answer="16"/>

        <item type="Integer:Scene" answer="21"/>

        <item type="Integer:Scene" answer="28"/>

        <item type="Integer:Scene" answer="32"/>

        <item type="Integer:Scene" answer="36"/>

        <item type="Integer:Scene" answer="40"/>

        <item type="Integer:Scene" answer="44"/>

    </Answers>

</DeepVideoUnderstandingTopicQuery>

<DeepVideoUnderstandingTopicResult question="5" id="1">

    <item type="Integer:Scene" answer="9"/>

</DeepVideoUnderstandingTopicResult>

Classify Sentiment Label for a Given Scene:

<DeepVideoUnderstandingTopicQuery question="6" id="1">

    <item subject="Sentiment:Unknown" scene="18"/>

    <item description="In Scene 18, What is the correct sentiment label?"/>

    <Answers>

        <item type="Sentiment" answer="fight"/>

        <item type="Sentiment" answer="sexual harassment"/>

        <item type="Sentiment" answer="travel"/>

        <item type="Sentiment" answer="talking / conversation"/>

        <item type="Sentiment" answer="greeting"/>

        <item type="Sentiment" answer="dressing"/>

    </Answers>

</DeepVideoUnderstandingTopicQuery>

<DeepVideoUnderstandingTopicResult question="6" id="1">

    <item type="Sentiment" answer="fight"/>

</DeepVideoUnderstandingTopicResult>