Call for Papers

Deep video understanding is a difficult task which requires systems to develop a deep analysis and understanding of the relationships between different entities in video, to use known information to reason about other, more hidden information, and to populate a knowledge graph (KG) with all acquired information. To work on this task, a system should take into consideration all available modalities (speech, image/video, and in some cases text). The aim of this workshop is to push the limits of multimodal extraction, fusion, and analysis techniques to address the problem of analysing long duration videos holistically and extracting useful knowledge to utilize it in solving different types of queries. The target knowledge includes both visual and non-visual elements. As videos and multimedia data are getting more and more popular and usable by users in different domains, the research, approaches and techniques we aim to be applied in this workshop will be very relevant in the coming years and near future.

This workshop will support two tracks of research contributions:

  1. Track 1: Interested authors are invited to apply their approaches and methods on a novel High-Level Video Understanding (HLVU) dataset being made available by the workshop organizers. These include 10 movies with a Creative Commons license. This dataset will be annotated by human assessors and ground truth (Ontology of relations, entities, actions & events, names and images of all main characters, Knowledge Graph for 50% of the movies) provided to participating researchers. The organizers will also support evaluation and scoring of two main query types distributed with the dataset:

    • Multiple choice question answering on Knowledge Graph for selected movies.

    • Possible path analysis between persons / entities of interest in a Knowledge Graph extracted from selected movies.


2. Track 2: Contributions related (but not limited) to the following topics applied on the provided HLVU dataset or any external datasets are invited:

    • Multimodal feature extraction for movies and extended video

    • Multimodal fusion of computer vision, text/language processing and audio for extended video / movie analysis

    • Machine Learning methods for movie-based multimodal interaction

    • Sentiment analysis and multimodal dialogue modeling for movies

    • Knowledge Graph generation, analysis, and extraction for movies and extended videos

Submission

We invite submissions of long papers (up to 8 pages excluding references), short papers (up to 4 pages excluding references), and extended abstracts (up to 1 page excluding references), formatted according to the ACM template available here (https://www.acm.org/binaries/content/assets/publications/consolidated-tex-template/acmart-master.zip), or directly from Overleaf (www.overleaf.com/gallery/tagged/acm-official#.WOuOk2e1taQ). Submissions shall be single blind, i.e. do not need to be anonymized. Workshop papers will be indexed by ACM Digital Library in an adjunct proceedings.

Papers submitted at ICMI 2020 must not have been published previously. A paper is considered to have been published previously if it has appeared in a peer-reviewed journal, magazine, book, or meeting proceedings that is reliably and permanently available afterward in print or electronic form to non-attendees, regardless of the language of that publication. A paper substantially similar in content to one submitted to ICMI 2020 should not be simultaneously under consideration for another conference or workshop.

ICMI 2020 does not consider a paper on arXiv.org as a dual submission.

Please email Track 1 run submission XML files directly to Keith Curtis (keith.curtis@nist.gov), and CC to George Awad (george.awad@nist.gov). Please indicate ICMI workshop in the subject line. XML format details are available on the README.txt at https://drive.google.com/drive/u/0/folders/1eCfMy6y9YWHB9qgQQ6GHkEYjLtXDJI3a

Paper submissions will be handled electronically via EasyChair : easychair.org/conferences/?conf=dvu2020

Important Dates

* = Track 1 only