News:
Sep 5, 2025: 🚀 We Launch the Video-TT Challenge!
Overview of Video-TT Dataset:
The Video Thinking Test (Video-TT) is a benchmark dataset consisting of 1,000 short-form videos collected from YouTube Shorts, curated to evaluate the ability of video language models to understand real-world, dynamic scenes. Each video is paired with five questions in total—one open-ended comprehension question and four adversarial questions designed to probe narrative complexity, temporal reasoning, and robustness against natural ambiguities. By distinguishing between errors caused by limited frame sampling and those reflecting deeper comprehension gaps, Video-TT provides a controlled yet challenging testbed for video understanding and reasoning research, while highlighting the current performance gap between humans and state-of-the-art models. You may refer to the video below for more details of Video-TT dataset.
Dataset download: Our dataset can be downloaded from huggingface.
Test data usage: In this challenge, we only use the "test" split, which contains 1k multi-choice questions with ground truth answers.
Evaluation: The ground truth answers are provided in the dataset. Accuracy among these questions is the only metric for final evaluation.
Baseline: Representative work have been evaluated in the paper, you may check for comparisons.
First Prize (1 team):$1500 USD in cash or prizes of equal value
Second Prize (1 team):$1000 USD in cash or prizes of equal value
Third Prize (1 team):$500 USD in cash or prizes of equal value
All teams should submit the evaluation logs in the following format:
{
"0001-7": "D",
"<qid of the video>": <Model Response>,
xxx: xxx,
}
If you have a question about the submission format, please create a topic in the competition forum.
To submit your evaluation logs, please open the submit portal in CodaBench and submit your evaluation logs in a single zip file.
The name of your JSON file should be "results.json", other names may cause the evaluation server fail to run your results.
You should submit a zip file containing only the "results.json". For Mac users, we recommend that you directly zip the JSON file without creating a new directory.
All teams should also submit their technical report after the challenge is finished. Detailed timeline will be introduced below. Please use CVPR style (double column) in the form of 3-6 pages inclusive of any references. Please explain clearly the data sources, the training strategies and your model architecture you have used to make sure your results are comparable to others.
Please include your GitHub link in the report. The top 2 winners are required to release their codebases and final models so that other people can reproduce them in the future. Please contact us if you have any questions.
For report submission, please send an email to videottntu@gmail.com.
Format of email subject: “YourName-Submission-VideoTT-Challenge”;
Attach your technical report and other relevant materials to the email.
Include your CodaBench account (registered email) and username for our challenge in the email. Include meta info like team members, institution, etc.
Sep 05, 2025 (12:00 PM UTC Time): Evaluation server opens, with leaderboard available.
Oct 09, 2025 (12:00 PM UTC Time): Evaluation server closes.
Oct 12, 2025 (12:00 PM UTC Time): Report submission due.
Codalab competition website: Video-TT Challenge
For Challenge Forum: Video-TT Challenge Forum
Nanyang Technological University
Nanyang Technological University
Nanyang Technological University