Track 3: Affordance-Centric Question-driven Task Completion Challenge

This track aims at encouraging our participants to advance SOTA system on the Affordance-Centric Question-driven Task Completion (AQTC) task.
Our collected AssistQ dataset is the first dataset for the AQTC task. Participants should use the dataset to train/test their algorithm.
The competition is based on our new AssistQ test set (2023 version) only. This means that competitors can use the total 2022 train + test sets to train their model.
Top 2 winners will be mentioned at the workshop and formally recognized.

For more details, please refer to our Challenge White Paper (TODO: update). For any questions about codalab, please post in its forum.

AssistQ Dataset

Compared to AssistQ competition in LOVEU@CVPR'22, this year we add a new testing set, which we denote as [AssistQ test@23]. The goal is to achieve higher recall on this set, using the [AssistQ train@22] + [AssistQ test@22] sets.

Download [AssistQ train@22] + [AssistQ test@22]: we will send you the download link after filling in AssistQ Downloading Agreement.
(25 May, 2023 Update) Download [AssistQ test@23] without ground-truth annotation file (test2023_without_gt.json). This file uses the same videos in [AssistQ train@22] + [AssistQ test@22] sets, but for each video, there is a new question and the candidate answers. Try to get a higher score on this set!
(26 June, 2023 Update) We released [AssistQ test@23] with ground-truth annotation file: test2023_with_gt.json
In each data folder, there are several files:

(1) video.mp4 / video.mov: instructional video;

(2) script.txt: the video script with the timestamp. For example,

0:00:00-0:00:04 How to start, stop, start and stop airfryer? Turn the temperature knob anticlockwise to 120 degrees.

0:00:04-0:00:07 Turn the time knob clockwise to 10 minutes.

...

The meaning of the annotation (from left to right): start time-end time: text script. The time format follows HH:MM:SS.

(3) buttons.csv: button bounding-box annotation. For example,

button1,362,86,185,72,airfryer-user.jpg,960,1280

button2,378,330,185,170,airfryer-user.jpg,960,1280

...

The meaning of the annotation (from left to right): button name, top-left x, top-left y, width, height, image filename, image width, image height.

(4) images/ folder: the folder contains the image files mentioned in buttons.csv.

Question-Answer Annotations: we aggregate the annotations of all data samples in train.json. A video can have multiple questions, and each question needs to be answered in multiple steps and multiple modalities. Specifically, each data index (e.g., coffeemachine_d2stw, diffuser_lxcd4) corresponds to a list that contains multiple question-answer pairs:

{'aircon_utr3b': [{...}, {...}, {...}, {...}, {...}, {...}], 'airfryer_gye82': [{...}, {...}, {...}, {...}, {...}], 'airfryer_pe2j7': [{...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}], 'airfryer_w9rzm': [{...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, ...], 'bicycle_g8h94': [{...}, {...}, {...}], ...}

For each data sample, there are multiple question-answer pairs:

[

{

"question": "How to bake a cake at 120 degrees for 15 minutes?",

"answers": [

["Turn <button1> clockwise", "Turn <button1> anticlockwise", "Turn <button2> clockwise", "Turn <button2> anticlockwise to 0 minutes", "Turn <button1> to 200 degrees", "Turn <button1> to 120 degrees", "Turn <button1> to 180 degrees", "Turn <button2> clockwise to 3 minutes", "Turn <button2> clockwise to 10 minutes", "Turn <button2> clockwise to 15 minutes"],

], # candidate answers of each step (2 steps in this case)

"correct": [6, 10], # correct answer index of each step (starting from 1). We would not release this in the testing set

"images": ["airfryer-user.jpg", "airfryer-user.jpg"] # user view image of each step, mentioned in buttons.csv

{...},

...

]

Baseline

For starter baseline code: Github.
We encourage you to learn winners' solutions of last year: starsholic (Winning the CVPR'2022 AQTC Challenge: A Two-stage Function-centric Approach) and unipyler (Technical Report for CVPR 2022 LOVEU AQTC Challenge).

Evaluation Protocol

The "multi-step answers'" in AQTC is similar to "multi-round dialogues'' in Visual Dialog. Therefore, we follow the evaluation metric Recall@K in visual dialogue, which measures how often the ground-truth answer is ranked in the top-k choices. Higher Recall@K denotes better performance. We evaluate Recall@1 and Recall@3 in experiments.

Evaluation metrics (Recall@1, Recall@3) are the average values for each answer step. They should be obtained from the AssistQ testing set.
The code for evaluation can also be found in Github.

Submission

Codalab competition with leaderboard:
To submit your results to the leaderboard, you must construct a submission zip file containing submit_test.json for test data. Use the following command to generate the submission file.

zip -r submit_test.zip submit_test.json

The format of submit_test.json is very simple. You only need to organize the score of each answer as a dictionary:

{

"blender_92uto": # data index

[

{

"question": "How to bake a cake at 120 degrees for 15 minutes?", # do not change the question

"scores": # scores of candidate answers

[

[0.1, 0.2, 0.3, ...],

[0.3, 0.2, 0.1, ...]

]

{...},

...

"...": {...},

...

}

If you have a question about the submission format or if you are still having problems with your submission, please create a topic in the competition forum (rather than contact the organizers directly by e-mail) and we will answer it as soon as possible.

Registration & Report Submission Portal

Please send an email to loveu.cvpr@gmail.com.

Format of Email subject: “YourName-Submission-LOVEU23-Track3”.
Please include metadata like your team members, institution, etc.

Attach your technical report and other relevant materials to the email.

For more details, please refer to our Challenge White Paper (TODO: update).

Timeline:

May 25, 2023 (11:59PM Pacific Time): AssistQ testset@2023 released.
Jun 15, 2023 (11:59PM Pacific Time): evaluation server close.
Jun 22, 2023 (11:59PM Pacific Time): report submission due.

Communication & QA

- For Challenge Policies: CodaLab Challenge Forum