Track 3: Affordance-Centric Question-driven Task Completion Challenge

For more details, please refer to our Challenge White Paper (TODO: update). For any questions about codalab, please post in its forum. 

AssistQ Dataset

Compared to AssistQ competition in LOVEU@CVPR'22, this year we add a new testing set, which we denote as [AssistQ test@23]. The goal is to achieve higher recall on this set, using the [AssistQ train@22] + [AssistQ test@22] sets. 

(1) video.mp4 / video.mov: instructional video;

(2) script.txt: the video script with the timestamp. For example,

0:00:00-0:00:04 How to start, stop, start and stop airfryer? Turn the temperature knob anticlockwise to 120 degrees. 

0:00:04-0:00:07 Turn the time knob clockwise to 10 minutes. 

...

The meaning of the annotation (from left to right): start time-end time: text script. The time format follows HH:MM:SS.

(3) buttons.csv: button bounding-box annotation. For example,

button1,362,86,185,72,airfryer-user.jpg,960,1280

button2,378,330,185,170,airfryer-user.jpg,960,1280

...

The meaning of the annotation (from left to right): button name, top-left x, top-left y, width, height, image filename, image width, image height. 

(4) images/ folder: the folder contains the image files mentioned in buttons.csv.

{'aircon_utr3b': [{...}, {...}, {...}, {...}, {...}, {...}], 'airfryer_gye82': [{...}, {...}, {...}, {...}, {...}], 'airfryer_pe2j7': [{...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}], 'airfryer_w9rzm': [{...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, ...], 'bicycle_g8h94': [{...}, {...}, {...}], ...}

    For each data sample, there are multiple question-answer pairs:

"question": "How to bake a cake at 120 degrees for 15 minutes?"

"answers": [

["Turn <button1> clockwise", "Turn <button1> anticlockwise", "Turn <button2> clockwise", "Turn <button2> anticlockwise to 0 minutes", "Turn <button1> to 200 degrees", "Turn <button1> to 120 degrees", "Turn <button1> to 180 degrees", "Turn <button2> clockwise to 3 minutes", "Turn <button2> clockwise to 10 minutes", "Turn <button2> clockwise to 15 minutes"], 

["Turn <button1> clockwise", "Turn <button1> anticlockwise", "Turn <button2> clockwise", "Turn <button2> anticlockwise to 0 minutes", "Turn <button1> to 200 degrees", "Turn <button1> to 120 degrees", "Turn <button1> to 180 degrees", "Turn <button2> clockwise to 3 minutes", "Turn <button2> clockwise to 10 minutes", "Turn <button2> clockwise to 15 minutes"]

], #  candidate answers of each step (2 steps in this case)

"correct": [6, 10],  # correct answer index of each step (starting from 1). We would not release this in the testing set

"images": ["airfryer-user.jpg", "airfryer-user.jpg"] # user view image of each step, mentioned in buttons.csv

},

   {...},

   ...

]

Baseline

Evaluation Protocol

Submission 

zip -r submit_test.zip submit_test.json

"blender_92uto": # data index

  "question": "How to bake a cake at 120 degrees for 15 minutes?", # do not change the question

"scores": # scores of candidate answers

[

[0.1, 0.2, 0.3, ...], 

[0.3, 0.2, 0.1, ...]

]

},

{...},

...

],

"...": {...},

...

}

Registration & Report Submission Portal

Please send an email to loveu.cvpr@gmail.com. 

For more details, please refer to our Challenge White Paper (TODO: update).

Timeline:


Communication & QA