Track 2: Generic Event Boundary Captioning Challenge

  • This track aims at encouraging our participants to advance SOTA system on Boundary Captioning.

  • The competition is based on our Kinetics-GEBC test set only.

  • Top 3 winners will be mentioned at the workshop and formally recognized.

For more details, please refer to our Challenge White Paper. For any questions about codalab, please post in its forum.


Using Kinetic-GEBC Dataset

  • In total, our Kinetic-GEBC dataset includes 176,681 boundaries in 12,434 videos selected from all categories in Kinetic-400. Each annotation consists of several boundaries inside a video. Each video contains 1 to 8 annotations from different annotators, where the boundaries' location are not the same. In evaluation, for each video, we only take the annotation with highest consistency as the ground truth towards each video. Specifically, the highest consistency means the boundaries' locations in which annotation are closest with that in all other annotations.

  • Participants could choose to use either version of dataset to train their models:

a) Full train set including all annotations towards each video.

b) Filtered train and validation set only including the annotations with highest consistency towards each video.

After that, participants could generate captions for test set timestamps. Note that, the ground truths used in our testing process only includes the annotations with highest consistency.

  • In comparison of most relevant datasets in video captioning, our Kinetic-GEBC is the first one targeting on the captioning of generic event boundaries.


Evaluation Protocol

  • In evaluation, we employ CIDEr, SPICE and ROUGE_L as evaluation metrics, which are widely utilized in image and video captioning benchmarks.

  • We separate the prediction captions into Subject, Status Before and Status After, and then compute the similarity score of each item with the ground truth. Finally, an average score is computed across three items for each metric.

  • The final ranking is determined based on the comprehensive consideration across the scores in all three metrics.

  • We evaluate performance on Kinetics-GEBC Test Set in this competition, video list can be found here.

Baseline

In our baseline, we modified several current SOTA models by adjusting the input and embedding module. Further details are shown in our paper. The baseline code is available here.


Table: Performance on Kinetics-GEBC for various modified SOTA methods.

We also provide extracted feature used in our baseline.

R-CNN: For each video, we sampled all frames and then extracted 1024-dim region features from each frame by a pre-trained Faster R-CNN backbone. The region feature consists of two part:

a) Region features from all the possible object region.

b) A full-frame feature extracted from each frame by skipping the RPN module within R-CNN.

TSN: For each video, we segment it into several chunks by its boundaries (in the annotation with highest consistency). For example, if a video has three boundaries (0, 1, 2) in its annotation, it would be segment into four chunks (0, 1, 2, 3). Then for each chunks, we use a pre-trained TSN backbone to extract a 2048-dim TSN feature.

Submission Format

To submit your results to the leaderboard, you must construct a submission zip file containing two files: submit_val.pkl, submit_test.pkl for validation data and test data, respectively. Use the following command to generate the submission file.

zip -r test_submit.zip submit_val.pkl, submit_test.pkl

The pickle format is composed of a dictionary containing keys with boundary id and sub-dictionary containing the Subject, Status_Before and Status_After items of each boundary. The boundary id is in the form of "vid" + "_" + "boundary_index", and the "Status" and "Before/After" is concatenated by underline "_". For example,

{'3yBL-2nND3E_0': {

'Subject': 'man in black t shirt and pant',

'Status_Before': 'walking on the running track holding javelin in hand from left',

'Status_After': 'run on the running track holding javelin in hand from left'

}}.

If you have a question about the submission format or if you are still having problems with your submission, please create a topic in the competition forum (rather than contact the organizers directly by e-mail) and we will answer it as soon as possible.

Registration & Report Submission Portal

Please send an email to loveu.cvpr22@gmail.com.

  • Format of Email subject: “YourName-Submission-LOVEU22-Track2”.

  • Please include metadata like your team members, institution, etc.

  • Attach your technical report and other relevant materials in the email.

For more details, please refer to our Challenge White Paper.

Timeline

  • April, 2022 (11:59PM Pacific Time): evaluation server open for the val set.

  • May 01, 2022 (11:59PM Pacific Time): evaluation server open for the test set, with leaderboard available.

  • Jun 01, 2022 (11:59PM Pacific Time): evaluation server close.

  • Jun 08, 2022 (11:59PM Pacific Time): report submission due.


Communication & QA