Track 2: Generic Event Boundary Captioning Challenge

For more details, please refer to our Challenge White Paper(TODO: update). For any questions about codalab, please post in its forum

Using Kinetic-GEBC Dataset

a) Full train set including all annotations towards each video.

b) Filtered train and validation set only including the annotations with highest consistency towards each video.

After that, participants could generate captions for test set timestamps. Note that, the ground truths used in our testing process only includes the annotations with highest consistency.

Evaluation Protocol


In our baseline, we modified several current SOTA models by adjusting the input and embedding module. Further details are shown in our paper. The baseline code is available here.

Table: Performance on Kinetics-GEBC for various modified SOTA methods.

We also provide extracted feature used in our baseline.

R-CNN: For each video, we sampled all frames and then extracted 1024-dim region features from each frame by a pre-trained Faster R-CNN backbone. The region feature consists of two part: 

a) Region features from all the possible object region. 

b) A full-frame feature extracted from each frame by skipping the RPN module within R-CNN. 

TSN: For each video, we segment it into several chunks by its boundaries (in the annotation with highest consistency). For example, if a video has three boundaries (0, 1, 2) in its annotation, it would be segment into four chunks (0, 1, 2, 3). Then for each chunks, we use a pre-trained TSN backbone to extract a 2048-dim TSN feature. 

Submission Format

To submit your results to the leaderboard, you must construct a submission zip file containing two files: submit_val.pkl, submit_test.pkl for validation data and test data, respectively. Use the following command to generate the submission file.

zip -r submit_val.pkl, submit_test.pkl

The pickle format is composed of a dictionary containing keys with boundary id and sub-dictionary containing the Subject, Status_Before and Status_After items of each boundary.  The boundary id is in the form of "vid" + "_" + "boundary_index", and the "Status" and "Before/After" is concatenated by underline "_". For example,

{'3yBL-2nND3E_0': {

'Subject': 'man in black t shirt and pant',

'Status_Before': 'walking on the running track holding javelin in hand from left',

'Status_After': 'run on the running track holding javelin in hand from left'


If you have a question about the submission format or if you are still having problems with your submission, please create a topic in the competition forum (rather than contact the organizers directly by e-mail) and we will answer it as soon as possible.  

Registration & Report Submission Portal

Please send an email to 

For more details, please refer to our Challenge White Paper(TODO: update).


Communication & QA