• Cognitive Science has shown that humans consistently segment videos into meaningful chunks. The segmentation happens naturally, without pre-defined categories and without being explicitly asked to do so.

  • Here, we study the task of Generic Event Boundary Detection (GEBD), aiming at detecting generic, taxonomy-free event boundaries that segment a whole video into chunks. Details can be found in our paper: https://arxiv.org/abs/2101.10511

  • Some example event boundaries are shown in the righthand figure.

  • Generic event boundaries are

    • immediately useful for applications like video editing and summarization

    • stepping stone towards long-form video modeling via reasoning the temporal structures of segmented units


  • We present more details of our dataset & annotation below and present details of competition track 1 and track 2 in the corresponding separate webpages. More details & some visualization examples can be found in our white paper.

  • Notably, our Kinetics-GEBD has the largest number of boundaries (e.g. 32x of ActivityNet, 8x of EPIC-Kitchens-100) which are in-the-wild, open-vocabulary, cover generic event change, and respect human perception diversity.