First International Workshop on

Affective Understanding
in Video

at CVPR 2021


Thanks to all the participants! The challenge is now closed, but you may still submit entries to benchmark your results against the leaderboard.

Congrats to our top 3 teams on the leaderboard!

  1. Van Thong Huynh, Soo-Hyung Kim, Guee-Sang Lee, Hyung-Jeong Yang from Chonnam National University

  2. Kezhou Lin (Zhejiang University), Xiaohan Wang (Zhejiang University), Zhedong Zheng (University of Technology Sydney), Linchao Zhu (University of Technology Sydney), Yi Yang (Zhejiang University)

  3. Lin Wang, Baoming Yan, Xiao Liu, Chao Ban, Bo Gao from Alibaba Group

Challenge Timeline:

    • Challenge open: March 1st 2021

    • Challenge close: April 24th 2021

    • Winners Announced: May 1st 2021

Top participants invited to speak at our workshop (June 19th)!

Evoked Expressions from Videos (EEV) Challenge

Given a video, how well can models predict viewer facial reactions at each timestamp when watching the video? Predicting evoked facial expressions from video is challenging, as it requires modeling signals from different modalities (visual and audio) potentially over long timescales. Our challenge uses the EEV dataset, a novel dataset collected using reaction videos, to study these facial expressions variations as viewers watch the video. The 15 facial expressions annotated in the dataset are: amusement, anger, awe, concentration, confusion, contempt, contentment, disappointment, doubt, elation, interest, pain, sadness, surprise, and triumph. Each expression ranges from 0~1 in each frame, corresponding to the confidence that the expression is present. The EEV dataset is collected using publicly available videos, and the dataset is available at:

Register for the challenge:

EEV Dataset

Dataset: [paper] [dataset csv]

Videos can evoke a range of affective responses in viewers. The ability to predict evoked affect from a video, before viewers watch the video, can help in content creation and video recommendation. We introduce the Evoked Expressions from Videos (EEV) dataset, a large-scale dataset for studying viewer responses to videos. Each video is annotated at 6 Hz with 15 continuous evoked expression labels, corresponding to the facial expression of viewers who reacted to the video. We use an expression recognition model within our data collection framework to achieve scalability. In total, there are 8 million annotations of viewer facial reactions to 5,153 videos (370 hours). We use YouTube to obtain a diverse set of video content. We hope that the size and diversity of the EEV dataset will encourage further explorations in video understanding and affective computing.

If you find our dataset useful, please consider citing: