First International Workshop on
at CVPR 2021
Thanks to all the participants! The challenge is now closed, but you may still submit entries to benchmark your results against the leaderboard.
Congrats to our top 3 teams on the leaderboard!
Van Thong Huynh, Soo-Hyung Kim, Guee-Sang Lee, Hyung-Jeong Yang from Chonnam National University
Kezhou Lin (Zhejiang University), Xiaohan Wang (Zhejiang University), Zhedong Zheng (University of Technology Sydney), Linchao Zhu (University of Technology Sydney), Yi Yang (Zhejiang University)
Lin Wang, Baoming Yan, Xiao Liu, Chao Ban, Bo Gao from Alibaba Group
Top participants invited to speak at our workshop (June 19th)!
Evoked Expressions from Videos (EEV) Challenge
Given a video, how well can models predict viewer facial reactions at each timestamp when watching the video? Predicting evoked facial expressions from video is challenging, as it requires modeling signals from different modalities (visual and audio) potentially over long timescales. Our challenge uses the EEV dataset, a novel dataset collected using reaction videos, to study these facial expressions variations as viewers watch the video. The 15 facial expressions annotated in the dataset are: amusement, anger, awe, concentration, confusion, contempt, contentment, disappointment, doubt, elation, interest, pain, sadness, surprise, and triumph. Each expression ranges from 0~1 in each frame, corresponding to the confidence that the expression is present. The EEV dataset is collected using publicly available videos, and the dataset is available at: https://github.com/google-research-datasets/eev.
Register for the challenge: https://www.aicrowd.com/challenges/evoked-expressions-from-videos-challenge-cvpr-2021
Videos can evoke a range of affective responses in viewers. The ability to predict evoked affect from a video, before viewers watch the video, can help in content creation and video recommendation. We introduce the Evoked Expressions from Videos (EEV) dataset, a large-scale dataset for studying viewer responses to videos. Each video is annotated at 6 Hz with 15 continuous evoked expression labels, corresponding to the facial expression of viewers who reacted to the video. We use an expression recognition model within our data collection framework to achieve scalability. In total, there are 8 million annotations of viewer facial reactions to 5,153 videos (370 hours). We use YouTube to obtain a diverse set of video content. We hope that the size and diversity of the EEV dataset will encourage further explorations in video understanding and affective computing.
If you find our dataset useful, please consider citing: