First International Workshop on

Affective Understanding
in Video

at CVPR 2021 (June 19th)


  • AUVi is taking place on June 19th virtually at CVPR! [Workshop program]

    • Watch pre-recorded talks: YouTube playlist of all pre-recorded talks at AUVi. Note that you don't have to watch the talks ahead of AUVi - you can still come to talks & ask questions at our workshop! These links provide more options for attendees who may be from different timezones or may have a conflict tomorrow.

    • Submit questions for the discussion panels: Link to Google Form.

  • Evoked Expressions in Videos Challenge open: March 1st (completed!) [Challenge details]

  • Paper submission deadline:


How to attend AUVi?

Our workshop is part of the conference on computer vision and pattern recognition (CVPR). To attend our workshop, please register for CVPR at this link: to have access to the virtual platform (including Zoom and Gatherly links) from CVPR. You can also attend other workshops, tutorials, and poster presentations at CVPR. We additionally plan on streaming workshop proceedings to YouTube - stay tuned for the streaming link!

How do I attend the poster session / How does Gatherly work?

Our poster session will be on Gatherly, at the link provided to CVPR attendees on Cadmium. Gatherly is a video chat software where participants can move around (spatially in 2D), and only video/audio of those close to you is shown. You can move around by simply clicking on people you would like to chat with! This helpful video will show you how to move around in Gatherly. You can also share your screen on Gatherly. Additional information is in the Gatherly Video Guides.

How do I attend the live sessions (speaker panels, challenge panels)?

The live panel discussions will be on Zoom, at the link provided to CVPR attendees on Cadmium. Stay tuned for ways to watch and submit questions for live sessions ahead of our workshop!

I'm having trouble with Zoom/YouTube/Gatherly, what do I do?

The Zoom link and Gatherly link will be available only for CVPR attendees via the Cadmium platform provided by CVPR. The YouTube stream link is publicly available. If you are having trouble with Gatherly, please use this Gatherly troubleshooting guide. If you are having trouble with Zoom, please use this Zoom troubleshooting guide. If you are not able to use any of the virtual tools, please contact us at and we will respond to you as soon as we can.

Videos allow us to capture and model the temporal nature of expressed affect, which is crucial in achieving human-level video understanding. Affective signals in videos are expressed over time across different modalities through music, scenery, camera angles, and movement, as well as with character tone, facial expressions, and body language. With the widespread availability of video recording technology and increasing storage capacity, we now have an expanding amount of public academic data to better study affect. Additionally, there is a growing number of temporal modeling methods that have not yet been well-explored for affective understanding in video.

Our workshop seeks to further explore this area and support researchers to compare models quantitatively at a large scale to improve the confidence, quality, and generalizability of the models. We encourage advances in datasets, models, and statistical techniques that leverage videos to improve our understanding of expressed affect and applications of these models to fields such as social assistive robotics, video retrieval and creation, and assistive driving that have direct and clear benefits to humans.

Submit papers

Enter our challenge

Attend virtually!

Contact us

Topics include, but are not limited to:

  • Methods to recognize affective expressions evoked in viewers by videos, such as from music/audio, scenes, character interactions, and related topics.

  • Methods to recognize affective expressions of people shown in videos, including facial expression, body expressions, voice expression, and related topics.

  • Ethics, bias and fairness in modeling and datasets for the problem of affective understanding in videos.

  • Multimodal techniques for understanding affective expressions, including non-visual signals such as audio or speech.

  • Explainability and interpretability in the context of affective video understanding.

  • Temporal context and scene context in affective video understanding.

  • Cross-cultural analysis of affect and subjective annotations.

  • Open public academic datasets to understand affective expressions in video.

  • Applications of affective understanding of videos to industry.

Photo credit to Kelly Sikkema, Cookie the Pom, and Adam Solomon on Unsplash.