The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) “State-of-Mind, Depression, and Cross-cultural Affect” is a satellite event of ACM MM 2019, (Nice, France, 21-25 October 2019), and the ninth competition aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual, and audio-visual health and emotion sensing, with all participants competing under strictly the same conditions.

The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the audio, visual and audio-visual affect recognition communities, to compare the relative merits of the approaches to automatic health and emotion analysis under well-defined conditions. Another motivation is the need to advance health and emotion recognition systems to be able to deal with fully naturalistic behaviour in large volumes of un-segmented, non-prototypical and non-preselected data, as this is exactly the type of data that both multimedia and human-machine/human-robot communication interfaces have to face in the real world.

We are calling for teams to participate in three Sub-Challenges:

State-of-Mind Sub-Challenge (SoMS)

The AVEC 2019 SoMS is a new task focusing on the continuous adaptation of human state-of-mind (SOM), which is pivotal for mental functioning and behaviour regulation. Human SOM constantly shifts due to internal and external stimuli, and habitual use of either adaptive or maladaptive SOM influences mental health. One key aspect of the human experience is our emotions, as they reflect our SOM. In the SoMS, self-reported mood (10-point Likert scale), before and after the narrative of personal stories (two positive and two negative), has to be predicted automatically from audio-visual recordings (USoM corpus). Performance is evaluated with the Concordance Correlation Coefficient (CCC).

Detecting Depression with AI Sub-Challenge (DDS)

The AVEC 2019 DDS is a major extension of the AVEC 2016 DSC, where the level of depression severity (PHQ-8 questionnaire) was assessed from audio-visual recordings of US Army veterans interacting with a virtual agent conducting a clinical interview and driven by a human as a Wizard-of-Oz (DAIC-WOZ corpus). The DAIC corpus contains new recordings of US Army veterans with the virtual agent being, this time, fully driven by artificial intelligence, i.e., without any human intervention. Those new recordings are used as a test partition for the DDS and will help to understand how the absence of a human for conducting the virtual agent impacts on automatic depression severity assessment. Performance is evaluated with the Concordance Correlation Coefficient (CCC).

Cross-cultural Emotion Sub-Challenge (CES)

The AVEC 2019 CES is a major extension of the AVEC 2018 CES, where dimensions of emotion were inferred from audio-visual recordings collected “in-the-wild”, i.e., with standard webcams and at home/work place, in a cross-cultural setting: German culture => Hungarian culture (SEWA corpus). This dataset now includes data collected from new participants with Chinese culture, which is used as a test set for the AVEC 2019 CES to investigate how emotion knowledge of Western European cultures (German, Hungarian) can be transferred to the Chinese culture. Performance is evaluated with the Concordance Correlation Coefficient (CCC) averaged over the emotional dimensions.


We encourage contributions aiming at highest performance w.r.t. the baselines provided by the organisers, and contributions aiming at finding new and interesting insights w.r.t. to the topic of these challenges, especially:

Multimodal Affect Sensing

        • Audio-based Health/Emotion Recognition
        • Video-based Health/Emotion Recognition
        • Physiological-based Health/Emotion Recognition
        • Multimodal Representation Learning
        • Transfer Learning
        • Semi-supervised and Unsupervised Learning
        • Multi-view learning of Multiple Dimensions
        • Personalised Health/Emotion Recognition
        • Context in Health/Emotion Recognition
        • Multiple Rater Ambiguity and Asynchrony


        • Multimedia Coding and Retrieval
        • Mobile and Online Applications