The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) “State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect” was a satellite event of ACM MM 2019, (Nice, France, 21 October 2019), and the ninth competition aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual, and audio-visual health and emotion sensing, with all participants competing under strictly the same conditions.

The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the audio, visual and audio-visual affect recognition communities, to compare the relative merits of the approaches to automatic health and emotion analysis under well-defined conditions. Another motivation is the need to advance health and emotion recognition systems to be able to deal with fully naturalistic behaviour in large volumes of un-segmented, non-prototypical and non-preselected data, as this is exactly the type of data that both multimedia and human-machine/human-robot communication interfaces have to face in the real world.

We called for teams to participate in three Sub-Challenges:

State-of-Mind Sub-Challenge (SoMS)

The AVEC 2019 SoMS was a new task focusing on the continuous adaptation of human state-of-mind (SOM), which is pivotal for mental functioning and behaviour regulation. Human SOM constantly shifts due to internal and external stimuli, and habitual use of either adaptive or maladaptive SOM influences mental health. One key aspect of the human experience is our emotions, as they reflect our SOM. In the SoMS, self-reported mood (10-point Likert scale), after the narrative of personal stories (two positive and two negative), had to be predicted automatically from audio-visual recordings (USoM corpus). Performance was evaluated with the Concordance Correlation Coefficient (CCC).

Detecting Depression with AI Sub-Challenge (DDS)

The AVEC 2019 DDS was a major extension of the AVEC 2016 DSC, where the level of depression severity (PHQ-8 questionnaire) was assessed from audio-visual recordings of US Army veterans interacting with a virtual agent conducting a clinical interview and driven by a human as a Wizard-of-Oz (DAIC-WOZ corpus). The DAIC corpus contains new recordings with the virtual agent being, this time, fully driven by artificial intelligence, i.e., without any human intervention. Those new recordings were used as a test partition for the DDS and helped to understand how the absence of a human for conducting the virtual agent impacts on automatic depression severity assessment. Performance was evaluated with the Concordance Correlation Coefficient (CCC).

Cross-cultural Emotion Sub-Challenge (CES)

The AVEC 2019 CES was a major extension of the AVEC 2018 CES, where dimensions of emotion were inferred from audio-visual recordings collected “in-the-wild”, i.e., with standard webcams and at home/work place, in a cross-cultural setting: German culture => Hungarian culture (SEWA corpus). This dataset was extended to include data collected from new participants with Chinese culture, which was used as a test set for the AVEC 2019 CES to investigate how emotion knowledge of Western European cultures (German, Hungarian) can be transferred to the Chinese culture. Performance was evaluated with the Concordance Correlation Coefficient (CCC) for the Chinese culture, and was averaged over the emotional dimensions.


We encouraged contributions aiming at highest performance w.r.t. the baselines provided by the organisers, and contributions aiming at finding new and interesting insights w.r.t. to the topic of these challenges, especially:

Multimodal Affect Sensing

        • Audio-based Health/Emotion Recognition
        • Video-based Health/Emotion Recognition
        • Physiological-based Health/Emotion Recognition
        • Multimodal Representation Learning
        • Transfer Learning
        • Semi-supervised and Unsupervised Learning
        • Multi-view learning of Multiple Dimensions
        • Personalised Health/Emotion Recognition
        • Context in Health/Emotion Recognition
        • Multiple Rater Ambiguity and Asynchrony


        • Multimedia Coding and Retrieval
        • Mobile and Online Applications


The sponsor of the AVEC 2019 Challenge, audEERING GmbH, and his partner Jabra, supported the Challenge by offering one headphone to the winner of each Sub-challenge. Prices were handed to the speaker of the winning team during the Workshop.

1. Detecting Depression with AI Sub-challenge:

Jabra Elite 85h featuring audEERING's acoustic scene detection technology

=> Awarded to Prerana Mukherjee, IIIT Sri City, India

2. Cross-cultural Emotion Sub-challenge: Jabra Elite Active 65t

=> Awarded to Shizhe Chen, Renmin University of China

3. State-of-mind Sub-challenge: Jabra Elite 65t

=> Awarded to Yan Li, Peng Chen Laboratory, China