Scope and topics

AVEC 2018 is an ACM MM Challenge Workshop themed around two topics: for the first time in a challenge bipolar disorder and emotion recognition. Bipolar disorder (BD) is a serious mental health disorder, with patients experiencing either manic or depressive episodes. Those with BD tend to live with this long-term. The purpose of the Audio/Visual Emotion Challenge and Workshop (AVEC) series is to bring together multiple communities from different disciplines, in particular the audio-visual multimedia communities and those in the psychological and social sciences who study expressive behaviour and emotion. One of the goals of AVEC is to compare the relative merits of different modalities (e.g., audio and video) under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. A second motivation is the need to advance health and emotion recognition systems to be able to deal with fully naturalistic behaviour in large volumes of un-segmented, non-prototypical and non-preselected data as this is exactly the type of data that new generation of affect-oriented multimedia and human-machine/human-robot communication interfaces have to face in the real world.

AVEC 2018 introduces major novelties this year with three separated sub-challenges:

Bipolar Disorder Sub-challenge (BDS) — participants have to classify patients suffering from bipolar disorder into remission, hypo-mania and mania, as defined by the young mania rating scale, from audio-visual recordings of structured interviews (BD corpus); performance is measured by the unweighted average recall over the three classes.
Cross-cultural Emotion Sub-challenge (CES) — participants have to predict the level of three emotional dimensions (arousal, valence, and likability) time-continuously in a cross-cultural setup (German => Hungarian) from audio-visual recordings of dyadic interactions (SEWA corpus); performance is the concordance correlation coefficient (CCC) averaged over the dimensions.
Gold-standard Emotion Sub-Challenge (GES) — participants have to generate a reliable gold-standard (i.e., a single time series preserving variance of emotion labels) from individual ratings of emotional dimensions (arousal, valence) that will be evaluated by a baseline multimodal (audio, video, physiology) emotion recognition system from recordings of dyadic interactions (RECOLA corpus); performance is the concordance correlation coefficient (CCC) averaged over the dimensions, and a condition on the unexplained variance between the generated gold-standard and the (original) individual annotations is used to ensure that sufficient variability is preserved in the labels.

The AVEC 2018 BDS is a new first of its kind task in the scope of mental health analysis. Whereas the topic of depression analysis was featured in the previous editions of AVEC, we introduce this year for the first time in a challenge the analysis of bipolar disorder, which rank among the top-ten mental disorder for adults according to the World Health Organisation. The dataset used for the AVEC 2018 BDS includes audiovisual recordings of structured interviews performed by 47 Turkish speaking subjects aged 18-53 (BD corpus). All those subjects suffered from bipolar disorder and were recruited from a mental health service hospital where they were diagnosed by clinicians following DSM-5’s inclusion criteria.

The AVEC 2018 CES is a major extension of the Emotion Sub-Challenge previously run in AVEC 2017 and based on the SEWA dataset (German culture). This dataset includes online video chats – recorded “in-the-wild”, i.e., with standard webcams and at home/work place – between 32 pairs of participants aged 18-60+ and discussing about advertisements they watched. This dataset has been extended and now includes data collected from 32 new pairs of participants with Hungarian culture and in the same age range as for the German culture. For the AVEC 2018 CES, the extended version of the SEWA dataset is used in the context of the first ever cross-cultural emotion recognition competition task, in order to quantify how emotion knowledge of a culture can be transferred to another: participants have data from one culture (German) to train and optimise an audio-visual emotion recognition system, and data from the second culture (Hungarian) is used solely for evaluation purposes (labels unknown).

The AVEC 2018 GES is a new task focusing on the generation of dimensional emotion labels that are used both for training and testing emotion recognition systems. The task consists in creating a single time-series of emotion labels, usually referred as “gold-standard”, from a pool of time-continuous annotations of dimensional emotions provided by several annotators. Many difficulties arise in the generation of this gold-standard, as inconsistencies appear in the reported annotation values – even for a single evaluator –, and a delay is present between the emotional event expressed in the data and the corresponding annotation value reported (time-continuously) by the annotator. Moreover, some annotators might be more reliable than others according to their familiarity with the expressed behaviours. In this Sub-Challenge, participants have to generate a reliable gold-standard from individual ratings of dimensional emotions that will be then used to train and evaluate a baseline multimodal emotion recognition system on the RECOLA dataset, which includes audio-visual and physiological recordings of dyadic interactions from 27 French speaking subjects aged 18-25. By reliable, we mean that the produced gold-standard should not only maximise the performance of the baseline emotion recognition system, but it also needs to preserve a sufficient amount of the variance found in the original individual annotations to be validated.

We encourage both - contributions aiming at highest performance w.r.t. the baselines provided by the organisers, and contributions aiming at finding new and interesting insights w.r.t. these challenges. Besides participation in the challenge, we are also encouraging submissions of original contributions on the following topics (not limited to):

Multimodal Affect Sensing

Audio-based Health/Emotion Recognition
Video-based Health/Emotion Recognition
Physiological-based Health/Emotion Recognition
Multimodal Representation Learning
Semi-supervised and Unsupervised Learning
Multi-view learning of Multiple Dimensions
Personalised Health/Emotion Recognition
Context in Health/Emotion Recognition
Multiple Rater Ambiguity and Asynchrony

Application

Multimedia Coding and Retrieval
Mobile and Online Applications

Read more about the Challenge guidelines in this section.