Challenge guidelines

Team registration is now closed. Access to the dataset is however still possible using the following online form: data_request.

See below for more information on the data provided for each Sub-challenge, including data partitioning and baseline features. Information regarding baseline systems, baseline paper, and the submission process for test results are further provided.

The organisers provided the following data:

Bipolar Disorder Sub-challenge (BDS)

    • audio (.wav), video recordings (.mp4) of each subject
    • labels (level of mania and value of the young mania rating scale) and metadata (age, gender)
    • baseline features

Cross-cultural Emotion Sub-challenge (CES)

    • audio (.wav), video recordings (.avi) of each subject
    • labels (gold-standard for arousal, valence, and likability) and metadata (age, gender)
    • baseline features

Gold-standard Emotion Sub-challenge (GES)

    • audio (.wav), video (.mp4) and physiological (.csv) recordings of each subject
    • labels (individual ratings and gold-standard for arousal and valence) and metadata (age, gender)
    • baseline features

Baseline features

A common framework based on open-source toolboxes was exploited to extract baseline features from the multimodal recordings. This framework spans three levels of representation: functionals of LLDs, bag-of-words, and unsupervised deep representation, of three modalities: audio, video, and physiologic.

    • audio: functionals of low-level descriptors (openSMILE), bag-of-audio-words (openXBOW), unsupervised deep representation (DeepSpectrum); low-level descriptors (12 MFCCs plus log energy, eGeMAPS) are additionally provided using openSMILE
    • video: functionals of low-level descriptors (openFACE), bag-of-visual-words (openXBOW); 2D and 3D pixel coordinates of 68 facial landmarks, 3D position and orientation of the head, gaze direction for both eyes, histogram of oriented gradients on the aligned 112x112 area of the face and intensity of 14 facial action units are additionally provided using openFACE

Data partitioning

Three well balanced (in terms of age, gender, and labels) partitions of the datasets were created for each Sub-challenge: training, development, and test, with about 50% of instances in the training partition and 25% in the development and test partitions for BDS and CES, and 33% of instances in each partition for GES. Participants have to stick to the definition of training, development, and test sets as given.

Whereas recordings are provided for all partitions, metadata and labels are not available for the test partition and must be inferred automatically. No manual intervention of any kind on the test partitions is permitted. Test results must be solely the result of a fully automatic process without any human intervention.

Baseline scripts

Scripts for extracting audiovisual features and reproducing baseline performance of each Sub-Challenge can be accessed on the GitHub repository of the AVEC 2018 Challenge.

Test results submission

The number of submissions of test results was limited to five trials per team and Sub-challenge. Whereas the challenge is closed, evaluation on the test sets is still opened. Test results must be submitted by email to with the exact following object: AVEC 2018 Test Sub-Challenge.

Please refer to the followings for the formatting of the test submissions:

  • Bipolar Disorder Sub-Challenge
    • Predictions on the test partition must be provided as a single csv file containing two columns: the instance name and its predicted label; e.g., 'test_001',1, for predicting class 1 (remission) on the instance 'test_001'. Please name the csv file with the name of the Sub-challenge, followed by the short name of your team and the corresponding number of submission, all separated with an underscore, e.g., BDS_EMOTEAM_1.csv for the first submission of the team EMOTEAM.
  • Cross-cultural Emotion Sub-Challenge
    • Predictions on the test partitions (DE, i.e., German culture, and HU, i.e., Hungarian culture) must be formatted exactly the same as provided in the labels, i.e., one csv file per recording containing four columns: instance name, time code in seconds, predicted arousal, predicted valence, and predicted likability, e.g., 'Test_01';17.600000;0.625409;0.745017;-0.389730. Timings must start from 0, have an interval of 100ms, and reach the last frame of the audio recording floored to the first decimal. For submission, a single zip file containing all 16 test files for DE (optional but highly recommended), and all 66 test files for HU (mandatory for rankings) in a folder named with the name of the Sub-challenge, followed by the short name of your team and the corresponding number of submission, all separated with an underscore, e.g., contains a folder CES_EMOTEAM_1 with all test files.
  • Gold-standard Emotion Sub-Challenge
    • Submission consists in an executable system that generates a single time series of an emotional dimension, e.g., arousal or valence, with an hop-size of 400ms, which is referred as a "Gold-standard". The system takes as inputs the individual ratings (hop-size: 40ms) of an emotional dimension (arousal or valence) of a recording, and eventually corresponding features, either from those provided in the baseline system or extracted from the corresponding recording, and generates a single time series of emotion labels that is saved in the same format and structure (one folder named arousal and another named valence with their respective gold-standard) as given in the exemplary gold-standard folder, i.e., one arff file per recording, containing three attributes: Instance_name (string), frameTime (numeric), and GoldStandard (numeric). Timings must start from 0 and reach the last frame of the audio recording rounded to a factor of 400ms.
    • The system can be provided as a package written in Matlab (compatible with R2015b), Python, Perl, Shell, C/C++ (other formats are possible as long as easily usable), and must be easy to configure; structure and name of the files provided in the repository and used by the system must be preserved, a main path to the folder containing the dataset should be thus provided. An encrypted version of the executable can be provided if necessary. Once the system is successfully configured, the generated Gold-standard is then checked with respect to the preservation of the original unexplained variance of the original annotations. If this verification is validated, the baseline emotion recognition system is used with the generated Gold-standard to evaluate the performance on the development and test sets of the dataset.

Baseline paper

The paper introducing the AVEC 2018 Challenge can be viewed here.