Challenge guidelines


In order to register a team to AVEC 2018, please complete the following online form: challenge registration.

By registering to AVEC 2018, you agree that your team name, team members' affiliation, and the performance obtained in the Sub-challenge(s) can be listed on the challenge website and future publications; submitted predictions might also be further exploited for meta-learning.

Once your registration will be validated, you will have access to the data and can directly start your experiments with the training and development partitions. All Sub-challenges allow contributors to find their own features with their own machine learning algorithm; the use of external ressources such as pre-trained models is allowed. Once you found your best method you should prepare your paper for the Workshop. At the same time you can compute your results per instance of the test set and send them to the organisers; a maximum of five submissions of the results on the test set per Sub-challenge is allowed, and the organisers, whom do not take part in the competition, use the same number of trials for running evaluations on the test partitions. Once we receive your submission, we will then let you know your performance result. Please note that at least one upload on the test set is mandatory and each participation has to be accompanied by a paper presenting the results (see more on the Submission policy).

See below for more information on the data provided for each Sub-challenge, including data partitioning and baseline features. Information regarding baseline systems, baseline paper, and the submission process for test results are further provided.

The organisers provide the following data:

Bipolar Disorder Sub-challenge (BDS)

    • audio (.wav), video recordings (.mp4) of each subject
    • labels (level of mania and value of the young mania rating scale) and metadata (age, gender)
    • baseline features

Cross-cultural Emotion Sub-challenge (CES)

    • audio (.wav), video recordings (.avi) of each subject
    • labels (gold-standard for arousal, valence, and likability) and metadata (age, gender)
    • baseline features

Gold-standard Emotion Sub-challenge (GES)

    • audio (.wav), video (.mp4) and physiological (.csv) recordings of each subject
    • labels (individual ratings and gold-standard for arousal and valence) and metadata (age, gender)
    • baseline features

Baseline features

A common framework based on open-source toolboxes has been exploited to extract baseline features from the multimodal recordings. This framework spans three levels of representation: functionals of LLDs, bag-of-words, and unsupervised deep representation, of three modalities: audio, video, and physiologic.

    • audio: functionals of low-level descriptors (openSMILE), bag-of-audio-words (openXBOW), unsupervised deep representation (DeepSpectrum); low-level descriptors (12 MFCCs plus log energy, eGeMAPS) are additionally provided using openSMILE
    • video: functionals of low-level descriptors (openFACE), bag-of-visual-words (openXBOW); 2D and 3D pixel coordinates of 68 facial landmarks, 3D position and orientation of the head, gaze direction for both eyes, histogram of oriented gradients on the aligned 112x112 area of the face and intensity of 14 facial action units are additionally provided using openFACE

Data partitioning

Three well balanced (in terms of age, gender, and labels) partitions of the datasets were created for each Sub-challenge: training, development, and test, with about 50% of instances in the training partition and 25% in the development and test partitions for BDS and CES, and 33% of instances in each partition for GES. Participants have to stick to the definition of training, development, and test sets as given.

Whereas recordings are provided for all partitions, metadata and labels are not available for the test partition and must be inferred automatically. No manual intervention of any kind on the test partitions is permitted. Test results must be solely the result of a fully automatic process without any human intervention.

Baseline scripts

Scripts for extracting audiovisual features and reproducing baseline performance of each Sub-Challenge can be accessed on the GitHub repository of the AVEC 2018 Challenge.

Test results submission

The number of submissions of test results is limited to five trials per team and Sub-challenge. Participants who registered with several teams do not have extra-submissions available, and need to share the number of five trials between the teams. Test results must be submitted by email to fabien.ringeval@imag.fr with the exact following object:

AVEC 2018 Test Submission Teamname

With Teamname being the short name of the team you registered. Evaluation and feedback on test performance might take up to 48 business hours. Please refer to the followings for the formatting of the test submissions:

  • Bipolar Disorder Sub-Challenge
    • Predictions on the test partition must be provided as a single csv file containing two columns: the instance name and its predicted label; e.g., 'test_001',1, for predicting class 1 (remission) on the instance 'test_001'. Please name the csv file with the name of the Sub-challenge, followed by the short name of your team and the corresponding number of submission, all separated with an underscore, e.g., BDS_EMOTEAM_1.csv for the first submission of the team EMOTEAM.
  • Cross-cultural Emotion Sub-Challenge
    • Predictions on the test partitions (DE, i.e., German culture, and HU, i.e., Hungarian culture) must be formatted exactly the same as provided in the labels, i.e., one csv file per recording containing four columns: instance name, time code in seconds, predicted arousal, predicted valence, and predicted likability, e.g., 'Test_01';17.600000;0.625409;0.745017;-0.389730. Timings must start from 0, have an interval of 100ms, and reach the last frame of the audio recording floored to the first decimal. For submission, a single zip file containing all 16 test files for DE (optional but highly recommended), and all 66 test files for HU (mandatory for rankings) in a folder named with the name of the Sub-challenge, followed by the short name of your team and the corresponding number of submission, all separated with an underscore, e.g., CES_EMOTEAM_1.zip contains a folder CES_EMOTEAM_1 with all test files.
  • Gold-standard Emotion Sub-Challenge
    • Submission consists in an executable system that generates a single time series of an emotional dimension, e.g., arousal or valence, with an hop-size of 400ms, which is referred as a "Gold-standard". The system takes as inputs the individual ratings (hop-size: 40ms) of an emotional dimension (arousal or valence) of a recording, and eventually corresponding features, either from those provided in the baseline system or extracted from the corresponding recording, and generates a single time series of emotion labels that is saved in the same format and structure (one folder named arousal and another named valence with their respective gold-standard) as given in the exemplary gold-standard folder, i.e., one arff file per recording, containing three attributes: Instance_name (string), frameTime (numeric), and GoldStandard (numeric). Timings must start from 0 and reach the last frame of the audio recording rounded to a factor of 400ms.
    • The system can be provided as a package written in Matlab (compatible with R2015b), Python, Perl, Shell, C/C++ (other formats are possible as long as easily usable), and must be easy to configure; structure and name of the files provided in the repository and used by the system must be preserved, a main path to the folder containing the dataset should be thus provided. An encrypted version of the executable can be provided if necessary. Once the system is successfully configured, the generated Gold-standard is then checked with respect to the preservation of the original unexplained variance of the original annotations. If this verification is validated, the baseline emotion recognition system is used with the generated Gold-standard to evaluate the performance on the development and test sets of the dataset.