Dataset

A new dataset for each mental disorder (eating disorder, depression, non-defined disorder) has been created for this shared task. They are composed of a collection of labelled messages sent to groups on the popular Telegram platform.

Telegram is a cross-platform, encrypted and cloud-based messaging service. It is free to use and it allows you to send and receive messages from an individual or a group. We extracted a collection of messages from some public groups where the main language is Spanish. These groups are focused on different topics that are directly related to mental disorders. The conversations of hundreds of users in these groups are extracted and anonymized. Then, a manual labelling process took place through the Prolific service, which connected annotators with our hosted labelling tool. Each user’s history has been labelled by ten annotators. The probability of a disorder is established by dividing the number of annotators that found evidence of suffering from the targeted disorder (thus, marking that user as positive) by the number of total annotators (ten, as it has been said). This measure allows for regression analysis of systems, so a prediction tool can be evaluated not only on a majority vote but also on how close it is to the confidence of a group of human judgments.

The corpus is divided into three subsets, each related to a different disorder. For each subcorpus, several hundreds of users are considered, with an average number of messages of 50 per user:

Corpus eating disorders: 335 users in total. There are 10 users for trial, 175 for training, and 150 for testing.
Corpus depression: 335 users in total. There are 10 users for trial, 175 for training, and 150 for testing.
Unknown disorder: 150 testing users.

The data associated with each task will be provided to the participants following the established dates.

For the test run the participants will have to make use of our server.

Content Warning: The data contains sensitive information about users suffering from mental disorders. The dataset provided will be used exclusively for the completion of the task, and will not be distributed under any circumstances.

Page updated

Google Sites

Report abuse