A new dataset has been created for this shared task. It is composed of a collection of labelled messages sent to groups on the popular Telegram platform and Twitch.
Telegram is a cross-platform, encrypted and cloud-based messaging service. It is free to use and it allows you to send and receive messages from an individual or a group. We extracted a collection of messages from some public groups where the main language is Spanish. These groups are focused on different topics that are directly related to gambling. The conversations of hundreds of users in these groups are extracted and anonymized.
Additionally, Twitch data is included in the dataset, where messages from users discussing gambling-related topics in live stream chats have been gathered.
For each subcorpus, several hundreds of users are considered, with an average number of messages of N per user:
Corpus task 1 and task 2: 517 users in total (7 users for trial, 350 for training and 160 for testing)
The data associated with each task will be provided to the participants following the established dates.
For the test run the participants will have to make use of our server.
Content Warning: The data contains sensitive information about users suffering from mental disorders. The dataset provided will be used exclusively for the completion of the task, and will not be distributed under any circumstances.