SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis
Introduction
Intimacy is a fundamental social aspect of language. This SemEval shared task focuses on predicting the intimacy of tweets in 10 languages. This task is co-organized by University of Michigan and Snap Inc.
The training data is now available on our codalab competition site (participate->files->public data). You could check out our task paper for more details about the dataset and baseline performance.
[ALERT] You might see offensive or sexual content in the dataset.
Task description
The goal of this task is to predict the intimacy of tweets in 10 languages. You are given a set of tweets in six languages (English, Spanish, Italian, Portuguese, French, and Chinese) annotated with intimacy scores ranging from 1-5 to train your model.
You are encouraged (but not required) to also use the question intimacy dataset (Pei and Jurgens, 2020) which contains 2247 English questions from Reddit as well as another 150 questions from Books, Movies, and Twitter. Please note that the intimacy scores in this dataset range from -1 to 1 so you might need to consider data augmentation methods or other methods mapping the intimacy scores to the 1-5 range in the current task. Please check out the paper for more details about this question intimacy dataset.
The model performance will be evaluated on the test set in the given 6 languages as well as an external test set with 4 languages not in the training data (Hindi, Arabic, Dutch and Korean).
We will use Pearson's r as the evaluation metric.
Important Dates
Training data ready: 26 September 2022
Evaluation start: 10 January 2023
Evaluation end by: 31 January 2023
System paper submission due: February 2023
Task paper submission due: February 2023
Notification to authors: March 2023
Camera ready due: April 2023
SemEval workshop Summer 2023 (co-located with a major NLP conference)
Dataset
The training data is now available on our codalab competition site (participate->files->public data). Please note that you might see offensive or sexual content in our dataset.
Organizers
Jiaxin Pei, University of Michigan
Francesco Barbieri, Snap Inc.
Vítor Silva, Snap Inc.
Maarten Bos, Snap Inc.
Yozen Liu, Snap Inc.
Leonardo Neves, Snap Inc.
David Jurgens, University of Michigan
References
Jiaxin Pei and David Jurgens. 2020. Quantifying Intimacy in Language. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5307–5326, Online. Association for Computational Linguistics.
Jiaxin Pei, Vítor Silva, Maarten Bos, Yozon Liu, Leonardo Neves, David Jurgens and Francesco Barbieri. 2022. SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis In arXiv:2210.01108