SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis

Introduction

Intimacy is a fundamental social aspect of language. This SemEval shared task focuses on predicting the intimacy of tweets in 10 languages. This task is co-organized by University of Michigan and Snap Inc.

The training data is now available on our codalab competition site (participate->files->public data). You could check out our task paper for more details about the dataset and baseline performance.

[ALERT] You might see offensive or sexual content in the dataset.

Task description

    • The goal of this task is to predict the intimacy of tweets in 10 languages. You are given a set of tweets in six languages (English, Spanish, Italian, Portuguese, French, and Chinese) annotated with intimacy scores ranging from 1-5 to train your model.

    • You are encouraged (but not required) to also use the question intimacy dataset (Pei and Jurgens, 2020) which contains 2247 English questions from Reddit as well as another 150 questions from Books, Movies, and Twitter. Please note that the intimacy scores in this dataset range from -1 to 1 so you might need to consider data augmentation methods or other methods mapping the intimacy scores to the 1-5 range in the current task. Please check out the paper for more details about this question intimacy dataset.

    • The model performance will be evaluated on the test set in the given 6 languages as well as an external test set with 4 languages not in the training data (Hindi, Arabic, Dutch and Korean).

    • We will use Pearson's r as the evaluation metric.


Important Dates

  • Training data ready: 26 September 2022

  • Evaluation start: 10 January 2023

  • Evaluation end by: 31 January 2023

  • System paper submission due: February 2023

  • Task paper submission due: February 2023

  • Notification to authors: March 2023

  • Camera ready due: April 2023

  • SemEval workshop Summer 2023 (co-located with a major NLP conference)

Dataset

The training data is now available on our codalab competition site (participate->files->public data). Please note that you might see offensive or sexual content in our dataset.

Organizers

Contacts

Please join our google group for direct conversations.

Twitter: intimacy_sem23

Email: tweet.intimacy.semeval2023@gmail.com

References