Sentiment Analysis and Homophobia detection of YouTube comments in Code-Mixed Dravidian Languages

FIRE 2022

Sentiment analysis is the task of identifying subjective opinions or emotional responses about a given topic. It has been an active area of research in the past two decades in both academia and industry. There is an increasing demand for sentiment detection on social media texts which are largely code-mixed for Dravidian languages. Code-mixing is a prevalent phenomenon in a multilingual community and the code-mixed texts are sometimes written in non-native scripts. Systems trained on monolingual data fail on code-mixed data due to the complexity of code-switching at different linguistic levels in the text. The shared Task - A presents a new gold standard corpus for sentiment detection of code-mixed text in Dravidian languages (Tamil-English, Malayalam-English, and Kannada-English). The shared Task - B presents Homophobia and Transphobia Detection is the task of identifying homophobia, transphobia, and non-anti-LGBT+ content from the given corpus. Homophobia and transphobia are both toxic languages directed at LGBTQ+ individuals that are described as hate speech.

The goal of this task is to identify sentiment polarity of the code-mixed dataset of comments / posts in Tamil-English, Malayalam-English, and Kannada-English collected from social media. The comment / post may contain more than one sentence but the average sentence length of the corpora is 1. Each comment / post is annotated with sentiment polarity at the comment / post level. This dataset also has class imbalance problems depicting real world scenarios. Our proposal aims to encourage research that will reveal how sentiment is expressed in code-mixed scenarios on social media..

The participants will be provided development, training and test dataset.

Task A: This is a message-level polarity classification task. Given a Youtube comment, systems have to classify it into positive, negative, neutral, or mixed emotions. The participants will be provided development, training and test dataset code-mixed text in Dravidian languages (Tamil-English, Malayalam-English, and Kannada-English)

Task B: In this share task, participants will be provided with comments extracted from social media platforms and are expected to develop and submit systems to predict whether it is homophobic/transphobic in nature. The seed data for this task is the Homophobia/Transphobia Detection dataset [1], a collection of comments from YouTube. This dataset consists of manually annotated comments indicating whether if the text is homophobic/transphobic or not. The participants will be provided development, training, and test dataset in English, Malayalam, and Tamil.

References:

[1] Chakravarthi, B.R., Priyadharshini, R., Ponnusamy, R., Kumaresan, P.K., Sampath, K., Thenmozhi, D., Thangasamy, S., Nallathambi, R. and McCrae, J.P., 2021. Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments. arXiv preprint arXiv:2109.00227.

Page updated

Google Sites

Report abuse