Dravidian-CodeMix-HASOC2020

IDRBT - HyderabadI 16th-20th December

HASOC- Offensive Language Identification-Track2-DravidianCodeMix

Call for participation

Registration

Important Dates

Organizers

Datasets

People use offensive content in their social media posts to degrade an individual or religion or other organizations in many respects. The identification of such social media posts is a necessity. A substantial amount of work has been done in languages like English. However, the offensive language identification in Indian language scenario is still an unexplored area. One of the key reasons is the code-mixing.

The goal of this task is to identify the offensive language of the code-mixed dataset of comments/posts in Dravidian Languages (Tamil-English and Malayalam-English and ) collected from social media. Each comment/post is annotated with offensive language label at the comment/post level. The data set has been collected from YouTube comments and Tweets.

This shared task is the Sub-Track of Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) at FIRE-2020. Competition link is https://competitions.codalab.org/competitions/25295

The participants will be provided development, training and dev/test dataset.

Task1:

This is a message-level label classification task. Given a YouTube comment in Code-mixed (Mixture of Native and Roman Script) Tamil and Malayalam, systems have to classify it into offensive or not-offensive.

Task 2:

This is a message-level label classification task. Given a tweet or YouTube comments in Tanglish and Manglish (Tamil and Malayalam using written using Roman Script), systems have to classify it into offensive or not-offensive.

As far as we know, this is the first shared task on Offensive language in Dravidian Code-Mixed text.

Page updated

Google Sites

Report abuse