Dravidian-CodeMix-HASOC2020
IDRBT - HyderabadI 16th-20th December
HASOC- Offensive Language Identification-Track2-DravidianCodeMix
People use offensive content in their social media posts to degrade an individual or religion or other organizations in many respects. The identification of such social media posts is a necessity. A substantial amount of work has been done in languages like English. However, the offensive language identification in Indian language scenario is still an unexplored area. One of the key reasons is the code-mixing.
People use offensive content in their social media posts to degrade an individual or religion or other organizations in many respects. The identification of such social media posts is a necessity. A substantial amount of work has been done in languages like English. However, the offensive language identification in Indian language scenario is still an unexplored area. One of the key reasons is the code-mixing.
The goal of this task is to identify the offensive language of the code-mixed dataset of comments/posts in Dravidian Languages (Tamil-English and Malayalam-English and ) collected from social media. Each comment/post is annotated with offensive language label at the comment/post level. The data set has been collected from YouTube comments and Tweets.
The goal of this task is to identify the offensive language of the code-mixed dataset of comments/posts in Dravidian Languages (Tamil-English and Malayalam-English and ) collected from social media. Each comment/post is annotated with offensive language label at the comment/post level. The data set has been collected from YouTube comments and Tweets.
This shared task is the Sub-Track of Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) at FIRE-2020. Competition link is https://competitions.codalab.org/competitions/25295
This shared task is the Sub-Track of Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) at FIRE-2020. Competition link is https://competitions.codalab.org/competitions/25295
The participants will be provided development, training and dev/test dataset.
The participants will be provided development, training and dev/test dataset.
Task1:
Task1:
This is a message-level label classification task. Given a YouTube comment in Code-mixed (Mixture of Native and Roman Script) Tamil and Malayalam, systems have to classify it into offensive or not-offensive.
This is a message-level label classification task. Given a YouTube comment in Code-mixed (Mixture of Native and Roman Script) Tamil and Malayalam, systems have to classify it into offensive or not-offensive.
Task 2:
Task 2:
This is a message-level label classification task. Given a tweet or YouTube comments in Tanglish and Manglish (Tamil and Malayalam using written using Roman Script), systems have to classify it into offensive or not-offensive.
This is a message-level label classification task. Given a tweet or YouTube comments in Tanglish and Manglish (Tamil and Malayalam using written using Roman Script), systems have to classify it into offensive or not-offensive.
As far as we know, this is the first shared task on Offensive language in Dravidian Code-Mixed text.
As far as we know, this is the first shared task on Offensive language in Dravidian Code-Mixed text.