Datasets
The data is in the following format
The data is in the following format
Comment label
Comment label
Intha padam vantha piragu yellarum Thala ya kondaduvanga positive
Intha padam vantha piragu yellarum Thala ya kondaduvanga positive
Tamil-English: 15744 comments, Train: 11,335 Validation: 1,260 and Test: 3,149
Tamil-English: 15744 comments, Train: 11,335 Validation: 1,260 and Test: 3,149
Malayalam-English: 6,739 comments, Train: 4,851 Validation: 541 and Test: 1,348
Malayalam-English: 6,739 comments, Train: 4,851 Validation: 541 and Test: 1,348
We present Tamil-English and Malayalam-English, a dataset of YouTube video comments. The dataset contains all the three types of code-mixed sentences -- Inter-Sentential switch, Intra-Sentential switch and Tag switching. Most comments were written in Roman script with either Tamil / Malayalam grammar with English lexicon or English grammar with Tamil / Malayalam lexicon. Some comments were written in Tamil / Malayalam script with English expressions in between.
We present Tamil-English and Malayalam-English, a dataset of YouTube video comments. The dataset contains all the three types of code-mixed sentences -- Inter-Sentential switch, Intra-Sentential switch and Tag switching. Most comments were written in Roman script with either Tamil / Malayalam grammar with English lexicon or English grammar with Tamil / Malayalam lexicon. Some comments were written in Tamil / Malayalam script with English expressions in between.
Malayalam trail data: https://drive.google.com/file/d/1a7oq6rUMsjIMbBzwsN2jQfCZYcfmhc6_/view?usp=sharing
Malayalam trail data: https://drive.google.com/file/d/1a7oq6rUMsjIMbBzwsN2jQfCZYcfmhc6_/view?usp=sharing
More details about the dataset are in the papers "A Sentiment Analysis Dataset for Code-Mixed Malayalam-English" and "Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text"
More details about the dataset are in the papers "A Sentiment Analysis Dataset for Code-Mixed Malayalam-English" and "Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text"