We present Tamil-English and Malayalam-English, a dataset of YouTube video comments. The dataset contains all three types of code-mixed sentences Inter-Sentential switch, Intra-Sentential switch, and Tag switching. Most comments were written in native script and Roman script with either Tamil / Malayalam grammar with English lexicon or English grammar with Tamil / Malayalam lexicon. Some comments were written in Tamil / Malayalam script with English expressions in between.Â
For more information and to register, please check the CodaLab link: