Dataset
Dataset
The CoLI-Kenglish dataset consists of English and Kannada words in Roman script and are grouped into six major categories, namely, “Kannada”, “English”, “Mixed-language”, “Name”, “Location” and “Other” and submit their methods in Kanglish shared task where each word will be identified and categorized in one of earlier mentioned categories.
Table 1 presents the description of labels in CoLI-Kenglish dataset.
Dataset Statistics
Dataset Statistics
Table 2 presents the labels distribution in train set. The statistics for test set will be released later.