Task 1: Homophobia and Transphobia Detection in Social Media Comments
Homophobia and Transphobia Detection involves identifying homophobic, transphobic, and non-anti-LGBTQ+ content in YouTube comments. Both homophobia and transphobia represent toxic language directed at LGBTQ+ individuals, often categorized as hate speech. While comments in the dataset may consist of multiple sentences, the average length of comments is typically one sentence. The corpus includes annotations at the comment or post level. The initial data for this task is sourced from the Homophobia/Transphobia Detection dataset [3], which comprises manually annotated comments from social media that indicate whether the text contains homophobia or transphobia.
Participants are provided with sentences extracted from comments on social media platforms. Systems must predict whether a given comment contains any form of homophobia or transphobia.
subTask 1: H&T Multilingual Classification Task
Objective: Classify comments into three categories: Homophobia, Transphobia, and None of the Above.
Languages: This task will be conducted in multiple languages, specifically English, Tamil, Telugu, Kannada, Gujarati, Hindi, Malayalam, Tulu, Tamil-English, and Marathi.
Special Focus on Tulu: Given the scarcity of resources like annotated corpora for under-resourced languages like Tulu, this task presents a unique challenge. We have introduced a code-mixed Tulu dataset specifically designed to detect homophobic and transphobic content. This dataset aims to promote research in few-shot learning, pushing the boundaries of what's possible in language processing for low-resource contexts.
subTask 2: H&T Span Detection
Objective: Identify specific spans within comments that contain instances of homophobia and transphobia.
Languages: English, Tamil, and Marathi.
Details: Participants will be provided with comments and are required to classify these comments at the span level. This task requires a deeper level of text understanding and precision, as participants must discern and highlight the textual evidence for homophobia or transphobia within the comments.
CodaLab: https://codalab.lisn.upsaclay.fr/competitions/21864
Organizers
Bharathi Raja Chakravarthi, School of Computer Science, University of Galway, Ireland
Ruba Priyadharshini, Gandhigram Rural Institute-Deemed to be university
Prasanna Kumar Kumaresan, Data Science Institute, University of Galway, Ireland
Paul Buitelaar, Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway, Ireland
Malliga Subramanian, Kongu Engineering College, Tamil Nadu, India
Student Volunteer:
Kishore Kumar Ponnusamy, Digital University of Kerala, India
Task 2: Misogyny Meme Detection
The Shared Task on Misogyny Meme Detection is part of LT-EDI@LDK 2025 and focuses on a multimodal machine learning challenge. This task invites researchers, practitioners, and enthusiasts from the Natural Language Processing (NLP) and computer vision communities to develop robust systems for detecting misogynistic content in memes. Given the increasing prevalence of misogynistic speech in online spaces, this task aims to foster innovative solutions to identify and mitigate harmful content, particularly in Chinese social media discourse.
Participants will develop models that can analyze both textual and visual elements of memes sourced from social media platforms. The objective is to classify memes into two categories: Misogynistic or Non-misogynistic. This task presents a multilingual and multimodal challenge, as participants will be required to process textual content in Chinese while also considering visual cues within memes. By tackling this challenge, the shared task aims to advance research in hate speech detection, multimodal classification, and ethical AI development. To download the data and participate, go to CodaLab and click the “Participate tab”.
CodaLab: https://codalab.lisn.upsaclay.fr/competitions/21880
Organizers:
Bharathi Raja Chakravarthi, School of Computer Science, University of Galway, Ireland
Rahul Ponnusamy, Data Science Institute, University of Galway, Ireland
Ping Du, University of Galway, Ireland
Xiaojian Zhuang, University of Galway, Ireland
Saranya Rajiakodi, Department of Computer Science, Central University of Tamil Nadu, India
Paul Buitelaar, Data Science Institute, University of Galway, Ireland.
Premjith B, Amrita School of Artificial Intelligence, Amrita Vishwa Vidyapeetham, Coimbatore, Amrita Vishwa Vidyapeetham, India
Bhuvaneswari Sivagnanam, Department of Computer Science, Central University of Tamil Nadu, India
Anshid Kizhakkeparambil, WMO Imam Gazzali Arts and Science College, Kerala, India
Lavanya S.K., Anna University, Tamil Nadu, India
Task 3. Caste and Migration Hate Speech Detection
Hate speech targeting caste and migration communities is a growing concern in online platforms, particularly in linguistically diverse regions. As part of LT-EDI@LDK 2025, this shared task on Caste and Migration Hate Speech Detection aims to advance research in text classification, social media analysis, and ethical AI development. By focusing on Tamil-language content, this task provides a unique opportunity to tackle caste or migration-related hate speech detection in low-resource languages, contributing to a safer and more inclusive digital space.
Participants are required to develop automatic classification models that can analyze text from social media platforms and determine whether it contains caste-based or migration-related hate speech. The classification labels for this task include: Caste/Migration-related Hate Speech or Non-Caste/Migration-related Hate Speech. To download the data and participate, go to CodaLab and click the “Participate tab”.
CodaLab: https://codalab.lisn.upsaclay.fr/competitions/21884
Organizers:
Saranya Rajiakodi, Central University of Tamil Nadu, India
Bharathi Raja Chakravarthi, School of Computer Science, University of Galway, Ireland
Rahul Ponnusamy, Data Science Institute, University of Galway, Ireland
Shunmuga Priya MC, Data Science Institute, University of Galway, Ireland
Sathiyaraj Thangasamy, Department of Tamil, Sri Krishna Adithya College of Arts and Science, Tamil Nadu, India
Bhuvaneswari Sivagnanam, Central University of Tamil Nadu, India
Balasubramanian Palani, Indian Institute of Information Technology Kottayam, Kerala, India
Kogilavani Shanmugavadivel, Kongu Engineering College, Tamil Nadu, India
Abirami Murugappan, Department of Information Science and Technology, Anna University
Student Volunteer:
Charmathi Rajkumar, The American College, Madurai, Tamil Nadu, India
Task 4: Detecting Racial Hoaxes in Code-Mixed Hindi-English Social Media Data
As a type of misinformation, hoaxes seek to propagate incorrect information in order to gain popularity on social media. Racial hoaxes are a particular kind of hoax that is particularly harmful since they falsely link individuals or groups to crimes or incidents. This involves nuanced challenges of identifying false accusations, fabrications, and stereotypes that falsely impact other social, ethnic, or out groups in negative actions.
In this shared task, we present HoaxMixPlus, a racial hoax-annotated corpus of 5,105 YouTube comment postings in code-mixed Hindi-English, addressing the challenge of misinformation in low-resource languages. Racial hoaxes falsely associate individuals or groups with crimes or incidents, negatively impacting social or ethnic communities. The dataset aims to fill the gap in annotated code-mixed data, providing a benchmark for identifying racial hoaxes in code-mixed Hindi-English social media contexts. To download the data and participate, go to CodaLab and click the “Participate tab”.
Codalab: https://codalab.lisn.upsaclay.fr/competitions/21885
Organizers:
Bharathi Raja Chakravarthi, School of Computer Science, University of Galway, Ireland
Prasanna Kumar Kumaresan, School of Computer Science, University of Galway, Ireland
Shanu Dhawale, School of Computer Science, University of Galway, Ireland
Saranya Rajiakodi, Central University of Tamil Nadu, India
Sajeetha Thavareesan, Department of Computing, Eastern University, Sri Lanka
Subalalitha Chinnaudayar Navaneethakrishnan, Department Of Computer Science & Engineering, SRM Institute Of Science And Technology, Tamil Nadu
Thenmozhi Durairaj, SSN College of Engineering, Tamil Nadu, India
Task 5: Speech Recognition for Vulnerable Individuals in Tamil
This shared task addresses a challenging area in Automatic Speech Recognition: vulnerable old-aged and transgender people in Tamil. People in their old-age visit primary locations such as banks, hospitals and administrative offices to address their needs in their quotidian lives. Many aged people are unaware of using the equipment facilitated to aid people. Similarly, transgender people are deprived of primary education because of prejudice in society, so speech is the only medium that could assist them in satisfying their needs. The spontaneous speech data is gathered from old-aged and transgender people, who are bereft of using these facilities to their advantage. The speech corpus containing 5.5 hours of transcribed speech will be released for the training set, and 2 hours of speech data will be released for testing.
The participants will be provided training and a test dataset. To download the data and participate, go to CodaLab and click the “Participate tab”.
Codalab: https://codalab.lisn.upsaclay.fr/competitions/21879
Organizers:
Bharathi B, SSN College of Engineering, Tamil Nadu
Bharathi Raja Chakravarthi, School of Computer Science, University of Galway, Ireland
Sripriya N, SSN College of Engineering, Tamil Nadu
Rajeswari Natarajan, SASTRA University, Tamil Nadu, India
Student Volunteer:
Suhasini S, SSN College of Engineering, India
Swetha Valli, Thiagarajar College of Engineering, India.
Task 6: Bias and Propaganda Annotation in English and Tamil
The Bias and Propaganda Annotation task aims to advance the understanding and annotation of biased and propagandistic content in English and Tamil texts, with a focus on political discourse, such as the US Gender Policy and India’s Three Language Policy. Participants are required to develop annotation systems that annotate text based on the presence of bias or propaganda. Participants are encouraged to utilize any existing or new technologies as part of their annotation process. The shared task aims to serve as a collaborative platform where participants propose diverse methods for annotating and analysing the dataset. The primary goal is to create a shared corpus for multilayered annotation, developing annotation guidelines that reflect the diverse and often conflicting discourses surrounding this sensitive topic. Additionally, it seeks to cultivate the next generation of Natural Language Processing (NLP) researchers by equipping them with hands-on experience in working with raw data sources.
Task1: Bias and Propaganda Annotation in English
Sub Task 1: The goal is to focuses on annotating contents related to Trump's US gender policy against transgender individuals. The task is to annotate based on the bias and propaganda guidelines in English texts that discuss or analyse this policy. There are totally 6 bias labels and 4 propaganda labels.
Sub Task 2: Annotate the content of YouTube comments related to the Ukraine-Russia war in English. The task involves categorizing the comments based on bias and propaganda, following established guidelines for analyzing bias and propaganda in English texts. There are totally 8 bias labels and 4 propaganda labels.
Task 2: Bias and Propaganda Annotation in Tamil
The goal of task 2 is to provide annotating content related to the Three Language Policy/National Education Policy related issues. The task is to annotate based on the bias and propaganda guidelines in Tamil texts that discuss or analyse this policy. There are totally 7 bias labels and 4 propaganda labels.
To download the data and participate, go to CodaLab and click the “Participate tab”.
Codalab: https://codalab.lisn.upsaclay.fr/competitions/22054
Organizers
Shunmuga Priya Muthusamy Chinnan, Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway, Ireland
Bharathi Raja Chakravarthi, School of Computer Science, University of Galway, Ireland
Meghann L. Drury-Grogan, Department of Enterprise and Technology, Atlantic Technological University, Ireland
Senthil Kumar B, Department of Information Technology, Velammal Institute of Technology, Chennai, India
Saranya Rajiakodi, Department of Computer Science, Central University of Tamil Nadu, India.
Angel Deborah S, Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Student Volunteer:
Jason Joachim Carvalho, School of Computer Science, University of Galway, Ireland