Students: Caelan Templeton, Darwin Estrella Vicente
Research mentor: Ruyuan Wan
The detection of toxic language online, which encompasses hate speech, abuse, and offensive content — presents a significant challenge in the field of Natural Language Processing. This detection of this content is important due to how offensive content can affect mental health. This task is complicated by the subjective nature of what constitutes toxic language; toxic language is based on annotators’ beliefs and identities. Our research seeks to solve these problems by highlighting the biases in current toxic language detection systems and proposing a new model that considers both the majority and minority of annotators’ opinions to better address these biases. The primary purpose of this study is to develop a more nuanced understanding of how demographic diversity and annotator disagreement influence toxic language detection. We employed analytical methods, including demographic bar analysis, heatmaps, correlation analysis, clustering using K-Means, and Principal Component Analysis. Our correlation and clustering analyses reveal that demographic factors significantly influence toxicity perception. For instance, older annotators had higher agreement rates for some situations due to their generational differences. Compared to traditional methods that treat annotator disagreement as noise, our approach emphasizes capturing the diversity of opinions to improve detection accuracy. Intersectional analysis showed that combined demographic factors provide a deeper understanding of labeling patterns, and case studies on controversial topics highlighted the different perspectives among the demographic groups. Our research underscores the necessity of incorporating demographic information and modeling annotator disagreement to develop more inclusive and representative toxic language detection systems.
Ruyuan Wan's research interest is in the intersection of human-computer interaction (HCI) and natural language processing (NLP), including: (1) Understand Social Dynamics in Language, e.g., social bias, propaganda, etc.; (2) Design User Engagement in Online Communities for Social Good; (3) Develop Explainable Human-centered Language Technologies.