A Dataset for Tackling Gender Bias in Text
We are a team of research interns taking part in the humanitarian AI internship at Mila (Quebec Institute of Learning Algorithms) and our goal is to detect and correct gender bias in text with machine learning (a branch of artificial intelligence). We started this project during the AI for Social Good Lab and would like to extend our gratitude towards the creation of this program.
Most work related to gender bias within the machine learning community is focused on debiasing existing models such as word embeddings, coreference resolution and captioning. We propose to build a dataset from which a model can be trained to detect gender bias in text. During our first prototype, this website allowed for sentences to be labeled via crowdsourcing. Each participant labeled 10 sentences and after 7 days of being deployed a total of 365 participants were reached. The two major takeaways after this first experiment were that binary labeling is hard for labelers and gender bias is misunderstood. Therefore, we suggest finding a gender bias definition from experts to better guide the labelers. With a linguistic approach, we would like to augment existing text data and scrape the web to acquire a larger set of sentences.
Our second crowdsourcing prototype launched at the end of November 2018.
A model trained on a robust gender bias dataset could directly address the negative preconceived ideas people have about gender and guide human judgments by recognizing their gender biases. This could then initiate reflections on gender-sensitive topics and empower the movement of fairness and equity for all genders.