Investigating and Mitigating Biases in Crowdsourced Data
23 October 2021 - 3 PM to 8 PM EDT - Virtual
Workshop Proceedings are now available. [proceedings] [pdf]
A workshop to explore how specific crowdsourcing workflows, worker attributes, and work practices contribute to biases in data. We also plan to discuss research directions to mitigate labelling biases, particularly in a crowdsourced context, and the implications of such methods for the workers.
Submission Deadline for Position Papers:
10 September 2021 17 September 2021
Notification of Acceptance for Position Papers:
17 September 2021 23 September 2021
Registration Deadline (Workshop and CrowdBias Challenge): 15 October 2021
Please register for the workshop through the ACM CSCW registration system.
Workshop registration is $20 (USD).
Accepted Position Papers
Senjuti Dutta, Rhema Linder, Doug Lowe, Matthew Rosenbalm, Anastasia Kuzminykh and Alex Williams. The Productivity Paradox: Understanding Tooling Biases in Crowdwork. [PDF]
Hossein A. Rahmani and Jie Yang. Demographic Biases of Crowd Workers in Key Opinion Leaders Finding. [PDF]
Tim Draws, Alisa Rieger, Oana Inel, Ujwal Gadiraju and Nava Tintarev. Introducing the Cognitive-Biases-in-Crowdsourcing Checklist. [PDF]
Rahul Pandey and Hemant Purohit. CrowdRL: A Reinforcement Learning Framework for Human Error Mitigation in Crowdsourcing-based Stream Processing. [PDF]
Gianluca Demartini, Kevin Roitero and Stefano Mizzaro. Bias Management in Crowdsourced Data: Moving Beyond Bias Removal. [PDF]
Chien-Ju Ho and Ming Yin. Designing and Optimizing Cognitive Debiasing Strategies for Crowdsourcing Annotation. [PDF]
Caifan Du. Bias Mitigation through ML pipeline in Crowd-based Knowledge Creation. [PDF]
Call for Participation
It is common practice for machine learning systems to rely on crowdsourced label data for training and evaluation. It is also well-known that biases present in the label data can induce biases in the trained models. Biases may be introduced by the mechanisms used for deciding what data should/could be labelled or by the mechanisms employed to obtain the labels. Various approaches have been proposed to detect and correct biases once the label dataset has been constructed. However, proactively reducing biases during the data labelling phase and ensuring data fairness could be more economical compared to post-processing bias mitigation approaches. In this workshop, we aim to foster discussion on ongoing research around biases in crowdsourced data and to identify future research directions to detect, quantify and mitigate biases before, during and after the labelling process such that both task requesters and crowd workers can benefit. We will explore how specific crowdsourcing workflows, worker attributes, and work practices contribute to biases in the labelled data; how to quantify and mitigate biases as part of the labelling process; and how such mitigation approaches may impact workers and the crowdsourcing ecosystem. The outcome of the workshop will include a collaborative publication of a research agenda to improve or develop novel methods relating to crowdsourcing tools, processes and work practices to address biases in crowdsourced data. We also plan to run a Crowd Bias Challenge prior to the workshop, where participants will be asked to collect labels for a given dataset while minimising potential biases.
We invite participants to take part in the workshop challenge and/or submit a position paper.
Submit a Position Paper
You can submit 2-3-page position papers on previous or ongoing research work on biases in crowd data. More details on the Call for Papers page.
Participate in the Crowd Bias Challenge
We plan to introduce a workshop challenge where participants will gather a crowdsourced dataset for a given problem. More details on the Crowd Bias Challenge page.
Through this workshop, we aim to foster discussion on ongoing work around biases in crowd data, provide a central platform to revisit the current research, and identify future research directions that are beneficial to both task requesters and crowd workers.
Understanding how annotator attributes contribute to biases
Research on crowd work has often focused on task accuracy whereas other factors such as biases in data have received limited attention. We are interested in reviewing existing approaches and discussing ongoing work that helps us better understand annotation attributes contributing to biases.
Quantifying bias in annotated data
An important step towards bias mitigation is detecting such biases and measuring the extent of biases in data. We seek to discuss different methods, metrics and challenges in quantifying biases, particularly in crowdsourced data. Further, we are interested in ways of comparing biases across different samples and investigating if specific biases are task-specific or task-independent.
Novel approaches to mitigate crowd bias
We plan to explore novel methods that aim to reduce biases in crowd annotation in particular. Current approaches range from worker pre-selection, improving task presentation and dynamic task assignment. We seek to discuss shortcomings and limitations of existing and ongoing approaches and ideate future directions.
Impact on crowd workers
We want to explore how bias identification and mitigation strategies can impact the actual workers, positively or negatively. For example, workers in certain groups may face increased competition and lack of task availability. Collecting worker attributes and profiling could raise ethical concerns.