Data Labeling and AI

As machine learning models continue to evolve, more and more companies are outsourcing labor in order to train models from displaying sensitive or disturbing content. The market of data curation has grown so large that companies are offering it as the sole product or service it provides. Many arguments have been made against using “human in the loop” (HITL) methods as it is seen as unethical not only because of the explicit content employees are subjected to, but also because of the rise in exploitation of cheap labor from countries with less regulation.

This raises ethical questions about 1) the necessity of and 2) research and labor guidelines associated with the use of HITL in machine learning tools. Our team provides relevant background information on HITL, details the harms and benefits of HITL based on workers' experiences, and discusses findings from a survey we conducted to ultimately prompt further conversation and action about these ethical concerns.

Keywords: content moderation, data labeling, data validation, human subjects research, business process outsourcing, labor practices, exploitation, exposure to graphic content

Research performed by Team AIRB for the AI Ethics Project in Tyler Coleman's Spring 2023 Creativity with AI class at UT Austin.

Team members: Eriane Austria, Jake Gollub, Kevin Vuong, Connor Blankenship

Page updated

Google Sites

Report abuse