Survey Results on Data Labeling

To better understand a wider consensus about the topic of AI and it's use of Human in the Loop training, we sent out a survey to friends and peers of class members. This survey was comprised of multiple questions from the entire class, and we will focus on some of the specific questions that pertained to our topic.

Survey results

According to the data results, the average person knows only a small amount about using humans to train ML models. This information tells us that there may be a large gap in general understanding about the topic, which could unfortunately be used in favor of the large companies utilizing these processes. By keeping workers under NDA as well as refusing to cover the extensiveness of what workers are being exposed to, companies are able to shield the public eye from the potential exploitation of their employees. Without raising awareness for the rights of workers doing content moderation, there may not be improvements to resources, benefits and overall treatment of the employees who's primary job is data labeling.

This question had interesting results, and the team and I had multiple debates on whether it was right, wrong or even feasible for anyone to go without being exposed to this content. Around 70% of the participants believed that it was worth shielding the masses from disturbing content if it meant that a small group had to withstand the content moderation first. The next question helped to provide some insight into participants answers for this question. While the sample size was smaller, I believe that this is a major question that needs to be answered not only for content moderation but human in the loop ML learning at large. Depending on what the general populous believes, there could be multiple ways of approaching the issue of ML model content moderation.

If you were hired to sort through data for an ML model, to what extent would you like to be debriefed on the content you would be exposed to? What compensation/benefits do you believe are acceptable for someone working in this position?

Below are some of the responses to the following question:

Fairly extensively
I would like to know all possible potential things i would be exposed to. Compensation definitely must include a good pay as well as a therapist or someone to always be in contact and work with to ensure it isn’t massive damage to this person.
I would like to be thoroughly debriefed, I know descriptions can't convey how awful content can be but at least it would give some preparation. As for compensation/benefits, I'm not an expert on this kind of thing but I do think aside from basic financial compensation that these workers should have plenty of time off and their insurance should cover very in depth therapy and psychiatry etc.
A large extent, don't know
Extremely detailed debriefing, and full-time compensation, vacation and healthcare benefits- especially including therapy/psychiatric insurance coverage
I expect full disclosure on content I might see as it may be potentially harmful to my wellbeing. Content moderaters/trainers should get paid extremely well especially if they are in danger of working with NSFW content on a daily basis. Their health care benefits should be fantastic, especially mental health.
a ton of briefing on content, generous work hours with lots of flexibility, extensive benefits
If the paycheck is high enough I won't ask any questions
I would really want consent to be enforced before I was exposed to any content. Maybe not enough to make sure that I know what I'm consenting to. I'm not sure what compensation normally looks like for this kind of thing, but I think it should match (or exceed, if it's a full-time job) whatever is usually given to participants of a psychological study.
$20 an hour
I would like to be given examples on what I would see. I would not want them to sugarcoat or downplay what I may see. They should also be transparent about the side effects (mental effects) of the job. They need to be clear about how other workers in the field have handled the job, and how it affected their lives. They should direct me toward resources that can help me should I be seriously affected by my work. Someone working in this position would need a lot of resources, specifically for mental health. Constantly viewing and going through graphic content will have an effect on one's psyche. They will definitely need help, and will need the option to talk through what they saw. They should also be allowed to react to what they see, and not have a set number/quota forced on them. There should be plenty of breaks and off-days given to the worker. Additionally, there must also been work/activities that do not involve filtering through disturbing content. If going through that content is all they are allowed to do, they will be overwhelmed.
Multiple levels of filters to sort out people willing and capable of processing the content, including dedicated counselors and therapists + an ethics board. Compensation equal to a technical job (50-60k)
Full mental health support during the employment and a period after employment ends.
As long as I don’t have to do anything inappropriate, or any action that causes harm on anyone or that makes me feel uncomfortable.
Some descriptions of the data would be nice before sorting through it.

Although many of the participants disclosed that their knowledge of the topic was sparse, most of the responses to this question seemed very well thought out and appropriate for the type of work at hand as well as related work that would deal potential harm to the worker's mental state. As shown by Caroline Sinders' wage calculator, determining a fair payment for data labeling is very tricky and often undervalued by companies such as Sama paying workers less than $2 an hour. Workers were exposed to data that was so unknown even OpenAI did not know the extent of the data that was being reviewed. Many employees reported having mental scarring from work, stating that some passages "..w(ere) torture", and "By the time it gets to Friday, you are disturbed from thinking through that picture". The amount of data labeling jobs will only increase over time, and shown by not only the news articles but the survey results there is a large vague picture for how workers should be treated and compensated for their work. Overall the survey results showed that the majority of participants mutually agreed that compensation should be fairly high, with extensive benefits especially focused on the mental well being of employees.

Sources

Page updated

Google Sites

Report abuse