Why Machine Learning is Racist

By Carter Moyer '20

Artificial intelligence (AI) will be the defining technology of the twenty-first century. No longer will our creations be limited by our own intelligence—with programs smarter than who built them, we will be able to solve innumerable challenges. I am a firm believer in innovating our way out of problems, not because I believe that technology is innately good but because I believe humans are innately fallible creatures. Our history is defined by systemic racism and misogyny, by wars fought over religions and romantic partners, by environmental destruction and political incompetence. Now, there are people who fear AI for the worst humanity has committed: genocide, fueled by a corrupted sense of self-preservation or human irrelevance. However, we should be far more fearful of the bias inherent in AI, for it is already here and exposing our own prejudice.

In 2016, ProPublica reported that an algorithm used by state prosecutorial offices across the United States called COMPAS (standing for Correctional Offender Management Profiling for Alternative Sanctions) consistently and incorrectly rated black defendants as more likely to re-offend than their white counterparts. Conversely, white defendants were incorrectly rated as lower-risk re-offenders. Using this false information, judges gave black defendants longer sentences, stealing time from their lives purely because of the color of their skin.

In 2018, an MIT researcher discovered that facial recognition programs from IBM, Microsoft, and other tech giants were incredibly accurate at identifying individuals—that is, if they were white males. If they were female or persons of color, the software’s accuracy dropped precipitously, and women of color were the worst off.

There are other instances, from software falsely labeling neighborhoods with higher percentages of black residents as requiring larger police presences to a hiring algorithm employed by Amazon that discriminated against female applicants. But machines are supposed to remove the human element—they aren’t supposed to be racist or have preferences, right?

Well, the reason AI research, development, and implementation has really taken off in the past couple of years is because of AI’s requirement for data and lots of it. It has only been in the past two years, with the rise of the internet and smart devices, that computers could take in millions upon millions of data points. AI programs then use machine learning which takes in a ton of data, some labeled, some not, and learns from the labeled data to accurately identify the unlabeled data.

This is how TikTok and YouTube are so good at recommending new content, keeping you hooked until three hours have passed and you still haven’t written that Existentialism paper. It is also why none of the engineers who work on TikTok and YouTube know how they work, at least not specifically. They cannot point to a line of code that the program wrote and explain its importance or role, just that it works.

Now, why is AI racist? Well, because we as a society are. Black defendants are given longer sentences because judges, human judges, have their own biases and have historically given black defendants longer sentences. It’s because, due to an environment that lacks opportunity, that places this expectation of “criminality” onto one’s blackness, and that is more likely to convict black defendants, repeat offenders are more likely to be black. Thus, the AI’s perverted logic goes, black defendants are more likely to become repeat offenders. More men apply for and are hired for jobs in tech, so the algorithm should select resumes with more masculine writing styles, more masculine experiences, and more masculine names. And those same (overwhelmingly white and asian) male developers are the ones who are using their own faces for facial recognition software, so their software will naturally be better at identifying their own faces.

“Okay, so we live in a prejudicial society. Is there anything we can do to make AI less biased?” one might ask.

Well, in polling operations (no, Carter, not politics!), poorer, less-educated, and non-white voters are consistently undercounted. This is due to a number of reasons such as being at work during hours that pollsters call. In order to accurately account for their preferences, pollsters give the responses they did receive added “weight,” meaning those data points will influence the algorithm more than non-weighted response. However, every pollster will tell you that this adds volatility to the algorithm. It would be like showing every prospective student that one neat, well-decorated room in all of Hotchkiss, leading them to think that all rooms are like that.

Another way to try and reduce AI biases is to specifically gather more data on underrepresented groups. In the case of the facial recognition software, that means having an equal proportion of women to men and people of color to white people. However, when it comes to the sentencing algorithm, feeding the program more data on black defendants who were not repeat offenders might not work. This is because those who did not re-offend may and likely do have characteristics that made people less prejudicial such as having a whiter sounding name or living in a nicer zipcode. The same thing goes for the hiring algorithm and the women who have been hired in the past.

There is only so much we can do to reduce the biases of AI because any data we feed it was influenced by the environment it was recorded in. Unless someone figures out a way to de-bias the data points given to machine learning algorithms, society will have to adequately reckon with its legacy of injustice, and unfortunately, my money is on the former happening sooner.

Page updated

Report abuse