Identity Bias in ML Algorithms

What makes a learning algorithm biased?

Machine learning is a subset of artificial intelligence in which algorithms directed by complex learning strategies teach computers to make decisions with a supposed lack of bias. Neural networks, machine learning tools used for classifying information in the same manner as a human brain, are a popular choice for learning, because they can classify images and information based on a data-driven probability system that has speed and precision accuracy. As these algorithms are already present in many aspects of our lives (e.g., facial recognition in tagged Facebook photos, shopping ads on Amazon, show recommendations on Netflix), it is important to consider their troubling implications.

Appearance of Bias in Machine Learning

  • In 2015, gender bias in Google searching was uncovered when results for "CEO" returned only images of white men. Afterwards, a research study at Carnegie Mellon University found that Google displayed nearly 2,000 high-paying executive job ads to users expected to be male job-seekers but only around 300 times to the female group.

  • In 2015, Flickr's image recognition tool tagged Black people as "animals" or "apes"; HP's software for web cameras could not recognize darker skin tones well, and Nikon's technology inaccurately identified Asians as constantly blinking.

  • In 2015, the Sentencing Reform and Corrections Act sought to implement mandatory risk and needs assessment systems in all federal prisons. Such systems are used to evaluate recidivism rates and assign scores indicating the scale of a defendant's likelihood to commit crimes in the future. In nine states, these scores are provided to judges for consideration during sentencing.

Machine learning algorithms generate most of the scores based on employment history, education levels, prior crimes, and some other factors; but they notably do not include race and ethnicity in their calculations, meaning that they should theoretically produced unbiased results. However, an investigation found that COMPAS, a popular risk-assessment product "used by judicial systems throughout the United States, is twice as likely to mistakenly identify white defendants as a low risk for committing future crimes and twice as likely to erroneously tag Black defendants [with the same profile] as a high risk" (Yapo & Weiss).

Where does the bias come from?

For the specific example of criminal justice described above, a significant amount of research has been done into how such algorithms came to be. "According to the NAACP Criminal Justice Fact Sheet, Black people in the U.S. are incarcerated at close to six times the rate of white people. Moreover, in 2008 black and Hispanic people accounted for 58% of the prison population while only representing 25% of the total population. It is clear that deep societal and systemic biases against minorities exist" (Yapo & Weiss). Given these race-based statistics in the prison system, risk assessment algorithms have been written in a way which guarantees that Black defendants will have more negative scores than their white counterparts. Criminologists use predictive parity the goal of generating equally accurate forecasts for all racial groups as a marker for algorithm fairness, but this leads to disparities in which people are incorrectly classified when groups have different arrest rates. In reality, predictive parity actual corresponds to optimal discrimination.

Additionally, there is often "black-box" secrecy behind machine learning algorithm designs, meaning that the criteria and calculations that go into algorithm development are not released by for-profit companies. Therefore, if an algorithm was trained on a racist, sexist, or otherwise discriminatory dataset, used limited training data that focused only on specific demographic or socioeconomic groups, or used poorly chosen practices, it is hidden from users and critics. Furthermore, algorithms can be so complicated that even those with access to their formulations may struggle to predict their outcomes and interpret their effects. It follows that there are few avenues for people to evaluate the level on which these algorithms reflect biases, prejudices, and discriminatory views, which happens inevitably and unconsciously as a result of their creation by humans.

How do identity dynamics play a role in shaping these biases?

The Cycle of Socialization by Bobble Harro is a model that effectively maps out the process of how people come to hold biases.

When we analyze the decision-making process of Artificial Intelligence and Machine Learning algorithms, there are some clear examples of how real-world assumptions that we make when it comes to how non-dominant identities (lacking power, access, and privilege, not being considered “the norm”) and dominant identities (having power, access, and privilege, being considered “the norm”) are reflected in the algorithm’s situational analysis and choices.

The assumptions that algorithms are programmed to make stem from the assumptions that programmers are socialized to make. This is not to say that the programmers are necessarily at fault— all of us as humans living in a society are conditioned from birth to develop biases about different identities based on the set of identities that we are born into, as well as personal, familial, institutional, and structural influences. The Cycle of Socialization by Bobble Harro is a model that effectively maps out the process of how people come to hold biases.

As the cycle illustrates, we are immersed in this life-long process of “how to be.” We are born into identities over which we have no control. Then we are taught by the people raising us, as well as other institutions, how to think and act according to those identities and the value that those in power place on them. Subsequently, we either pass on those ideas as we have learned all our lives, or we embark on a path less taken and disrupt that lifelong cycle, challenging ourselves to think critically about the values we are stalled with— which is much easier said than done.

A question that Barro highlights that is crucial to reflect on is, “what has kept me in this cycle for so long?” (51) Barro then reveals that at the core of this cycle is where the answer to this question lies: fear, ignorance, insecurity, confusion, and degree of power. Thinking about these core enablers of the cycle, it is important to shift the conversation away from identifying whose fault it is that these socializations prevail in humans, and by extension, human-programmed algorithms. Instead, we should think about how these socializations shape the biases that we see in the decision-making processes of these algorithms.

Those that hold more dominant identities are more represented in research into AI and Machine Learning than those with non-dominant identities. As a result, researchers find data based on a less diverse sample size, and the solutions that arise from that data only alleviate biases against those of more dominant identities.

The article, “Teaching yourself about structural racism will improve your machine learning” unpacks three motivations for educating oneself about structural racism, the second being “the common occurrence of ‘algorithmic bias.’” In this section, the authors unpack specific examples of how these assumptions and racial biases have led to Machine Learning algorithms failing to work for racial minorities. The more concerning examples they cite include, “facial recognition software that misidentifies gender and even species when presented with dark-skinned women of African descent” (Buolamwini and Gebru, 2016) and “proprietary formulas used in criminal sentencing that misclassify defendants as high risk for recidivism at a greater rate for Black versus White defendants” (Rudin and others, 2018).

(Robinson, et.al.) In analyzing these examples, the writers suggest that the root of the issue with these algorithmic models is that “too few observations of members of racial minority groups and unrepresentative sampling that can differentially limit generalizability” (Kreatsoulas and Subramanian, 2018). They also include a chart to demonstrate how structural inequity and racism cycle into bias, which affects an algorithm’s functionality (a health-based algorithm in this case).

A chart to demonstrate how structural inequity and racism cycle into bias, which affects an algorithm’s functionality (a health-based algorithm. in this case).

(Robinson, et.al.) In analyzing these examples, the writers suggest that the root of the issue with these algorithmic models is that “too few observations of members of racial minority groups and unrepresentative sampling that can differentially limit generalizability” (Kreatsoulas and Subramanian, 2018). They also include a chart to demonstrate how structural inequity and racism cycle into bias, which affects an algorithm’s functionality (a health-based algorithm in this case).

“The data generation process ‘[is] an inherently subjective enterprise in which a discipline’s norms and conventions help to reinforce existing racial (and other) hierarchies’ (Ford and Airhihenbuwa, 2010). As explicated by the Public Health Critical Race Praxis, without an explicit focus on social equity, the concerns of the most privileged members of society are overrepresented in data and research (Ford and Airhihenbuwa, 2010).”

This cycle is of interest because it can be compared to the Cycle of Socialization. In the beginning, structural inequity and racism, which are beyond the control of individuals, set the conditions for the data on race dynamics known as the “race variable” which then factors into bias in algorithmic decision-making.

A common thread that I notice in these assumptions that the algorithms, and by origin, the programmers make is that they stem from the ways in which we as individuals maintain the cycle of socialization. When programmers don’t acknowledge the complexity of identity, those complexities are not considered in the decision-making algorithms. There isn’t enough of a focus on recognizing that the structural and institutional barriers that those of non-dominant identities face, make “neutrality” in decision-making inequitable. As a result, the algorithms are more inclined to make decisions that uphold those biases and stereotypes.

How is this narrative being reinforced by learning algorithms?

Now that we have explored the impact of real-world identity dynamics on algorithmic bias, let’s analyze how learning algorithms, in turn, reinforce these identity dynamics and stereotypes, thus creating a cycle of this behavior and making oppression a technical communication problem.

Some examples of learning algorithms reinforce biases and stereotypes that, after reviewing the Coded Bias film, were easily noticed in media and standards of intellect. Whether it be the glorification of facial recognition in crime dramas, or games and standardized tests, a quote from the movie that encapsulates the reason why we see the identity-biases and dynamics in place is, “our ideas about technology and society that we think are normal are actually ideas that come from a very small and homogenous group of people.”

(CNBC - July 11, 2020) In light of recent protests, facial recognition has come under scrutiny for the way in which it's deployed by police departments. In response, IBM, Amazon and Microsoft have all stated that they'll either stop developing this tech, or stop selling it to law enforcement until regulations are in place. At least half of Americans are reportedly in a facial recognition database, potentially accessible by the local police department as well as federal government agencies like ICE or the FBI. It's not something you likely opted into, but as of now, there's no way to be sure exactly who has access to your likeness.
For BIPOC individuals, facial recognition has a broadened definition of "likeness," thus creating a greater likelihood of false pursuits and arrests.

(Coded Bias) By creating algorithms that are more or less optimal for use depending on how much melanin a user has, programmers exclude certain demographics from using a service based on social identity. These algorithms, which are created through the lens of personal perspectives and bias, are then reaffirming certain personal perspectives and biases about citizens, citizens of color especially, that can have legal, disproportionate, and potentially lethal, consequences.

What are ways to avoid such biases in learning algorithms?

To mitigate the ethical implications and consequences of artificial intelligence, it is important that machine learning engineers develop strict standards, codes of conduct, and regulations for AI products and algorithms. This can be done through issue management frameworks:

Fink’s Seven-Phase Issue Development Process Framework

Fink's Issue Development Framework can be utilized to present the evolution of AI issues that relates to the public at a societal level, specifically regarding the use of machine learning algorithms to prevent and alleviate future harm to citizens.


      • Phase 1: Felt-Need

This stage is triggered by any number of sources that perpetuate skepticism about AI, such as emerging events, publications, and precipitating crises. Recently, this has occurred through non-fiction books, public discourse, and movies like Coded Bias.

      • Phase 2: Media Coverage

This stage includes fast-spreading stories about experiences that publicizes the felt-need to a wider audience. Some examples are investigative reports and opinion pieces, such as Kate Crawford’s 2016 New York Times article imploring vigilance in machine learning design, in which she wrote the following:

"Sexism, racism, and other forms of discrimination are being built into... machine learning algorithms... [w]e need to be vigilant about how we design and train these... systems, or we will see ingrained forms of bias built into the artificial intelligence of the future. Like all technologies before it, artificial intelligence will reflect the values of its creators. So inclusivity matters — from who designs it to who sits on the company boards and which ethical perspectives are included" (Yapo & Weiss).


      • Phase 3: Interest Group Development

This stage involves awareness of bias in AI machine learning algorithms growing in larger, more powerful institutions, like businesses and the government. Here, Partnerships and conferences also form to research and recommend best practices relating to ethics, fairness, inclusivity, transparency, and privacy in AI.

Note: Researchers believe that this is where present-day society stands in the framework.


      • Phase 4: Leading Political Jurisdictions

In this stage, policies begin to change at the city, county, and state levels. Local alterations to existing technological precedence and structure occur, along with federal-level lobbying.


      • Phase 5: Federal Government Attention

Through public congressional hearings and testimonies, well-funded and high-interest research studies, and more resources and public investment allocated to the issue, this stage lays the foundation for policy adoption on a larger scale.


      • Phases 6 and 7: Litigation and Regulation

Policy is in place on a federal level and enforced through strong technology regulation and well-placed litigation efforts.

Carrot and Stick Approaches to Stakeholder Management

The role of corporate social responsibility has been growing in popularity in the technological sphere, which has brought about the rise of the "Carrot", or values-based, and "Stick", or rules-based, approaches to stakeholder management. Technology companies are voluntarily implementing self-regulatory policies on the basis of wide-spread championed values in ethics, best practices, rich management, and philanthropy. This is largely due to companies' desires to prevent the government from imposing external oversight. However, noting the black-box strategies discussed above, this has proven to be largely ineffective in for-profit entities.

Analysis of Policy Avenues

Together, these issue management approaches are meant to alert participants and stakeholders in AI fields to potential ethical dilemmas and concerns within the industry. Utilizing the input of social scientists, ethicists, lawyers, policymakers, and others in addition to engineers and corporate leaders, AI can be viewed through a much more comprehensive lens. Awareness of impending ethical risks and issues are crucial in the design of AI to ensure that the most vulnerable members of society are served and not harmed by technologies that are innovated. Researchers conjecture that, in the case that self-regulation is insufficient in preventing and correcting negative biases in AI design processes for corporations, legislation and enforcement of compliance to standards will likely result.

What should we teach engineers to avoid discriminatory algorithms in the future?

Overall, there are a few main ideas that can be synthesized from the information above:


      • Engineers need to support standards and regulations that are anti-bias. Allowing value-based corporate methodology to be the main driver of algorithmic equality is simply not good enough. For-profit companies have shown to care more about their intellectual property and ability to generate revenue with their algorithms than subjecting their work to true evaluation and objective judgment, meaning that litigation and policy are absolutely necessary in achieving algorithm fairness. Engineers should be taught not only to comply with such standards, but also to champion and create them.


      • Engineers need to design solid evaluation mechanisms for their technologies. As was the case with the issue of predictive parity, it is incredibly important that the goals an algorithm sets out to accomplish are not inherently biased. This is also true for the means by which the end is achieved. Ensuring more limited bias throughout every step of the algorithm creation process requires a comprehensive understanding of the problem space and any preconceived notions already present within it (e.g. racial disparities in the criminal justice system), ways in which the algorithm development framework can perpetuate and exacerbate these biases, and a commitment to iterating over work until it is both accurate and equitable.


      • Engineers need to create a more representative group driving their products forward. Without those whom the technology serves building the technology itself, it will never be able to achieve its goal in catering to entire communities. A diverse group of algorithm developers is key in ensuring that no identities are overlooked, no plights are left ignored, and no people are harmed.

References

CNBC News. (2020, July 11). The Fight Over Police Use Of Facial Recognition Technology. Retrieved December 19, 2020, from https://www.youtube.com/watch?v=oCwEYi_JjEQ

Harro, B. (2000). The Cycle of Socialization. In 1138408840 857528225 M. Adams (Author), Readings for diversity and social justice (pp. 45-52). New York, NY: Routledge.

Kantayya, S. (Director). (2020). Coded Bias [Video file]. United States: 7th Empire Media. Retrieved December 19, 2020.

Robinson, W. A. Renson, A. I Naimi. (2000) Teaching yourself about structural racism will improve your machine learning, Biostatistics, Volume 21, Issue 2, April 2020, Pages 339–344, https://doi.org/10.1093/biostatistics/kxz040

Yapo, A., & Weiss, J. (2018, January 3). Ethical Implications of Bias in Machine Learning (Tech.). Retrieved December 19, 2020, from Bentley University website: http://hdl.handle.net/10125/50557

ZLIOBAITE, I. (2015, October 31). A survey on measuring indirect discrimination in machine learning (Tech.). Retrieved December 19, 2020, from Aalto University and Helsinki Institute for Information Technology (HIIT) website: https://arxiv.org/pdf/1511.00148.pdf