Data & Bias

Many misconceptions about AI arise from overestimating its capabilities and assuming that, as a machine, it makes unbiased decisions. However, AI models are shaped by their training data, which can be biased depending on how it was collected. As a result, AI can generate biased or inaccurate information, raising ethical concerns - particularly when users fail to recognize these biases. This can contribute to broader societal issues such as inequality, racism, and misinformation (Buolamwini, 2019; Crawford, 2021).

To promote algorithmic transparency, “it is very important to train algorithms on non-biased datasets. And it is essential for algorithms not to use sensitive information, such as race, gender, disability, and union affiliation” (Esade, 2024, 2:57).

Types of Data Bias in AI

There are many types of bias stemming from the data used to train AI. Below are some key types of bias to consider when creating data sets for AI models.

Algorithmic Bias

AI systems may unintentionally favor or disadvantage certain groups or traits due to flaws in the algorithms or underlying methods.

Measurement Bias

Faulty data collection tools or methods can introduce errors that distort the data set and lead to inaccurate conclusions.

Sample Bias

When collected data fails to accurately represent the broader population, it results in skewed or misleading outcomes.

Stereotyping Bias

An AI system can reinforce societal stereotypes by unfairly prioritizing one gender, ethnicity, or group over another.

Labeling Bias

Incorrect or biased labels in supervised learning data sets often cause AI systems to make inaccurate predictions or classifications.

Exclusion Bias

Omitting certain data from training sets - often because it is deemed unnecessary - can lead to significant gaps in AI decision-making.

(Holdsworth, 2023)

Let's try an example

AI Literacy - Understanding AI Bias

(Common Sense Education, 2023)

Reflect

Imagine you are tasked with creating an AI tool that can identify weather conditions such as sunny, rainy, or cloudy.

What types of images would you include in the training data? Your goal is to create as comprehensive a data set as possible to minimize AI bias. (Consider factors such as geographic diversity, time of day, and seasonal variations.)

How would you assess whether AI Bias exists in your trained model? (Think about ways to test its accuracy across different locations, lighting conditions, and weather patterns.)

◄ Back: Large Language Models

Next: Ethical Use ►

Page updated

Google Sites

Report abuse