[TBD]Understanding bias in machine learning

Introduction

Laymen explanation

The sample data used for training has to be as close a representation of the real scenario as possible. If not, it can impact ML performance. ML can amplify bias which is worse. If you like to know more about its impact, this document can help.

Technical explanation

Machine learning bias, also sometimes called algorithm bias or AI bias, is a phenomenon that occurs when an algorithm produces results that are systemically prejudiced due to erroneous assumptions in the machine learning process.

Kind of biases

Model bias

Bias comes from models that are overly simple and fail to capture the trends present in the data set. For example, a linear model may be suitable for one group but not for another.

Similarly, ignoring an important feature in machine training is a bias.

Data bias

Data bias in machine learning is a type of error in which certain elements of a dataset are more heavily weighted and/or represented than others.A biased dataset does not accurately represent a model’s use case, resulting in skewed outcomes, low accuracy levels, and analytical errors.

Impact of data bias

Types of Data bias

Sample bias - Sample bias occurs when a dataset does not reflect the realities of the environment in which a model will run. In statistical sense, actual PDF(probability distribution function) of population doesn't match the sample PDF. Class imbalance is also a kind of bias(Refer here)
Exclusion bias - Deleting valuable data thought to be unimportant. For example, outlier can be useful for anomaly detection based problem. If we delete them, then it impacts ML algorithm learning
Measurement bias -
Recall bias -
Observer bias - Observer bias is the effect of seeing what you expect to see or want to see in data
Racial bias - Racial bias occurs when data skews in favor of particular demographics.
Association bias - For example, our dataset may have a collection of jobs in which all men are doctors and all women are nurses. This does not mean that women cannot be doctors, and men cannot be nurses. However, as far as your machine learning model is concerned female doctors and male nurses do not exist.

Bias and variance

There should be right balance between model bias and variance. Here is the reason.

Using insufficient number of features for training causes high bias resulting in underfitting. Note that ignoring an important feature is bias here(Refer below diagram).

Similarly, using unnecessary features for training causes high variance resulting in overfitting(Refer above pic). Note that unnecessary features act as noise and so, training on noises cause overfitting.

Biased model impact with increasing training data size

If a learning algorithm is suffering from high bias, getting more and more training examples doesn't help(Refer below pic).

Measuring bias

This paper talks about metric for bias measurement. Statistical Parity Test(AKA Disparate Impact) is one such approach(refer here).

Can bias be totally removed?

Bias mitigation

Role in neural networks

https://stackoverflow.com/questions/2480650/what-is-the-role-of-the-bias-in-neural-networks

Reference

https://lionbridge.ai/articles/7-types-of-data-bias-in-machine-learning/

https://images.app.goo.gl/VSqLiK8mAuTDGfuh6

https://www.kdnuggets.com/2019/08/types-bias-machine-learning.html

https://images.app.goo.gl/x5juE7SudcEc8GuL7

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/handling-class-imbalance-in-machine-learning

https://machinelearningmastery.com/what-is-imbalanced-classification/

https://www.borealisai.com/en/blog/tutorial1-bias-and-fairness-ai/

https://www.datacamp.com/community/blog/measuring-bias-in-ml

https://towardsdatascience.com/is-your-machine-learning-model-biased-94f9ee176b67

https://towardsdatascience.com/introducing-model-bias-and-variance-187c5c447793

https://images.app.goo.gl/8SZZszKKpKVbm4w27

https://coursera.org/share/0626a6420beef982feb69f2505424e59

https://images.app.goo.gl/bF13bQToTsNqhZxP7

https://images.app.goo.gl/tTAXjsMjyKcZxaRn9

https://images.app.goo.gl/VwCApSReYyu28ykKA

Page updated

Google Sites

Report abuse