[TBD]Role of probability density function in machine learning

Introduction

Laymen explanation

As you know probability density function, popularly known as PDF, is widely used in machine learning model designing. If you are interested to know this backbone mathematical concept, then this document helps.

Technical explanation

The Probability Density Function(PDF) is the probability function which is represented for the density of a continuous random variable lying between a certain range of values. It is also called a probability distribution function or just a probability function.

Use in machine learning

- Bayes decision rule, which is optimal classifier uses probability density function for calculating misclassification error.
- comparison of classifiers can be done without training if one knows the class distribution function. Refer here for detail
- Data condensation can be done via random removal of samples. For removal based on population distribution, this approach can use class PDFs for better precision. Stratified random sampling is such model.
- Stratified sampling takes advantage of some inhomogeneity (heterogeneity) of a population, whereas random and systematic sampling generally assume the population is homogeneous. One often knows how the population is distributed among these different strata and then usually proportionally allocates the sample accordingly. Although this might seem an advantage, the random sampling process tends to generate similar results. However, stratified sampling will ensure that no strata are missed. Stratified random sampling will enhance statisitical precision, which is a desirable outcome.
- Redundant feature removal using Bayes decision rule needs PDFs
- Stochastic process(random variables over time) uses PDFs. Note that Stochastic process is used in time series prediction. Some examples of stochastic processes used in Machine Learning are: Poisson processes: for dealing with waiting times and queues. Random Walk and Brownian motion processes: used in algorithmic trading.

Relevance in neural networks

What if PDFs is not known?

There is practical difficulty to get PDFs probabilities in real life and also calculating misclassification error is not easy for many case. To handle this, there are techniques which estimates probability distribution. Kernel density estimator is one such algorithm. This video and writeup talks about estimation approaches in detail.

Reference

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/comparing-machine-learning-algorithms

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-bayes-decision-rule-in-machine-learning

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/handling-large-training-dataset-in-machine-learning

https://www.andrews.edu/~calkins/math/edrm611/edrm07.htm

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-redundant-features-in-machine-learning

https://byjus.com/maths/probability-density-function/

https://www.stat.auckland.ac.nz/~fewster/325/notes/ch1annotated.pdf

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/types-of-prediction-in-data-science

https://towardsdatascience.com/stochastic-processes-analysis-f0a116999e4

https://towardsdatascience.com/forecasting-with-stochastic-models-abf2e85c9679

https://youtu.be/dLflHoGNofY

Page updated

Google Sites

Report abuse