Laymen explanation
As you know probability density function, popularly known as PDF, is widely used in machine learning model designing. If you are interested to know this backbone mathematical concept, then this document helps.
The Probability Density Function(PDF) is the probability function which is represented for the density of a continuous random variable lying between a certain range of values. It is also called a probability distribution function or just a probability function.
Bayes decision rule, which is optimal classifier uses probability density function for calculating misclassification error.
comparison of classifiers can be done without training if one knows the class distribution function. Refer here for detail
Data condensation can be done via random removal of samples. For removal based on population distribution, this approach can use class PDFs for better precision. Stratified random sampling is such model.
Stratified sampling takes advantage of some inhomogeneity (heterogeneity) of a population, whereas random and systematic sampling generally assume the population is homogeneous. One often knows how the population is distributed among these different strata and then usually proportionally allocates the sample accordingly. Although this might seem an advantage, the random sampling process tends to generate similar results. However, stratified sampling will ensure that no strata are missed. Stratified random sampling will enhance statisitical precision, which is a desirable outcome.
Redundant feature removal using Bayes decision rule needs PDFs
Stochastic process(random variables over time) uses PDFs. Note that Stochastic process is used in time series prediction. Some examples of stochastic processes used in Machine Learning are: Poisson processes: for dealing with waiting times and queues. Random Walk and Brownian motion processes: used in algorithmic trading.
??
There is practical difficulty to get PDFs probabilities in real life and also calculating misclassification error is not easy for many case. To handle this, there are techniques which estimates probability distribution. Kernel density estimator is one such algorithm. This video and writeup talks about estimation approaches in detail.
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/comparing-machine-learning-algorithms
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-bayes-decision-rule-in-machine-learning
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/handling-large-training-dataset-in-machine-learning
https://www.andrews.edu/~calkins/math/edrm611/edrm07.htm
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-redundant-features-in-machine-learning
https://byjus.com/maths/probability-density-function/
https://www.stat.auckland.ac.nz/~fewster/325/notes/ch1annotated.pdf
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/types-of-prediction-in-data-science
https://towardsdatascience.com/stochastic-processes-analysis-f0a116999e4
https://towardsdatascience.com/forecasting-with-stochastic-models-abf2e85c9679
https://youtu.be/dLflHoGNofY