Mean (μ): Sum/n
Weighted Mean: Sum(xi * wi)/Sum(wi)
Median: for odd number of samples, middle element with equal number of samples grater and less than value; for even number of samples, the average of the 2 middle elements
Mode: the element(s) that occur more frequently in a data set. A set can be multimodal.
Precision: number of significant digits (123.45 & 0.012345 both have precision 5)
Scale: number of significant digits to the right of decimal (123.45 has scale 2)
Quatiles : 3 points, split ordered data set into 4 equal groups
Usage: to summarize a group of numbers and give a picture (descriptive statistics)
Interquartile range (IQR):
Usage: also called "midspread", "middle 50%", "H-spread", measurement of statistical dispersion
https://www.mathsisfun.com/data/standard-normal-distribution.html
Importance - it fits many natural phenomena.
Also known as - Gaussian distribution / bell curve
Is:
Probability Density:
https://www.hackerrank.com/challenges/s10-normal-distribution-1/tutorial
μ = 0 (mean = 0, so distribution is symmetric to Y axis)
σ = 1
All normal distribution can be represented as standard normal distribution
https://www.hackerrank.com/challenges/s10-normal-distribution-1/tutorial
https://www.hackerrank.com/challenges/s10-normal-distribution-1/tutorial
static double factorial(int n){
if(n == 0) return 1;
if(n == 1) return 1;
return n*factorial(n-1);
}
public static double erf(double z)
{
int nTerms = 315;
double runningSum = 0;
for(int n = 0; n < nTerms; n++){
runningSum += Math.pow(-1,n)*Math.pow(z,2*n+1)/(factorial(n)*(2*n+1));
}
return (2/Math.sqrt(Math.PI))*runningSum;
}
/**
*
* @param mu - mean
* @param sigma - variance
* @param x
* @return cumulative normal distribution
*/
public static double cumulativeNormalDistribution(double mu, double sigma, double x)
{
double result=0.5 * (1.0 + erf((x-mu)/(sigma * Math.sqrt(2))));
return result;
}
https://en.wikipedia.org/wiki/Binomial_distribution "the number of successes in a sequence of n independent (guessing) experiments"
https://en.wikipedia.org/wiki/Bean_machine According to the central limit theorem, the binomial distribution approximates the normal distribution provided that the number of rows and the number of balls are both large.
average of the squared differences from the Mean
How spread out numbers are; a standard way of "what's normal"
sigma = population standard deviation = square root of the Variance
N = the size of the population
x_i = each value from the population
μ = the population mean
Expectations:
Population (all that we are interested in) vs. Sample (a selection taken from Population)
Difference in the DIVISOR in calculating the variance (average of the squared difference)
Population: divisor is N
Sample: divisor is (N-1) : number of samples minues 1
Think it as a "correction" when data is only samples.... (any better explanation?)
Why Square?
https://en.wikipedia.org/wiki/Standard_score
Z - score is just "how many sigma away from mean"
z = ( x - μ ) / σ
In probability theory
Experiment: procedure can be indefinitely repeated & has well defined set of possible outcomes
Sample Space S: the set of possible outcomes
Event A: a set of ourcomes of an experiment (a subset of sample space S), being assigned a probability
Probability
P(A) = number of favorable outcomes / total outcomes
P(A') (also written P(AC)) is A not occur
P(A)+P(A') = 1
A and B mutually exclusive if A and B have no events in common:
A ∩ B = ∅ and P(A ∩ B) = 0
For disjoint events A & B, P(A∪B) = P(A)+P(B) // because there is no common events
If union of A and B covers all events in sample space:
A∪B=S and P(A∪B)=1
If occurance of A change the probability of occurance of B, they are dependent. Otherwise thay are independent.
If A & B are independent, then they occur in a sequence is the intersection (∩)
P (A ∩ B) = P(A) * P(B)
The probability of event B occurring that event A has already occurred is read "the probability of B given A" and is written: P(B|A)
P(A ∩ B) = P(A) * P(B|A)
P(B|A) meaning:
P(B|A)*P(A)
P(A|B) = –———————
P(B)
P(B|A) * P(A)
= ————————————————
P(B|A)*P(A) + P(B|AC) * P(AC)
Derivation:
nPr = n! / (n-r)! (0!=1)
Derivation:
nCr = nPr / r! = n! / (n-r)! / r!
https://en.wikipedia.org/wiki/Random_variable
A random variable (typical capital letter) X is:
Also "Bernoulli trial"
Give random variable:
Consider p being probability of success, q being probability of failure
Probability Mass Function (PMF) of X is:
A binomial process is:
The random variable is the number of success x out of n trials.
The probability distribution for the binomian random variable.
b(x,n,p) = nCx * px * q(n-x)
Derivation: n trials, probability of x success is:
Cumulative distribution function (CDF) of X is function FX(x):
Is non-decreasing, value is cumulates all probabilities for values of X up to and including x.
For a range:
A statistical experiment:
Negative Binomial is:
b*(x,n,p) = (n-1)C(x-1) · px · (1-p)(n-x)
Derivition:
https://www.desmos.com/calculator/vylzmh1ryc
Is a special case of negative binomian distribution:
Random variable X is: Xi= 1 if ith trial is a first success; 0 otherwise;
g(n,p) = (1-p)(n-1)·p
Derivition: negative binomial distribution where x=1, or, the chance of first (n-1) fails then one success.
Name: "The word “geometric” has been used for about 2500 years to describe a geometric progression. That’s a sequence where the ratio of the next term to the present term is constant. For example, in the sequence 16,24,36,54,… , each term is 3/2 of the previous." (from https://www.quora.com/Why-are-geometric-distributions-called-geometric)
Why (comparing with Binomial Distribution):
https://towardsdatascience.com/poisson-distribution-intuition-and-derivation-1059aeab90d
https://www.youtube.com/watch?v=tA0piaEZOpE
Poisson Experiment is:
Usage: to predict the probability of a given number of events occurring in a fixed interval of time
P(k, λ) = λke-λ / k!
https://www.desmos.com/calculator/v5pf9rkc2u
Derivation:
https://towardsdatascience.com/poisson-distribution-intuition-and-derivation-1059aeab90d
https://en.wikipedia.org/wiki/Limit_(mathematics)
If:
then:
In fact, this also holds true even if the population is binomial, provided that min(np, n(1-p))> 5, where n is the sample size and p is the probability of success in the population. This means that we can use the normal probability model to quantify uncertainty when making inferences about a population mean based on the sample mean.
https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-1/tutorial
The central limit theorem (CLT) states that, for a large enough sample (), the distribution of the sample mean will approach normal distribution. This holds for a sample of independent random variables from any distribution with a finite standard deviation.
For the sample size n, sum Sn is close to normal distribution:
For the means of the samples:
https://en.wikipedia.org/wiki/Correlation_and_dependence
Between two random variables or bivariate data.
Examples:
https://mathworld.wolfram.com/ExpectationValue.html
Is the weighted average of a random variable X.
Is intuitively the arithmetic mean of a large number of independent realizations of X. (wikipedia)
denotion & definition:
P(x) is the probability density function
https://en.wikipedia.org/wiki/Covariance
https://mathworld.wolfram.com/Covariance.html
https://www.hackerrank.com/challenges/s10-pearson-correlation-coefficient/tutorial
cov(X,Y) - a measure of the joint variability (how they vary together) of two random variables (X,Y):
cov(X,Y) = ⟨(X-μX)(Y-μY)⟩ = ⟨X Y⟩ - μX-μY
Also known as:
Interpretation:
ρX,Y = cov(X,Y) / (σX·σY)
Spearman's Rank Correlation Coefficient is:
(Wikipedia) "It is common to regard these rank correlation coefficients as alternatives to Pearson's coefficient, used either to reduce the amount of calculation or to make the coefficient less sensitive to non-normality in distributions. However, this view has little mathematical basis, as rank correlation coefficients measure a different type of relationship than the Pearson product-moment correlation coefficient, and are best seen as measures of a different type of association, rather than as an alternative measure of the population correlation coefficient."
For special case where there is no duplicates, formula see: https://www.hackerrank.com/challenges/s10-spearman-rank-correlation-coefficient/tutorial
https://www.hackerrank.com/challenges/s10-least-square-regression-line/tutorial
Lineal Regression
Ŷ̂ = a + bX ("Y hat" - estimate of Y)
To estimate Y based on X, assuming Y & X lineal correlated.
Least Square Regression Line makes the vertical distance from the data points to the regression line as small as possible. It minimizes the variance (sum of the squares of the errors).
There are other lineal regression techniques.
Formula see hackerrank link.
If Y is linearly dependent on X1...Xm, then
Ŷ̂ = a + b1X1 + b2X2 + ... + bmXm
Set theory signs: https://www.mathsisfun.com/sets/symbols.html