Statistics

Median: for odd number of samples, middle element with equal number of samples grater and less than value; for even number of samples, the average of the 2 middle elements

Mode: the element(s) that occur more frequently in a data set. A set can be multimodal.

Precision: number of significant digits (123.45 & 0.012345 both have precision 5)

Scale: number of significant digits to the right of decimal (123.45 has scale 2)

Quartiles & Interquatiles

Quatiles : 3 points, split ordered data set into 4 equal groups

Q1: median of set between smallest & median
Q2: median of data set
Q3: median of set between median and largest

Usage: to summarize a group of numbers and give a picture (descriptive statistics)

Interquartile range (IQR):

=Q3 - Q1

Usage: also called "midspread", "middle 50%", "H-spread", measurement of statistical dispersion

Normal Distribution, Binomial distribution, Standard Deviation

Normal Distribution (bell curve)

https://www.mathsisfun.com/data/standard-normal-distribution.html

Importance - it fits many natural phenomena.

Also known as - Gaussian distribution / bell curve

Is:

a probability function (how values of a variable are distributed)
symmetric

Probability Density:

https://www.hackerrank.com/challenges/s10-normal-distribution-1/tutorial

Standard Normal Distribution

μ = 0 (mean = 0, so distribution is symmetric to Y axis)

σ = 1

All normal distribution can be represented as standard normal distribution

https://www.hackerrank.com/challenges/s10-normal-distribution-1/tutorial

Cumulative Normal Distribution Probability

https://www.hackerrank.com/challenges/s10-normal-distribution-1/tutorial

  static double factorial(int n){

     if(n == 0) return 1;

     if(n == 1) return 1;

     return n*factorial(n-1);

 public static double erf(double z)

     int nTerms = 315;

     double runningSum = 0;

     for(int n = 0; n < nTerms; n++){

         runningSum += Math.pow(-1,n)*Math.pow(z,2*n+1)/(factorial(n)*(2*n+1));

     return (2/Math.sqrt(Math.PI))*runningSum;

/**

  * @param mu - mean

  * @param sigma - variance

  * @param x

  * @return cumulative normal distribution

*/

 public static double cumulativeNormalDistribution(double mu, double sigma, double x)

     double result=0.5 * (1.0 + erf((x-mu)/(sigma * Math.sqrt(2))));

     return result;

Binomial distribution (bean machine)

https://en.wikipedia.org/wiki/Binomial_distribution "the number of successes in a sequence of n independent (guessing) experiments"

https://en.wikipedia.org/wiki/Bean_machine According to the central limit theorem, the binomial distribution approximates the normal distribution provided that the number of rows and the number of balls are both large.

Variance

average of the squared differences from the Mean

Standard Deviation (σ)

How spread out numbers are; a standard way of "what's normal"

sigma = population standard deviation = square root of the Variance

N = the size of the population

x_i = each value from the population

μ = the population mean

Expectations:

+/- 1σ : 68%
+/- 2σ : 95%
+/- 3σ : 99.7%

Population (all that we are interested in) vs. Sample (a selection taken from Population)

Difference in the DIVISOR in calculating the variance (average of the squared difference)

Population: divisor is N

Sample: divisor is (N-1) : number of samples minues 1

Think it as a "correction" when data is only samples.... (any better explanation?)

Why Square?

No cancellation of positive & negative (so that difference can be 0 even nothing at mean)
Reflect also the "spread out"
- (-4, -4, +4, +4) vs (+7, +1, -6, -2)
- Exaggerate the bigger one (or more deviate one here)
- When the differences are more spread out (not all with same difference, some positive some negative), exaggerate it

Z Score (Standard Score)

https://en.wikipedia.org/wiki/Standard_score

Z - score is just "how many sigma away from mean"

z = ( x - μ ) / σ

Probability: Experiment, Sample Space, Event, Union/Intersection

In probability theory

Experiment: procedure can be indefinitely repeated & has well defined set of possible outcomes

Sample Space S: the set of possible outcomes

Event A: a set of ourcomes of an experiment (a subset of sample space S), being assigned a probability

Probability

P(A) = number of favorable outcomes / total outcomes

0 <= P(A) <= 1
P(S) = 1

P(A') (also written P(A^C)) is A not occur

P(A)+P(A') = 1

Compound Events

A ∪ B : A or B (union)
A ∩ B : A and B (intersection)

Mutual Exclusive Events or Disjoint

A and B mutually exclusive if A and B have no events in common:

A ∩ B = ∅ and P(A ∩ B) = 0

For disjoint events A & B, P(A∪B) = P(A)+P(B) // because there is no common events

Collectively Exhaustive Events

If union of A and B covers all events in sample space:

A∪B=S and P(A∪B)=1

Independent Event & Multiplication Rule

If occurance of A change the probability of occurance of B, they are dependent. Otherwise thay are independent.

If A & B are independent, then they occur in a sequence is the intersection (∩)

P (A ∩ B) = P(A) * P(B)

Conditional Probability, Bayes' Theorem

The probability of event B occurring that event A has already occurred is read "the probability of B given A" and is written: P(B|A)

P(A ∩ B) = P(A) * P(B|A)

P(B|A) meaning:

Event: B
Note: changing of Sample Space (differ from P(A), P(B), P(A∩B)): A & AB (...given A), not include A^C
NO restriction on the sequence (A first or B first does not matter)
Reads "B given A"

Bayes' Theorem

P(B|A)*P(A)

P(A|B) = –———————

P(B)

P(B|A) * P(A)

= ————————————————

P(B|A)*P(A) + P(B|A^C) * P(A^C)

Derivation:

P(A∩B)=P(A)*P(B|A)=P(B)*P(A|B)
P(B)=P(A∩B)+P(Ac∩B) // intersection of A&B, plus intersection of (not A) & B

Combination & Permutation

Permuation _nP_r

"r-element permutation of A" is
- ordered arrangement of
- r elements from
- a set A
- of n elements

_nP_r = n! / (n-r)! (0!=1)

Derivation:

Full permutation of n considered to be _nC_rthen full permutation of the r elements and the (n-r) elements: n! = _nC_r * r! * (n-r)!
_nP_r considered to be combination of r elements from n elements, then full permutation of the r elements: _nP_r = _nC_r * r!
Therefore : _nP_r = n! / (n-r)!

Combination _nC_r

"r-element combination of A": Pick out r elements from a set A of n elements, order insignificant
Typically written as C(n,r) or https://images.app.goo.gl/wdun2osHDoLZWagz8

_nC_r = _nP_r / r! = n! / (n-r)! / r!

Random Variable & Probability Mass Function (PMF)

https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library/random-variables-discrete/v/random-variables

https://en.wikipedia.org/wiki/Random_variable

Random Variable : map event to real number

A random variable (typical capital letter) X is:

real-valued function (meaning resulting real numbers) X : S →R
domain S is Sample Space (so means mapping sample space to real values)
R is Range & is Real Value
in which there is an event for each interval I where I ⊆ R
- "interval" see https://en.wikipedia.org/wiki/Interval_(mathematics)
- ⊆ : subset
- Understanding: each I belongs to R, there's at least one event
Also known as "a random variable, random quantity, aleatory variable, or stochastic variable"
Summary:
- X is a mapping, not a "variable" in algebra sense (you cannot give it a value).
- It maps events to real values
  - each value garanteed mapped by at least one event, possible many events
  - but (my understanding) event not garanteed to map to at least a value

Probability Mass Function (PMF) : chance of random variable be a value

(Wikipedia) is a function that
- gives the probability (output)
- that a discrete random variable is exactly equal to some value (input)
Sometimes it is also known as the discrete density function.
Summary: what's the chance X takes the exact input value?

Binomial Experiment (# of events in given # of trials)

Binomial Experiment

Also "Bernoulli trial"

n repeated trials
trials are independent
outcome success(s) or fail(f)
DID NOT MENTION chance of success/fail being equal, therefore tossing coins / bean machine is only special type of binomial experiment

Give random variable:

X(s) = 1
X(f) = 0

Consider p being probability of success, q being probability of failure

Probability Mass Function (PMF) of X is:

p(x) =
- 1-p≡q if x = 0 (triple bar meaning identical, interpretation: chance of fail)
- p if x = 1 (interpretation: chance of success)
- 0 otherwise
another expression:
- f(x) = p^x(1-p)^1-xfor X∈{0,1}

Binomial Process

A binomial process is:

A special binomial experiment
- number of success is x
- total number of trials is n
- probability of success of single trial is p
- probability of failure of single trial is q (q = 1-p)
- b(x,n,p) is the binomial probability, meaning:
  - the probability of exact x successes out of n trials

The random variable is the number of success x out of n trials.

Binomial Distribution (chance of x success in n trials)

The probability distribution for the binomian random variable.

b(x,n,p) = _nC_x * p^x * q^(n-x)

Derivation: n trials, probability of x success is:

the chance of all the x trials being success is p^x
the chance of all the n-x trials being failure is q^(n-x)
p^x * q^(n-x)the chance of one specific combination
_nC_x number of combinations of x successes in set of n trials

Cumulative Probability (chance of numbr of success in n trials fall in a range)

Cumulative distribution function (CDF) of X is function F_X(x):

F_X(x) = P(X <= x)

Is non-decreasing, value is cumulates all probabilities for values of X up to and including x.

For a range:

P(a<X≤b) = F_X(b)-F_X(a)

Geometric Distribution (# of trials for given # of events)

Negative Binomial Experiment (x success, probability of requiring n trials)

A statistical experiment:

n repeated trials
independent trials
outcome is success (s) or failure (f)
P(s) is same for every trial (independent)
continues until x successes are observed

Negative Binomial is:

Discrete random variable X: the number of experiments until x successes occurs

Negative Binomial Distribution

b*(x,n,p) = _(n-1)C_(x-1) · p^x · (1-p)^(n-x)

Derivition:

b*(x,n,p) - chance of n trials produces x successes for success rate being p
- p^x · (1-p)^(n-x) being the probability of one such combination that results in exact x success occurs
- _(n-1)C_(x-1) because the last trial must be success, number of combinations would be x-1 elements from set of n-1

https://www.desmos.com/calculator/vylzmh1ryc

Geometric Distribution (probability of requiring n trials for 1 success)

Is a special case of negative binomian distribution:

number of failures before the first success (x=1)

Random variable X is: X_i= 1 if i^th trial is a first success; 0 otherwise;

g(n,p) = (1-p)^(n-1)·p

Derivition: negative binomial distribution where x=1, or, the chance of first (n-1) fails then one success.

Name: "The word “geometric” has been used for about 2500 years to describe a geometric progression. That’s a sequence where the ratio of the next term to the present term is constant. For example, in the sequence 16,24,36,54,… , each term is 3/2 of the previous." (from https://www.quora.com/Why-are-geometric-distributions-called-geometric)

Poisson Distribution (Events in a Region)

Poisson Experiment (only know average number of success in a specific region)

Why (comparing with Binomial Distribution):

Problem with Binomial Distribution:
- Binary - in each trial result is only success/fail, where "trial" is occurrence in a time period, the model does not fit
- Needs number of trials & success probability known
Solution: divide "trial" (a unit time) smaller & smaller, hence Poisson Experiment
- rate λ (the expected value of x)
- n → ∞,
- p is infinitesimal (approaching zero)

https://towardsdatascience.com/poisson-distribution-intuition-and-derivation-1059aeab90d

https://www.youtube.com/watch?v=tA0piaEZOpE

Poisson Experiment is:

X -> number of occurrence
in a specific region (typically a continuous interval)
number of events in non-overlapping intervals is independent
average number of success λ in a specific region is known
- example: seen a lion (success) in a one-day safari (region)
the probability of exactly one event in a sufficiently short interval of length h is approximately λh (probability of success is proportional to size of region)
the probability of two or more event in a sufficiently short interval is essentially zero

Usage: to predict the probability of a given number of events occurring in a fixed interval of time

Poisson Distribution (probability of a given number of success in that region)

P(k, λ) = λ^ke^-λ / k!

e=2.71828
λ : average number of success in a specific region
k: variable, the actual number of success in a specific region
P(k, λ): probability of getting exactly k success

https://www.desmos.com/calculator/v5pf9rkc2u

Derivation:

https://towardsdatascience.com/poisson-distribution-intuition-and-derivation-1059aeab90d

https://en.wikipedia.org/wiki/Limit_(mathematics)

Central Limit Theorem (Population & Samples)

https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_probability/BS704_Probability12.html#:~:text=The%20central%20limit%20theorem%20states,will%20be%20approximately%20normally%20distributed.

If:

a population with mean μ and standard deviation σ
take sufficiently large (usually n>30) random samples from the population with replacement (put back)

then:

the distribution of the sample means will be approximately normally distributed
regardless of whether the source population is normal or skewed
If the population is normal, then the theorem holds true even for samples smaller than 30.

In fact, this also holds true even if the population is binomial, provided that min(np, n(1-p))> 5, where n is the sample size and p is the probability of success in the population. This means that we can use the normal probability model to quantify uncertainty when making inferences about a population mean based on the sample mean.

https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-1/tutorial

The central limit theorem (CLT) states that, for a large enough sample (), the distribution of the sample mean will approach normal distribution. This holds for a sample of independent random variables from any distribution with a finite standard deviation.

For the sample size n, sum Sn is close to normal distribution:

μ’ = n · μ
σ’ = √̅n̅ · σ

For the means of the samples:

μ″ = μ
σ″ = σ / √̅n̅

Pearson Correlation Coefficient (Lineal Correlation)

Correlation

https://en.wikipedia.org/wiki/Correlation_and_dependence

Between two random variables or bivariate data.

Examples:

Height of parent / children
Price of good / quantity of demand

Expectation Value

https://mathworld.wolfram.com/ExpectationValue.html

Is the weighted average of a random variable X.

Is intuitively the arithmetic mean of a large number of independent realizations of X. (wikipedia)

denotion & definition:

<f(x)>
E{f(x)}
= ∑f(x)P(x)

P(x) is the probability density function

Covariance

https://en.wikipedia.org/wiki/Covariance

https://mathworld.wolfram.com/Covariance.html

https://www.hackerrank.com/challenges/s10-pearson-correlation-coefficient/tutorial

cov(X,Y) - a measure of the joint variability (how they vary together) of two random variables (X,Y):

sign (+/-) :
- + if greater X correspondences greater Y
- - if greater X correspondences lesser Y
magnitude:
- not easy to interprete, not normalized & depends on magnitudes of variables

cov(X,Y) = ⟨(X-μ_X)(Y-μ_Y)⟩ = ⟨X Y⟩ - μ_X-μ_Y

⟨X⟩=μ_X and ⟨Y⟩=μ_Y

Pearson Correlation Coefficient (PCC)

Also known as:

Pearson's r

Interpretation:

Measurement of linear correlation
Value: -1 (total negative linear correlation) ~ 0 (no linear correlation) ~ +1 (total positive linear correlation)

ρX,Y = cov(X,Y) / (σ_X·σ_Y)

Spearman's Rank Correlation Coefficient (Rank Correlation)

Spearman's Rank Correlation Coefficient is:

Two random variables X & Y
RankX & RankY denote the respective ranks of each data point
- for duplications: duplicate ranks and less total ranks
the Pearson Correlation Coefficient of RankX & RankY

(Wikipedia) "It is common to regard these rank correlation coefficients as alternatives to Pearson's coefficient, used either to reduce the amount of calculation or to make the coefficient less sensitive to non-normality in distributions. However, this view has little mathematical basis, as rank correlation coefficients measure a different type of relationship than the Pearson product-moment correlation coefficient, and are best seen as measures of a different type of association, rather than as an alternative measure of the population correlation coefficient."

For special case where there is no duplicates, formula see: https://www.hackerrank.com/challenges/s10-spearman-rank-correlation-coefficient/tutorial

Least Square Regression Line (Lineal Correlation)

https://www.statisticshowto.com/least-squares-regression-line/#:~:text=The%20Least%20Squares%20Regression%20Line,of%20squares%20of%20the%20errors).

https://www.hackerrank.com/challenges/s10-least-square-regression-line/tutorial

Lineal Regression

Ŷ̂ = a + bX ("Y hat" - estimate of Y)

To estimate Y based on X, assuming Y & X lineal correlated.

Least Square Regression Line makes the vertical distance from the data points to the regression line as small as possible. It minimizes the variance (sum of the squares of the errors).

There are other lineal regression techniques.

Formula see hackerrank link.

Multiple Regression

If Y is linearly dependent on X1...Xm, then

Ŷ̂ = a + b1X1 + b2X2 + ... + bmXm

References

Set theory signs: https://www.mathsisfun.com/sets/symbols.html

Google Sites

Report abuse