Statistics

sampling
- with replacement
  - idependent probabilty
- without replacement
  - conditional probability
    - P(A|B) = P(A∩B) / P(B)
- without replacemant

Distribution

distribution
- discrete
- continious
  - uniform
  - normal distribution
    - mean
    - standard deviation
  - bimodal etc.
- without replacement
  - binominal
    - true, false
    - sequence of indipendent events
      - expected value = n * p <=> p = expected value / n

Expected value

expected valie = mean of propability distribution

the bigger the sample the closer the mean of the sample to the expected value i.e. law of large number

Central Limit Theory

the bigger the sample sice, the closer the result gets to the normal distribution i.e. mean of full population

Poisson Distribution

probability of events over a fixed period of time

lambda is the average events in the time period i.e. the expected value

Hypothesis Testing

define target population
state null hypthesis
- assume nothing
  - indipendent variable
state alternative hypothesis
- opposing null hypothesis
  - dependent variable
collect sample data
- the bigger the sample, the closer the mean of the sample to the mean of the population (i.e. central limit theory)
test sample statistically
- experiment
  - What is the effect of the treatment on the response
    - treatment: indepentent variable
    - respone: dependent variable
  - controlled experiment
    - treatment group vs. non-treatment control group
      - A/B-testing
    - avoid bias
      - randomisation
      - blinding
        double-blinding
draw conclusion about sample

Null Hypthesis

The null hypthesis H₀ claims that the studied effect does not exist. There is no relationsship between two sets of data.

Alternative Hypothesis

The alternative hythesis H₁ claims that the studied effest exists. There is a relationship between two sets of data.

Correlation

correlation coefficient
- What do we know about y when we know x?
- relationships between -1 and +1
  - 0.99 very strong
  - 0.75 strong
  - 0.50 moderate
  - 0.20 weak
  - 0.0 0 none
- Don't confuse correlation wiht causation!
  - confounding variables

Probabilitas Value

The p-value is the probability to archieve the result or more given the null hypotesis is true.

significance level α = 0.05

p < α 🠖 result is statistically significant

Errors

type I error: null hypthesis is actually true, but it is falsely rejected as false, i.e. false positive

type II errror: null hypothesis is actually false, but is falsely accepted as true, i.e. false negative

Regression Analysis

changes in the independent variable correlates with changes in the dependent variable

Confusion Matrix

Table of Confusion

predicted condition
pos

predicted condition
neg

total

actual result condition pos

true positive TP
1-β

false negative FN
type II error, β

TP + FN

sensitivity

TP / (TP + FN)

actual result condition neg

false positive FP
type I error, α

true negative TN
1-α

FP + TN

specificity

TN / (FP + TN)

statistics

true positive: predicted positive, actual positive
false positive (i.e. type I error): predicted positive, but actual negative -- like smoke alarm going off without smoke
true negative: predicted negative, actual negative
false negative (i.e. type II error): predicted negative, but actual positive -- like smoke alarm not going off from smoke
- - sensitivity = (TP / (TP + FN))
    - - rather flag then not flag
  - specificity = (TN / (FP + TN ))
    - - rather not flag than flag

Page updated

Google Sites

Report abuse