Basic Statistics

1) Computing 95% confidence intervals

Mean +/- 1.96 * std/sqrt(no of samples)

ci_alpha <- 0.05

qnorm(ci_alpha / 2)

qnorm(1 - (ci_alpha/2))

# how many standard deviations from the mean must you go to capture 95% of the scores

meaning of confidence intervals

SUMMARY: if you repeat the experiment 100 times, 95 times the true value of the mean will fall within this interval.

This does not mean than with 95% probability, the mean will fall in this interval

another explanation of confidence intervals by ISLR people (Rob Tibshirani)

5) t statistic and confidence intervals (link) (link) (meaning of confidence intervals)

6) Power calculation in R (link)

sample size

power.t.test

power.t.test(n = NULL, power = .95, sd = 5, alternative = "two.sided", sig.level = 0.001, delta = 0.1)

7) Type 1 and Type 2 errors and p value (link) (link) (link to tutorials on power calculations)

8) pvalue or p-value (link) (link)

9) q value (link) (link)

FDR VERY GOOD (link)

9) F1-score (link)

10) Bias variance tradeoff

* https://www.youtube.com/watch?v=VaN1RUDuioQ&list=PLOg0ngHtcqbPTlZzRHA2ocQZqB1D_qZ5V&index=5

* http://scott.fortmann-roe.com/docs/BiasVariance.html

* VERY GOOD picture

* https://github.com/neelsoumya/basic_statistics/blob/master/bias_variance.png

12) Survival analysis (using survminer) (cheatsheet)

VERY GOOD tutorial on survival models and time to event models (link)

Hazard ratio and survival time to event models (link)

BEST explanation of hazard ratio (link) (link)

Survival analysis code (my code on bitbucket)

Using strata in survival models (link)

Weights in survival models (link)

Advanced analysis using strata (link)

Time-varying models and time varying covariates in survival models (link) (link)

13) Meta-analysis (slides) (link) (see my software page for more meta-analysis code) (tutorial) (very good tutorial) (very good code)

(another very good tutorial)

14) GWAS (link) (genes and probability course VERY GOOD link)

15) Bias of an estimator (link)

16) Basic epidemiology and statistics like rate, incidence, risk, prevalence (link)

2 by 2 table in epidemiology (link)

Screenshots of basic epidemiology (link) in the metafor source code bitbucket

VERY GOOD resources for epidemiology (link)

17) VERY GOOD collection of screenshots for learning epidemiology and basic statistics (link) (link)

10) Propensity score matching or matched case control (link) (link)

11) Stratified sampling (link) (R code)

23) Poisson process (link)

24) ROC, AUC curve (link)

26) Chi squared statistic (link)

Chi-squared distribution (link)

Chi-squared test with an example from Hardy-Weinberg equilibrium (link)

Chi-squared contingency table (link)

in R

hist(stats::rchisq(n = 100, df = 3))

hist(stats::rchisq(n = 100, df = 30))

stats::chisq.test( x=c(1,2,3), y=c(9,10,11) )

VERY GOOD explanation of Chi squared and hypothesis test and p value or p-value (link) (screenshot)

28) Great explanation of PCA (principal components analysis) (link) (matlab code) (code)

30) Teaching materials for basic statistics and machine learning from a bootcamp (code, tutorials, notes) (link)

31) Basic statistics (ANOVA, t-test, F-test, etc) (link)

Linear models, ANOVA, mixed effects, fixed effects, random effects and other basics (link) (tutorial 1) (tutorial 2)

Beautiful VERY GOOD tutorial on how most statistical tests are related to linear models (link)

Coursera course on basic statistics (link, github)

32) Boostrap

Simple example to do bootstrap in python and R (on bitbucket)

R syntax for generating confidence interval using bootstrap

d <- data.frame(w=rnorm(100),

x=rnorm(100),

y=sample(LETTERS[1:2], 100, replace=TRUE),

z=sample(LETTERS[3:4], 100, replace=TRUE) )

# do GLM on this new data frame

fm2 <- glm(y ~ w + x + z, data=d, family=binomial)

confint(object = fm2, method = 'boot')

33) Linear models and interaction effects (ISLR video)

1) Multivariate normal distribution (The Multivariate Gaussian Distribution by Chuong B. Do)

2) Review of probability theory (Review of Probability Theory by Arian Maleki and Tom Do, Stanford University)

5) Excellent description of Mahalanobis distance (link, by Rick Wicklin)

6) Deming regression

8) Wilcoxon rank-sum

9) LOESS regression

10) G-test

11) Fitting models to biological data using linear and nonlinear regression (GraphPad book)

12) Mediation analysis (link)

13) Distributions

Poisson

hist(dpois(x = seq(0, 20, by=1), lambda = 2.5))

14) Curated set of resources for quick study on my github