Basic Statistics
1) Computing 95% confidence intervals
Mean +/- 1.96 * std/sqrt(no of samples)
ci_alpha <- 0.05
qnorm(ci_alpha / 2)
qnorm(1 - (ci_alpha/2))
# how many standard deviations from the mean must you go to capture 95% of the scores
meaning of confidence intervals
SUMMARY: if you repeat the experiment 100 times, 95 times the true value of the mean will fall within this interval.
This does not mean than with 95% probability, the mean will fall in this interval
another explanation of confidence intervals by ISLR people (Rob Tibshirani)
5) t statistic and confidence intervals (link) (link) (meaning of confidence intervals)
6) Power calculation in R (link)
sample size
power.t.test
power.t.test(n = NULL, power = .95, sd = 5, alternative = "two.sided", sig.level = 0.001, delta = 0.1)
7) Type 1 and Type 2 errors and p value (link) (link) (link to tutorials on power calculations)
8) pvalue or p-value (link) (link)
FDR VERY GOOD (link)
9) F1-score (link)
10) Bias variance tradeoff
* https://www.youtube.com/watch?v=VaN1RUDuioQ&list=PLOg0ngHtcqbPTlZzRHA2ocQZqB1D_qZ5V&index=5
* http://scott.fortmann-roe.com/docs/BiasVariance.html
* VERY GOOD picture
* https://github.com/neelsoumya/basic_statistics/blob/master/bias_variance.png
12) Survival analysis (using survminer) (cheatsheet)
VERY GOOD tutorial on survival models and time to event models (link)
Hazard ratio and survival time to event models (link)
BEST explanation of hazard ratio (link) (link)
Survival analysis code (my code on bitbucket)
Using strata in survival models (link)
Weights in survival models (link)
Advanced analysis using strata (link)
Time-varying models and time varying covariates in survival models (link) (link)
13) Meta-analysis (slides) (link) (see my software page for more meta-analysis code) (tutorial) (very good tutorial) (very good code)
14) GWAS (link) (genes and probability course VERY GOOD link)
15) Bias of an estimator (link)
16) Basic epidemiology and statistics like rate, incidence, risk, prevalence (link)
2 by 2 table in epidemiology (link)
Screenshots of basic epidemiology (link) in the metafor source code bitbucket
VERY GOOD resources for epidemiology (link)
17) VERY GOOD collection of screenshots for learning epidemiology and basic statistics (link) (link)
10) Propensity score matching or matched case control (link) (link)
11) Stratified sampling (link) (R code)
23) Poisson process (link)
24) ROC, AUC curve (link)
26) Chi squared statistic (link)
Chi-squared distribution (link)
Chi-squared test with an example from Hardy-Weinberg equilibrium (link)
Chi-squared contingency table (link)
in R
hist(stats::rchisq(n = 100, df = 3))
hist(stats::rchisq(n = 100, df = 30))
stats::chisq.test( x=c(1,2,3), y=c(9,10,11) )
VERY GOOD explanation of Chi squared and hypothesis test and p value or p-value (link) (screenshot)
28) Great explanation of PCA (principal components analysis) (link) (matlab code) (code)
30) Teaching materials for basic statistics and machine learning from a bootcamp (code, tutorials, notes) (link)
31) Basic statistics (ANOVA, t-test, F-test, etc) (link)
Linear models, ANOVA, mixed effects, fixed effects, random effects and other basics (link) (tutorial 1) (tutorial 2)
Beautiful VERY GOOD tutorial on how most statistical tests are related to linear models (link)
Coursera course on basic statistics (link, github)
32) Boostrap
Simple example to do bootstrap in python and R (on bitbucket)
R syntax for generating confidence interval using bootstrap
d <- data.frame(w=rnorm(100),
x=rnorm(100),
y=sample(LETTERS[1:2], 100, replace=TRUE),
z=sample(LETTERS[3:4], 100, replace=TRUE) )
# do GLM on this new data frame
fm2 <- glm(y ~ w + x + z, data=d, family=binomial)
confint(object = fm2, method = 'boot')
33) Linear models and interaction effects (ISLR video)
1) Multivariate normal distribution (The Multivariate Gaussian Distribution by Chuong B. Do)
2) Review of probability theory (Review of Probability Theory by Arian Maleki and Tom Do, Stanford University)
5) Excellent description of Mahalanobis distance (link, by Rick Wicklin)
8) Wilcoxon rank-sum
10) G-test
11) Fitting models to biological data using linear and nonlinear regression (GraphPad book)
12) Mediation analysis (link)
13) Distributions
Poisson
hist(dpois(x = seq(0, 20, by=1), lambda = 2.5))
14) Curated set of resources for quick study on my github