Home > Learning > BI/DS Tutorials and Workshops > 2026 Summer Workshops
Now registering: SC INBRE 2026 Biostatistics Summer Courses — 3 one-week courses. Free. Limited space available.
Home > Learning > BI/DS Tutorials and Workshops > 2026 Summer Workshops
Choose from any of three one-week courses or take all three!
Four hours/day – 9 am to 11 am (lecture and discussion) and 1 to 3 pm (R lab and application).
Virtual or In-person at USC Columbia, Discovery Building Computing Lab, 4th floor, 915 Greene St., Suite 303B, Columbia 29208 (driving directions); Paid parking is available at the Innovista Parking Garage, 821 Park St., Columbia (driving directions).
Please note, there is a one-hour break for lunch. For those attending in person, although we will provide water, tea and coffee with some pastries in one of our conference rooms beside the lab, we will not be providing lunch. Either plan to bring your own or be aware that there are plenty of choices of eateries within a 5-minute walk of the location. No food is allowed in the Computer Lab.
Course Presenter: Dr. Alexander McLain, University of South Carolina Arnold School of Public Health, Professor of Epidemiology and Biostatistics
Week 1 (June 1-5): Foundations of Data Science in R
Week 2 (June 8-12): Statistical Modeling
Week 3 (June 15-19): Bioinformatics & High-Dimensional Data
9 to 11 am: Lecture and discussion
1 to 3 pm: R Lab and application
Register by May 22
Space is limited –25 in-person and 25 virtual with priority given to SC INBRE participants. A waitlist will be formed once capacity is reached.
Target: Assumes no prior R experience. All participants working with biological data.
Monday, June 1, Day 1
MORNING
Course Overview & Orientation to Data Science
Why data science in biology? Course logistics, expectations, reproducibility principles.
AFTERNOON
Getting Started with R and RStudio
Why data science in biology? Course logistics, expectations, reproducibility principles.
Tuesday, June 2, Day 2
MORNING
R Programming Fundamentals
Data types, vectors, matrices, lists, data frames; indexing; control flow (if/else, for loops); writing functions.
AFTERNOON
Working with Data in R
Importing CSV/Excel files; inspecting data (str, summary, head); basic data manipulation with base R.
Wednesday, June 3, Day 3
MORNING
Data Wrangling with the tidyverse
Tidy data principles; dplyr verbs (filter, select, mutate, group_by, summarize); pipes (%>%); tidyr (pivot_longer, pivot_wider).
AFTERNOON
tidyverse Lab
Hands-on cleaning and reshaping a messy biological dataset; joining tables; handling missing data.
Thursday, June 4, Day 4
MORNING
Data Visualization with ggplot2
Grammar of graphics; geom types (point, bar, box, line, histogram, density); faceting; themes; color palettes for biological data.
AFTERNOON
ggplot2 Lab
Recreating publication-quality figures from genomics/ecology datasets; customizing axes, legends, and themes.
Friday, June 5, Day 5
MORNING
Probability, Distributions, and Statistical Inference
Random variables; common distributions (Normal, Binomial, Poisson); Central Limit Theorem; p-values, confidence intervals, and their correct interpretation.
AFTERNOON
Simulation & Inference Lab
Simulating data in R; visualizing distributions; one- and two-sample t-tests; chi-square tests; Wilcoxon rank-sum test; interpreting output.
Target: Applied regression methods for quantitative, binary, time-to-event, and clustered outcomes.
Monday, June 8, Day 6
MORNING
Simple and Multiple Linear Regression
Model formulation; OLS estimation; interpretation of coefficients; assumptions; R² and model fit; introduction to confounding.
AFTERNOON
Linear Regression Lab
Fitting lm() models; diagnostic plots (residuals, Q-Q, leverage); testing assumptions; applying to a quantitative biological trait.
Tuesday, June 9, Day 7
MORNING
Model Selection and Variable Importance
Overfitting and bias-variance tradeoff; AIC/BIC; stepwise selection (and its limitations); introduction to cross-validation.
AFTERNOON
Model Selection Lab
Comparing nested models; using step() and AIC; k-fold cross-validation with caret or rsample; interpreting results critically.
Wednesday, June 10, Day 8
MORNING
Logistic Regression and Binary Outcomes
Generalized linear models; logit link; odds ratios and their interpretation; model diagnostics; introduction to classification metrics.
AFTERNOON
Logistic Regression Lab
Fitting glm() for case-control data; computing and plotting ROC curves (pROC); evaluating sensitivity/specificity; applying to a disease outcome dataset.
Thursday, June 11, Day 9
MORNING
Survival Analysis
Censoring and time-to-event data; Kaplan-Meier estimator; log-rank test; Cox proportional hazards model; checking the PH assumption.
AFTERNOON
Survival Analysis Lab
KM curves with survminer; log-rank tests; fitting coxph(); visualizing hazard ratios; applying to a clinical or ecological dataset.
Friday, June 12, Day 10
MORNING
Mixed Models and Clustered/Repeated Data
Why standard regression fails with clustered data; random intercepts and slopes; fixed vs. random effects; model interpretation; ICC.
AFTERNOON
Mixed Models Lab
Fitting lme4::lmer() and glmer(); random effect structures; model comparison; applying to longitudinal biological data.
Target: Genomics workflows, dimensionality reduction, penalized regression, and reproducible reporting.
Monday, June 15, Day 11
MORNING
Genomics Data: Structure, Formats, and Public Databases
FASTQ, BAM, VCF, count matrix formats; overview of NCBI/GEO/dbGaP; accessing public datasets; data provenance and metadata.
AFTERNOON
Accessing Public Genomics Data
Using GEOquery and Biobase to retrieve expression datasets; exploring metadata; quality assessment with basic EDA.
Tuesday, June 16, Day 12
MORNING
Differential Expression Analysis
RNA-seq workflow overview (alignment → counts → DE); negative binomial models; DESeq2/edgeR framework; normalization strategies.
AFTERNOON
DESeq2 Lab
Full DESeq2 workflow: importing count data, size factor normalization, dispersion estimation, Wald/LRT tests, results tables.
Wednesday, June 17, Day 13
MORNING
Multiple Testing, FDR, and Visualization of High-Dimensional Results
Family-wise error rate vs. FDR; Bonferroni, Benjamini-Hochberg; q-values; volcano plots; MA plots; heatmaps.
AFTERNOON
Multiple Testing & Visualization Lab
Applying p.adjust(); generating volcano and MA plots with ggplot2; hierarchical clustering and heatmaps with pheatmap/ComplexHeatmap.
Thursday, June 18, Day 14
MORNING
Dimensionality Reduction and Penalized Regression
PCA and its geometric interpretation; scree plots; biplots; introduction to LASSO/ridge/elastic net; coordinate descent; tuning λ.
AFTERNOON
PCA and glmnet Lab
prcomp() and factoextra for PCA; fitting penalized regression with glmnet; cross-validated λ selection; coefficient path plots; interpreting sparse solutions.
Friday, June 19, Day 15
MORNING
Reproducible Research and Course Capstone
R Markdown / Quarto for reproducible reporting; literate programming; project organization best practices; version control concepts.
AFTERNOON
Capstone Lab & Presentations
Students produce a short reproducible analysis report (R Markdown/Quarto) integrating skills from the course; brief group presentations and discussion.