Advanced Regression Methods for Population Health

Goals: This course provides an overview of modern statistical approaches to longitudinal and correlated data. It seeks to bridge the gap between statistical theory and real-world application by mixing methodological lectures with case studies using diverse types of data arising in health sciences research. Common topics with observational data, such as missing data and weighting techniques, will also be discussed. The main emphasis is on the practical aspects of (clustered-)data analysis. After taking the course, students will be able to:

      • Extend the knowledge of regression analysis beyond ordinary linear models
      • Understand the features of correlated data and their implications in drawing inference
      • Construct proper linear and generalized linear models for longitudinal and clustered data
      • Understand the assumptions needed for estimation and inference
      • Implement the inference procedures to solve real-world problems using statistical packages such as SAS and R
      • Use diagnostic tools to assess model fit
      • Interpret and present the analytic results to answer substantive questions

Prerequisites: Students are expected to have acquired basic knowledge in statistical concepts such as random variables, expectation, and variance, and to have taken a course in ordinary linear models. Prior exposure to matrix algebra and experience with statistical packages such as SAS and R will be helpful but are not required.

Time and Location: MW 2:30—3:45pm, HSLC 1220.

Course material: Course material is adapted from the book Fitzmaurice, G. M., Laird, N. M., and Ware, J. H. (2012). Applied Longitudinal Analysis. New York: Wiley and content of its companion website.

      • Lecture 1. Introduction [slides]
      • Lecture 2. Linear Regression, Maximum Likelihood, and ANOVA [slides; SAS code]
      • Lecture 3. Descriptive and Graphical Analysis Using SAS [slides; SAS code]
      • Lecture 4. Single-Group Analysis and General Linear Model [slides; SAS code]
      • Lecture 5. Estimation and Inference in General Linear Models [slides]
      • Lecture 6. Analysis of Response Profiles in SAS [slides; SAS code]
      • Lecture 7. More on Analysis of Response Profiles [slides; SAS code]
      • Lecture 8. Modelling the Mean: Parametric Curves [slides; SAS code]
      • Lecture 9. Modelling the Covariance [slides; SAS code]
      • Lecture 10. Linear Mixed Effects Models [slides; SAS code]
      • Lecture 11. Prediction in Mixed Effects Models [slides; SAS code]
      • Lecture 12. Design of Longitudinal Studies [slides]
      • Lecture 13. Residual Analysis and Diagnostics [slides; SAS code]
      • Lecture 14. Generalized Linear Models for Longitudinal Data [slides; SAS code]
      • Lecture 15. Marginal Models: Generalized Estimating Equations [slides; SAS code]
      • Lecture 16. Case Studies of Marginal Models [slides; SAS code]
      • Lecture 17. Generalized Linear Mixed Effects Models [slides; SAS code]
      • Lecture 18. Contrasting GLMM and GEE [slides; SAS code]
      • Lecture 19. Missing Data and Multiple Imputation [slides; SAS code]
      • Lecture 20. Inverse Probability Weighting for Missing Data [slides; SAS code]
      • Lecture 21. Smoothing Longitudinal Data: Semiparametric Regression Models [slides; SAS code]
      • Lecture 22. Multilevel Models [slides; SAS code]
      • Lecture 23. Special Topic: Repeated-Measures Designs [slides; SAS code]
      • Lecture 24. Review and Prospects [slides]


Datasets: fev1_baseline.txt (baseline data of the six-cities study); tlc-data.txt (the TLC trial)

Student feedback [Spring 2019]