This is the course webpage for the course 'Statistical Applications with R', offered in PGP Term IV, AY 2022 - 23.
Advanced R, by Hadley Wickham.
R for Data Science, by Hadley Wickham
An Introduction to Statistical Learning, with applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani.
Data Visualization, by Kieran Healy
Welcome to SAR 2022! [html version]
Session 1 - 2: Intro to R, Handling data types and data structures in R [html version]
Session 3: Data Visualization using ggplot2 [html]
Session 5: Functions and conditional executions [html]
Session 6: Exploratory Data Analysis in R [html] [Datasets for EDA practice]
Session 7: Monte Carlo simulations [slides] [html], SAR Midterm 2020 Question paper, Solutions, NFHS dataset
Session 8: Introduction to R Markdown [.Rmd file examples], Linear Regression Models --- Simple Linear Regression, Regression diagnostics [html] [Advertising dataset]
Session 9 - 10: Multiple Linear Regression [html] [Credit dataset]
Session 11: Variable Selection methods: Best Subset, forward and backward stepwise selection [html]
Exercises - ISLR Chapter 6 - Problems 1, 8, 10, 11 (only subset selection method)
Session 12: Shrinkage methods in Regression: Ridge regression, Lasso [html]
Shrinkage methods Exercises - ISLR Chapter 6 - Problem 9 (a) - (d), (g), 11 (ridge and lasso methods)
Session 13 - 14: Classification: Logistic Regression [html]
Logistic Regression exercises - ISLR Chapter 4 - Problem 10, 11, 13 (only logistic regression parts)
Session 15: Classification: LDA, QDA [html]
Classification Exercises: ISLR Chapter 4 - Problem 10, 11, 13 (LDA ,QDA)
Session 16: Classification: kNN [html], Class Imbalance problem [html] [SMOTE paper]
Session 17 - 18: Principal Component Analysis [html] [Market Basket Analysis dataset]
Session 19: k-means Clustering [html], Clustering with PCA [html]
Session 20: Hierarchical Clustering [html], Practice problems [SAR 2020 Endterm] [Solutions]
A vital part of this course is the group project component.
Students have to form groups of sizes 7 - 8. The group details need to be shared by July 01, 2022, via Google Sheets.
Students who are auditing the course are encouraged to do group projects as well. But it is advisable that you create separate groups and not with those enrolled for this course.
Students are free to choose any statistical applications project to work on.
All the groups are required to submit a project proposal outlining the problem statement, goals, and related methods plus data sources, and/or other references. I shall review all the proposals and share my feedback. Last date to submit project proposals is July 25, 2022.
The student groups need to submit their final project reports via a Google form (to be shared in due time) along with related R codes, no later than the last day of the current term. Projects shall be evaluated based on their relevance, rigour, and approach. If I find it necessary, I shall be meeting the groups individually for a group viva regarding their work.
Last date for final project submission: 31 August, 2022.