(Old Version)
STAT 3005
Nonparametric Statistics (2022-23 Fall)
Class Information
Class time: W 0930-1215
Location: MMW LT2
Outline: <Download here>
Instructor
Name: Kin Wai CHAN
Email: kinwaichan@cuhk.edu.hk
Office: LSB 115
Tel: 3943 7923
Office hour:
(i) I have an open door policy. Feel free to drop by anytime and ask me questions.
(ii) Because of pandemic, you may make an appointment with me for a ZOOM meeting.
Teaching Assistants
Email: kaipanchu@link.cuhk.edu.hk
Office: LSB G32
Tel: 3943 8535
Xu (Lexi) LIU
Email: lexilxu@link.cuhk.edu.hk
Office: LSB 143
Tel: 3943 1747
Description
This course introduces a wide variety of nonparametric techniques for performing statistical inference and prediction, emphasizing both conceptual foundations and practical implementation. Basic theoretical justification is also provided. The content covers three broad themes: (i) rank-type and order-type methods for handling location, dispersion, correlation, distribution and regression problems, (ii) resampling-type procedures for testing and assessing precision, and (iii) smoothing-type techniques for estimation and prediction. Topics include Wilcoxon signed-rank test, Mann-Whitney rank sum test, Spearman’s rho, Kendall’s tau, Kruskal-Wallis test, Kolmogorov-Smirnov test, bootstrapping, Jackknife, subsampling, permutation tests, kernel method, k-nearest neighbour, tree-based method, classification, etc.
Note: No prerequisite but knowledge of Stat 2001, 2005 and 2006 is strongly recommended.
Textbooks
A self-contained lecture note is the main source of reference. Complementary textbooks include
(Major) Bonnini, S., Corain, L., Marozzi, M., and Salmaso, L. (2014). Nonparametric hypothesis testing: rank and permutation methods with applications in R. Wiley.
(Major) Wasserman, L. (2006). All of nonparametric statistics. Springer.
(Minor) Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.
(Minor) James, G., Witten, D., Hastie, T., and Tibshirani, R (2013). An Introduction to Statistical Learning: with Applications in R. Springer.
Learning outcomes
Upon finishing the course, students are expected to
appreciate the beauty of nonparametric methods;
apply a wide variety of nonparametric techniques to perform inference, prediction and learning tasks;
understand the pros and cons of parametric and nonparametric methods;
master the skills in deriving basic theoretical properties of nonparametric methods;
use computer programs to perform nonparametric statistical analysis for real-life problems.
Assessment and Grading
There are three main assessment components, plus a bonus component.
a (out of 100) is the average score of approximately eight assignments with the lowest two scores dropped;
m (out of 100) is the score of mid-term project; and
f (out of 100) is the score of final project.
b (out of 2) is the bonus points, which will be given to students who actively participate in class.
The total score t (out of 100) is given by
t = min{100, 0.3a + 0.2max(m,f) + 0.5f + b}
If min(t, f ) < 30, the final letter grade will be handled on a case-by-case basis. Otherwise, your letter grade will be in the A range if t ≥ 85, at least in the B range if t ≥ 65, at least in the C range if t ≥ 55.
Important note: For the most updated information, please always refers to the course outline announced by the course instructor in Blackboard, which shall prevail the above information if there is any discrepancy.
Syllabus
Part I: Philosophy and Foundation
Introduction: history, philosophy, examples.
Statistical foundation: basic testing and estimation, statistical limiting theorems.
Part II: Rank-type and order-type methods
Location and scale problems: sign test, signed-rank test, rank sum test, Ansari–Bradley test.
Correlation problem: Spearman’s ρ, Kendall’s τ, Bergsma–Dassios’s correlation
Distribution problem: Kolmogorov–Smirnov test, Cram ́er–von Mises test, Anderson-Darling test.
Part III: Resampling-type procedure
Permutation tests: ideas of randomization, examples of permutation tests.
Bootstrap and Subsampling: different bootstrapping methods, Jackknife, Subsampling.
Part IV: Smoothing-type estimation and learning techniques
Density estimation: histogram, kernel method, bandwidth selection.
Nonparametric regression: Nadaraya–Watson kernel estimator, local polynomial estimator.
Other topics: (a) classification, (b) Bayesian nonparametric, (c) rank-type regression, (d) k-nearest neighbor, ...
Lecture Notes
All right reserved. Do not distribute without permission from the author.
Front matters
Part I: Philosophy and Foundation
Part II: Rank-type and order-type methods
Chapter 3: Location and scale problems
Chapter 4: Correlation problem
Chapter 5: Distribution problem
Part III: Resampling-type procedures
Part IV: Smoothing-type estimation and learning techniques
Appendices
Appendix A: Basic Mathematics
Appendix B: Basic probability
Appendix C: Basic Statistics
Appendix D: Basic programming in R --- for students who want to review; read Lectures 2 and 3 in RMSC 1101
Appendix E: R-codes used throughout the courses (this folder will be updated from time to time)
P.S.: Not all materials in the appendices are directly useful for this course. I will tell you which parts are useful when we need them.