STAT 3005
Nonparametric Statistics (2023-24 Fall)
Class Information
Class time: W 0930-1215
Location: YC Liang Hall 104
Outline: see Blackboard
Instructor
Name: Kin Wai CHAN
Email: kinwaichan@cuhk.edu.hk
Office: LSB 115
Tel: 3943 7923
Office hour:
(i) I have an open door policy. Feel free to drop by anytime and ask me questions.
(ii) Because of pandemic, you may make an appointment with me for a ZOOM meeting.
Teaching Assistants
Cheuk Hin (Andy) CHENG
Email: andychengcheukhin@link.cuhk.edu.hk
Office: LSB G32
Tel: 3943 8535
Wing Tung (Toto) KEUNG
Email: wtkeung@link.cuhk.edu.hk
Office: LSB 123
Tel: 3943 8522
Description
This course introduces a wide variety of nonparametric techniques for performing statistical inference and prediction, emphasizing both conceptual foundations and practical implementation. Basic theoretical justification is also provided. The content covers three broad themes: (i) rank-type and order-type methods for handling location, dispersion, correlation, distribution and regression problems, (ii) resampling-type procedures for testing and assessing precision, and (iii) smoothing-type techniques for estimation and prediction. Topics include Wilcoxon signed-rank test, Mann-Whitney rank sum test, Spearman’s rho, Kendall’s tau, Kruskal-Wallis test, Kolmogorov-Smirnov test, bootstrapping, Jackknife, subsampling, permutation tests, kernel method, k-nearest neighbour, tree-based method, classification, etc.
Note: No prerequisite but knowledge of Stat 2001, 2005 and 2006 is strongly recommended.
Textbooks
A self-contained lecture note is the main source of reference. Complementary textbooks include
(Major) Bonnini, S., Corain, L., Marozzi, M., and Salmaso, L. (2014). Nonparametric hypothesis testing: rank and permutation methods with applications in R. Wiley.
(Major) Wasserman, L. (2006). All of nonparametric statistics. Springer.
(Minor) Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.
(Minor) James, G., Witten, D., Hastie, T., and Tibshirani, R (2013). An Introduction to Statistical Learning: with Applications in R. Springer.
Learning outcomes
Upon finishing the course, students are expected to
appreciate the beauty of nonparametric methods;
apply a wide variety of nonparametric techniques to perform inference, prediction and learning tasks;
understand the pros and cons of parametric and nonparametric methods;
master the skills in deriving basic theoretical properties of nonparametric methods;
use computer programs to perform nonparametric statistical analysis for real-life problems.
Assessment and Grading
There are three main assessment components, plus a bonus component.
a (out of 100) is the average score of approximately eight assignments with the lowest two scores dropped;
m (out of 100) is the score of mid-term project; and
f (out of 100) is the score of final project.
b (out of 2) is the bonus points, which will be given to students who actively participate in class.
The total score t (out of 100) is given by
t = min{100, 0.3a + 0.2max(m,f) + 0.5f + b}
If min(t, f ) < 30, the final letter grade will be handled on a case-by-case basis. Otherwise, your letter grade will be in the A range if t ≥ 85, at least in the B range if t ≥ 65, at least in the C range if t ≥ 55.
Important note: For the most updated information, please always refers to the course outline announced by the course instructor in Blackboard, which shall prevail the above information if there is any discrepancy.
Syllabus
Part I: Philosophy and Foundation
Introduction: history, philosophy, examples.
Statistical foundation: basic testing and estimation, statistical limiting theorems.
Part II: Rank-type and order-type methods
Location and scale problems: sign test, signed-rank test, rank sum test, Ansari–Bradley test.
Correlation problem: Spearman’s ρ, Kendall’s τ, Bergsma–Dassios’s correlation
Distribution problem: Kolmogorov–Smirnov test, Cram ́er–von Mises test, Anderson-Darling test.
Part III: Resampling-type procedure
Permutation tests: ideas of randomization, examples of permutation tests.
Bootstrap and Subsampling: different bootstrapping methods, Jackknife, Subsampling.
Part IV: Smoothing-type estimation and learning techniques
Density estimation: histogram, kernel method, bandwidth selection.
Nonparametric regression: Nadaraya–Watson kernel estimator, local polynomial estimator.
Other topics: (a) classification, (b) Bayesian nonparametric, (c) rank-type regression, (d) k-nearest neighbor, ...
Lecture Notes
All right reserved. Do not distribute without permission from the author.
Front matters
Part I: Philosophy and Foundation
Part II: Rank-type and order-type methods
Chapter 3: Location and scale problems
Chapter 4: Correlation problem
Chapter 5: Distribution problem
Part III: Resampling-type procedures
Part IV: Smoothing-type estimation and learning techniques
Appendices
Appendix A: Basic Mathematics
Appendix B: Basic probability
Appendix C: Basic Statistics
Appendix D: Basic programming in R --- for students who want to review; read Lectures 2 and 3 in RMSC 1101
Appendix E: R-codes used throughout the courses (this folder will be updated from time to time)
P.S.: Not all materials in the appendices are directly useful for this course. I will tell you which parts are useful when we need them.
Assignments
Assignment 1: Ideas of nonparametric methods & simulation about ranks --- Due: 29 Sep (Fri) @1800
Assignment 2: Theory, simulation and application of rank-type tests --- Due: 9 Oct (Mon) @ 1800
Assignment 3: Rank-type correlations and real-data application --- Due: 24 Oct (Tue) @ 1800
Assignment 4: Distribution test and image-data analysis --- Due: 17 Nov (Fri) @ 1800
Assignment 5: Permutation test and causal inference --- Due: 24 Nov (Fri) @ 1800
Assignment 6: Modified bootstrap procedures for multivariate data --- Due: 1 Dec (Fri) @ 1800
Assignment 7: KDE-based estimation and mixture distribution --- Due: 5 Dec (Tue) @ 1800
In-class notes
*** In-class notes and recordings (if any) will be uploaded within one week after the lecture ***
Download URL: in-class notes & recordings
Lecture 1 (6 Sep) --- Recordings (1: Introduction >>> 2: review and rank theory) N.B.: I am sorry that the only soundtracks are available.
Lecture 2 (13 Sep) --- Recordings (1: Example 2.5 >>> 2: Motivation of rank-type test)
Lecture 3 (20 Sep) --- Recordings (1: five rank-type tests >>> 2: theory of rank-type tests >>> 3: Example 3.5 signed rank test)
Lecture 4 (27 Sep) --- Recordings: (1: revision of rank-type tests and theory >>> 2: Examples 3.8, 3.13, 3.15)
Lecture 5 (4 Oct) --- Recordings: (1: Pearson's correlation >>> 2: Spearman's correlation >>> 3: Kendall's correlation and Bergsma–Dassios’s correlation)
Lecture 6 (11 Oct) --- Recordings: (1: Review and BD correlation >>> 2: Chatterjee correlation)
Lecture 7 (18 Oct) --- Recording: (1: Distribution problem)
Lecture 8 (25 Oct) --- Recordings: (1: Properties and computation of KS test >>> 2: two-sample distribution test and Lilliefors normality test >>> 3: Coding exercise for distribution test)
Lecture 9 (1 Nov) --- Recordings: (1: three types of permutation >>> 2: theory of permutation >>> 3: coding exercise >>> 4: Causal inference)
Lecture 10 (8 Nov) --- Recordings (1: Causal inference >>> 2: Introduction of Bootstrap >>> 3: Why Jackknife works? >>> 4: Why Jackknife doesn't work?) N.B.: I am sorry that the no soundtracks are available.
Lecture 11 (16 Nov) --- Recordings (1: pBoot and spBoot >>> 2: validity of pBoot and spBoot >>> 3: coding example >>> 4: histogram)
Lecture 12 (22 Nov) --- Recordings: (1: oh-notation >>> 2: motivation of KDE >>> 3: theory of KDE) N.B.: parallel computing slides
Lecture 13 (29 Nov) --- Recordings: (1: bandwidth selection >>> 2: introduction to nonparametric regression >>> 3: local linear regression estimator)
Mid-term project
Start time: 27 October (Friday) @ 7:00 pm
End time: 29 October (Sunday) @ 7:00 pm
Duration: 48 hours
Scope: Chapters 1--4
Instructions: The detailed instructions are stated on the first page of the question paper. Some highlights are listed below:
Read the instructions carefully before doing the exam.
Complete the project by yourself without discussion and consultation.
Use any official course materials if you wish.
Submission
Sign the Honor Code, and attach it as a cover of your submitted file.
Submit to Blackboard.
You may submit your answers as many times as you wish, however, only the last submission will be graded.
MOCK mid-term project
REAL mid-term project
Real project (with Suggested solution & students' answers. Note that they may not be error-free.)
Final project
Start time: 8 December (Friday) @ 10:00 am
End time: 11 December (Monday) @ 10:00 am
Duration: 72 hours
Scope: Chapters 1--9
Instructions: The detailed instructions are stated on the first page of the question paper. Some highlights are listed below:
Read the instructions carefully before doing the exam.
Complete the project by yourself without discussion and consultation.
Use any official course materials if you wish.
Submission
Sign the Honor Code, and attach it as a cover of your submitted file.
Submit to Blackboard.
You may submit your answers as many times as you wish, however, only the last submission will be graded.
MOCK final project
Mock 3 (2021 Fall F) *** Most relevant ***
Mock 4 (2022 Fall F) *** Most relevant ***
REAL final project
TBA