Class time: W 0930-1215
Location: YC Liang Hall 104
Outline: 2024Fall_S3005_outline.pdf
Password: see Blackboard
Name: Kin Wai CHAN
Email: kinwaichan@cuhk.edu.hk
Office: LSB 115
Tel: 3943 7923
Office hour:
I have an open-door policy. Feel free to drop by anytime and ask me questions.
Cheuk Hin (Andy) CHENG
Email: andychengcheukhin@link.cuhk.edu.hk
Office: LSB G32
Tel: 3943 8535
Yi Ho (Henry) NGAN
Email: yihongan@link.cuhk.edu.hk
Office: LSB G30
Tel: 3943 8534
This course introduces a wide variety of nonparametric techniques for performing statistical inference and prediction, emphasizing both conceptual foundations and practical implementation. Basic theoretical justification is also provided. The content covers three broad themes: (i) rank-type and order-type methods for handling location, dispersion, correlation, distribution and regression problems, (ii) resampling-type procedures for testing and assessing precision, and (iii) smoothing-type techniques for estimation and prediction. Topics include Wilcoxon signed-rank test, Mann-Whitney rank sum test, Spearman’s rho, Kendall’s tau, Kruskal-Wallis test, Kolmogorov-Smirnov test, bootstrapping, Jackknife, subsampling, permutation tests, kernel method, k-nearest neighbour, tree-based method, classification, etc.
Note: No prerequisite but knowledge of Stat 2001, 2005 and 2006 is strongly recommended.
A self-contained lecture note is the main source of reference. Complementary textbooks include
(Major) Bonnini, S., Corain, L., Marozzi, M., and Salmaso, L. (2014). Nonparametric hypothesis testing: rank and permutation methods with applications in R. Wiley.
(Major) Wasserman, L. (2006). All of nonparametric statistics. Springer.
(Minor) Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.
(Minor) James, G., Witten, D., Hastie, T., and Tibshirani, R (2013). An Introduction to Statistical Learning: with Applications in R. Springer.
Upon finishing the course, students are expected to
appreciate the beauty of nonparametric methods;
apply a wide variety of nonparametric techniques to perform inference, prediction and learning tasks;
understand the pros and cons of parametric and nonparametric methods;
master the skills in deriving basic theoretical properties of nonparametric methods;
use computer programs to perform nonparametric statistical analysis for real-life problems.
There are three main assessment components, plus a bonus component.
a (out of 100) is the average score of approximately eight assignments with the lowest two scores dropped;
m (out of 100) is the score of mid-term project; and
f (out of 100) is the score of final project.
b (out of 2) is the bonus points, which will be given to students who actively participate in class.
The total score t (out of 100) is given by
t = min{100, 0.3a + 0.2max(m,f) + 0.5f + b}
If min(t, f ) < 30, the final letter grade will be handled on a case-by-case basis. Otherwise, your letter grade will be in the A range if t ≥ 85, at least in the B range if t ≥ 65, at least in the C range if t ≥ 55.
* For the most updated information, please always refers to the course outline announced by the course instructor in Blackboard, which shall prevail the above information if there is any discrepancy.
Introduction: history, philosophy, examples.
Statistical foundation: basic testing and estimation, statistical limiting theorems.
Location and scale problems: sign test, signed-rank test, rank sum test, Ansari–Bradley test.
Correlation problem: Spearman’s ρ, Kendall’s τ, Bergsma–Dassios’s correlation, Chatterjee correlation
Distribution problem: Kolmogorov–Smirnov test, Cram ́er–von Mises test, Anderson-Darling test.
Permutation tests: ideas of randomization, examples of permutation tests.
Bootstrap and Subsampling: different bootstrapping methods, Jackknife, Subsampling.
Density estimation: histogram, kernel method, bandwidth selection.
Nonparametric regression: Nadaraya–Watson kernel estimator, local polynomial estimator.
Other topics: (a) classification, (b) Bayesian nonparametric, (c) rank-type regression, (d) k-nearest neighbor, ...
* Click (S3005/2024Fall/lecture) to download lecture notes (or click the individual links below).
* The finalized version of the notes will be uploaded one day before the lecture.
* All rights reserved by the authors. Re-distribution by any means is strictly prohibited.
Front matters
Part I: Philosophy and Foundation
Part II: Rank-type and order-type methods
Chapter 3: Location and scale problems
Chapter 4: Correlation problem
Chapter 5: Distribution problem
Part III: Resampling-type procedures
Part IV: Smoothing-type estimation and learning techniques
Appendices
Appendix A: Basic Mathematics
Appendix B: Basic probability
Appendix C: Basic Statistics
Appendix D: Basic programming in R --- for students who want to review; read Lectures 2 and 3 in RMSC 1101
Appendix E: R-codes used throughout the courses can be found in the lecture folder (this folder will be updated from time to time).
P.S.: Not all materials in the appendices are directly useful for this course. I will tell you which parts are useful when we need them.
* Click (S3005/2024Fall/A) to download assignments.
Assignment 1: Concepts of nonparametric inference and properties of ranks --- Due: 27 Sep (Fri) @1800
Assignment 2: Real-data application of rank-type tests, theory of rank-type tests --- Due: 11 Oct (Fri) @ 1800
Assignment 3: Rank-type correlations and real-data application --- Due: 25 Oct (Fri) @ 1200
Assignment 4: distribution test and cancer cell image data analysis --- Due: 15 Nov (Fri) @ 1800
Assignment 5: causal inference with permutation test --- Due: 22 Nov (Fri) @ 1800
Assignment 6: bootstrap variance & CI, and a new type of bootstrap CI --- Due: 2 Dec (Mon) @ 1800
Assignment 7: kernel smoothing --- Due: 9 Dec (Mon) @ 1800
* Click (S3005/2024Fall/inclassNote) and (S3005/2024Fall/recording) to download in-class notes and recordings (if any).
* In-class notes and recordings (if any) will be uploaded within one week after the lecture.
Lecture 1 (4 Sep) --- Review of statistics, theory of rank and signs
Lecture 2 (11 Sep) --- Properties of rank, rank/sign-type tests
Lecture 3 (25 Sep) --- Theory of rank-type tests, rank-sum test
Lecture 4 (2 Oct) --- Methods of rank-type tests (sign test, signed-rank test, rank-sum test, trend test, Ansari–Bradley test), implementation
Lecture 5 (9 Oct) --- Strong/weak correlation, Pearson's correlation, Spearman's correlation, Kendall's correlation, Bergsma-Dassios's correlation
Lecture 6 (16 Oct) --- Bergsma-Dassios's correlation, Chatterjee's correlation, power curve
Lecture 7 (23 Oct) --- Coding exercise related to correlation, distribution test
Lecture 8 (30 Oct) --- KS, CvM, AD distribution test, 2-sample test, Lilliefors normality test
Lecture 9 (6 Nov) --- permutation test, coding exercise, causal inference
Lecture 10 (13 Nov) --- Jackknife, bootstrap variance estimator, bootstrap confidence interval
Lecture 11 (20 Nov) --- percentile CI, studentized pivotal CI, bias-corrected & accelerated CI, proof for bootstrap CI, histogram and its theory
Lecture 12 (27 Nov) --- KDE, optimal bandwidth, rule of thumb bandwidth
Lecture 13 (2 Dec) --- cross-validation & plug-in bandwidth, nonparametric regression, N-W kernel estimator, local polynomial estimator
* Click (S3005/2024Fall/quiz) to download quizzes.
Quiz 1: Properties of rank statistics and simulation
Quiz 2: A new rank-type test
Quiz 3: correlation test
Start time: 25 October (Friday) @ 7:00 pm
End time: 27 October (Sunday) @ 7:00 pm
Duration: 48 hours
Scope: Chapters 1--4
Instructions: The detailed instructions are stated on the first page of the question paper. Some highlights are listed below:
Read the instructions carefully before doing the exam.
Complete the project by yourself without discussion and consultation.
Use any official course materials if you wish.
The project will be graded according to the criterion referencing scheme (see the last page of mock solution).
Submission
Compile your answers in a single ".pdf" file (i.e., not MS words, jpeg, zip, etc).
Sign the Honor Code, and attach it as a cover of your submitted file.
Name the document in the format S4010_M_sid_name.pdf, e.g., S4010_M_1155001234_ChanKinWai.pdf.
Submit to Blackboard. You may submit your answers as many times as you wish, however, only the last submission will be graded.
All plots, numerical answers, simulation results, etc must be included in the written part. Graders will not run your submitted codes to check the answers.
Mock mid-term projects
MOCK mid-term project (with suggested solutions & students' answers. Note that they may not be error-free.)
REAL mid-term project
2024Fall_S3005_M (password will be sent to your CUHK email sharply at the project start time)
Start time: 13 December (Friday) @ 10:00 am
End time: 16 December (Monday) @ 10:00 am
Duration: 72 hours
Scope: Chapters 1--9
Instructions: The detailed instructions are stated on the first page of the question paper. Some highlights are listed below:
Read the instructions carefully before doing the exam.
Complete the project by yourself without discussion and consultation.
Use any official course materials if you wish.
Submission
Sign the Honor Code, and attach it as a cover of your submitted file.
Submit to Blackboard.
You may submit your answers as many times as you wish, however, only the last submission will be graded.
MOCK final project
REAL final project
TBA