Statistical Learning: Methodology and Theory

Class Schedule:

Lectures : T Th 11:45am-1pm at Old Chem 025

Office Hours : T 2-3pm, lounge beside 118 (I should be in 118A, please call me)

Syllabus

Quiz Schedule:

Q1 - September 21, 2017

Q2 - October 19, 2017

Q3 - November 28, 2017

On these days, we will have a 60min lecture, followed by a (15-20)min quiz

Topics for Project

Mid-sem deadline for project : October 15, 2017 (extended)

Deadline for project : November 30, 2017

Project presentation : December 5 and 7, 2017 from 11:45am-1pm at Old Chem 025 with a 15min presentation by each student

Books:

Hastie, Tibshirani and Friedman

Devroye, Gyorfi and Lugosi

Gyorfi

Pollard (go to Books)

Asymptotic Statistics by A. W. van der Vaart (check Duke library for a copy)

Statistics for High-Dimensional Data: Methods, Theory and Applications by Peter Bühlmann and Sara van de Geer

Assignments:

1, 2, 3, 4

Notes:

Course Note 1 and 2 (Thanks to Xu Chen)

Course Note (Thanks to Jialiang Mao)

Consistency in Logistic Regression (also see in the 'Papers' below)

Regularized Logistic

Lyapounov's CLT

Useful Links:

Convergence Diagram

Introduction to ML

Mixture modelling

Fisher's LDA

Mahalanobis distance and Elliptic distributions

Glivenko-Cantelli

Logistic Regression

Sub-gradient (check example in Section 3.4)

Long talk by Buhlmann

LASSO by Hoff

Linear Maps and Orthogonal Projections

Infinite VC

Density Estimation

SiZeR

KDE and Convolutions

Super Kernels

Histogram in R

Parzen Window video

The Parzen window estimate of a pdf (thin black line) matches with the actual pdf (thicker blue line). The histogram of the actual data points are shown in light gray in the background.

Histogram vs. Parzen window

Non-parametric Regression

Note on L2 spaces

Note on in P and monotonicity

Consistency of kNN (check p.2 onwards)

Proof of Stone's (1977) result

High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality

Papers:

The use of multiple measurements in taxonomic problems by R. A. Fisher

On the generalized distance in statistics by P. C. Mahalanobis

Logistic Regression: Consistency - I

Logistic Regression: Consistency - II

Generative vs Discriminative

Foster and George

Best Subset Selection With $l_0$ penalty and Some comments

LASSO Uniqueness

RE Condition

LASSO for Logistic Regression - I and II

Survey of Consistency of KDE

Optimal Smoothing in Kernel Discriminant Analysis

On Error-rate Estimation in Nonparametric Classification

Loftsgaarden and Quesenberry (1965)

Parzen 1962

Silverman 1984

Cover and Hart 1967

Stone (1977)

Consistency in NP Regression

Rate of convergence of kNN

Geometric representation of HDLSS data

NN for HDLSS data

Coffee with everyone!