CME 250: Introduction to Machine Learning   

Announcements

  • 4/19: The data collection survey for the final project has been created and can be found here. Note that it is mandatory for students enrolled in this class to fill out the survey. If you are auditing we still encourage you to fill out the survey to create a richer dataset for the final project. The survey must be completed by Thursday April 21st, 11:59pm.
    • To confirm your completion of the survey, send a screen shot of the completion screen with to cme250-spr1516-staff [at] lists.stanford.edu  with the email subject "CME 250 PROJSURV".
  • 4/5: The first online quiz and applied exercises have been posted to the website. 
  • 3/28: First class Tuesday April 5.
  • 3/28: Welcome to CME 250!

Schedule

  • Tuesday and Thursday, 4:30 - 5:50pm (with a total of 8 sessions): April 5, 7, 12, 14, 19, 21, 26 and 28.
  • Location: Shriram Ctr BioChemE 104 (map)
Lecture Topics (tentative)

Lecture 1 (April 5)

Introduction to Machine Learning
Lecture 3 (April 12)

More Unsupervised Learning Techniques and Imputation
Lecture 5 (April 19)

Penalties, Sparsity and Regularization (lasso, ridge) and Cross-validation
Lecture 7 (April 26)

Classification and Regression Trees (CART) and the Bootstrap
Lecture 2 (April 7)

Clustering Algorithms and Dimensionality Reduction (PCA, ICA)
Lecture 4 (April 14)

Linear & Logistic Regression, related methods
Lecture 6 (April 21)

Support Vector Machines (SVM)
Lecture 8 (April 28)

Ensemble Methods (Random Forests) and Neural Networks


Course Description


A four week short course presenting the principles behind when, why, and how to apply modern machine learning algorithms. We will discuss a framework for reasoning about when to apply various machine learning techniques, emphasizing questions of overfitting/underfitting, regularization, interpretability, supervised/unsupervised methods, and handling of missing data. 


The principles behind various algorithms—the why and how of using them—will be discussed, while some mathematical detail underlying the algorithms—including proofs—will not be discussed. 


Unsupervised machine learning algorithms presented will include k-means clustering, principal component analysis (PCA), and independent component analysis (ICA). Supervised machine learning algorithms presented will include support vector machines (SVM), neural nets, classification and regression trees (CART), boosting, bagging, and random forests. Imputation, the lasso, and cross-validation concepts will also be covered. The R programming language will be used for examples, though students need not have prior exposure to R.


Contact

To contact the course instructors please use the staff email list:
  • cme250-spr1516-staff [at] lists.stanford.edu

Instructors
  • Gabriel Maher - gdmaher [at] stanford [dot] edu
  • Alexander Ioannidis - ioannidis [at] stanford [dot] edu

Course Requirements 
  • Course is 1 unit, graded Satisfactory / No Credit.
  • To receive credit, students must 
    1. complete a subset of short exercises, and 
    2. complete an anonymous online competition to gain practical experience applying course concepts to a real data set. 

Prerequisites


Course assumes no prior background in machine learning. Previous exposure to undergraduate-level mathematics (calculus, linear algebra, statistics) and basic programming (R/Matlab/Python) helpful.