Lectures

Class sessions take place virtually via Zoom (links on Canvas).

All sessions will be recorded. Recordings and lecture slides are available to registered students on Canvas.

All lecture slides available here (Brown only)

Week 1

Lecture 1: Introduction to Data Science & EEPS 1960D

Lecture Slides:

Readings

  • [3 pages] Blei and Smyth (2017), Science and Data Science. PNAS. [link] (Open access)

Week 2

Lecture 2: Introduction to Machine Learning

Lecture Slides

Readings

  • [optional] ISL: Chapter 2 (ISL: Introduction to Statistical Learning with Applications in R -- see Resources page)

  • [optional] Electoral Precedent (xkcd)


Lecture 3: Mathematical Foundations

Lecture Slides

  • Mathematical Foundations of Machine Learning [outline] [annotated]* (Brown only) *updated after Lecture 4

Pre-lecture activities

  1. [10 minutes] Watch this video ("Vectors, what even are they?") to review the concept of a vector. All students should watch this, even if you believe you are already familiar with the concept of a vector.

  2. Review additional linear algebra topics, as necessary. When deciding which topics to review, ask yourself whether you are familiar with both the "Physics student" view and the "Computer Science student" view of the concept and have the ability to translate between them. [Credit: linear algebra review videos were created by Grant Sanderson]

  3. Review calculus topics, as necessary.

  4. Review probability and statistics topics, as necessary.

Additional resources

Week 3

Lecture 4: Clustering Part 1

Lecture Slides

Resources


Lecture 5: Clustering Part 2

Lecture Slides

Resources

Week 4

Lecture 6: Dimensionality Reduction, Part 1

Lectures Slides [Recordings available under Media Library on Canvas]

Resources


Lecture 7: Dimensionality Reduction (Part 2) and Case study

Lectures Slides [Recordings available under Media Library on Canvas]

Resources

Week 5

Lecture 8: Introduction to Supervised Learning

Lecture Slides

Resources

  • ISL Chapter 2: Statistical Learning

Week 6

Lecture 9: KNN and Linear Regression

Lecture Slides

Resources


Lecture 10: Cross-validation and Regularization

Lecture Slides

Resources

  • ISL Section 5.1: Cross-validation

  • Cross-validation example: KNN for Iris Dataset [opens Colab Notebook]

  • ISL Section 6.2: Shrinkage Methods

  • Chapter 2: The Lasso for Linear Models, in Statistical Learning with Sparsity: The Lasso and Generalizations [pdf] (by Hastie, Tibshirani and Wainwright)

Week 7

Lecture 11: Regularization (Part 2)

Lecture Slides

Resources


Lecture 12: Logistic Regression & Support Vector Machines (SVMs)

Lecture Slides

Resources

  • ISL Section 4.3: Logistic Regression

  • ISL Chapter 9: Support Vector Machines

  • Support Vector Machines interactive demo (by Jonas Greitemann)

Week 8

Lecture 13: SVMs and Classification Metrics

Lecture Slides

Resources


Lecture 14: Artificial Neural Networks

Lectures

Resources

Week 9

Lecture 15: Decision Trees and Ensemble Methods, Part 1

Lecture Slides

Resources


Lecture 16: Decision Trees and Ensemble Methods, Part 2

Lecture Slides

Resources

Week 10

Lecture 17: Machine Learning Review, Tricks & Tips

Lecture Slides

Resources

  • Reading: Domingos (2012). A Few Useful Things to Know About Machine Learning [pdf], Communications of the ACM.


Lecture 18: Gaussian Process Regression

Lecture Slides

Resources

Week 11

Lecture 19: Kaggle Competition and GPR examples

Lecture Slides

Resources

  • Domingo et al. (2020). Using Ice Cores and Gaussian Process Emulation to Recover Changes in the Greenland Ice Sheet During the Last Interglacial. JGR Earth Surface. [link]

  • How to (almost) win Kaggle Competitions (by Yanir Seroussi)


Lecture 20: Feature Engineering & Feature Selection

Lecture Slides

Resources

Week 12

Lecture 21: Machine Learning Failures and Unintended Harms

Lecture Slides

Resources

  • Suresh & Guttag (2020). A Framework for Understanding the Unintended Consequences of Machine Learning. arXiv:1901.10002

  • Lehman et al (2019). The Surprising Creativity of Digital Evolution. arXiv:1803.03452

  • Kaufman et al. (2012). Leakage in data mining: Formulation, detection and avoidance. ACM Transactions on Knowledge Discovery from Data [pdf].

  • Muller et al. (2019). How Data Science Workers Work with Data. Proceedings of the Conference on Human Factors in Computing Systems [pdf].

  • Jia et al. (2019). Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature [link] [popular science article].

  • AI Incident Database

  • Calling Bullsh*t in the Age of Big Data (course developed by C. Bergstrom & J. West @ UW-Seattle) [videos] [website]

  • fast.ai short course: Practical Data Ethics

  • Documentary: Coded Bias (available on PBS and Netflix), directed by Shalini Kantayya.


Lecture 22: In-class activity

Lecture Slides

Resources

  • Bender, Gebru et al. (2021). On the Dangers of Stochastic Parrots. FAccT [pdf].

  • Loft (2020). Earth system modeling must become more energy efficient. Eos [link].

  • Green Algorithms [website] [paper]

Week 13

Lecture 23: Deep Neural Networks

Lecture Slides

Resources


Lecture 24: Big Data

Lecture Slides

Resources