CIS 419/519 Applied Machine Learning (Spring 2021)

CIS 419/519 Applied Machine Learning (Fall 2021)

Instructor: Dinesh Jayaraman (dineshj [at] seas.upenn.edu)

Course Description

Machine learning has been essential to the success of many recent technologies, including autonomous vehicles, search engines, genomics, automated medical diagnosis, image recognition, and social network analysis, among many others. In this course, we will cover the fundamental concepts and algorithms that enable computers to learn from experience. In the first half, we will largely focus on building up from the simplest machine learning algorithms towards modern deep neural network approaches. In the second half, we will cover approaches to apply machine learning to four key application domains: computer vision, natural language processing, decision making and robotic control, and recommender systems.

The objective of the course is to produce strong and versatile practitioners of machine learning. Lectures will focus largely on developing the mathematical understanding required for this. Through homeworks, tutorials, recitations, and projects, students will have the opportunity to apply that understanding and implement machine learning solutions for various realistic problem settings.

Comparison to CIS 520

Penn CIS offers two different introductory machine learning courses: CIS 419/519 (Applied Machine Learning) and CIS 520 (Machine Learning).

You should take CIS 419/519 (this course!) if:

You're interested in understanding how to apply existing machine learning algorithms to new problems. 419/519 introduces various families of ML algorithms appropriate for solving problems involving different types of data.

You should take CIS 520 if:

You're interested in pursuing research towards developing new ML algorithms. CIS 520 provides a mathematically rigorous introduction to the statistical foundations and theory of machine learning, which is required for such research.

CIS 519 is NOT a prerequisite for CIS 520. However, it makes little sense to take CIS 519 after having already taken CIS 520. It also makes little sense, but is possible to take CIS 419/519 first and then later take CIS 520.

Required Background

Introductory probability and statistics, multi-variable/vector calculus, linear algebra. We will provide primer/refresher documents for these topics, but assume that you are already largely familiar with them from before. This knowledge will be important for understanding machine learning in enough depth to become a strong practitioner, which is the key learning objective of this course. Without it, the lectures will be hard to follow, and you will only be able to acquire a shallow understanding of the material.
Programming skills. We will use Python throughout the course. While we will help you pick up Python, if you are not confident of your coding skills in any language at all, be warned that homework for this class could be quite difficult.

Student Communication and Class Links

Canvas (for materials release and grades), Piazza (for class and cohort discussion), Gradescope (for work submission)

Schedule

Grading Scheme

Homeworks (x5): 30%
Final Exam: 35%
Project (teams of 2 or 3): 20%
Weekly lecture-quizzes: 10%
Piazza and class discussion participation: 5%

If you're taking the undergraduate version of this course (CIS 419), then you will be evaluated differently on your homeworks and projects. Note that since the two versions have different requirements, you cannot complete the course as CIS 419 and petition afterwards to have it changed to CIS 519 for graduate credit.

Project Format

Projects in this course will focus on picking an existing open source ML project, learning about it, and improving upon it in some way using the tools covered in class. The target will be to produce new contributions in code, technique, analysis, application, or datasets that are of value to others, while also being of value for your own education. The final result will be an (optionally) public document, website, or blog.

Contributions that build on an existing project might include:

code: helping implement it in a new programming language or using new libraries or with higher efficiency, fixing bugs in an existing codebase
application: applying the same algorithm to a new domain,
data: e.g. collecting a new dataset, or more/improved data to augment an existing dataset, or improving data preprocessing,
algorithm: modifying an existing algorithm by, for example, changing the objective function,
technique: e.g., improving the neural net architecture or learning rate scheduling used in a project, better analyzing its sensitivity to hyperparameters or its performance on various inputs etc.

You will be judged both on your demonstration of a solid understanding of ML covered in class, and also on the value of your contributions. Implementing well-known algorithms on widely used datasets is usually a bad strategy for producing something of value.

Archive of Completed Class Projects From Fall 2021

Public Projects Information

Team

Dinesh Jayaraman (instructor)
Kelly Feng
Benedict Florance Arockiaraj
Haoyu Wu
Dan Gallagher
Martin Ricardo del Garcia
Sharanya Venkat
Maheshwarran Karthikeyan
Pratik Kunapuli
Jianxiong Cai
Alexander Dong

Resources

EMAB Tutoring

Penn's Engineering Master’s Advisory Board (EMAB) has announced Tutoring Program for master’s students in Machine Learning. This program will serve as a resource to help students strengthen their skills in the area. Students can drop in at one of our sessions whenever they need help - no commitment required and free of charge. If a student is interested in this program, they are encouraged to learn about our program at https://pennemab.weebly.com/tutoring.html

Textbook for Mathematics Background (Probability, Calculus, Linear Algebra)

Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. 2020. Mathematics for Machine Learning. Cambridge University Press.

Probability Resources

Probability Review

Linear Algebra Resources

Python Resources

A Couple of Excellent Resources for Hands-On Machine Learning Through Interactive iPython Notebooks

Other Useful Textbooks, Courses, and Lecture Notes

Machine Learning by Tom Mitchell, McGraw Hill, 1997. (On reserve in Penn library)
A Course in Machine Learning by Hal Daumé III.
Machine Learning Lecture Notes by Andrew Ng.
Machine Learning for Intelligent Systems by Kilian Weinberger.
Google Machine Learning Crash Course
Reinforcement Learning: An Introduction by Sutton and Barto, MIT Press, 1998. (Full text available online; on reserve in Penn library)

For a more advanced treatment of machine learning topics, I would recommend one of the following books (all freely available online)

Pattern Recognition and Machine Learning by Bishop, Springer, 2006.
Machine Learning: A Probabilistic Perspective by Kevin P. Murphy, MIT Press, 2021.
The Elements of Statistical Learning 2nd edition by Hastie, Tibshirani and Friedman, Springer-Verlag, 2008.
Convex Optimization by Stephen Boyd and Lieven Vandenberghe, Cambridge University Press, 2004.
Information Theory, Inference, and Learning Algorithms by David Mackay, Cambridge University Press, 2003.
Deep Learning by Yoshua Bengio, Ian Goodfellow, and Aaron Courville.

Some Useful Articles

Article by David Mimno on Data Pre-Processing

Online Machine Learning Communities

Research Conferences (nearly all freely available proceedings)

NeurIPS: Neural Information Processing Systems
ICML: International Conference on Machine Learning
ICLR: International Conference on Learning Representations
KDD: Knowledge Discovery and Data Mining
Computer Vision: CVPR, ICCV, ECCV
Natural Language Processing: ACL, EMNLP, NAACL

A list of courses (freely available to varying extents) from around the internet, on machine learning and related topics:

https://deep-learning-drizzle.github.io/index.html

Preprint Servers

Code for published papers

https://paperswithcode.com/

Software

We will be using the following software throughout the course

Python : we'll be using python throughout the course to implement various ML algorithms and run experiments
- Google Developer Python Tutorial (highly recommended as a way to master python in just a few hours!)
- NumPy Tutorial (also highly recommended!)
- Python tutorial (work at least through section 5; skip sections 2, 3.1.3)
- Python quick reference
Google Colab
Scikit-learn machine learning in Python
Pytorch deep learning library