CIS 419/519 Applied Machine Learning (Fall 2021)

Instructor: Dinesh Jayaraman (dineshj [at] seas.upenn.edu)

Course Description

Machine learning has been essential to the success of many recent technologies, including autonomous vehicles, search engines, genomics, automated medical diagnosis, image recognition, and social network analysis, among many others. In this course, we will cover the fundamental concepts and algorithms that enable computers to learn from experience. In the first half, we will largely focus on building up from the simplest machine learning algorithms towards modern deep neural network approaches. In the second half, we will cover approaches to apply machine learning to four key application domains: computer vision, natural language processing, decision making and robotic control, and recommender systems.

The objective of the course is to produce strong and versatile practitioners of machine learning. Lectures will focus largely on developing the mathematical understanding required for this. Through homeworks, tutorials, recitations, and projects, students will have the opportunity to apply that understanding and implement machine learning solutions for various realistic problem settings.

Comparison to CIS 520

Penn CIS offers two different introductory machine learning courses: CIS 419/519 (Applied Machine Learning) and CIS 520 (Machine Learning).

You should take CIS 419/519 (this course!) if:

  • You're interested in understanding how to apply existing machine learning algorithms to new problems. 419/519 introduces various families of ML algorithms appropriate for solving problems involving different types of data.

You should take CIS 520 if:

  • You're interested in pursuing research towards developing new ML algorithms. CIS 520 provides a mathematically rigorous introduction to the statistical foundations and theory of machine learning, which is required for such research.

CIS 519 is NOT a prerequisite for CIS 520. However, it makes little sense to take CIS 519 after having already taken CIS 520. It also makes little sense, but is possible to take CIS 419/519 first and then later take CIS 520.

Required Background

  • Introductory probability and statistics, multi-variable/vector calculus, linear algebra. We will provide primer/refresher documents for these topics, but assume that you are already largely familiar with them from before. This knowledge will be important for understanding machine learning in enough depth to become a strong practitioner, which is the key learning objective of this course. Without it, the lectures will be hard to follow, and you will only be able to acquire a shallow understanding of the material.

  • Programming skills. We will use Python throughout the course. While we will help you pick up Python, if you are not confident of your coding skills in any language at all, be warned that homework for this class could be quite difficult.

Student Communication and Class Links

  • Canvas (for materials release and grades), Piazza (for class and cohort discussion), Gradescope (for work submission)

Schedule

Schedule

Grading Scheme


  • Homeworks (x5): 30%

  • Final Exam: 35%

  • Project (teams of 2 or 3): 20%

  • Weekly lecture-quizzes: 10%

  • Piazza and class discussion participation: 5%

If you're taking the undergraduate version of this course (CIS 419), then you will be evaluated differently on your homeworks and projects. Note that since the two versions have different requirements, you cannot complete the course as CIS 419 and petition afterwards to have it changed to CIS 519 for graduate credit.

Project Format

Projects in this course will focus on picking an existing open source ML project, learning about it, and improving upon it in some way using the tools covered in class. The target will be to produce new contributions in code, technique, analysis, application, or datasets that are of value to others, while also being of value for your own education. The final result will be an (optionally) public document, website, or blog.

Contributions that build on an existing project might include:

  1. code: helping implement it in a new programming language or using new libraries or with higher efficiency, fixing bugs in an existing codebase

  2. application: applying the same algorithm to a new domain,

  3. data: e.g. collecting a new dataset, or more/improved data to augment an existing dataset, or improving data preprocessing,

  4. algorithm: modifying an existing algorithm by, for example, changing the objective function,

  5. technique: e.g., improving the neural net architecture or learning rate scheduling used in a project, better analyzing its sensitivity to hyperparameters or its performance on various inputs etc.

You will be judged both on your demonstration of a solid understanding of ML covered in class, and also on the value of your contributions. Implementing well-known algorithms on widely used datasets is usually a bad strategy for producing something of value.

Archive of Completed Class Projects From Fall 2021

Public Projects Information

Team

  • Dinesh Jayaraman (instructor)

  • Kelly Feng

  • Benedict Florance Arockiaraj

  • Haoyu Wu

  • Dan Gallagher

  • Martin Ricardo del Garcia

  • Sharanya Venkat

  • Maheshwarran Karthikeyan

  • Pratik Kunapuli

  • Jianxiong Cai

  • Alexander Dong




Resources


EMAB Tutoring

Penn's Engineering Master’s Advisory Board (EMAB) has announced Tutoring Program for master’s students in Machine Learning. This program will serve as a resource to help students strengthen their skills in the area. Students can drop in at one of our sessions whenever they need help - no commitment required and free of charge. If a student is interested in this program, they are encouraged to learn about our program at https://pennemab.weebly.com/tutoring.html


Textbook for Mathematics Background (Probability, Calculus, Linear Algebra)

Probability Resources

Linear Algebra Resources

Python Resources

A Couple of Excellent Resources for Hands-On Machine Learning Through Interactive iPython Notebooks

Other Useful Textbooks, Courses, and Lecture Notes

For a more advanced treatment of machine learning topics, I would recommend one of the following books (all freely available online)

Some Useful Articles

Online Machine Learning Communities

Research Conferences (nearly all freely available proceedings)

A list of courses (freely available to varying extents) from around the internet, on machine learning and related topics:

https://deep-learning-drizzle.github.io/index.html

Preprint Servers

Code for published papers

Software

We will be using the following software throughout the course