Advanced Topics in Machine Learning
Caltech, Spring 2022

Topic: Representation Learning for Science

Representation learning transforms data into representations (also called embeddings, encodings, or features) from which it is easier to extract useful information. Recently, these methods have facilitated progress in a variety of fields, such as medicinal chemistry, ecology, protein synthesis, fluid mechanics, sports analytics, and animal behavior analysis. Here, we will cover a range of methods (autoencoders, graph embedding techniques, symbolic representations, and self-supervised learning) and help students make connections to applications in science.

The goal in this course is for students to be able to:

  • Recognize existing representation learning methods and potential application areas. (Focus of lectures)

  • Effectively apply existing representation learning methods in a pre-defined setting. (Focus of assignments)

  • Formulate challenges in scientific data analysis into appropriate computer science questions. (Focus of project proposal)

  • Explore new techniques and develop methods that work on real-world data from scientific applications. (Focus of final project)

Caltech listing: CS/CNS/EE/IDS 159 (3-0-6) TTh 2:30-4:00. [Piazza] [Gradescope] [Reading material]

Instructors and Teaching Assistants

We'd be happy to discuss any questions/comments/feedback you have throughout the course - feel free to post on Piazza or contact any one of us!

Structure

See Deliverables Schedule and Lecture Schedule for more details.

  • Week 1: Introduction

  • Week 2: Methods for Representation Learning (Assignment 1)

  • Week 3: Symbolic Representations (Assignment 2)

  • Week 4: Self-Supervised Learning and Applications (Assignment 3)

  • Weeks 5 to 10: Research Project and Guest Lectures

Grading

The grading will be based 94% on the final project, and the three assignments are 2% each. The primary goal of the course is to help students think more about and get started on research projects in representation learning, and the assignments are meant to demonstrate different areas of representation learning before choosing a project.

Homeworks

Each student should submit a copy of the homework (assignments 1, 2, 3), but students may work collaboratively. All students should understand all parts of the completed homework. The homeworks are released on a Thursday and due the next Wednesday.

Please use the CS159 Piazza to submit homework-related questions so that instructions can respond promptly and all students have access to the response. Completed homeworks and final projects should be submitted on Gradescope. If you do not have access to the Gradescope by the end of the first week of class, please email an instructor.

Late Policy

A student has a total of 48 late hours for the term for assignments 1,2, and 3. This means that if the first homework is submitted 47 hours late and the second homework is submitted two hours late, the group would incur no penalty on the first homework but score zero on the second. There is no late submission allowed for the final project proposal and report.

Final Project

The final project may be done in groups (recommended group size is 2-3, max is 4), and one final project document should be submitted per group. The deliverables are: a project proposal due on May 1st, and the final project report due at the last day of class (June 3rd).

Students will also have an opportunity to present their project in a poster session on June 2nd.

Reading Material

Additional references are available at: [Reading material]. It is not necessary to read all material in depth for this course - they are compiled to be a resource for students to explore as additional topics in representation learning, which may be useful for the final project and beyond.

Deliverables Schedule