Advanced Topics in Machine Learning
Caltech, Spring 2022

Students are required to conduct and submit a research project as well as present an accompanying poster at the poster session on June 2nd. Students may work on projects in groups of up to 4 people.

The project can be on any topic related to representation learning for science. This is not limited to the papers covered in lecture (see additional references). The purpose of the project is to explore new techniques and develop methods that work on real-world data from scientific applications. While we encourage you to explore new research directions, projects may also focus on reproducing or comparing previous approaches, perhaps investigating various theoretical or design choices. If you are a graduate student currently working on a related topic, you may use your research for the project.

Generally, projects will fall into one of the following categories:

  • Pure Theory: study proof techniques, try to extend a proof, or apply a proof to a new setting.

  • Algorithms & Models: extend an algorithm or model, design a new one, or adapt one for a new setting.

  • Applications: identify the correct assumptions for approaching a new setting, experimentally validate these assumptions.

We have prepared a list of example research topics below. Feel free to use one of these if you are having difficulty settling on a topic.

Project Ideas

  • Start from an open-source scientific dataset (ex: molecular properties, behavior analysis, species classification), identify existing baselines, and explore whether you can improve performance on downstream tasks using different representation learning strategies

  • Design different learning tasks to train representations, and evaluate performance on downstream tasks, such as accuracy and data efficiency

  • Investigate generalization capabilities of learned representations on different tasks (ex: train on a related dataset, and test on a different dataset)

  • Identify properties that you want in your representation space (ex: invariance to different factors, disentangling different factors, etc.), and evaluate how well your model achieves the desired properties

  • Jointly do generative modeling and representation learning, visualize both generative model performance and representations

  • If you are interested in specific projects by the TAs, you can contact them for advice.

Project Proposal

    • Due May 1st.

    • The project proposal is a mechanism to get you to start thinking about your project. It also allows the TAs to give you feedback on potential challenges. It is non-binding; you can change the direction of the project after the proposal submission.

    • Upload your project proposal to Gradescope (one per team).

    • Project proposals can be 3 pages maximum, excluding references.

    • We recommend the following structure for the project proposal:

      • Introduction of research area: Motivate why this area is interesting and describe key technical challenges.

      • Overview of previous work: Give an overview of different types of approaches and highlight assumptions of previous work. This should not be a laundry list of previous papers. Rather, it should describe previous approaches in broad strokes.

      • Research question(s): Describe one or more research questions to be addressed. The research question should be a high-level description of the project topic.

      • Plan of attack: Describe a preliminary plan to address the research question(s), and outline any experimental set-ups, such as datasets, models, etc. The plan of attack should extend the research question(s) to the details of the project.

Project Report

  • Due June 3rd.

  • The project report is expected to be a mini research paper, like the ones that we have read in class. You may not have had time to explore all of the relevant aspects of the project or even improve upon existing methods. However, the project report should demonstrate that you put careful thought and consideration into how you have approached the problem setting.

  • Upload your project proposal to Gradescope (one per team).

Formatting

  • There is no page limit. The main report will likely be around 6-10 pages, but you may use additional pages for references, appendices, etc.

  • Use the NeurIPS LaTeX format, and use the "preprint" option when compiling your report.

  • Use vector graphics (e.g., SVG, EPS, PDF) for figures wherever possible.

Grading

There is not a specific grading rubric for the project report. However, we will look for the following factors:

  • How much thought and consideration did you put into the problem setting? Is the problem setting sufficiently challenging or interesting to explore? Have you adequately motivated the research direction?

  • How much thought and consideration did you put into your approach? Did you correctly identify the previous approaches and their limitations? Did you attempt to address theses limitations through your work? Did you note key obstacles in implementing your approach? Even if your results were not successful, did you try enough things and reflect on why they did not work? What would you have done differently given more time and resources?

  • How much thought and consideration did you put into your investigation? Did you use reasonable performance metrics? Did you compare with appropriate baselines? Did you provide insight and visualize key aspects of your approach (using tables, figures, toy examples)? For any theoretical components of the project, have you identified the crucial assumptions and limitations?

Structure

You can structure the report however you see fit. We recommend the following structure:

    • Introduction: Motivate why this area is interesting, describe key technical challenges, and give a short summary of how you have addressed these challenges.

    • Background and previous work: Briefly review any necessary background material. Give an overview of previous approaches, making sure to highlight their assumptions or limitations. This should not be a laundry list of previous papers.

    • Research question and approach: If not already described in the previous sections, outline the research question(s) that you are addressing. Describe the approach(es) that you considered, providing justification for your choice. If necessary, you may use this section to introduce any proofs or derivations, which will be included in the appendices.

    • Experiments and results: If there is an empirical component to your project, describe the experimental set-up in this section. You may include additional details in the appendices if necessary. Make sure to describe the data, models, training scheme, performance metrics, etc. Describe the experimental results, referencing relevant figures and tables.

    • Discussion and conclusion: Restate the research question, approach, and main findings. Describe any interesting implications of your project, outlining possible directions for future research.