Research Design for Machine Learning
Cornell, Fall 2025
Research Design for Machine Learning
Cornell, Fall 2025
This course prepares junior PhD students to embark on a successful machine learning research career. Students will learn to develop core skills for machine learning research: identifying open questions, navigating the changing research landscape, designing sound experiments, analyzing results critically, and effectively communicating their work. Topics include learning how to define your research goals, understanding the realities of the publishing process, balancing novelty and feasibility, practicing responsible AI, and setting up collaborations for success. Guest speakers from experts in various ML fields will provide a broader perspective to students, complementing the instructor's own research on application-driven ML and developing AI for science.
The goal in this course is for students to be able to:
Recognize strengths, limitations, potential extensions, and ethical considerations of existing ML works
Formulate well-defined, feasible, and impactful research questions
Design experimental setups to test own research questions
Communicate methods, results, and implications effectively in both written and presentation formats
Cornell listing: CS 6784-002; TTh 1:25-2:40.
Instructors and Teaching Assistants
Jennifer J. Sun (jjs533@cornell.edu). Office Hours: Mondays 2:30 to 3:30pm (location Gates 450)
Akanksha Sarkar (as2637@cornell.edu). Office Hours: Tuesdays 4:30 to 6pm (location Gates G11)
There will be no OH on holidays but please email or Slack us if you have any questions.
We'd be happy to discuss any questions/comments/feedback you have throughout the course - feel free to contact any one of us!
Structure
Week 1: Introductions
Weeks 2-5: Research & publication process, project proposal
Weeks 6 to 10: Dataset design, Communication skills, Research Project + Midterm presentation
Weeks 11 to 15: Common challenges, Other skills, Feedback, Research Project + Final presentation
Grading
The primary goal of the course is to prepare students to be able to design and execute an independent research project in machine learning. The assignments are designed to demonstrate different areas required in a research project (with the final project being a 4 page report in the format of NeurIPS, CVPR, or ICLR). The deliverables are:
10% assignments (breakdown: three total, 3.33% each)
15% 1 to 2 page project proposal
25% midterm progress report (breakdown: 15% 2 to 3 page project progress, 8% in-class mid-term presentation, 2% for 1-minute group updates)
45% final project (breakdown: 25% 4 page final project, 16% in class final presentation, 4% for 1-minute group updates)
5% class participation
Bonus: Up to 2% of the final grade for exceptional class participation.
The page limit does not include references or supplementary materials.
Deliverables
Each student should submit a copy of the assignments (1, 2, and 3), but students may work collaboratively; ensure you give credit to other students that you have collaborated with. All students should understand all parts of the completed homework.
The research proposal + final project may be done in groups (recommended group size is 2-3, max is 4), and one final project document should be submitted per group.
Please use Slack to submit homework-related questions so that instructors can respond promptly and all students have access to the response. Completed homeworks and final projects should be submitted on Gradescope. If you do not have access to the Gradescope by the end of the first week of class, please email an instructor.
Late Policy
A student has a total of 48 late hours for the term (usable on the 3 assignments and the project proposal). This means that if the first homework is submitted 47 hours late and the second homework is submitted two hours late, the group would incur no penalty on the first homework but score zero on the second. There is no late submission allowed for the midterm progress report and the final project report.
Gen AI Policy
Generative AI tools, such as ChatGPT or other language models, are permitted for use in this course. However, it's essential to adhere to the following:
Attribution is required. Whenever you incorporate output from a generative AI tool, clearly indicate its source and usage. This will not remove credit from your work; it is helpful for us to understand how students may or may not be leveraging these tools.
Assignments & projects are designed for your learning. While AI may be helpful, ensure you fully grasp the concepts behind the work.
Future implications. The foundations you establish in this course will be crucial for success in future research projects, write-ups, presentations, and jobs .
Reading Material
Please use your Cornell netid login for access to assignments:
Academic integrity
Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Collaborations are allowed only if explicitly permitted. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated. Respectful, constructive and inclusive conduct is expected of all class participants.
This website is based on Representation Learning for Science