Instead of a final exam, you will do a course project. This will give you some practice doing research. This is a significant component of the course and requires some careful thought and planning. It cannot be left until the end of the semester.
- 9/14: deadline to send project team names to instructor. Get together with your team and start brainstorming. The project requires serious thought and planning.
- 10/8 (or sooner): After you have chosen a topic and done some serious thinking and planning, set a time to meet with instructor about your proposed project project well before the proposal is due. The sooner the better, in order to get instructor feedback and find a viable topic.
- 11/2: Project proposal due. Note: you must get instructor feedback on your project topic prior to this. This proposal is a commitment to exactly what you are working on. This is a helpful planning tool for your team. You should already be working on your project by now; do not expect instructor feedback after the proposal has been submitted.
- 12/14: Project write-up due by 5pm. The style is that of a research paper. Do not send code, or exhaustive numeric results of every experiment run. See papers from the conferences below, for the style of writing, and what the experiments section typically includes.
- 12/17: Project Presentation during evening Poster Session [instead of Final Exam]
There are no strict guidelines, but please be concise. If you're using latex, then aim for at most 5 pages in the NeurIPS format.
The goal of the course project is to give students the opportunity to engage in original machine learning research. Students may work on open problems that have been revealed during our study of Machine Learning, or do a project comparing several different algorithms. Students should strive for the same quality and quantity of results in publications at the top machine learning conferences. (For examples, see proceedings of NIPS, ICML, AISTATS, and ICLR, or the top Data Mining conference, KDD. For other application areas that use machine learning techniques, the top Vision conferences are ICCV and CVPR, and the NLP conferences are ACL NAACL and EMNLP. Note that the closer this goal is met, the more likely this project could lead to a publication.)
Due to high enrollments, projects must be done in teams, not individually. Projects will be done in groups of 3 students.
Types of projects:
- Algorithms and data structures:
- proposing a new algorithm for an existing problem, and showing good empirical performance through experiments
- proposing a new data structure for an existing problem, and showing good empirical performance through experiments
- any hybrid of the above two items, and items from the Theory section, or items from the Applications section.
- Applications:
- Designing a machine learning algorithm, motivated by a specific application (or real-world data set), which exhibits good empirical performance on data for that application (or data set).
- Adapting an existing machine learning to a new application (with good justification), and showing good empirical performance on data for that application.
- Running thorough experiments with a variety of algorithms for the same machine learning task (examples include classification, regression, clustering, etc.). Typically multiple data sets should be used for a full comparison, and the experiments should study algorithmic questions by varying different aspects (beyond parameter tuning, which should also be done, following principles learned in class). For example, a k-nearest-neighbor project could experiment with a variety of different techniques for building the NN data structure.
- Theory: original theoretical results for machine learning. Examples include:
- negative results, or lower bounds for a problem
- clearly formulating or characterizing a problem that has not been theoretically analyzed before, and providing preliminary analyses, e.g. of an existing algorithm for the problem (such as one used in practice), or of a new algorithm for the problem
- proposing a new algorithm for an existing problem and proving performance guarantees
- proposing a new data structure for an existing problem and proving performance guarantees
RESOURCES:
Data:
There are MANY data sets, for a variety of problems, available from:
Code:
You do not need to implement algorithms yourself, you can use existing implementations if they are available. You do not have to use Python. Some trusted sources for ML implementations include:
and there are many other sources.
Academic Integrity:
As a reminder about academic integrity, plagiarism on the course project will not be tolerated. At the university-level, you should already be well aware of correct practices. Here are some slides on plagiarism, by Prof. Barba.