In the following, you will find a detailed description of all the final projects for the course.
Underlined in the “Description” section of each project you find the main goal of each project. This corresponds to the minimal requirements to obtain a passing grade at the final evaluation (i.e. 5.5), in absence of any other results.
Every project contains an “Ideas for research directions” section in which multiple research questions related to the project are outlined. Students aiming at higher marks will be required to attack at least one of these questions. We want to emphasize that a thorough analysis for a single direction is more valuable than superficial results spread out across different ideas. We encourage you to think and act like the researchers you all are.
The directions marked as [Challenge 🏆] are considered especially complex and require more effort and technical/theoretical sophistication. Their completion will be highly valued in the grading, but they are by no means necessary for obtaining excellent marks. Since these challenges replicate more closely actual research questions, we also expect a higher degree of independence from the part of the team. As such, we suggest exploring them only to the ambitious teams feeling confident about doing so, and ideally only after having covered some of the other ideas related to the project.
The training portion of datasets in Projects 3 and 5 is very large (>500k examples). Since we understand using all the available data for training and analysis can be unfeasible without the appropriate resources, we allow you to use any amount of data you can comfortably fit your disk or RAM space, as long as it is:
Documented in your report, alongside your hardware limitations.
At least 50k examples, so as to draw valid insights from your experiments
We still encourage you to use as much data as possible and to consider using e.g. Google Colab and Kaggle to get access to additional RAM and GPU resources, and the 🤗 Datasets library for smart caching and memory management.
The usage of fewer data won't be penalized in the project evaluation if these requirements are satisfied.
Here you can find the pages with project descriptions and resources: