General Premise

General Premise

In the following, you will find a detailed description of all the final projects for the course.  

Underlined in the “Description” section of each project you find the primary goal of each project. This corresponds to the minimal requirements to obtain a passing grade at the final evaluation (i.e. 5.5), in absence of any other results. 

Every project contains an “Ideas for research directions” section in which multiple research questions related to the project are outlined. Students aiming for higher marks will be required to attack at least one of these questions. We want to emphasize that a thorough analysis for a single research direction is much more valuable than superficial results across different ideas. We encourage you to think and act like the researchers you all are.

Examples of potential pitfalls to avoid:


The directions marked as [Challenge 🏆] are considered especially complex and require more effort and technical/theoretical sophistication. Their completion will be highly valued in the grading, but they are not necessary for obtaining excellent marks. Since these challenges resemble more closely actual research questions, we also expect a higher degree of independence from the part of the team. As such, we suggest exploring them only to the ambitious teams feeling confident about doing so, and ideally only after having covered some of the other ideas related to the project.

Using Large Datasets

Some projects may involve the training of models on large datasets (>500k examples). We understand using all the available data for training and analysis can be unfeasible without the appropriate resources. Therefore, we allow you to use any amount of data you can comfortably fit your disk or RAM space, as long as it is:

We still encourage you to use as much data as possible and to consider using e.g. Google Colab and Kaggle to get access to additional RAM and GPU resources, and the 🤗 Datasets library for smart caching and memory management. Moreover, we will strive to provide you an overview of how the RUG Peregrine cluster can be used to mitigate such problems.

The usage of fewer training data won't be penalized in the project evaluation if these requirements are satisfied. By contrast, test data should always be used in their entirety.

Using AI-powered Writing Assistants

AI writing assistants' capabilities have recently evolved from simple style and spell-checking (e.g. Grammarly) to generating meaningful content from scratch (e.g. OpenAI's ChatGPT, Anthropic's Claude, Meta's LLaMA). Please refer to the official RUG policy on AI in teaching to know how these tools can or cannot be used in the context of your education.

Here you can find the pages with project descriptions and resources: