General Premise
General Premise
In the following, you will find a detailed description of all the final projects for the course.
Underlined in the “Description” section of each project you find the primary goal of each project. This corresponds to the minimal requirements to obtain a passing grade at the final evaluation (i.e. 5.5), in absence of any other results.
Every project contains an “Ideas for research directions” section in which multiple research questions related to the project are outlined. Students aiming for higher marks will be required to attack at least one of these questions. We want to emphasize that a thorough analysis for a single research direction is much more valuable than superficial results across different ideas. We encourage you to think and act like the researchers you all are.
Examples of potential pitfalls to avoid:
Broad evaluation without hypothesis: A shallow comparison of the performance of multiple models on a task, possibly using multiple evaluation metrics that do not agree with each other, is not very interesting from a research perspective. Research questions should generally be the main driver for all tests, and they need to be clearly stated in the report before experiments. For example, suppose your question is whether encoder-decoder models can be used for sequence tagging in which encoder-only models are typically used. In this case, comparing some encoder-decoder and encoder-only models is justified. Similarly, comparing the same model architecture with different parameter counts might be reasonable if your ultimate goal is to show that model capacity is/isn't important for performance on a task.
Comparing apples and oranges: You will be asked to make sure that the results of an experimental evaluation are comparable between them and (as much as possible) with the ones reported in the references you compare with. This includes, among other things: ensuring that test splits are the same across all evaluations, that there is no test data leakage during model training, and that the metrics used have valid and matching configurations.
Presenting experimental results without comments: While experiments are an important part of the project, their purpose is to provide evidence in favor or against the initial hypotheses that are investigated. For this reason, discussing how experimental results relate to initial research questions is very important, and results should not be presented without an adequate contextualization.
Challenges
The directions marked as [Challenge 🏆] are considered especially complex and require more effort and technical/theoretical sophistication. Their completion will be highly valued in the grading, but they are not necessary for obtaining excellent marks. Since these challenges resemble more closely actual research questions, we also expect a higher degree of independence from the part of the team. As such, we suggest exploring them only to the ambitious teams feeling confident about doing so, and ideally only after having covered some of the other ideas related to the project.
Using Large Datasets
Some projects may involve the training of models on large datasets (>500k examples). We understand using all the available data for training and analysis can be unfeasible without the appropriate resources. Therefore, we allow you to use any amount of data you can comfortably fit your disk or RAM space, as long as it is:
documented in your report, alongside your hardware limitations.
at least 50k examples, so as to draw valid insights from your experiments
We still encourage you to use as much data as possible and to consider using e.g. Google Colab and Kaggle to get access to additional RAM and GPU resources, and the 🤗 Datasets library for smart caching and memory management. Moreover, we will strive to provide you an overview of how the RUG Peregrine cluster can be used to mitigate such problems.
The usage of fewer training data won't be penalized in the project evaluation if these requirements are satisfied. By contrast, test data should always be used in their entirety.
Using AI-powered Writing Assistants
AI writing assistants' capabilities have recently evolved from simple style and spell-checking (e.g. Grammarly) to generating meaningful content from scratch (e.g. OpenAI's ChatGPT, Anthropic's Claude, Meta's LLaMA). Please refer to the official RUG policy on AI in teaching to know how these tools can or cannot be used in the context of your education.
Here you can find the pages with project descriptions and resources: