Each group project involves the following 4 items:
All deadlines will be provided on Nestor.
The labs of course weeks 6 and 7 are dedicated to supporting your project work. At least 2 representatives of each group must attend the lab and report their progress to the instructors (e.g. by preparing a couple of slides or printing a 1-page handout). You'll receive feedback and advice on the spot. Here are some ideas for points to discuss in your progress report:
Division of roles among team members
Current literature review stage - what sources have you read and found most useful to proceed with your experiments.
Current experimental stage - which experiments have been conducted so far, and preliminary results if available.
Which research direction do you plan on taking beyond the shared mandatory part? Do you plan to work on the challenge?
Provide a non-exhaustive time planning for the weeks leading up to the submission of the report for every member of the team.
Complete all sections of the report template and submit the final PDF version through Nestor.
Length: between 5 and 8 pages (not counting references and appendices).
Note: You don’t need to fill the maximum number of pages to get a good grade. Substance, clarity and conciseness are more important than length.
Create a GitHub repository with all relevant data & code for the project. Add a link to the repository in the written report.
Students are encouraged to collaborate using Github to version code, but it is not mandatory (i.e. you can simply upload everything at the end if you prefer).
Data:
any newly created datasets (including new annotations of existing datasets) should also be hosted on the repository (unless it’s huge and easy to re-generate with a script). Important: Project 1 (A Study in Post-Editing Stylometry) makes use of data that should not be shared publicly. If you build upon those, please be sure not to share them in a public Github repository or anywhere else.
for existing datasets, only provide links to the original sources.
Include output files for all reported numbers in the report (e.g. output logs from running evaluation scripts)
Code:
add all code for preprocessing, training, feature extraction, predicting, evaluating, analyzing, etc.
You can use Jupyter Notebooks for analysis and visualization, but code needs to be modular, well-commented and well-structured. If you have no experience, Google Style Guide is a good starting point for docstrings and comments.
GitHub README:
The most important part of your repository! It should tell us exactly how to use your code and reproduce your results step-by-step.
Include at least the following sections:
How to install all dependencies
How to preprocess the data so it is in the correct format
How to train the models on the data
How to use one of your already trained models to predict unseen data
How to evaluate the output
Installing dependencies: pip install -r requirements.txt
To check if everything works as it should: create a new Python virtual environment, install the dependencies and follow along with your own README file. You can ask another student group to test (help each other)!
Each team delivers an oral presentation at the Final Project Fair (April 14th, 2022 between 9:00 and 13:00):
Time: 10 minutes + 3 minutes for Q&A session. We expect everyone's active participation during all Q&A sessions.
Presenter(s): choose 1 or 2 team members to deliver the presentation. All team members must be present in person and ready to answer the audience questions during Q&A;
Format: Submit the PDF version of your presentation via Nestor by 8:00 on the morning of the presentation. We will download the files to the PC on the morning before the presentation. See Nestor for details on the naming of the file;
Style: do not get lost in the details, 10 minutes are short! Use the time to get the main message across, and raise some interesting discussion.
[Subject to change. If needed, changes will be communicated in due time before the deadline]
Clarity (2): structure, coherence & language of the final writeup
Introduction (2): motivation & previous work, research questions
Method (3): description of models, soundness of experiments, choice baselines
Results (3): overview, performance, discussion & analysis
Reproducibility (2): quality and comprehensiveness of Github code & README
Presentation (2): quality of the presentation, answers to questions from the audience