Project Grading is based on a single individual project or teams of a maximum of 2.
There will be a series of benchmarks on various types of datasets:
image classification
language modeling
reinforcement learning
image generation
speech/audio analysis
medical data analysis
video analysis
etc.
For each, there is a target score, derived from the highest solution obtained by students in previous years. The project's objective is to surpass this previous score.
Students are encouraged to review solutions from previous years, and current literature, and attempt any means to improve the score. Grading is based on the extent of score improvement (at least matching the previous year’s score), the quality of experimentation (ablation study on the introduced elements), and the quality of the report describing the utilized solution. Bonus points are awarded for innovative solutions and non-trivial discoveries/analyses that reveal new insights about the dataset/solutions, etc.
Grading:
1 point starting base
Benchmark Performance:
1 point: for achieving a score approximately equal to last year without introducing new ideas, either own new or from recent literature
2 points: for slightly improving the score or enhancing another metric (memory usage/training time/inference speed) by introducing new ideas compared to the previous year
3 points: for substantially improving the score by introducing own new ideas or applying advanced methods from recent literature
Ablation Study:
1 point: for basic comparison between the baseline and new solution
2 points: for more detailed comparisons on various metrics such as speed, compute, memory usage
3 points: for extended comparisons in more scenarios, types of data, data regimes, evaluations of robustness, interpretability, and other more complex studies
Solution Report:
1 point: for a simple description of related work, the new solution, and the results
2 points: for detailed related work, well-written solution description, presentation of results, and an analysis of them
3 points: for a well-written, high-quality paper. An academic publication level of writing and structure in all aspects: related work, method description, theoretical analysis, results, and results analysis.
If a benchmark does not yet have a previous score, the first generation is tasked with thoroughly documenting the problem, and existing literature, and introducing an initial solution. The performance of the solution will only account for 1 point (compared to a teacher-provided baseline), and the other two categories will weigh a third more (1pct becomes 1.33pct, 2pct becomes 2.66, 3 points becomes 4pct).
The grading will be done during the last week, you are invited to ask questions anytime you need it!
To ensure fair evaluation, all benchmarks from previous years will have two scores: standard and open. The standard score is obtained using free online training resources (Kaggle, Colab) without pre-training the model or using additional data. The open score can be achieved using any computational resources and additional data, provided they are made available to students in subsequent years.
Students are free to compete in the standard or open category, or both. The Benchmark Performance grade will be based on the most favorable score for the student.
Students may suggest new datasets or Kaggle / CLEF / AICrowd competitions to work on, which will then form a new benchmark for the following years.
The current list of project benchmarks is listed below:
TBA