Dataset & Evaluation
The dataset is available on the task's GitHub repository and on Zenodo. The dataset is split into training, development and testing sets with 50K, 25K and 25K examples, respectively. Each one of these sets consists of two parts. The first one is a CSV file (e.g., train.csv), which contains pairs of User IDs (uids) and Problem IDs (pids). Each uid appears 50 times in the file with 50 different pids for the training set (or 25 for development and testing sets). The other part is a directory (e.g., train) that contains the source codes. Each pid in the CSV file is linked to a source code file in the directory. Systems will be evaluated and ranked based on the Accuracy metric. An evaluation script is available on the GitHub repository. Each participant should report the accuracy of his/her system on the development and testing sets.
Note that:
Participants are NOT allowed to use the development set or any external dataset (labeled or unlabeled) to train their systems.
Participants can use additional resources such as pre-trained language models, knowledge bases, etc.
In the testing phase, participants can perform up to three submissions. The best one will be used in the final ranking of the participants.
Dataset Statistics
Users Count
Solutions Count
Tokens Count
Whitespaces Count
Unique Tokens
AVG. Solutions/User
AVG. Tokens/Solution
AVG. Whitespaces/Solution
Maximum Tokens in a Solution
1,000
100,000
22,795,141
46,944,461
1,171,991
100
227.951
469.445
10,189
Unique Problems
Maximum Solutions/Problem
Minimum Solutions/Problem
AVG. Solutions/Problem
AVG. Solutions/Codeforces Index
Median Solutions/Problem
Unique Countries
AVG. Solutions/Country
Minimum Tokens in a Solution
6,553
61
1
15.260
2,439.024
12
78
1,282.051
3