Dataset & Evaluation

The dataset is available on the task's GitHub repository and on Zenodo. The dataset is split into training, development and testing sets with 50K, 25K and 25K examples, respectively. Each one of these sets consists of two parts. The first one is a CSV file (e.g., train.csv), which contains pairs of User IDs (uids) and Problem IDs (pids). Each uid appears 50 times in the file with 50 different pids for the training set (or 25 for development and testing sets). The other part is a directory (e.g., train) that contains the source codes. Each pid in the CSV file is linked to a source code file in the directory. Systems will be evaluated and ranked based on the Accuracy metric. An evaluation script is available on the GitHub repository. Each participant should report the accuracy of his/her system on the development and testing sets.

Note that:

  • Participants are NOT allowed to use the development set or any external dataset (labeled or unlabeled) to train their systems.

  • Participants can use additional resources such as pre-trained language models, knowledge bases, etc.

  • In the testing phase, participants can perform up to three submissions. The best one will be used in the final ranking of the participants.

Dataset Statistics

Users Count

Solutions Count

Tokens Count

Whitespaces Count

Unique Tokens

AVG. Solutions/User

AVG. Tokens/Solution

AVG. Whitespaces/Solution

Maximum Tokens in a Solution

1,000

100,000

22,795,141

46,944,461

1,171,991

100

227.951

469.445

10,189

Unique Problems

Maximum Solutions/Problem

Minimum Solutions/Problem

AVG. Solutions/Problem

AVG. Solutions/Codeforces Index

Median Solutions/Problem

Unique Countries

AVG. Solutions/Country

Minimum Tokens in a Solution

6,553

61

1

15.260

2,439.024

12

78

1,282.051

3