Submissions must not incorporate any copyrighted, proprietary, or closed-source data, code, or content.
Participants are restricted to using for training only the datasets provided.
Participants may submit new solutions or implementation/modification of existing work, both of which are accepted. Submissions must specify if they are original or based on prior research. In the last case, please reference the relevant paper(s).
The submission consists of two parts :
1) A valid Pull Request on the challenge GitHub page.
2) Submit their solutions to the Hugging Face competition space that will host the challenge.
The name of the submission must be the same in both the platforms.
Winning models, along with their corresponding code and data, are required to be open-sourced and publicly available after the competition.
Any group can submit as many solutions as they want (No restriction on the # of solution per group). The top-performing model will determine the team's ranking on the leaderboard and will be chosen for the final evaluation phase.
Ensure that your solution is fully reproducible. Include any random seeds or initialization details used to ensure consistent results (e.g., torch.manual_seed() or np.random.seed()) and If using a pre-trained model, include the instructions for downloading or specifying the model path.
Material to be submitted on Hugging Face:
- For each of the four test dataset provided, upload the prediction obtained in a CSV file (do not change the id of the samples!).
Then Compress the submission folder into a .gz file using the Python script provided in the "Submission Info" tab in the Hugging Face competition page.
Rules for the GitHub submission:
All submissions must follow the file and folder structure below:
main.py:
The script must accept the following command-line arguments:
python main.py --test_path <path_to_test.json.gz> --train_path <optional_path_to_train.json.gz>
- Behavior :
- If --train_path is provided, the script must train the model using the specified train.json.gz file.
- If --train_path is not provided, the script should only generate predictions using the pre-trained model checkpoints provided.
- The output must be a CSV file named as testset_<foldername>.csv
Here, <foldername> corresponds to the dataset folder name (e.g., A, B, C, or D).
- Ensure the correct mapping between test and training datasets:
- Example: If test.json.gz is located in ./datasets/A/, the script must use the pre-trained model that was trained on ./datasets/A/train.json.gz.
Folder and File Naming Conventions
- checkpoints/: Directory containing trained model checkpoints. Use filenames such as:
model_<foldername>_epoch_<number>.pth
-Example: model_A_epoch_10.pth
Save at least 5 checkpoints for each model.
- source/: Directory for all implemented code (e.g., models, loss functions, data loaders).
- submission/: Folder containing the predicted CSV files for the four test sets:
testset_A.csv, testset_B.csv, testset_C.csv, testset_D.csv
- logs/: Log files for each training dataset. Include logs of accuracy and loss recorded every 10 epochs.
- requirements.txt: A file listing all dependencies and the Python version. Example:
python==3.8.5
torch==1.10.0
numpy==1.21.0
- README.md: A clear and concise description of the solution, including:
- Image teaser explaining the procedure
- Overview of the method