Projects should be carried out in groups of up to 2 people.
Submissions must not incorporate any copyrighted, proprietary, or closed-source data, code, or content.
Participants are restricted to using for training only the datasets provided.
Participants may submit new solutions or the implementation/modification of existing work, both of which are accepted. Submissions must specify if they are original or based on prior research. In the last case, please reference the relevant paper(s).
Any group can submit as many solutions as they want (No restriction on the number of solutions per group). The top-performing model will determine the team's ranking on the leaderboard and will be chosen for the final evaluation phase.
Ensure that your solution is fully reproducible. Include any random seeds or initialization details used to ensure consistent results (e.g., torch.manual_seed() or np.random.seed()), and if using a pre-trained model, include the instructions for downloading or specifying the model path.
Material to be submitted on Hugging Face:
For each of the four test dataset provided, upload the prediction obtained in a CSV file (do not change the id of the samples!).
Then Compress the submission folder into a .gz file using the Python script provided in the "Submission Info" tab in the Hugging Face competition page.
Rules for the GitHub repo:
All submissions must follow the file and folder structure below:
main.py:
The script must accept the following command-line arguments:
python main.py --test_path <path_to_test.json.gz> --train_path <optional_path_to_train.json.gz>
Behavior :
If --train_path is provided, the script must train the model using the specified train.json.gz file.
If --train_path is not provided, the script should only generate predictions using the pre-trained model checkpoints provided.
The output must be a CSV file named as testset_<foldername>.csv. <foldername> corresponds to the dataset folder name (e.g., A, B, C, or D).
Ensure the correct mapping between test and training datasets:
Example: If test.json.gz is located in ./datasets/A/, the script must use the pre-trained model that was trained on ./datasets/A/train.json.gz.
Folder and File Naming Conventions
checkpoints/: Directory containing trained model checkpoints. Use filenames such as:
model_<foldername>_epoch_<number>.pth
Example: model_A_epoch_10.pth. Make sure you save at least 5 checkpoints for each model.
source/: Directory for all implemented code (e.g., models, loss functions, data loaders).
submission/: Folder containing the predicted CSV files for the four test sets:
testset_A.csv, testset_B.csv, testset_C.csv, testset_D.csv
logs/: Log files for each training dataset. Include logs of accuracy and loss recorded every 10 epochs.
requirements.txt: A file listing all dependencies and the Python version.
Example:
python==3.8.5
torch==1.10.0
numpy==1.21.0
README.md: A clear and concise description of the solution, including:
Image teaser explaining the procedure
Overview of the method