Datathon@IndoML 2023

Datathon'24 is now underway. Checkout it out here!

TLDR: Important dates and steps

👉 Register: https://forms.gle/2d5kabsuvnUx6xX76
📂 Download the Data: Check the "data" folder
🏗️ Build your model: A quick recipe to get you upto speed
🏟️ Submit your predictions at: https://codalab.lisn.upsaclay.fr/competitions/14977
🚀 Starting Kit: Colab Notebook
🗓️ Competition Deadline: 12th October
🏆 Win prizes worth INR 1 Lakhs

Announcement:

The Final Tutorial for the Final Phase [Phase-2] has been uploaded. Tutorial Link: https://rb.gy/frxux Speaker: Shubhadip Nag, IIT Kharagpur.
The Final Data has been released for the Final Evaluation on 2nd October 2023.
The surprise dataset has been released on the Codalab platform for the Datathon 2023 challenge!
The second Tutorial series took place on 3rd September at 6:30 PM. Tutorial Link: https://rb.gy/gu88u Speaker: Rishi Chowdhury, Masters Student at ISI Kolkata.
The first Tutorial series took place on 26th August at 6:30 PM. Tutorial Link: https://rb.gy/jwdwh Speaker: Bishal Santra, Researcher, IIT Kharagpur.
You can find the steps and instructions to participate in the attached video link: https://rb.gy/lu2k3
The competition opens on 9th August 2023 at 12:00 Noon IST. The dataset will be released alongside at Codalab.

Check out this space for further updates !!!

Welcome to Datathon@IndoML 2023. Like previous years, Datathon will be held in conjunction with IndoML 2023. We invite participation from students as well as early career professionals. Selected teams will also be invited to IndoML 2023 to present their solution to leading Machine learning researchers from around the world, both from industry and academia.

Task

Intent detection is commonly treated as a classification task in conversational systems. In this year's Datathon@IndoML, we pose the challenge of the 'intent recognition' task as a few-shot multi-class classification problem. The training data contains 150 classes, each with 15 utterances. The blind test data is available for testing your model. The labels to the test data utterances are not provided to the participants. To evaluate your model's performance, please proceed to our Codalab page. You may simply go to the "Participate" tab, navigate to the "submit/view results" link, and submit to the "Final" phase page.

Please note that you may skip Phase 1 a.k.a the development phase and directly submit to the final phase from October 2.

Please go through this tutorial to know how you can use Pre-Trained model to fine-tune it further using the training Data.

Competition timeline

9/08/23-- Datathon starts.

Aug-Sept -- Tutorials.

14/9/23 -- The Surprise dataset will be released and the evaluation leaderboard will be kicked off.

September (2nd half) -- Ask Me Anything (AMA) session (Open QA with the organizers).

2/10/2023-- New dataset has been released for final evaluation.

15/10/2023 -- Competition ends.

16/10/2023 -- Top teams have been announced.

22/10/2023 -- Code and Report submission for the top teams.

21/12/23 - 23/12/22 -- IndoML 2023, top teams will be invited at IndoML 2023 to present their work and the final result will be declared.

Evaluation

To be eligible for the prizes, the teams need to submit the code, implementation details as well as a 1-page report (format to be provided) explaining the solution.

During the competition, a surprise data set will be released so that participating teams can evaluate themselves and fine-tune their models for the domain adaptation task. The submitted models would be tested on the new dataset for the final evaluation.

Note: Highest performance metric is not the only criterion on which the winners will be decided. The teams will be judged based on overall performance, innovativeness of the proposed solution, and as well as new findings if any. The final decision on the winners will be made by the judges at IndoML 2023. Prizes are to be won in multiple categories.

Competition guidelines

Participants should work in a team of maximum of 3 members. A Google Form will be circulated for the registration of the teams.
Each team should have at least one person from an Indian University or an Indian research lab.
Please join Discord at https://discord.gg/afHtCMBqy4 to receive the latest updates, post your doubts and engage in interactive discussions.
Submission of code/implementation details and report is mandatory to be considered for prizes.
The organizers will take the final call on the final prize money as well as any modification of the evaluation criteria (if any).

[1] Bastianelli, Emanuele, et al. "SLURP: A Spoken Language Understanding Resource Package." Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.

Page updated

Google Sites

Report abuse