Participants should submit data in line with the format shown in the trial data. Participants will be allowed to submit 3 runs each.
Information on submissions, the evaluation script, trial and test data will be made available below.
All Data will be released via the MLSP Github page here: https://github.com/MLSP2024/MLSP_Data/
For evaluation we will use the same protocol as for the LCP 2021 shared task and for the TSAR2022 shared task. The data has been split into LCP and LS files that will work appropriately with these evaluation scripts.
Trial Data is now available at the MLSP GitHub for 10 languages: Catalan, English, Filipino, French, German, Italian, Japanese, Portuguese, Sinhala and Spanish. 30 instances are available for each language, giving a total of 300 trial instances.
All Test Data is available via the GitHub Link above.
The test data is released as unlabelled files, mirroring the format of the trial data. Participant systems should add the labels for LCP to the _lcp files and for LS to the _ls files following the format in the trial data.
Submissions should be made via a Pull Request to the following GitHub repository:
https://github.com/MLSP2024/MLSP_Participants/
Please see the GitHub Readme at the link above for detailed information on how to submit.
Papers should be submitted through the BEA workshop START system: https://softconf.com/naacl2024/BEA2024
Please select the track MLSP_SharedTask when submitting your paper.
Shared Rask Teport
Datasets
MultiLS Framework (link)
Spanish and Catalan Datasets