Participant Instructions

Participants should submit data in line with the format shown in the trial data. Participants will be allowed to submit 3 runs each. 

Information on submissions, the evaluation script, trial and test data will be made available below.

All Data will be released via the MLSP Github page here: https://github.com/MLSP2024/MLSP_Data/

Evaluation Script

For evaluation we will use the same protocol as for the LCP 2021 shared task and for the TSAR2022 shared task. The data has been split into LCP and LS files that will work appropriately with these evaluation scripts.


Trial Data

Trial Data is now available at the MLSP GitHub for 10 languages: Catalan, English, Filipino, French, German, Italian, Japanese, Portuguese, Sinhala and Spanish. 30 instances are available for each language, giving a total of 300 trial instances.

Test Data

All Test Data is available via the GitHub Link above. 

The test data is released as unlabelled files, mirroring the format of the trial data. Participant systems should add the labels for LCP to the _lcp files and for LS to the _ls files following the format in the trial data.


Submission Information


Submissions should be made via a Pull Request to the following GitHub repository:

https://github.com/MLSP2024/MLSP_Participants/

Please see the GitHub Readme at the link above for detailed information on how to submit.

Paper Submission

Papers should be submitted through the BEA workshop START system: https://softconf.com/naacl2024/BEA2024 

Please select the track MLSP_SharedTask when submitting your paper.

References

Shared Rask Teport


@inproceedings{shardlow2024bea,title={{The BEA 2024 Shared Task on the Multilingual Lexical Simplification Pipeline}},author={Shardlow, Matthew and Alva-Manchego, Fernando and Batista-Navarro, Riza and Bott, Stefan and Calderon Ramirez, Saul and Cardon, Rémi and François, Thomas and Hayakawa, Akio and Horbach, Andrea and Huelsing, Anna and Ide, Yusuke and Imperial, Joseph Marvin and Nohejl, Adam and North, Kai and Occhipinti, Laura and Peréz Rojas, Nelson and Raihan, Nishat and Ranasinghe, Tharindu and Solis Salazar, Martin and \v{S}tajner, Sanja and Zampieri, Marcos and Saggion, Horacio},booktitle={Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA)},year={2024}}

Datasets


@inproceedings{shardlow2024readi,title={{An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework}},author={Shardlow, Matthew and Alva-Manchego, Fernando and Batista-Navarro, Riza and Bott, Stefan and Calderon Ramirez, Saul and Cardon, Rémi and François, Thomas and Hayakawa, Akio and Horbach, Andrea and Huelsing, Anna and Ide, Yusuke and Imperial, Joseph Marvin and Nohejl, Adam and North, Kai and Occhipinti, Laura and Peréz Rojas, Nelson and Raihan, Nishat and Ranasinghe, Tharindu and Solis Salazar, Martin and Zampieri, Marcos and Saggion, Horacio},booktitle={Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI)},year={2024}}

MultiLS Framework (link)


@article{north2024multils,
  title={MultiLS: A Multi-task Lexical Simplification Framework},  author={North, Kai and Ranasinghe, Tharindu and Shardlow, Matthew and Zampieri, Marcos},  journal={arXiv preprint arXiv:2402.14972},  year={2024}}

Spanish and Catalan Datasets


@misc{bott2024multilsspca,      title={MultiLS-SP/CA: Lexical Complexity Prediction and Lexical Simplification Resources for Catalan and Spanish},      author={Stefan Bott and Horacio Saggion and Nelson Peréz Rojas and Martin Solis Salazar and Saul Calderon Ramirez},      year={2024},      eprint={2404.07814},      archivePrefix={arXiv},      primaryClass={cs.CL}}