GOLD
Great Old List of Datasets
As part of our community research project Instruction Tuning Multilingual, we invited members of the community to contribute to GOLD: Great Old List of Datasets. Along with identifying some initial datasets to use in the project, this list is valuable unto itself.
We are happy to share GOLD publicly with our open science community as a comprehensive (and ever-growing) list of NLP datasets in all languages. Our hope is that you will make use of it to inform your own research. As a collaborative document, we continue to invite you to propose edits and additions to the document to support your own work and that of fellow community members.
Thanks to the contributors!
Aisha Alaagib
Ajinkya Mulay
Alham Fikri Aji
Alon Albalak
Jay Gala
Joseph Imperial
Kotti Sasikanth
Emmanuel Akanji
Oluwa Dunsin
Pratik Mehta
Ravi Deedwania
Rishit Dholakia
Ruchit Rawal
M Saiful Bari
Shivalika Singh
Sudipta Ghosh
Wei-Yin Ko
Yada Pruksachatkun