Great Old List of Datasets

As part of our community research project Instruction Tuning Multilingual,  we invited members of the community to contribute to GOLD: Great Old List of Datasets.  Along with identifying some initial datasets to use in the project, this list is valuable unto itself.

We are happy to share GOLD publicly with our open science community as a comprehensive (and ever-growing) list of NLP datasets in all languages. Our hope is that you will make use of it to inform your own research. As a collaborative document, we continue to invite you to propose edits and additions to the document to support your own work and that of fellow community members.

Thanks to the contributors!

GOLD: Great Old List of Datasets