SayMoreX, Metadata Editing and Collection Management

New: Beta-release version of SayMoreX for Mac and Windows now available [download] (2019-01-11) 

Developed with funding from the NSF (USA), ARC Centre of Excellence for the Dynamics of Language (Australia), and the ELDP (SOAS, UK). Please try out!

Tools for Linguistic Data
The world’s linguistic and cultural diversity is encoded in the 7000 or so distinct languages spoken across the world. With many of these languages currently endangered or threatened, the creation of an enduring record of these language is of paramount importance. The field of documentary linguistics seeks to create this record by assembling a richly-annotated corpus of recordings of observable linguistic behavior, supplemented with description of linguistic structures. The task of language documentation thus requires the researcher to manage a large amount of interlinked data, including raw audio and video recordings, photographs, transcription files, annotation files, lexical databases, responses to experimental stimuli, and field observations. While standards have been developed for archiving and preserving these data, linguistic data management prior to archive deposit is often ad-hoc and idiosyncratic, without any widely-accepted practice. In particular, no standard tool exists to manage digital file and associated metadata. At best, current practice is inefficient, resulting in delays prior to archiving and requiring significant additional investment of researcher time. At worst, these bottlenecks lead to indefinite delays, with the result that research products may never be properly archived.

This workshop series addresses this obstacle by developing standardized tools for management of linguistic data collections. Such tools will facilitate a more robust and reproducible science of language by providing researchers with standard methods to manage data from the point of collection to the point of archive deposit. The aim is to eliminate the collection management bottleneck and thus facilitate greater uptake of language archives. The workshop series will bring together relevant stakeholders including: field linguists who collect data; theoretical linguists who make use of archival linguistic data; experts in data curation; and software developers. In order to encourage broad participation the three workshops will be scheduled in conjunction with major gatherings of linguistic researchers, including the Linguistic Society of America annual meeting. The outcome of these workshops will be a sustainable plan for development of a cross-platform, open source collection management tool. By making data more accessible and better described this tool will facilitate increased reproducibility of linguistic research. This greater availability of primary language resources will transform not only various subfield of linguistics, but also related fields such as anthropology and social psychology, which rely on careful management of field data. Further, by taking a grass roots, community driven approach through this series of workshops we hope to encourage broad adoption of collection management tools by the language documentation community and thus decrease the barriers to proper description and archiving of linguistic data. Moreover, by improving the dialogue between language documenters, language archivists, and developers, this project will serve as a model for the development of linguistic software. 


  • Gary Holton, University of Hawai‘i at Mānoa
  • Nick Thieberger, University of Melbourne


meacom.linguistics at


This project is supported by US National Science Foundation grant BCS-1648984.