June 1-3, 2016, University of Melbourne

This conference will bring together people interested in building better tools and methods to document small languages. While primarily aimed at tool developers and at the linguists using the tools, a major outcome of our work will be increased access for speakers to better records of more languages than is currently the case. 

This conference will help to set the agenda for collaboration on standards and tool development and provide the CoEDL with direction for investment of funds. Attendance will be by invitation and will target those actively working in the area.

There were similar workshops/conferences run by in the USA a decade ago, and a Digital Tools Summit in the Humanities was also run at Virginia in 2005.

We will develop some discussion points before the conference that will be distributed; participants will be appointed to lead discussion in working groups. Each developer will present their software and outline their development plans, noting what kinds of problems they have encountered and what users have said about the software. There will be a process for scribing results and developing a report back from working groups.

We want to explore innovative tools and methods and identify current problems for fieldwork, recording, transcription, analysis, archiving and accessibility of language material.

Background reading

Emily M. Bender & Jeff Good. 2010. A Grand Challenge for Linguistics: Scaling Up and Integrating Models

Topics for discussion, including but not limited to:

Big Questions

• What are the most pressing technological needs in making better records of the world’s small languages? 

• What efforts are being made by current researchers to address these needs? 

• How can these efforts be coordinated to maximise the possibility of interoperability? 

• What obstacles to more efficient work practices could be overcome by a targeted effort of programming over the next few years?

• What emerging tools or methods can we look to and invest in?

• Without singling out anyone, in the past there have been large infrastructure projects that have developed guidelines and frameworks, sometimes getting to the level of being functioning systems, but ending up without content or users. 

• Why are there so few digital repositories for all the material being created by documentation projects?

• Can we identify the successful systems/ tools we use and why they are successful? (e.g. OLAC; Toolbox; ELAN)

Theme 1: Archiving, Discovery, (re)Use (theme leader: Linda Barwick)

  • How can we build on the foundation provided by OLAC to maximise discoverability of existing material? • How can archived language resources be made more useful in terms of citation, persistent identification, ease of access, and the development of ‘landing pages’ that describe the collections? • How can we extend the number of archives and the reach of archives to include more records, especially at-risk legacy records? • Archiving software, visualisation of language collections

    Theme 2: Workflows, Interoperability (theme leader: Sasha Arkhipov)

     What is the range of workflows (from recording through to the archive) that are used in LD projects and how can they be improved?

    Workflow blockages: how much is the lack of interoperability of our tools preventing the development of well constructed corpora? (problems: assigning metadata to items; knowing what has been transcribed, annotated, interlinearised; moving from complex multi-tier transcription to interlinearisation and losing part of the transcription, etc). 

     Interoperability and the outputs of LD (standards for all kinds of material created by LD)

     Standard formats for complex annotation/IGT

     Metadata entry tools to help organise collections and prepare them for archiving

    Theme 3: Data Enrichment (theme leader: Caroline Jones)

     Recording and transcribing/annotating recordings (HTK, e.g. MAUS – forced alignment). 

     Eventual automatic transcription

     Distributed annotation (including crowdsourcing): 

    online systems for annotating page images of notes (archival manuscripts: handwriting recognition)

             annotating dynamic media

             interlinearising annotations

     What emerging tools or methods can we look to and invest in?

     Increasing scope of recordings (e.g., Aikuma)

     Delivery of language records for speakers (phone apps, HTML5 services from archives)

     Dictionary creation and presentation systems (online, app-based)

    Theme 4: Corpora, Scale (theme leader: Steven Bird)

     Corpus development for small languages: what standards should we be adopting or developing for corpora of small languages that may be different to those in use for large languages?

  • What frameworks are there that small textual/media corpora can be placed into for general use (e.g., developing

  • Interfaces, models and technologies for mobile language apps (scaling up recording and delivery)