An ELAN-based language documentation corpus management tool

Kwaras ('duck' in Purépecha) integrates WAV files, ELAN annotations and document metadata into a web-based interface for language documentation corpora, allowing immediate access to linguistic annotations and audio recordings. This page contains downloadables and instructions for how to use this tool.

Designed by Russell Horton and developed by Lucien Carroll (UCSD Linguistics).

The code for Kwaras is publicly available at

The following upcoming paper describes the main features of Kwaras:

  • Caballero, Gabriela, Lucien Carroll & Kevin Mach. (to appear). Accessing, managing and mobilizing an ELAN-based language documentation corpus: the Kwaras and Namuti tools. Language Documentation and Conservation. PDF

Downloadable Kwaras installation files (updated: November 13, 2018):

Step 1: Installing Kwaras

Kwaras installation for Mac:

a. Download the kwaras-mac-2.2.1 directory and unzip it.

b. Move the whole folder to your working directory.

c. Double click the file ‘install-macos.COMMAND’ to install the Python library.

Kwaras installation for Windows:

a. Download the kwaras-win-2.2.1 directory and unzip it.

b. Move the whole folder to your working directory.

Step 2: Setting up data directories

After installation is complete, the next step is to export data from ELAN. The export process depends on setting up four main directories of data:

a. Transcriptions: a directory of ELAN (.eaf) files

b. Recordings: a directory of WAV format sound files, minimally containing a WAV file for each of the ELAN files in the transcription directory.

c. Web: contains a ‘css’ folder, a ‘js’ folder and ‘index_wrapper.html’ (these files are found in the ‘web’ folder inside the ‘kwaras’ folder)

d. Corpus: a directory for temporary output files

The files generated by Kwaras after export (an index.html file and clips directory) will go in the Web data directory. This whole directory can be uploaded to a web server.

Step 3: Preparing metadata information

Kwaras enables users to optionally incorporate metadata information for their annotation files. Kwaras pulls speaker codes either from the ELAN tier names or from a metadata file that users provide. If tier names follow the conventional pattern of “word@SPKR”, Kwaras will use the first element as the column name and the second element as a speaker code. For files with tier names that do not have “@SPKR” annotations, Kwaras will expect a metadata file in either a utf8-encoded CSV or an XLSX format, with the minimal following columns:

a. “File” for the basename (e.g. “tx143”),

b. “Contributor” for speaker codes (e.g. “BFL”)

c. “Format” for the file extension (e.g. “wav”)

Paired EAF and WAV files should either have the same basename or the WAV file should be linked media in the EAF file.

Step 4: Running the export-corpus function

Double-click the “export-corpus.COMMAND” file in Mac OS, or in Windows double-click “Kwaras.exe”.

The next step is to configure the export process, which requires selecting a language template*. In the configuration window, complete the following steps:

a. Working File Directory: select the Corpus directory.

b. List of Fields to Export: write down the tier names that should be extracted from the EAF files, separated by a comma and space; The tiers should be listed in the order that the user desires to display the data (Important: these tier names should be spelled exactly the way they are spelled in the ELAN files users are about to export).

c. Directory of Input EAFs: select the ‘Transcriptions’ directory of EAF files.

d. WAV Session Metadata: select the CSV metadata spreadsheet file (must be an XLSX or a utf8-encoded CSV).

e. WAV Input Directory: select the ‘Recordings’ directory with WAV files.

f. Web Files Output Directory: select the ‘Web’ directory.

g. HTML Page Title: Title showing in the header of the index.html file.

h. HTML div for Navigation: HTML code for a navigation bar (optional)

i. Press okay.

A Terminal window will open and display the process. Once completed, open the ‘Web’ folder and click on the ‘index.html’ file to display the corpus in your web browser.

* Available language templates were designed for particular language documentation projects, but the capabilities are equivalent for new users.

Questions or comments? Please email Gabriela Caballero (gcaballero at ucsd dot edu).