Home

NCHLT SPEECH CORPUS

for South Africa's eleven official languages -

subsidiary resources

The NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven official languages of South Africa.

The main purpose of this site is to serve as a repository for the data partition lists and pronunciation dictionaries that was used in experiments reported in the official corpus paper: E. Barnard, M. H. Davel, C. van Heerden, F. de Wet and J Badenhorst “The NCHLT Speech Corpus of the South African languages,” accepted for publication in Proc. SLTU, St Petersburg, Russia, May. 2014.

Speech Corpus

The NCHLT "clean" corpus can be obtained from the RMA at http://rma.nwu.ac.za/. Alternatively, the corpus is also available via anonymous FTP: ftp://ftp.internat.freebsd.org/pub/nchlt/Speech_corpora.

--------------------------------------------------

md5sum gzip tar archive

--------------------------------------------------

ecb6b02e0623865137ba17cdd3357886 nchlt_afr.tar.gz

4e7f825282c26ff445b90f5a5b7dd113 nchlt_eng.tar.gz

3371af81c7305f2974c6cc8834c404dc nchlt_nbl.tar.gz

ba072561a3b37615a6c6c9c58bb3b1d7 nchlt_nso.tar.gz

c57e3bd69b69b6809bb9835dd1e6ff63 nchlt_sot.tar.gz

a7cebe7c57d8afcc1b3bc2adf6e9beb4 nchlt_ssw.tar.gz

081233ae2fb2be38390530f045ff12b2 nchlt_tsn.tar.gz

cea5fe9cb33f1784b05fe909e932bbf5 nchlt_tso.tar.gz

74de3a16541aae3aff7888acca106d8b nchlt_ven.tar.gz

c4b83228f0ccc65f962898447dd5ef1c nchlt_xho.tar.gz

cfffe242523e04efb94e4e05bbc12194 nchlt_zul.tar.gz

--------------------------------------------------

LICENCE

The dictionaries made availabe on this site are derived works of the "NCHLT-inlang Pronunciation Dictionaries" by the Meraka Institute, CSIR and the North-West University, available from the RMA and released under a Creative Commons Attribution 3.0 Unported License (CC BY 3.0). When using these dictionaries, please cite the following paper:

  • E. Barnard, M. H. Davel, C. van Heerden, F. de Wet and J. Badenhorst, "The NCHLT corpus of the South African languages", in Proc. SLTU, May 2014.

When using the Afrikaans dictionary, please cite:

  • W. D. Basson and M. H. Davel, "Category-Based Phoneme-to-Grapheme Transliteration" in Proc. Interspeech, Lyon, France, August 2013, pp. 1956 - 1960.

NCHLT Dictionaries

For more information on the NCHLT dictionaries, see nchlt_dicts_project_report.pdf.

NCHLT Phone Set

For more information on the phone sets used in the NCHLT corpus dictionaries, see nchlt_phoneset.pdf.

ASR System Resources

To access the lists and dictionaries, click on any of the languages below: