Missing data mask estimation is the process of estimating which spectro-temporal regions in a spectrographic representation of noisy speech remain (relatively) uncorrupted. I have compiled an Octave/MATLAB package for machine-learning (SVM) based missing data mask estimation. For an overview of the methods and the relevant background, my thesis [1] is a good starting point.
Download
You can grab the archive here (517 KB). While the core code is recognizer-independent, the routines to read and write spectrograms are based on SPRAAK. A slightly different version, aimed at the Finnish Aalto recognizer can be found here (402 KB). It should be easy to replace the recognizer-specific routines for your own recognizer.
Contents
The following tools are provided:
make_codebooks : Creates speech and silence codebooks for use with Vector Quantization (VQ) based mask estimation. [2]
make_mlmodel : Creates models for make_mlmask, machine learning-based mask estimation [3,4]. Requires ysnhr.trk files as input, which can be created using make_ysnhr
make_mlmask: Creates missing data masks (in the form of .ym.trk files) which can be used for MDT-based recognition. Requires a model trained by make_mlmodel.
make_ysnhr: Helper function that creates the neccesary input feature files for make_mlmodel, based on noisy and clean speech files and the corresponding noise files.
In additional, the following directories exist:
dependencies : A number of c-files which need to be compiled to .mex files for the tools to work. See the README in this directory.
testdatadir : A small number of example speech files to test the tools with.
docs : Contains some of the articles referenced in the README's
Installation and usage
For installation and usage details, check the README files in each of the package subdirectories. The package was written and tested with octave 3.4, and should work with most other versions. For use with MATLAB, you would need to do a bit of conversion (endfor > end, endswitch > end, endif > end, etc) and remove the command-argument handling.
Disclaimer
This package is provided as-is, without any support, in the hope that it will be useful. Since missing data techniques are no longer an active research area for me, this code is no longer maintained.
References
[1] Noise robust ASR: Missing data techniques and beyond (Jort F. Gemmeke), PhD thesis, Radboud Universiteit Nijmegen, The Netherlands, 2011. [bib][pdf]
[2] M. Van Segbroeck and H. Van hamme, “Vector-Quantization based mask estimation for missing data automatic speech recognition,” in Proc. INTERSPEECH, Antwerp, Belgium, August 27–31 2007, pp. 910–913.
[3] "Estimation of spectral masks using sparse kernel-based methods", summer internship report, Locsei Gusztáv, 2010
[4] J. F. Gemmeke, Y. Wang, M. Van Segbroeck, B. Cranen, and H. Van hamme, “Application of noise robust MDT speech recognition on the SPEECON and SpeechDat-Car databases,” in Proc. INTERSPEECH, Brighton, UK, September 6–10 2009, pp. 1227–1230.