As a scientist on the Machine Learning and Speech Processing team at CaptionCall, I developed algorithms and implemented machine learning and natural language processing techniques to improve automatic speech recognition for telephone captioning, resulting in deployment and three patents. My contributions included:
Successfully developed novel techniques for training n-gram language models on the fly without saving transcriptions, including techniques that make use of interpolation and neural net text sampling.
Implemented natural language processing and deep learning methods to estimate transcription quality with an average mean absolute error of 1.6% (more than ten times smaller than state-of-the-art estimators at the time).
Collaborated in implementing real-time fusion of multiple transcripts for improved accuracy. Contributed quality estimation of transcripts using support vector machines for enhanced voting and tie-breaking decisions, improving accuracy by 5% on average.
US Patent 10,573,312 and 10,971,153 (2020)