Academic Projects

Audio Analysis on Detecting Potential Autistic Children

Supervisors: Professor Mark Hasegawa-Johnson and Professor Karrie Karahalios, 2021-2023

Diagnosing autism can be difficult because there is no medical test, like a blood test, to diagnose the disorder. Doctors look at the child’s developmental history and behavior to make a diagnosis. Autism can sometimes be detected at 18 months of age or younger. However, many children do not receive a final diagnosis until much older given limited reliable medical services. This delay means that people with autism might not get the early help they need. My research is to use machine learning algorithms on automatic audio analysis for rapid conversations between clinicians and children to assist clinicians in detecting features relevant for early diagonsis.

Supervisors: Professor Mark Hasegawa-Johnson and Professor Nancy L. McElwain, 2019-present

Children’s development happens in the everyday contexts in which they live, such as at home, at school, and on playgrounds. In the Kids in Context (KiC) Project, we pilot test LittleBeats™, a new wearable mobile sensing device for children. I'm an algorithm developer for analyzing children's vocalizations at home, which helps parents, earlyhood educators, and psychologists better understand children's experiences in everyday contexts and potentially identify children's mental health problems at an early stage. 

Transfer Learning for Under-Resourced Tonal Languages

Supervisor: Professor Mark Hasegawa-Johnson, 2019-2021

Phones, the segmental units in the International Phonetic Alphabet (IPA), include isolated consonants or vowels; tones, the suprasegemental units, represent pitch and voice quality movements that may span many phones. While various works have attempted to solve cross-lingual adaptation problems for segmental units, few of them have explored transfer learning techniques for suprasegmental units, such as tones. In this study, we perform extensive analysis on transfer learning techniques on suprasegmental units, tones, for five tonal languages (Mandarin, Cantonese, Lao, Thai, and Vietnamese).

Industrial Projects 

Second-pass non-autoregressive CTC-based deliberation ASR 

Research Intern, Facebook/Meta AI, AI Speech, summer 2022

Supervisor: Dr. Ke Li & Dr. Jinxi Guo

In this project, we Implemented a 2-pass transformer/conformer-based non-autoregressive CTC deliberation automatic speech recognition (ASR) model to further improve 1-pass streaming RNNT ASR with 14.3% relative word error rates (WER) reduction on Librispeech test-other dataset. We evaluated WER and real-time factors on both CTC-based and seq2seq-based (autoregressive) 2-pass deliberation ASR models on Librispeech to demonstrate the superiority of non-autoregressive-based systems.

Accent-Robust ASR 

Research Intern, Facebook/Meta AI, Multimodal Video Understanding, summer 2021

Supervisor: Dr. Vimal Manohar 

In this project, we build accent-robust ASR model using both unsupervised wav2vec embeddings and supervised learned embeddings, and we illustrated the benefits of wav2vec embeddings for improving domain-adversarial training and multi-task learning accented ASR tested on a large volume of real-world dataset containing 21 English accents.