Research & Development Interests
My research focuses on machine learning and signal processing for speech analysis and recognition, paralinguistic information retrieval, and audio scene analysis. Specific topics include
Automatic Speech Recognition
End-to-End (E2E) automatic speech recognition (ASR)
ASR for people with speech disorder
Multimodal information integration (Acoustic and articulatory motion from electromagnetic articulography (EMA) sensors)
Silent speech recognition/silent speech interface with articulatory motion data
Paralinguistic Analysis
Speaker diarization/identification/verification
Automatic analysis and assessment of perceptual speech intelligibility or severity of speech disorder
Speech-based stressful/emotional state analysis/recognition
Audio Scene Analysis
Sound event detection/classification/recognition
Content-based audio indexing and retrieval
Education
Korea Advanced Institute of Science and Technology (KAIST)
Ph.D. in Electrical Engineering (Aug. 2010 - Feb. 2016)
Thesis: Automatic intelligibility assessment and recognition of dysarthric speech using phonetic variations
Advisor: Dr. Hoirin Kim
Korea Advanced Institute of Science and Technology (KAIST)
M.S. in Information and Communications Engineering (Aug. 2008 - Aug. 2010)
Thesis: Audio feature extraction methods for multimedia content analysis
Advisor: Dr. Hoirin Kim
Tech University of Korea
B.S. in Electronics Engineering (Mar. 2004 - Aug. 2008)
Advisor: Dr. Eung-Hyuk Lee
Work Experience
Deep Learning Scientist (June 2022 - Present)
Speech science for NVIDIA Riva, NVIDIA's speech API
Staff Machine Learning Researcher (September 2019 - May 2022)
Speech science for Bixby, Samsung's personal voice assistant
Main responsibility was E2E ASR model development/maintenance in production and speaker diarization research/development
Involved in multiple projects: semi-supervised ASR model training, speech separation to handle overlapped speech, grapheme-to-phoneme (G2P) conversion, and speech activity detection
Speech Scientist (April 2018 - August 2019)
Machine learning and signal processing for automatic speech recognition, speaker diarization/recognition, and topic classification.
Speech Disorders & Technology Lab., The University of Texas at Dallas (Now the lab is at The University of Texas at Austin)
Postdoctoral Research Associate (May 2016 - April 2018)
Advisor: Dr. Jun Wang
Machine learning and signal processing for disordered speech recognition, silent speech recognition, and brain activity signal analysis.
Statistical Speech & Sound Computing Lab., Korea Advanced Institute of Science and Technology (KAIST)
Research Assistant (Aug. 2008 - Feb. 2016)
Advisor: Dr. Hoirin Kim
My work was focused on signal analysis, feature extraction, and machine learning in the field of speech and speaker recognition, and audio indexing.
Speech and Language Information Research Division, Electronics and Telecommunications Research Institute (ETRI)
Research Intern (Sep. 2009 - Feb. 2010)
Advisor: Sung Joo Lee and Dr. Yun-Keun Lee
I conducted research on target signal detection based on statistical models using cross-similarity between multi-channel microphones.
Human Media Communication & Processing Lab., Gwangju Institute of Science and Technology (GIST)
Research Intern (Jan. 2008 - Mar. 2008)
Advisor: Dr. Hong Kook Kim
I implemented a voice transmission system in a Bluetooth environment, using G.711 codec and G. 711 packet loss concealment algorithm.
Research Project
Silent speech interface
Funded by NIH (May 2016 - April 2018)
My work was focused on silent speech recognition, which converts articulatory movements into text information, using articulatory movement data from electromagnetic articulography (EMA) sensors (not using acoustic information). I worked on articulatory representation learning and articulatory modeling based on deep learning methods for the performance improvement of silent speech recognition.
Funded by AIA Korea (Aug. 2015 - Sep. 2015)
This project helped a mother with a speech impairment sing a happy birthday song to her daughter via a special voice generation device. In this project, my role was the choice of voice that is most similar to mother's voice among over 10,000 short voice samples using template-based matching algorithm.
A research on speech based emotion/stress states assessment and management techniques
Funded by KAIST in Korea (Mar. 2014 - Dec. 2014)
I conducted research on speech-based emotion classification and stressful state detection, focusing on feature extraction based on Teager energy operator and pitch perturbation modeling.
Development of smart video/audio surveillance SoC & core component for onsite decision security system
Funded by Ministry of Trade, Industry and Energy in Korea (Nov. 2013 - Oct. 2014)
I carried out research on abnormal sound detection/classification for surveillance applications using feature extraction methods based on two-dimensional cepstrum and image processing techniques.
Development of an embedded key-word spotting speech recognition system individually customized for disabled persons with dysarthria
Funded by Ministry of Knowledge Economy in Korea (Jun. 2010 - May 2014)
My work was focused on the intelligibility prediction of disordered speech and development of an individually customized speech recognition system using speaker adaptation methods. This was the research topic for my Ph.D. dissertation.
Development of a voice authentication entry system
Funded by Samsung S1 in Korea (Dec. 2012 - Oct. 2013)
I was responsible for design and implementation of TCP-IP based online continuous digit speech recognition system using KALDI speech recognition toolkit including feature extraction and decoding on PC/Embedded systems.
A research on special sound recognition
Funded by Small & Medium Business Corporation in Korea (Apr. 2012 - Sep. 2012)
I conducted research on robust infant crying detection in adverse noisy environments, focusing on feature extraction based on segmental two-dimensional linear frequency cepstral coefficients.
A research on speaker recognition in u-robot
Funded by Ministry of Knowledge Economy in Korea (Jun. 2010 - Jan. 2011)
I was responsible for design and implementation of online GMM-UBM based speaker recognition system, including endpoint detection, feature extraction, speaker identification, and speaker verification.
A research on audio feature analysis of malicious multimedia
Funded by ETRI in Korea (Jun. 2009 - Jan. 2010)
I conducted research on objectionable sound classification that filters pornographic content using only audio information in the video, focusing on feature extraction using time-frequency dynamics and feature transformation based on discriminant analysis.
Teaching Experience
Guest Lecturer, Department of Bioengineering, University of Texas at Dallas
"Machine learning applications: Automatic speech recognition", Nov. 2016, in BMEN 3325 Advanced Matlab Programming for Biomedical Engineering (Fall 2016)
"Machine learning applications: Automatic speech recognition", Nov. 2017, in BMEN 3325 Advanced Computational Tools for Biomedical Engineering (Fall 2017)
Teaching Assistant, Department of Electrical Engineering, KAIST
Graduate Course: Speech and Audio Coding Theory (Spring 2012, Spring 2010), Speech Recognition System (Fall 2011), Digital Speech Processing (Fall 2012, Spring 2011)
Undergraduate Course: Signals and Systems (Fall 2010)
I assisted in the preparation and grading of homework and exams.
Professional Activities
Reviewer
IEEE Transactions on Multimedia
IEEE Transactions on Audio, Speech, and Language Processing
IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Neural Systems and Rehabilitation Engineering
IEEE Access
IEEE Signal Processing Letters
Journal of Machine Learning Research
Journal of Supercomputing
JASA Express Letters
Invited Talks
"An overview of automatic speech recognition: From HMM to E2E," Artificial Intelligence Convergence Department, Chonnam National University, Jul. 27, 2022.
"Deep neural networks for the recognition of silent speech and dysarthric speech," Bioengineering Department, University of Texas at Dallas, TX, Nov. 17, 2017.
"Deep learning and its applications to dysarthric speech recognition," Chosun University, South Korea, Mar. 21, 2017.
"Introduction to deep learning," University of Texas at Dallas, TX, Jan. 21, 2017.
"Sound processing techniques," Gachon University, South Korea, Apr. 14, 2016.
"Introduction to deep learning," Jungwon University, South Korea, Apr. 5, 2016.
"Automatic intelligibility assessment and recognition of dysarthric speech using phonetic variations," ETRI, South Korea, Dec. 15, 2015.
Honors and Awards
Best Paper Award, International Conference on Mechatronics and Intelligent Robotics, May 2018
Best Poster Award, the 7th International Conference on Speech Motor Control, Jul. 2017
The Silver Prize, The 22nd Samsung Electronics HumanTech Paper Award, Jan. 2016, $7,000
Government Scholarship, KAIST, Aug. 2010 - Feb. 2016
International Speech Communication Association (ISCA) Travel Grant, Sep. 2012, 650 euro
The Grand Prize, Dept. of Electronics Engineering Conference, Tech University of Korea, Dec. 2007
Participation Prize, International Robot Contest, Seoul, South Korea, Oct. 2006
Scholarship, Dept. of Electronics Engineering, Tech University of Korea, 2004-2008