R&D Experience

Research & Development Interests

My research focuses on machine learning and signal processing for speech analysis and recognition, paralinguistic information retrieval, and audio scene analysis. Specific topics include

Automatic Speech Recognition

- End-to-End (E2E) automatic speech recognition (ASR)
- ASR for people with speech disorder
- Multimodal information integration (Acoustic and articulatory motion from electromagnetic articulography (EMA) sensors)
- Silent speech recognition/silent speech interface with articulatory motion data

Paralinguistic Analysis

- Speaker diarization/identification/verification
- Automatic analysis and assessment of perceptual speech intelligibility or severity of speech disorder
- Speech-based stressful/emotional state analysis/recognition

Audio Scene Analysis

- Sound event detection/classification/recognition
- Content-based audio indexing and retrieval

Education

Korea Advanced Institute of Science and Technology (KAIST)

- Ph.D. in Electrical Engineering (Aug. 2010 - Feb. 2016)
- Thesis: Automatic intelligibility assessment and recognition of dysarthric speech using phonetic variations
- Advisor: Dr. Hoirin Kim

Korea Advanced Institute of Science and Technology (KAIST)

- M.S. in Information and Communications Engineering (Aug. 2008 - Aug. 2010)
- Thesis: Audio feature extraction methods for multimedia content analysis
- Advisor: Dr. Hoirin Kim

Tech University of Korea

- B.S. in Electronics Engineering (Mar. 2004 - Aug. 2008)
- Advisor: Dr. Eung-Hyuk Lee

Work Experience

NVIDIA

Deep Learning Scientist (June 2022 - Present)
Speech science for NVIDIA Riva, NVIDIA's speech API

Samsung Research America

Staff Machine Learning Researcher (September 2019 - May 2022)
Speech science for Bixby, Samsung's personal voice assistant
Main responsibility was E2E ASR model development/maintenance in production and speaker diarization research/development
Involved in multiple projects: semi-supervised ASR model training, speech separation to handle overlapped speech, grapheme-to-phoneme (G2P) conversion, and speech activity detection

Avoma, Inc.

Speech Scientist (April 2018 - August 2019)
Machine learning and signal processing for automatic speech recognition, speaker diarization/recognition, and topic classification.

Speech Disorders & Technology Lab., The University of Texas at Dallas (Now the lab is at The University of Texas at Austin)

- Postdoctoral Research Associate (May 2016 - April 2018)
- Advisor: Dr. Jun Wang
- Machine learning and signal processing for disordered speech recognition, silent speech recognition, and brain activity signal analysis.

Statistical Speech & Sound Computing Lab., Korea Advanced Institute of Science and Technology (KAIST)

- Research Assistant (Aug. 2008 - Feb. 2016)
- Advisor: Dr. Hoirin Kim
- My work was focused on signal analysis, feature extraction, and machine learning in the field of speech and speaker recognition, and audio indexing.

Speech and Language Information Research Division, Electronics and Telecommunications Research Institute (ETRI)

- Research Intern (Sep. 2009 - Feb. 2010)
- Advisor: Sung Joo Lee and Dr. Yun-Keun Lee
- I conducted research on target signal detection based on statistical models using cross-similarity between multi-channel microphones.

Human Media Communication & Processing Lab., Gwangju Institute of Science and Technology (GIST)

- Research Intern (Jan. 2008 - Mar. 2008)
- Advisor: Dr. Hong Kook Kim
- I implemented a voice transmission system in a Bluetooth environment, using G.711 codec and G. 711 packet loss concealment algorithm.

Research Project

Silent speech interface

- Funded by NIH (May 2016 - April 2018)
- My work was focused on silent speech recognition, which converts articulatory movements into text information, using articulatory movement data from electromagnetic articulography (EMA) sensors (not using acoustic information). I worked on articulatory representation learning and articulatory modeling based on deep learning methods for the performance improvement of silent speech recognition.

Mother's first song [YouTube]

- Funded by AIA Korea (Aug. 2015 - Sep. 2015)
- This project helped a mother with a speech impairment sing a happy birthday song to her daughter via a special voice generation device. In this project, my role was the choice of voice that is most similar to mother's voice among over 10,000 short voice samples using template-based matching algorithm.

A research on speech based emotion/stress states assessment and management techniques

- Funded by KAIST in Korea (Mar. 2014 - Dec. 2014)
- I conducted research on speech-based emotion classification and stressful state detection, focusing on feature extraction based on Teager energy operator and pitch perturbation modeling.

Development of smart video/audio surveillance SoC & core component for onsite decision security system

- Funded by Ministry of Trade, Industry and Energy in Korea (Nov. 2013 - Oct. 2014)
- I carried out research on abnormal sound detection/classification for surveillance applications using feature extraction methods based on two-dimensional cepstrum and image processing techniques.

Development of an embedded key-word spotting speech recognition system individually customized for disabled persons with dysarthria

- Funded by Ministry of Knowledge Economy in Korea (Jun. 2010 - May 2014)
- My work was focused on the intelligibility prediction of disordered speech and development of an individually customized speech recognition system using speaker adaptation methods. This was the research topic for my Ph.D. dissertation.

Development of a voice authentication entry system

- Funded by Samsung S1 in Korea (Dec. 2012 - Oct. 2013)
- I was responsible for design and implementation of TCP-IP based online continuous digit speech recognition system using KALDI speech recognition toolkit including feature extraction and decoding on PC/Embedded systems.

A research on special sound recognition

- Funded by Small & Medium Business Corporation in Korea (Apr. 2012 - Sep. 2012)
- I conducted research on robust infant crying detection in adverse noisy environments, focusing on feature extraction based on segmental two-dimensional linear frequency cepstral coefficients.

A research on speaker recognition in u-robot

- Funded by Ministry of Knowledge Economy in Korea (Jun. 2010 - Jan. 2011)
- I was responsible for design and implementation of online GMM-UBM based speaker recognition system, including endpoint detection, feature extraction, speaker identification, and speaker verification.

A research on audio feature analysis of malicious multimedia

- Funded by ETRI in Korea (Jun. 2009 - Jan. 2010)
- I conducted research on objectionable sound classification that filters pornographic content using only audio information in the video, focusing on feature extraction using time-frequency dynamics and feature transformation based on discriminant analysis.

Teaching Experience

Guest Lecturer, Department of Bioengineering, University of Texas at Dallas

- "Machine learning applications: Automatic speech recognition", Nov. 2016, in BMEN 3325 Advanced Matlab Programming for Biomedical Engineering (Fall 2016)
- "Machine learning applications: Automatic speech recognition", Nov. 2017, in BMEN 3325 Advanced Computational Tools for Biomedical Engineering (Fall 2017)

Teaching Assistant, Department of Electrical Engineering, KAIST

- Graduate Course: Speech and Audio Coding Theory (Spring 2012, Spring 2010), Speech Recognition System (Fall 2011), Digital Speech Processing (Fall 2012, Spring 2011)
- Undergraduate Course: Signals and Systems (Fall 2010)
- I assisted in the preparation and grading of homework and exams.

Professional Activities

Reviewer

- IEEE Transactions on Multimedia
- IEEE Transactions on Audio, Speech, and Language Processing
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
- IEEE Transactions on Neural Systems and Rehabilitation Engineering
- IEEE Access
- IEEE Signal Processing Letters
- Journal of Machine Learning Research
- Journal of Supercomputing
- JASA Express Letters

Invited Talks

- "An overview of automatic speech recognition: From HMM to E2E," Artificial Intelligence Convergence Department, Chonnam National University, Jul. 27, 2022.
- "Deep neural networks for the recognition of silent speech and dysarthric speech," Bioengineering Department, University of Texas at Dallas, TX, Nov. 17, 2017.
- "Deep learning and its applications to dysarthric speech recognition," Chosun University, South Korea, Mar. 21, 2017.
- "Introduction to deep learning," University of Texas at Dallas, TX, Jan. 21, 2017.
- "Sound processing techniques," Gachon University, South Korea, Apr. 14, 2016.
- "Introduction to deep learning," Jungwon University, South Korea, Apr. 5, 2016.
- "Automatic intelligibility assessment and recognition of dysarthric speech using phonetic variations," ETRI, South Korea, Dec. 15, 2015.

Honors and Awards

- Best Paper Award, International Conference on Mechatronics and Intelligent Robotics, May 2018
- Best Poster Award, the 7th International Conference on Speech Motor Control, Jul. 2017
- The Silver Prize, The 22nd Samsung Electronics HumanTech Paper Award, Jan. 2016, $7,000
- Government Scholarship, KAIST, Aug. 2010 - Feb. 2016
- International Speech Communication Association (ISCA) Travel Grant, Sep. 2012, 650 euro
- The Grand Prize, Dept. of Electronics Engineering Conference, Tech University of Korea, Dec. 2007
- Participation Prize, International Robot Contest, Seoul, South Korea, Oct. 2006
- Scholarship, Dept. of Electronics Engineering, Tech University of Korea, 2004-2008

Google Sites

Report abuse