Ph.D. Student
Kim Jaechul Graduate School of AI @ KAIST
gyubok.lee [AT] kaist.ac.kr
Hello, my name is Gyubok (pronounced as KYOO-bok), and I am a Ph.D. student at KAIST, advised by Edward Choi. My primary research areas are natural language processing (database question answering, text-to-SQL, LLM agents, dialog systems) and machine learning for healthcare (question answering, clinical decision support systems). Specifically, I am interested in developing AI systems that allow people to freely and reliably explore and interact with large databases, such as electronic health records (EHR), using natural language.
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records [PDF]
Yeonsu Kwon*, Jiho Kim*, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul Cha, Tom Pollard, Alistair Johnson, Edward Choi
NeurIPS 2024 Datasets and Benchmarks (Spotlight)
EHR-SeqSQL: A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records [PDF]
Jaehee Ryu*, Seonhee Cho*, Gyubok Lee, Edward Choi
ACL 2024 Findings
Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes [PDF]
Sunjun Kweon, Junu Kim, Jiyoun Kim, Sujeong Im, Eunbyeol Cho, Seongsu Bae, Jungwoo Oh, Gyubok Lee, Jong Hak Moon, Seng Chan You, Seungjin Baek, Chang Hoon Han, Yoon Bin Jung, Yohan Jo, Edward Choi
ACL 2024 Findings
Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records [PDF]
Gyubok Lee, Sunjun Kweon, Seongsu Bae, Edward Choi
NAACL 2024 Clinical NLP Workshop - EHRSQL 2024 Shared Task (Oral)
Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL [PDF]
Yongjin Yang*, Sihyeon Kim*, SangMook Kim*, Gyubok Lee, Se-Young Yun, Edward Choi
ICLR 2024 Data Problems for Foundation Models (DPFM) Workshop
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images [PDF]
Seongsu Bae*, Daeun Kyung*, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei Ji, Eric I-Chao Chang, Tackeun Kim, Edward Choi
NeurIPS 2023 Datasets and Benchmarks
ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram [PDF]
Jungwoo Oh, Gyubok Lee, Seongsu Bae, Joon-myoung Kwon, Edward Choi
NeurIPS 2023 Datasets and Benchmarks
Exploration into Translation-Equivariant Image Quantization [PDF]
Woncheol Shin, Gyubok Lee, Jiyoung Lee, Eunyi Lyou, Joonseok Lee, Edward Choi
ICASSP 2023 (Oral, Top 3% recognition)
EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records [PDF]
Gyubok Lee, Hyeonji Hwang, Seongsu Bae, Yeonsu Kwon, Woncheol Shin, Seongjun Yang, Minjoon Seo, Jong-Yeup Kim, Edward Choi
NeurIPS 2022 Datasets and Benchmarks
Machine Learning Improves the Prediction Rate of Non-Curative Resection of Endoscopic Submucosal Dissection in Patients with Early Gastric Cancer [PDF]
Hae-Ryong Yun, Da Hyun Jung, Cheal Wung Huh, Gyubok Lee, Nak-Hoon Son, Jie-Hyun Kim, Young Hoon Youn, Jun Chul Park, Sung Kwan Shin, Sang Kil Lee, Yong Chan Lee
Cancers 2022 (IF: 6.575)
Erythropoiesis Stimulating Agent Recommendation Model Using Recurrent Neural Networks for Patient with Kidney Failure with Replacement Therapy [PDF]
Hae-Ryong Yun*, Gyubok Lee*, Myeong Jun Jeon, Hyung Woo Kim, Young Su Joo, Hyoungnae Kim, Tae Ik Chang, Jung Tak Park, Seung Hyeok Han, Shin-Wook Kang, Wooju Kim, Tae-Hyun Yoo
Computers in Biology and Medicine 2021 (IF: 6.698)
Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction [PDF]
Gyubok Lee*, Seongjun Yang*, Edward Choi
ACL 2021 (Oral)
Diverse and Admissible Trajectory Forecasting Through Multimodal Context Understanding [PDF]
Seong Hyeon Park, Gyubok Lee, Manoj Bhat*, Jimin Seo*, Minseok Kang, Jonathan Francis, Ashwin Jadhav, Paul Pu Liang, Louis-Philippe Morency
ECCV 2020
Unsupervised learning approach for network intrusion detection system using autoencoders [PDF]
Hyunseung Choi, Mintae Kim, Gyubok Lee, Wooju Kim
Journal of Supercomputing 2019 (IF: 2.469)
NAVER Cloud, Seongnam, South Korea (2024.10 - 2025.04)
Research Intern at Healthcare AI Team
In collaboration with Samsung Medical Center, I constructed benchmark datasets to build and evaluate conversational LLM agents in question answering over electronic health records.
Amazon, Sunnyvale, United States (2022.07 - 2023.01)
Applied Scientist Intern at Alexa AI - Natural Understanding Team
In collaboration with the Alexa Teacher Model team, specializing in language model pre-training, I investigated pre-training methods for task-oriented dialog systems to enhance Alexa's zero-shot and few-shot generalization capabilities for new tasks
Korea Advanced Institute of Science and Technology (KAIST) (2020.09 - Present)
Ph.D. in Artificial Intelligence (Advisor: Edward Choi)
Area: Natural Language Processing and Machine Learning for Healthcare
Yonsei University (2018.03 - 2020.08)
M.S. in Industrial Engineering - Data Science Track (Advisor: Wooju Kim)
Thesis: Improving Domain-Specific Neural Machine Translation by Leveraging In-Domain Monolingual Data
Carnegie Mellon University (2019.08 - 2020.02)
Visiting student at Language Technologies Institute (Advisor: John Kang and Jaime Carbonell)
University of Wisconsin–Madison (2010.09 - 2016.12)
B.B.A. in Actuarial Science