Gyubok Lee

Gyubok Lee (이규복)

Ph.D. Student

Kim Jaechul Graduate School of AI @ KAIST

gyubok.lee [AT] kaist.ac.kr

Hello, my name is Gyubok (pronounced as KYOO-bok), and I am a Ph.D. student at KAIST, advised by Edward Choi. My research lies at the intersection of natural language processing (tool-calling agents, database question answering) and machine learning for biomedicine (clinical decision support systems, biomedical co-scientist). Specifically, I am interested in (1) developing AI systems that enable reliable natural language interaction with large-scale databases, particularly electronic health records, and (2) more recently, building foundation models and agents that can reason across the entire spectrum of biomedicine, from molecular and omics data to clinical trials and patient records.

Publications (*: Equal contribution)

From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents [PDF]

Gyubok Lee, Woosog Chay, Heeyoung Kwak, Yeong Hwa Kim, Haanju Yoo, Oksoon Jeong, Meong Hi Son, Edward Choi

ICLR 2026

FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering [PDF]

Gyubok Lee*, Elea Bach*, Eric Yang, Tom Pollard, Alistair Johnson, Edward Choi, Yugang Jia, Jong Ha Lee

ML4H 2025 Proceedings

SCARE: A Benchmark for SQL Correction and Question Answerability Classification for Reliable EHR Question Answering [PDF]

Gyubok Lee*, Woosog Chay*, Edward Choi

ML4H 2025 Proceedings

EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records [PDF]

Yeonsu Kwon*, Jiho Kim*, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul Cha, Tom Pollard, Alistair Johnson, Edward Choi

NeurIPS 2024 Datasets and Benchmarks (Spotlight)

EHR-SeqSQL: A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records [PDF]

Jaehee Ryu*, Seonhee Cho*, Gyubok Lee, Edward Choi

ACL 2024 Findings

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes [PDF]

Sunjun Kweon, Junu Kim, Jiyoun Kim, Sujeong Im, Eunbyeol Cho, Seongsu Bae, Jungwoo Oh, Gyubok Lee, Jong Hak Moon, Seng Chan You, Seungjin Baek, Chang Hoon Han, Yoon Bin Jung, Yohan Jo, Edward Choi

ACL 2024 Findings

Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records [PDF]

Gyubok Lee, Sunjun Kweon, Seongsu Bae, Edward Choi

NAACL 2024 Clinical NLP Workshop - EHRSQL 2024 Shared Task (Oral)

Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL [PDF]

Yongjin Yang*, Sihyeon Kim*, SangMook Kim*, Gyubok Lee, Se-Young Yun, Edward Choi

ICLR 2024 Data Problems for Foundation Models (DPFM) Workshop

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images [PDF]

Seongsu Bae*, Daeun Kyung*, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei Ji, Eric I-Chao Chang, Tackeun Kim, Edward Choi

NeurIPS 2023 Datasets and Benchmarks

ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram [PDF]

Jungwoo Oh, Gyubok Lee, Seongsu Bae, Joon-myoung Kwon, Edward Choi

NeurIPS 2023 Datasets and Benchmarks

Exploration into Translation-Equivariant Image Quantization [PDF]

Woncheol Shin, Gyubok Lee, Jiyoung Lee, Eunyi Lyou, Joonseok Lee, Edward Choi

ICASSP 2023 (Oral, Top 3% recognition)

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records [PDF]

Gyubok Lee, Hyeonji Hwang, Seongsu Bae, Yeonsu Kwon, Woncheol Shin, Seongjun Yang, Minjoon Seo, Jong-Yeup Kim, Edward Choi

NeurIPS 2022 Datasets and Benchmarks

Machine Learning Improves the Prediction Rate of Non-Curative Resection of Endoscopic Submucosal Dissection in Patients with Early Gastric Cancer [PDF]

Hae-Ryong Yun, Da Hyun Jung, Cheal Wung Huh, Gyubok Lee, Nak-Hoon Son, Jie-Hyun Kim, Young Hoon Youn, Jun Chul Park, Sung Kwan Shin, Sang Kil Lee, Yong Chan Lee

Cancers 2022 (IF: 6.575)

Erythropoiesis Stimulating Agent Recommendation Model Using Recurrent Neural Networks for Patient with Kidney Failure with Replacement Therapy [PDF]

Hae-Ryong Yun*, Gyubok Lee*, Myeong Jun Jeon, Hyung Woo Kim, Young Su Joo, Hyoungnae Kim, Tae Ik Chang, Jung Tak Park, Seung Hyeok Han, Shin-Wook Kang, Wooju Kim, Tae-Hyun Yoo

Computers in Biology and Medicine 2021 (IF: 6.698)

Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction [PDF]

Gyubok Lee*, Seongjun Yang*, Edward Choi

ACL 2021 (Oral)

Diverse and Admissible Trajectory Forecasting Through Multimodal Context Understanding [PDF]

Seong Hyeon Park, Gyubok Lee, Manoj Bhat*, Jimin Seo*, Minseok Kang, Jonathan Francis, Ashwin Jadhav, Paul Pu Liang, Louis-Philippe Morency

ECCV 2020

Unsupervised learning approach for network intrusion detection system using autoencoders [PDF]

Hyunseung Choi, Mintae Kim, Gyubok Lee, Wooju Kim

Journal of Supercomputing 2019 (IF: 2.469)

Work Experience

Trillion Labs, Seoul, South Korea (2025.11 - Present)

Research Intern, Technical Staff

Building multiscale biomedical foundation models across molecular, omics, pharmaceutical, and clinical domains.

NAVER Cloud, Seongnam, South Korea (2024.10 - 2025.04)

Research Intern, Healthcare AI Team

Constructing benchmarks to build and evaluate conversational LLM agents in question answering over publicly accessible electronic health record databases (MIMIC-IV Demo and eICU Demo).

Amazon, Sunnyvale, United States (2022.07 - 2023.01)

Applied Scientist Intern, Alexa AI - Natural Understanding Team

Exploring various training methods to improve task-oriented dialog systems for zero-shot generalization to new domains and tasks (mainly instruction tuning and utterance-dialog act contrastive learning).

Education

Korea Advanced Institute of Science and Technology (KAIST) (2020.09 - Present)

Ph.D. in Artificial Intelligence (Advisor: Edward Choi)

Area: Natural Language Processing and Machine Learning for Healthcare

Yonsei University (2018.03 - 2020.08)

M.S. in Industrial Engineering - Data Science Track (Advisor: Wooju Kim)

Thesis: Improving Domain-Specific Neural Machine Translation by Leveraging In-Domain Monolingual Data

Carnegie Mellon University (2019.08 - 2020.02)

Visiting student at Language Technologies Institute (Advisor: John Kang and Jaime Carbonell)

University of Wisconsin–Madison (2010.09 - 2016.12)

B.B.A. in Actuarial Science

Page updated

Google Sites

Report abuse