Reliable Text-to-SQL Modeling on

Electronic Health Records

NAACL 2024 - Clinical NLP Shared Task

Motivation

Electronic Health Records (EHRs) are relational databases that store the entire medical histories of patients within hospitals. They record numerous aspects of patients' medical care, from admission and diagnosis to treatment and discharge. While EHRs are vital sources of clinical data, exploring them beyond a predefined set of queries or requests requires skills in query languages like SQL. To simplify access to EHR data, one straightforward strategy is to build a question-answering system, specifically leveraging text-to-SQL models that can automatically convert natural language questions into corresponding SQL queries and use the queries to retrieve answers.

The goal of this shared task is to build a reliable text-to-SQL model for an EHR database, specifically MIMIC-IV [1] Demo. This model should be able to selectively answer questions (through accurate SQL generation) when certain and abstain from providing answers for the rest, regardless of whether the input questions are intrinsically answerable or unanswerable. The scope of the input questions includes diverse topics relevant to clinical settings (e.g., patient demographics, vital signs, and disease survival rates) [2], as well as questions that are unanswerable given the database schema (e.g., asking about today's weather) and SQL functionalities (e.g., drawing a plot). Successfully solving this task will allow healthcare experts, including physicians, nurses, and researchers, to freely explore EHRs using natural language, significantly reducing their burden for information retrieval and synthesis across multiple tables in EHRs.


[1] Johnson, Alistair, Bulgarelli, Lucas, Pollard, Tom, Horng, Steven, Celi, Leo Anthony, and Roger Mark. "MIMIC-IV" (version 2.2). PhysioNet (2023). https://physionet.org/content/mimiciv/2.2.

[2] Lee, Gyubok, Hyeonji Hwang, Seongsu Bae, Yeonsu Kwon, Woncheol Shin, Seongjun Yang, Minjoon Seo, Jong-Yeup Kim, and Edward Choi. "EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records." Advances in Neural Information Processing Systems 35 (2022): 15589-15601. https://github.com/glee4810/EHRSQL.

Registration, Dataset, and Evaluation


Schedule

All deadlines are 11:59PM UTC-12:00 (Anywhere on Earth), unless stated otherwise

Contact

Organizer

Organizers are from EdLab @ KAIST.