Penglei Sun, Yixiang Chen, Xiang Li, and Xiaowen Chu
Paper Code Datasets Suppmentary Material BibTeX
In recent years, accurately and quickly deploying medical large language models (LLMs) has become a trend. Among these, retrieval-augmented generation (RAG) has garnered attention due to rapid deployment and privacy protection. However, the challenge hinder the practical deployment of RAG for medical diagnosis: the semantic gap between colloquial patient descriptions and the professional terminology within medical knowledge bases. We try to address the challenge from the data perspective and the method perspective. First, to address the semantic gap in existing knowledge bases, we construct DiagnosGraph, a generalist knowledge graph covering both modern medicine and Traditional Chinese Medicine. It contains 876 common diseases with the graph of 7, 997 nodes and 37, 201 triples. To bridge the gap between colloquial patient narratives and academic medical knowledge, DiagnosGraph also introduces 1, 908 medical record by formalizing the patient chief complaint and proposing a medical diagnosis. Second, we introduce the Multi-Round Diagnostic RAG (MRD-RAG) framework. It utilizes a multi-round dialogue to refine diagnostic possibilities, emulating the clinical reasoning of a physician. Experiments conducted on four medical benchmarks, with evaluations by human physicians, demonstrate that MRD-RAG enhances the diagnostic performance of LLMs, highlighting its potential to make automated diagnosis more accurate and human-aligned.