AI & NLP in Biomedicine

The incremental adoption of electronic health records (EHR) as a key component for health systems raises a number of questions that remain partially unsolved. EHRs store information of heterogeneous nature in a wide variety of formats, including free-text documents, such as clinical notes or radiological reports, which contain information related to clinical diagnoses, treatments or procedures. However, the unstructured nature of these textual fields makes the task of automatically extracting relevant concepts from them especially difficult. In this sense, the transformation of clinical text ---written in natural language--- into structured data enables its use in tasks such as treatment planning, disease research or decision-making in clinical practice as well as in the management of health systems. In recent years, natural language processing (NLP) and artificial intelligence (AI) techniques have been applied to problems such as clinical coding, automatic classification of clinical documents or named clinical-entities recognition, among others. However, most of the existing studies in the specific literature have only been carried out onto English texts, due to the scarce availability of annotated corpora with clinical-entity information or additional linguistic resources in other languages such as Spanish. In this project, we propose to advance in the creation and de-identification of a specific clinical corpus, which is expected to be of reference as an oncological text corpus in Spanish. By using this corpus, in my research laboratory we design and adapt new AI algorithms for natural Spanish-language processing to be applied to information-processing downstream tasks that are carried out on unstructured textual data stored in oncological EHRs contained in Galén, a healthcare information management system. The resulting models are being analyzed and validated by applying them to the resolution of different clinically-significant tasks through the analysis of Real-World-Data (RWD) in oncology units. The AI-for-NLP models are also expected to be transferred and applied to text corpora of other medical disciplines or healthcare settings, and validated in tackling information extraction and prediction tasks in those specific areas.