▶ Check more resources at github
▶ MEDSPANER: a medical semantic python-assisted named entity recognizer
A tool for semantic annotation of Spanish medical texts.
It was originally developed for clinical trial texts, but it can be applied to other medical text genres.
Check the companion GitHub repository.
See a demonstration video.
▶ Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) corpus
A collection of 1200 texts about clinical trials studies and clinical trials announcements.
The corpus is annotated with entities of the Unified Medical Language System.
Distributed for research and educational purposes under a Creative Commons Non-Commercial Attribution (CC-BY-NC-A) License.
A system to help patients or non-specialist users to understand medical texts.
Currently under development, it still needs an evaluation by end users.
Comparable corpus of medical texts (technical and simplified version) in the Spanish language (24 298 pairs of texts), distributed under a Creative Commons Attribution license (CC-BY).
The corpus includes a subset of 3 800 parallel sentences (alignment of technical and simplified version), revised by pairs of experts.
▶ Medical Lexicon for Spanish (MedLexSp)
An unified Spanish lexicon of medical terms with linguistic and semantic information.
Distributed for research and educational purposes.
▶ Simple Medical Lexicon for Spanish (SimpMedLexSp)
Lexicon of medical terms and equivalent forms in patient register (simplified terms or paraphrases).
14465 forms (including conjugated verb forms, gender and number variants) to date; a subset (4664) is normalized to the Unified Medical Language System (UMLS).
▶ PatientGenesys dialogue system
I have collaborated in developing a conversational agent that simulates a consultation with a virtual patient. The system is integrated in the PatientGenesys platform, a serious game aimed at providing continuous education to health professionals.
See the demonstration video of the full system.
See the video of the English chatbot or the Spanish chatbot.
I was in charge of the Spanish part of a medical text corpus for the MultiMedica project (2010-2013).
I revised the Spanish corpus, prepared the lexicons, and helped in developing an automatic term extractor of biomedical terms in Spanish.
A learner corpus of Spanish as a foreign language collected to carry out error analysis.
It gathers 40 interviews with learners of Spanish at A2 and B1 level (Common European Framework of Reference) and from more than 9 mother tongues (Portuguese, French, Italian, English, German, Dutch, Polish, Japanese and Chinese).
▶ LYNEAL (Letras y números en análisis lingüísticos)
LYNEAL is developed by Hiroto Ueda (University of Tokyo) for advanced text search: among others, linguistic patterns and word search, frequency counts and Key Word in Context (KWIC) functionality.
▶ I have also some experience in processing hypertext-based language teaching materials for Spanish as a Foreign Language:
Activities based on the Spanish Learner Oral Corpus, aimed at Spanish teachers trainees.
Interview for Hoy empieza todo 2 (Radio 3) (23th October 2023)
▶️ Link to the Youtube reproduction list.
"Recursos para el procesamiento del lenguaje médico en español", in Jornada de Biología Computacional, Ciencia de datos e Inteligencia Artificial (CSIC, 3rd July 2023)
"Simplificación de textos médicos con procesamiento del lenguaje: el proyecto CLARA-MeD", talk at Seminario Mirian Andrés, La Rioja University (23 May 2023)
"Proyecto CLARA-MeD. Procesamiento del lenguaje médico para la simplificación automática de textos", Jornada de Grandes infraestructuras europeas de Ciencias Sociales y Humanidades en el CSIC: DARIAH y CLARÍN en el horizonte (11 May 2023)
"Advances in processing and simplification of clinical trials texts", talk at LISN (14th March 2023) and at CENTAL (16th March 2023)
"A clinical trials corpus annotated with UMLS". Talk at the IIC-UAM Chair of Computational Linguistics, 24th April 2021
"A bird's eye view of NLP resources for Spanish medical text mining". COVID-19 Hackathon, December 2020. List of Spanish BioNLP corpora.
Participation at XVII Seminario TIC-ETL: El egresado en Filología en las Industrias de la Lengua
"Adaptivity in Natural Language Interaction in a Virtual Patient Simulation System". LIMSI, CNRS, 21st November 2017.
"Introduction to fastText". LIMSI, CNRS, 2017.
"Introduction to vector representations of words and documents". LIMSI, CNRS, 22nd September 2016.
"Part-of-Speech Tagging a Spanish Learner Oral Corpus". Spanish Learner Corpus Workshop, Universidad de La Coruña, 14th July 2015.
"Description of the PatientGenesys dialogue system". Postdoctoral seminar, Groupe ILES (LIMSI, CNRS), 16th June 2015.
"Proyecto MultiMedica. Consulta de textos médicos y extractor de términos". Unidad de Terminología Médica, Real Academia Nacional de Medicina, May 2014.
"Proyecto MultiMedica". Universidad de Alcalá de Henares, Facultad de Medicina, May 2014
"Textos de español oral. Recurso para el aprendizaje de E/LE basado en corpus de habla espontánea". Instituto Cervantes de Beijin, April 2012
"Taller de corpus". Sophia University, Tokyo. 12th January 2010.
Last update: April 2025.