Projects

Current Projects

Early Detection of HEalth Risks by textual analysis of MEDical documents

Referencia: PID2022-136522OB-C21Duración: 2023-2026 (3 años)  Entidad financiadora: Ministerio de Ciencia e InnovaciónConvocatoria: Proyectos I+D+i en el marco del programa estatal de Generación del Conocimiento y fortalecimiento científico y tecnología del sistema I+D+i - 2022 Entidades Participantes: Universidad Nacional de Educación a Distancia (UNED), Universidad del País Vasco (UPV)Entidad Coordinadora: Universidad Nacional de Educación a Distancia (UNED)Investigadores principales por la UNED: Juan Martínez Romo y Lourdes Araujo Serna
The EDHER-MED project is multidisciplinary and merges the fields of health and informatics, including the component of early detection. Based on the data available in various medical documents our hypothesis is that several advanced tools can be developed to help identify indications of the presence of the problems under consideration that can alert physicians. We will also advance in the improvement of information representation models for the biomedical domain. We also propose the enrichment of medical ontologies, in which Spanish is less represented, as well as the automatic generation of clinical argumentation to support or oppose a given hypothesis, the definition and refinement of argument units to facilitate explainability, and the automatic generation of timelines to predict future diagnoses or patient admissions. Specifically, we intend to address the study of the following health problems: - The early detection of mental health problems in children and adolescents, with special attention to suicide. - Early detection of HIV will also be addressed by exploring different techniques to extract key indicators of HIV status. - Research will also be carried out to improve the characterization of rare diseases and their effects on the mental health and well-being of children. - Finally, the last use case contemplated in this project corresponds to the automatic discovery of potential risk factors associated with cardiovascular complications.

Digital OBSERvatory of MENtal Health in social networks for Healthcare Institutions based on Language Technologies

Referencia: TED2021-130398B-C21Duración: 2022-2025 (3 años)Entidad financiadora: Ministerio de Ciencia e Innovación.Convocatoria: Proyectos Estratégicos Orientados a la Transición Ecológica y a la Transición Digital - 2021Entidades Participantes: Universidad Nacional de Educación a Distancia (UNED), Universidad del País Vasco (UPV)Entidad Coordinadora: Universidad Nacional de Educación a Distancia (UNED)Investigadores principales por la UNED: Juan Martínez Romo y Lourdes Araujo Serna
Social networks such as Twitter, Instagram, SnapChat or Facebook host thematic communities related to psychological disorders, where thousands of people enter to share their emotional state, to provide or seek help, or simply to chat. Parallel to this world of social networks is the group of psychiatry and psychology professionals whose work would be essential to help most of the people who participate in these communities. The proposal of this project is precisely to link these two worlds by building a mental health observatory that will allow health professionals to make a faster and more informed digital transition. The project focuses on issues such as depression, anxiety, the potential risk of suicide (subproject GELP) and situations of loneliness and isolation (subproject EDHIA), problems that are differentiated with respect to gender, a characteristic that we also want to analyze.

Métodos de la Lingüística Computacional para la legibilidad y simplificación automática en humanidades digitales 

PID2020-116001RB-C32Duración: 2021-2024 (3 años) Entidad financiadora: Ministerio de Ciencia e Innovación. CONVOCATORIA 2020 DE «PROYECTOS I+D+i» EN EL MARCO DEL PROGRAMA ESTATAL DE GENERACIÓN DE CONOCIMIENTO Y FORTALECIMIENTO CIENTÍFICO Y TECNOLÓGICO DEL SISTEMA DE I+D+i Y DEL PROGRAMA ESTATAL DE I+D+i ORIENTADA A LOS RETOS DE LA SOCIEDAD Entidades participantes en el proyecto coordnado CLARA-NLP: UAM (PID2020-116001RB-C31, Sub-área INF), Universidad Autónoma de Madrid, Facultad de Filosofía y Letras; UNED, ETSI Informática; CSIC (PID2020-116001RA-C33, Sub-área INF), Instituto de Lengua, Literatura y Antropología (ILLA). Investigador principal por la UNED: Ana García Serrano. IP del proyecto coordenado: Antonio Moreno Sandoval (UAM) 
x

Inclusive Memory (Erasmus KA) 

Inclusive Museums for well-being and health through the creation of a new shared memory 


Duration: 1/11/2021 – 1/11/2024 (36 meses)  Project n. 2021-1-IT02-KA220-HED-000031991. Partners:  UNIVERSITA DEGLI STUDI DI MODENA E REGGIO EMILIA (Italy, MODENA),  UNED (Spain, Madrid), Zètema Progetto Cultura srl (Italy, Roma), UNIVERSIDADE ABERTA (Portugal, Lisboa), HASKOLI ISLANDS (Iceland, REYKJAVIK), INTER ALIA (Greece, ATHINA), INSTITUT CATALA DE LA SALUT (Spain, BARCELONA). Funds: euros. Investigadora Principal del proyecto por la UNED: Covadonga Rodrigo 

x

Spektrum (Erasmus+ KA205)

2019-3-PL01-KA205-077866.Action 2 - strategic partnerships. https://mnk.pl/article/spectrum-project Duration: 16/03/2020 – 15/03/2022 (24meses)Participantes: Muzeum Narodowe w Krakowie, Poland, PL (E10206701); Universita degli studi Roma Tre IT (E10208847); UNED ES (E10208821); Faro. Vlaams Steunpunt voor cultureel erfgoed vzw BE (E10212964). Outside in Pathways, UK (940634900) Investigadora Principal del proyecto por la UNED: Covadonga Rodrigo
x

Duration: 1/10/2020 - 30/9/2021Financing Institution: Fondo Supera COVID-19 (CRUE - CSIC - Banco Santander)
Muchos investigadores biosanitarios de todo el mundo están dirigiendo sus esfuerzos hacia el estudio de la COVID-19. Este esfuerzo genera un gran volumen de publicaciones científicas y a una velocidad que dificulta la adquisición efectiva de nuevo conocimiento. Se necesitan Sistemas de Información que asistan a los expertos biosanitarios en el acceso, consulta y análisis de estas publicaciones. Este es, precisamente, el objetivo general del proyecto VIGICOVID.

MISMIS

Duration: 1/01/2019 - 31/12/2021Financing Institution: Ministerio de Ciencia, Innovación y Universidades, convocatoria 2018 de Proyectos de I+D de generación de conocimiento del programa estatal de generación de conocimiento y fortalecimiento científico y tecnológico del sistema de I+D+i. Referencia PGC2018-096212-B-C32. 
Desinformación y agresividad en Social Media: bias, controversia y veracidad.

Duration: 2019-2021Financing Institution: IMIENS
En este proyecto se diseñarán algoritmos para la identificación de relaciones relevantes entre distintas enfermedades, o entre enfermedades y otros conceptos médicos, que sirvan de ayuda a la realización de diagnósticos. Estas relaciones se pueden codificar como Reglas de Asociación (RA), que representan el conocimiento médico subyacente en el conjunto de HCE almacenadas en el repositorio de información clínica. Sin embargo, la extracción y selección de RA de alta probabilidad no es un proceso sencillo. Este proyecto es continuación del proyecto EXTRAE (IMIENS2017) en el que se han realizado importantes avances en la selección de reglas de asociación relevantes. En particular se ha desarrollado un nuevo algoritmo de aprendizaje semisupervisado que es capaz de proporcionar resultados de alta precisión con una cantidad muy reducida de datos de entrenamiento. Este algoritmo ha dado lugar a un sistema que se ha evaluado sobre un pequeño conjunto de datos médicos. Como continuación de este trabajo, en esta nueva propuesta perfeccionaremos el algoritmo desarrollado y lo generalizaremos para trabajar con un conjunto de datos (cohorte) extenso y mucho más específico, basado en un dominio concreto del conocimiento médico y extraído de repositorios especializados en uso secundario (investigación) con datos reales y codificados procedentes de hospitales colaboradores. 

Learning to Interact with Humans by Lifelong Interaction with Humans

Duration: 2018-2020Financing Institution:  EU (CHIST-ERA 2016) 

The LIHLITH project is a ​ fundamental pilot research project which introduces a new lifelong learning framework for the interaction of humans and machines on specific domains.

A Lifelong Learning system learns different tasks sequentially, over time, getting better at solving future related tasks based on past experience. LIHLITH will focus on human-computer dialogue​, where each dialogue experience is used by the system to learn to better interact, based on the success (or failure) of previous interactions. The key insight is that the dialogue will be designed to produce a reward, allowing the chatbot system to know whether the interaction was successful or not. The reward will be used to train the domain and dialogue management modules of the chatbot, improving the performance, and reducing the development cost, both on a single target domain but specially when moving to new domains.

Past Projects

Duration: 2018-2019Financing Institution: IMIENS
En este proyecto nos proponemos diseñar algoritmos que ayuden a la identificación de relaciones relevantes entre distintas enfermedades. Esta información es muy útil para realizar nuevos diagnósticos, probar nuevos tratamientos o fármacos, o para prever la posible evolución de la enfermedad, etc. . Muchas enfermedades comparten uno, o varios aspectos, como síntomas, evolución, tratamiento, etc., pero esto no siempre significa que exista una relación entre ellas. Por ello, lo que proponemos es un sistema capaz de detectar relaciones entre enfermedades que se pueden considerar significativas. La significatividad vendrá dada por la coincidencia de aspectos más allá de la casualidad que se capturará definiendo un modelo estadístico apropiado. Las relaciones entre distintas enfermedades se pueden establecer en base a distintos patrones, separada o conjuntamente: aparición conjunta, sí­ntomas comunes, similitudes de tratamientos, etc. Estas relaciones entre enfermedades se pueden codificar como Reglas de Asociación (RA), que se pueden considerar formas de representar el conocimiento médico subyacente en el conjunto de HCE almacenadas en el repositorio de información clínica. Este proyecto se enmarca en la Convocatoria IMIENS de Ayudas para la realización de Proyectos de Investigación Conjuntos entre grupos de investigación de la UNED y el Instituto de Salud Carlos III.

PLN.NET

Duration: 2016-2018Financing Institution:  Ministerio de Economía, Industria y Competitividad
Red temática de excelencia financiada por el Ministerio de Economía, Industria y Competitividad (referencia TIN2016-81739-REDT) para la creación de foros de comunicación vivos entre los investigadores del Procesamiento del Lenguaje Natural, donde llegar a puntos de encuentro en el proceso de estandarización de sus servicios.El grupo NLP&IR es uno de los integrantes, para más información:https://gplsi.dlsi.ua.es/pln/node/32

VEMODALEN

Duration: 2016-2019Financing Institution: Ministerio de Economía y CompetitividadConvocatoria:  2015, Modalidad 1: Proyectos DE I+D+I, del Programa Estatal de Investigación, Desarrollo e Innovación Orientada a los Retos de la Sociedad.
For an average citizen of our digital era, the problem is no longer finding relevant information, but assimilating the massive amount of relevant available information at any moment in time. This is not possible without the help of a new generation of machines able to digest all relevant sources into a readable, personalized synthesis of the stream of relevant information. And such machines need to acquire two crucial, interdependent skills: (i) the ability to automatically discern when different texts convey approximately the same message; and (ii) the ability to discern the credibility of messages.Our goal is to address the challenge of computing both textual similarity and source authority in online media, focusing on three different and challenging tasks in three relevant application scenarios: Identification and synthesis of controversy in the medical domain, Generation of reputation profiles for companies/brandS and Recommendation of instructional materials in e-learning environments.

MANTRA-MED

Modelado y AutoMatización de exTracción de Relaciones y cAtegorización de informes MEDicos para la recomendación de códigos CIE-10 (TIN2016-77820-C3-2-R)

Duration: 2017-2018 Financing institution: Ministerio de Economía y Competitividad
The automatic processing of Electronic Medical Records (EMR) poses challenges for the field of Natural Language Processing (NLP) which to a great extent are related to the adaptation of existing techniques to the domain of medicine. On the other hand, tasks such as assigning diagnostic codes and procedures to the EMRs, carried out manually by experts, raise the question of the need to explore and suggest Text Mining and Information Recovery techniques which allow for automatic inference of the relevant codes for EMR descriptions.
The Alcorcon Foundation University Hospital (HUFA in Spanish), with which the sub-project will collaborate, is a public university hospital which is part of the Madrid Health Service (SERMAS in Spanish). Like all Madrid Health Service hospitals it moved from the old CIE-9 discharge report coding scheme to the newer CIE-10 scheme on 1 January 2016. This change has resulted in a 75% decrease in coding team performance. Said teams are made up of personnel trained for the task. There are commercial applications available which aid in assigning CIE-10 codes by using existing mapping between CIE-9 and CIE-10. Nevertheless, the greater detail and comprehensiveness of CIE-10, combined with the fact that there are combination codes present in CIE-9 with no corresponding code in CIE-10, makes this mapping impossible in a large number of cases. All hospitals would benefit from having a tool which is able to automatically assign codes to diagnostics and procedures directly from the free text found in medical reports. This health-sector related problem will be the main focus and use case of this subproject.
We propose to a study, adapt and develop NLP and unsupervised learning techniques - which this group has a great deal of experience with in order to develop a tool which recommends and assigns CIE-10 codes to discharge reports. An unsupervised approach is imperative with the current limited availability of manually written records to train supervised systems with. As records written in Spanish will be readily available, we will focus on this language, although the methods can be applied to other languages and it is expected that the methods will be validated by the work done with other languages on the coordinated project.
The development of this tool encompasses investigative challenges of several diverse fields: anonymization of reports, lexical normalization within the domain, disambiguation of domain acronyms, representation of the documents, identification of concepts/expressions, extraction of relationships, structured information recovery and unsupervised learning. The use of unsupervised learning techniques will be studied in order to categorize discharge reports with CIE-10 codes, assessing data modeling by means of distributed representations with deep learning algorithms and Information Retrieval techniques. Likewise, statistical models will be applied in order to identify the underlying relationships among reports written with CIE-10 codes. This knowledge base of relationships will make it possible to recommend codes for new reports. The ideal method for combining the different code recommendation algorithms will beanalyzed by studying techniques based on automatic and heuristic learning. 

Museología e integración social: la difusión del patrimonio artístico y cultural del Museo del Prado a colectivos con especial accesibilidad (invidentes, sordos y reclusos)

Duration: 2016-2018 Financing institution: Convocatoria 2015 de Programas de Actividades de I+D entre Grupos de investigación de la Comunidad de Madrid, organizada por la Dirección General de Universidades e Investigación de la Consejería de Educación, Juventud y Deporte, en la Comunidad de Madrid. (S2015/HUM3494)
The work is structured around three focal points of attention: the first will detect the specific needs and interests of different groups; the second will deal with the design and the creation of applications, systems and virtual exhibitions adapted for these three groups, from some virtual thematic tours or visits of the Museo del Prado; finally, the third focus will seek to invigorate an international network that relates the social projection of museology and its application to the accessibility of the culture to specific groups, all of it through the development of the new technological commodities.The concern about the patrimonial dimension of the Community of Madrid, especially the art collection of the Museo del Prado, leads us to consider the museum as "cultural artifact" that goes beyond its investigative and conservative function, to seek to bring the museum to the viewer, whatever its diversity and condition, making it a sharer of the contact with the artistic reality and inviting him not only to a direct contemplation of a work of art, but to an interaction with the institution and its collections, with the purpose of exceeding the barrier of the sacredness of the works of art and saving the elitist character that the nineteenth century perception of the traditional collections can suppose. 


EXTracción de RElaciones entre Conceptos Médicos en fuentes de información heterogéneas

Duration: 2014-2017Financing institution: MINECO (TIN2013-46616-C2-2-R)
The overall objective of this project is to address the generation of techniques and tools to allow efficient and intelligent access to the contents of medical documents of multilingual nature such as i) general scientific documents, ii) medical records and iii) general information on the Internet. The project will demonstrate, through a series of use cases, the benefits of the application of language technology in the health sector, using advanced Natural Language Processing techniques such as information retrieval applied to large amounts of resources about medical information on the Internet. 

Voxpopuli


Duration: 2014-2016 Financing institution: Ministerio de Economía y Competitividad (TIN2013-4709-C3-1P)
Online Reputation Management has recently become a fundamental aspect of Public Relations for organizations, personalities and entities in general. The very reason why the online dimension of reputation is now essential the fact that it is the biggest, richest and most updated source of information, opinions and attitudes around any entity it is the reason why a manual analysis of information streams in media and social networks is not viable. Automatic processing of online information crucially depends of the advancements in many research fields (data structures and algorithms for real time Natural Language Processing, Opinion Mining, Textual Synthesis, Novelty Detection and Recommendation, multimedia search, social network analysis, etc.) that, up to now, have paid little attention to the online reputation scenario. For instance, opinion mining has been focused on product reviews, and its results are not applicable to the (much more complex) problem of evaluating how the content of information streams in sial networks may affect the reputation of a company. The project aims towards the creation of a new generation of online reputation monitoring systems, able to understand, process, aggregate and synthesize, in real time, facts, opinions and attitudes around an entity, of presenting such information in multiple dimensions, and of interacting with reputation experts so that they can accomplish their task better and faster. Our research will go from fundamental problems such as textual similarity or data structures for real time Natural Language Processing to prototype validation with reputation experts. Besides algorithms and prototypes, we will also create and distribute test collections to evaluate all relevant technologies in the reputation management scenario. 

Readers: Evaluation And DEvelopment of Reading Systems

Duration: 2013 - 2015Financing institution:  EU (CHIST-ERA 2011) + Mineco (PCIN-2013-002-C02-01)
The READERS project proposes new unsupervised computational models to automatically extract background knowledge after reading large amounts of unstructured text. This knowledge will be in the form of classes, categorized entities and predicates whose arguments are typified by probability distributions over classes. Classes themselves will be automatically organized into taxonomies related to the predicates in which they participate. 

LiMoSINe

Linguistically Motivated Semantic Aggregation Engines

Duration: 2011-2014Financing institution: European Comission, FP7-ICT

The LiMoSINe vision is to transition access to online information from a document-centric search paradigm focused on returning disconnected atomic pieces to a truly semantic aggregation paradigm. In this new paradigm, machines will understand a user's intent, discover and organize facts, identify opinions, experiences and trends, all from inherently multilingual online sources and open knowledge repositories. LiMoSINe's aggregation engines will automatically organize search results in semantically meaningful ways. 

ELIAS

Evaluating Information Access Systems

Duration:2011-2016Financing institution: European Science Foundation

ELIAS will define a new measurement paradigm for the evaluation of search engines based on so-called living laboratories. This paradigm involves (i) exploitation of novel market places and forums where large numbers of users are recruited into early stage evaluation experiments to test a particular aspect of an information access system; and (ii) using operational systems as experimental platforms on which to conduct user-based experiments at scale. 

The automatic encyclopedia of people and organizations.

Duration: 2010-2012Financing institution: MICINN (TIN2010-21128-C02)

The main goal of the project is to develop algorithms, techniques and systems able to mine and aggregate information relative to people and organizations from unstructured and structured web sources, such as social networks, blogs, news, semantic web data, and websites in general. 

Mejorando el Acceso, el Análisis y la Visibilidad de la Información y los Contenidos Multilingüe y Multimedia en Red para la Comunidad de Madrid

Duration: 2010-2013Financing institution: Regional Government of Madrid (S2009/TIC-1542)


Improving access, analysis and visibility of multilingual and multimedia Web contents.

Buscamedia

Duration: 2009-2012Financing institution: CDTI (CEN-20091026)

Development of a true Multimedia Semantic Search Engine. 

Financing institution: Sub-contracts by Grupo ALMASummary: Online Reputation Managing

Quantitative Evaluation of Academic Websites Visibility

Duration: 2008-2010Financing institution: CICYT (TIN 2007-67581-C02-01)
Automated Classification of academic websites by topic and language, in order to create ranks with them. The main goal of the project is to improve the accessibility and visibility of academic information on the World Wide Web.

Evaluation Best Practice and Collaboration for Multilingual Information Access


Financing institution: European Commission
TrebleCLEF supports the development and consolidation of expertise in the multidisciplinary research area of multilingual information access (MLIA) and disseminates this knowhow to the application communities through a set of complementary activities.

Text-Mess (subproyecto INES) 

Duration: 2007-2009Financing institution: CICYT (TIN2006-15265-C06-02)

Multilingual/Multimedia Access To Cultural Heritage

Duration: 2006-2009Financing institution: European Commission, 6FP (STREP 033104)
MultiMatch plans to develop a multilingual search engine specifically designed for access, organisation and personalised presentation of cultural heritage information.

Mejorando el acceso y visibilidad de la información multilingüe en red para la Comunidad de Madrid

Duration: 2006-2009Financing institution: Comunidad de Madrid, IV PRICIT, (S-0505/TID/0267)

MAVIR es una red de investigación formada por un equipo multidisciplinar de científicos, técnicos, lingüistas y documentalistas para desarrollar un esfuerzo integrador en las líneas de investigación, formación y transferencia de tecnología.

Quality Labelling of Medical Web Content using Multilingual Information Extraction.

Duration: 2006-2008Financing institution: European Commission (EC Programme: Public Health 61383)

Quality Labelling of Medical Web Content using Multilingual Information Extraction

SWIISA

Speech Web and Images Interactive Search Assitants

Duration: 2006-2007

Financing institution: UNED


Estudio de aplicación de asistentes interactivos a tres línas: búsqueda translingüe sobre imágenes, sobre la Web y sobre transcripciones automáticas de reconocedores de habla.

R2D2 (subproyecto Syembra)

Recuperación de Respuestas en Documentos Digitalizados

Duration: 2003-2006

Financing institution: CICYT (TIC2003-07158-C04)



Evaluation of cross-lingual answer retrieval systems.

RIBIDI

Recuperación de Información en Bibliotecas Digitales

Duration: 2001-2004Financing institution: CYTED VII.19

Cooperación iberoamericana en investigación y desarrollo de tecnologías para recuperación de información y bibliotecas digitales.

Cross-Language Evaluation Forum

Duration: 2001-2003Financing institution: European Commission, 5FP (IST-2000-31002)

Evaluation of Cross-Language Information Retrieval Systems for European Languages

ETB

European Schools Treasury Browser

Duration: 2000-2002Financing institution: European Commission, 5FP (IST Programme)

Access to meta-information about educational resources and new technologies in Europe.

DELOS: a Network of Excellence on Digital Libraries

Duration: 2000-2002Financing institution: European Commission, IST Programme
The main objective of DELOS is to coordinate a joint programme of activities of the major European teams working in digital library related areas.

News Agencies Multilingual Information Categorization

Duration: 1999-2002Financing institution: European Commission, 5FP (IST-1999-12392)

NAMIC main objective is to develop and bring to marketable stage advanced NLP technologies for multilingual news customization and broadcasting throughout distributed services.

EuroWordnet

Duration: 1996-1999Financing institution: European Commission, 4FP (Telematics, LE 4003)

The project aimed at building a multilingual lexical database with semantic relations between words in 8 european languages (Spanish, English, Italian, Dutch, French, German, Estonian and Czech). Every monolingual wordnet is linked to the others via an InterLingual Index derived from Wordnet 1.5.

Financing institution: ACO*HUM (Socrates), ELSENET, European Commission

A project under the auspices of ELSNET and ACO*HUM excellence networks to develop 6 specialization courses around Natural Language Processing and Speech Recognition and synthesis. Our task was to develop an open distance learning course on Natural Language Processing and Information Retrieval.

Duration: 2001-2003Financing institution: CICYT (TIC2000-0335-C03-01)

Multilingual named-entity recognition, hyperlinking, phrase extraction, summarization and semantic indexing for information access on a digital news archive.

RILE

Servidor de Recursos para el Desarrollo de la Ingeniería Lingüística en Español

Duration: 1999-2000Financing institution: M.I.N.E.R.

The goal of RILE is to develop a pilot for a server with resources, tools and information related to the development of applications in the field of Language Engineering for Spanish.

ITEMRecuperación de Información Textual en un Entorno Multilíngüe

Duration: 1996-1999Financing institution: CICyT (TIC96-1243-C03-01)

Development and integration of Language Engineering resources and tools for Spanish, Catalan, Basque and English and demonstration of such tools in a multilingual search engine with NLP capabilities.

Duration: 1993-1995Financing institution: European Commission (Esprit BRA 7315)

The goal was to explore the utility of constructing a multilingual lexical knowledge base from machine-readable versions of conventional dictionaries by exploring the utility of machine readable textual corpora as a source of lexical information not coded in conventional dictionaries, and by adding dictionary publishing partners to exploit the lexical database and corpus extraction software developed by the projects for conventional lexicography.