Title: "Accès à l'information biomédicale : vers une approche d'indexation et de recherche d'information conceptuelle basée sur la fusion de ressources termino-ontologiques" (2008-2012)
[Towards a conceptual approach of biomedical information retrieval based on the fusion of termino-ontological resources]
Abstract
Information Retrieval (IR) is a scientific field aiming at providing solutions to select relevant information from a corpus of documents in order to answer the user information need. In the context of biomedical IR, there are different sources of information : patient records, guidelines, scientific literature, etc. In addition, the information needs may concern different profiles : medical experts, patients and their families, and other users ...
Many challenges are specifically related to the biomedical IR: the document representation, the usage of terminologies with synonyms, acronyms, abbreviations as well as the access to the information guided by the context of information need and the user profiles.
Our work is most related to the biomedical IR and deals with the challenges of the representation of biomedical information and the access to this rich source of information in the biomedical domain.
Concerning the representation of biomedical information, we propose techniques and approaches to indexing documents based on :
recognizing and extracting concepts from terminologies: the method of concept extraction is basically based on an approximate lookup of candidate concepts that could be useful to index the document. This technique exploits two sources of evidence: (a) the content-based similarity between concepts and documents and (b) the semantic similarity between them.
disambiguating entry terms denoting concepts by exploiting the polyhierarchical structure of a medical thesaurus (MeSH - Medical Subject Headings). More specifically, the domains of each concept are exploited to compute the semantic similarity between ambiguous terms in documents.
The most appropriate domain is detected and associated to each term denoting a particular concept.
exploiting different termino-ontological resources in an attempt to better cover the semantics of document contents.
Concerning the information access, we propose a document-query matching method based on the combination of document and query expansion techniques. Such a combination is guided by the context of information need on one hand and the semantic context in the document on the other hand. Our analysis is essentially based on the study of factors related to document and query expansion that could have an impact on the IR performance: distribution of concepts in termino-ontological resources, fusion techniques for concept extraction issued from multiple terminologies, concept weighting models, etc.
Our contributions, in terms of indexing techniques and information access, have been experimentally evaluated on tests collections devoted to biomedical IR, especially the TREC Genomics collections.