Overview

    Project OutputMultilingual Annotation of Named Entities and Terminology Resources Acquisition (Mantra)

    Mantra will provide multilingual terminologies and semantically annotated multilingual documents, e.g., patent texts, to improve the accessibility of scientific information from multilingual documents. The MANTRA project capitalizes on parallel document corpora from which translational correspondences will be computed by the use of different alignment methods. Fortunately, the biomedical domain, the application scenario of MANTRA, offers a rich variety of such parallel corpora. 
    The project partners will exploit these multilingual document sets to harvest terms and concept representations in different languages in order to augment currently available terminological resources such as the Medical Subject Headings (MeSH). The project partners will collaboratively build two types of resources:
    • automatically enhanced multilingual terminologies and
    • semantically annotated multilingual documents.
    The novelty of the latter resource derives from the fact that we solicit and orchestrate community efforts for building up these annotated resources, a procedure that has already been proven successful for the semantic enrichment of large-scale biomedical document corpora (CALBC project) which was executed by the project partners. 

    The novelty of the first comes from a new combination of existing technologies in the area of statistical machine translation, named entity tagging and terminological resources. Both types of resources will be made available to the public for translation purposes and for search in and text mining from multilingual documents.

    The Mantra project and the Mantra challenge are now part of the CLEF initiative (CLEF-ER). 



    Project partners:

    External Contributors:

    Acknowledgements

    • European Commission: 7th FRAMEWORK PROGRAMME, Small or medium-scale focused research actions (STREP)
    • THEME: Information Content Technologies Call [FP7-ICT-2011-4.1],
      Challenge 4: Technologies for Digital Content and Languages
    • Grant Agreement Number: 296410