Talk 1: The Knowledge Base Population Task: Challenges for Information Extraction
Ralph Grishman, New York University
The Knowledge Base Population (KBP) task, being run for the past 3 years by the U.S. National Institute of Standards and Technology, is the latest in a series of multi-site evaluations of information extraction, following in the tradition of MUC and ACE. We examine the structure of KBP, emphasizing the basic shift from sentence-by-sentence and document-by-document evaluation to corpus-based extraction and the challenges it raises for cross-sentence and cross-document processing. We consider the problems raised by the limited amount and incompleteness of the training data, and how this has been (partly) addressed through such methods as semi-supervised learning and distant supervision. We describe some of the optional tasks which have been included -- rapid task adaptation (last year), temporal analysis (this year), cross-lingual extraction (planned for next year) -- and others which have been suggested.
Talk 2: Bringing Multilingual Information Extraction to the User
Ralf Steinberger, European Commission Joint Research Centre
The speaker will give an overview of how various text mining tools (information extraction, aggregation of multilingual information, document classification, trend analysis, and more) are combined in the Europe Media Monitor (EMM) family of applications to help users in their daily work. EMM was developed by the European Commission's Joint Research Centre (JRC), whose users include EU Institutions, national EU member state organisations, international organisations such as United Nations sub-organisations, and selected international partners (e.g. in the USA, in Canada and in China). The presentation will thus have an overview character rather than going into much technical detail. EMM applications are publicly accessible at http://emm.newsbrief.eu/overview.html. For scientific details and publications, see http://langtech.jrc.ec.europa.eu/.