Information Extraction with Humans in the Loop

Date: March 1, 2019

Speakers: Anna Lisa Gentile, IBM Research - Almaden

Abstract

Information Extraction (IE) techniques enables us to distill Knowledge from the abundantly available unstructured content. Some of the basic IE methods include the automatic extraction of relevant entities from text (e.g. places, dates, people, ...), understanding relations among them, building semantic resources (dictionaries, ontologies) to inform the extraction tasks, connecting extraction results to standard classification resources. IE techniques cannot decouple from human input - at bare minimum some of the data needs to be manually annotated by a human so that automatic methods can learn patterns to recognize certain type of information. The human-in-the-loop paradigm applied to IE techniques focuses on how to better take advantage of human annotations (the recorded observations), how much interaction with the human is needed for each specific extraction task. In this talk I will describe various experiments of the human- in-the-loop model on various IE tasks, such as (i) building dictionaries from text corpora in various languages [1]; (ii) extracting mentions of adverse drug reaction from text and matching them to the MedDRA ontology¹[2]; (iii) relation extractions, e.g. automatically identifying from the text which drug is causing which adverse drug reaction [3].

¹https://www.meddra.org/

References

Alba, A., Coden, A., Gentile, A.L., Gruhl, D., Ristoski, P., Welch, S.: Multi-lingual concept extraction with linked data and human-in-the-loop. In: Corcho, O^́., Janowicz, K., Rizzo, G., Tiddi, I., Garijo, D. (eds.) K-CAP 2017. pp. 24:1–24:8. ACM (2017). https://doi.org/10.1145/3148011.3148021, http://doi.acm.org/10.1145/3148011.3148021
Clarkson, K., Gentile, A.L., Gruhl, D., Ristoski, P., Terdiman, J., Welch, S.: User-centric ontology population. In: Gangemi, A., Navigli, R., Vidal, M., Hitzler, P., Troncy, R., Hollink, L., Tordai, A., Alam, M. (eds.) ESWC 2018. Lecture Notes in Computer Science, vol. 10843, pp. 112–127. Springer (2018). https://doi.org/10.1007/978-3-319-93417-4 8, https://doi.org/10.1007/978- 3-319-93417-4 8
Lourentzou, I., Alba, A., Coden, A., Gentile, A.L., Gruhl, D., Welch, S.: Min- ing relations from unstructured content. In: PAKDD 2018. pp. 363–375 (2018). https://doi.org/10.1007/978-3-319-93037-4 29, https://doi.org/10.1007/978-3-319- 93037-4 29

Google Sites

Report abuse