Human-in-the-Loop Entity Resolution for Knowledge Curation

Date: February 15, 2019

Speakers: Lucian Popa and Kun Qian, IBM Research - Almaden

Abstract

Entity resolution is a key form of reasoning that allows to establish explicit connections among entities across heterogeneous datasets. Such connections can represent "same-as" links between different representations of the same real-world entity or, more generally, can represent various types of relationships among entities. Along with other ubiquitous operations such as information extraction, data transformation and fusion, entity resolution is a crucial step for building high-value, domain-specific knowledge bases from raw data. One of the key challenges that we address, in this space, is the development of human-in-the-loop tools that target domain experts (rather than programmers) and help them reach high-accuracy entity resolution algorithms that are scalable, explainable and reusable. We will describe the techniques behind our tools, discuss several applications in concrete domains, and show live demos.

Bios

Lucian Popa is a Principal Research Staff Member and Manager at IBM Research - Almaden, which he joined in 2000. His research work has been focused on data exchange, schema mapping and, more recently, entity resolution. He has contributed to several IBM products, and leads a research team focused on human-in-the-loop systems for structured knowledge creation and learning. He has co-authored numerous research publications that are highly cited, with two of them receiving Test-of-Time Awards. He is an ACM Distinguished Member.

Kun Qian is a research scientist at IBM Research - Almaden. He received his PhD from UC Santa Cruz in 2017. His research work has been focused on example-driven schema mapping discovery, entity resolution, and knowledge creation. He is generally interested in the areas of information integration, active learning, and deep learning, with particular applications towards building human-in-the-loop machine learning systems for entity understanding and resolution.