Home‎ > ‎

SIGIR 2018 Tutorial


Knowledge Extraction and Inference from Text: Shallow, Deep, and Everything in Between


Systems for structured knowledge extraction and inference have made giant strides in the last decade. Starting from shallow linguistic tagging and coarse-grained recognition of named entities at the resolution of people, places, organizations, and times, modern systems link billions of pages of unstructured text with knowledge graphs having hundreds of millions of entities belonging to tens of thousands of types, and related by tens of thousands of relations. Via deep learning, systems build continuous representations of words, entities, types, and relations, and use these to continually discover new facts to add to the knowledge graph, and support search systems that go far beyond page-level “ten blue links”. We will present a comprehensive catalog of the best practices in traditional and deep knowledge extraction, inference and search. We will trace the development of diverse families of techniques, explore their interrelationships, and point out various loose ends.

Target audience

We will target early-career academic researchers and industrial practitioners. Attendees are expected to have some basic familiarity with text indexing and corpus statistics (tokenization, typical heavy-tailed vocabularies, TFIDF). They are expected to be largely familiar with undergrad statistics (probability, distributions, divergence). Some elementary machine learning (clustering, regression and classification; logistic regression, support vector machine basics) will also help but is not mandatory.

Topics with references

Please download the attached PDF files.
Soumen Chakrabarti,
Jul 9, 2018, 6:41 AM
Soumen Chakrabarti,
Jul 6, 2018, 11:00 AM
Soumen Chakrabarti,
Jun 24, 2018, 12:02 AM