Research
In general, I am interested in Machine Learning and Natural Language Processing, and how they enable applications in Semantic Web, Search or Information Retrieval, Recommendation, Personalization, and any data mining task in general. A few representative examples within these include entity extraction, relation extraction, entity linkage or named entity disambiguation, ontology creation, taxonomy construction and enrichment, recommending items/products/users, ranking items for retrieval, and more.
I have a rich experience on Information Extraction (IE) that helps build structured knowledge bases for Search, Recommendation, QA and any application that needs intelligent reasoning. In doing so, I have developed a variety of extraction techniques for the closed and open IE settings, starting from rule-based methods, weak supervision based methods to graph neural networks and large language models based methods. I also have some experience extracting and aligning information from web tables (relational tables embedded in HTML pages) for knowledge base enrichment.
Ceres - a closed IE method for relation extraction
OpenCeres - an open IE method for relation extraction
DOM-LM - a pre-trained language model for HTML DOM trees
TCN - a graph convolutional network for interpretation and alignment between web tables and a KB
LEAST - a self-training method for attribute/relation extraction in zero-shot settings
Besides IE research aimed toward building knowledge bases, I have rich experience mining knowledge graphs specifically for the task of fact checking. As part of my doctoral work, I designed and developed multiple computational methods that perform fact checking of statements that can be expressed in the form of (subject, predicate, object) triplets, e.g., (Olympia, capital_of, Washington). In doing so, I leveraged large public, collaboratively-edited knowledge bases such as DBpedia, Freebase, YAGO and Wikidata, typically published as part of the Linked Open Data (LOD). I list below a few techniques that resulted from this work.
You can also find implementations of a few related techniques (Path Ranking Algorithm, PredPath) in the same GitHub repo.
Tutorials:
Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web. XL Dong, H Hajishirzi, C Lockard, P Shiralkar. KDD 2020. PDF, Website, Slides
Web-scale Knowledge Collection. C Lockard, P Shiralkar, XL Dong, H Hajishirzi. WSDM 2020. PDF, Website, Slides
Fact Checking: Theory and Practice. With Christos Faloutsos (CMU), Xin Luna Dong (Amazon), Xian Li (Amazon), and Subhabrata Mukherjee (Amazon). KDD 2018, London, UK. Website, Slides
Publications: See my CV for a select set of papers and Google Scholar and DBLP for a complete list.
Thesis:
Ph.D. in Computer Science: Computational Fact Checking by Mining Knowledge Graphs - Dissertation, Slides
Posters:
Bot-or-Not: social bot detection. Onur Varol, Clayton Davis, Prashant Shiralkar, Emilio Ferrara, Filippo Menczer, Alessandro Flammini. Awarded Best Poster at CCS 2015 - PDF
RelSifter: Scoring Triples from Type-Like Relations. Prashant Shiralkar, Mihai Avram, Giovanni Luca Ciampaglia, Filippo Menczer, Alessandro Flammini. at WSDM Cup 2017 - PDF