past-projects‎ > ‎

Schema/Ontology Matching

This project was carried out from 2000 to 2009. It studied schema/ontology matching, which is fundamental to many data management applications, including data integration, warehousing, mining, e-commerce, e-science, and Web data processing.

The project was very timely. Shortly after it started around 2000, this direction exploded into a major direction in data management, and has received much attention ever since. The main contributions of this project:
  • We showed how to apply machine learning to this problem.
  • We showed that multiple types of domain knowledge must be exploited to maximize matching accuracy.
  • We introduced a highly modular extensible system architecture, which is pretty much the common matching architecture used today.
  • We showed how to exploit domain knowledge (e.g., in form of other schemas) in matching.
  • We were among the first to develop clean solutions to several difficult problems, such as finding complex schema matches and matching ontologies.
One of the main lessons I learned from this project is that crowdsourcing could be ideal for such matching (and this in turn motivated my subsequent work on crowdsourcing).

People and Funding
  • AnHai Doan, Robert McCann, Robin Dhamanka, Yoonkyong Lee, Mayssam Sayyadian, Wensheng Wu, Xiaoyong Chai.
  • Collaborators: Alon Halevy, Pedro Domingos, Phil Bernstein, Jayant Madhavan, Arnon Rosenthal, Len Seligman, Chris Clifton, Luis Gravano, Natasha Noy, Clement Yu.
  • We gratefully acknowledge support from grants CAREER IIS-0347903 and ITR 0428168, MITRE, and Google.

PhD Dissertation
Basic Matching Techniques
Crowdsourced Schema Matching
Matching Web Query Interfaces (on the Deep Web)
Workshops, Special Isses, Surveys, Textbook Chapters
Selected Talk Slides