Schema/Ontology Matching

This project was carried out from 2000 to 2009. It studied schema/ontology matching, which is fundamental to many data management applications, including data integration, warehousing, mining, e-commerce, e-science, and Web data processing.

The project was very timely. Shortly after it started around 2000, this direction exploded into a major direction in data management, and has received much attention ever since. The main contributions of this project:
  • We showed how to apply machine learning to this problem.
  • We showed that multiple types of domain knowledge must be exploited to maximize matching accuracy.
  • We introduced a highly modular extensible system architecture, which is pretty much the common matching architecture used today.
  • We showed how to exploit domain knowledge (e.g., in form of other schemas) in matching.
  • We were among the first to develop clean solutions to several difficult problems, such as finding complex schema matches and matching ontologies.
One of the main lessons I learned from this project is that crowdsourcing could be ideal for such matching (and this in turn motivated my subsequent work on crowdsourcing).

People and Funding
  • AnHai Doan, Robert McCann, Robin Dhamanka, Yoonkyong Lee, Mayssam Sayyadian, Wensheng Wu, Xiaoyong Chai.
  • Collaborators: Alon Halevy, Pedro Domingos, Phil Bernstein, Jayant Madhavan, Arnon Rosenthal, Len Seligman, Chris Clifton, Luis Gravano, Natasha Noy, Clement Yu.
  • We gratefully acknowledge support from grants CAREER IIS-0347903 and ITR 0428168, MITRE, and Google.

