Held in conjunction with ICDM 2014, Shenzhen, China, on 14 December 2014

Data are at the core of research in many domains outside of computer science, such as healthcare, social sciences, and business. Combining diverse sources of data provides potentially very useful and powerful data, but it is also a challenging research problem. There are a multitude of challenges in data integration: the data collections to be integrated may come from different sources; the collections may have been created by different groups; their characteristics can be different (different schema, different data types); and the data may contain duplicates. Solving these challenges requires substantial effort and domain experts need to be involved. In the era of Big Data, with organizations scaling up the volume of their data, it is critical to develop new and scalable approaches to deal with all these challenges. In addition, it is important to properly assess the quality of the source data as well as the integrated data. As a consequence, the quality of the source data will drive the methods needed for its integration. Data integration is an important phase in the KDD process, by creating new and enriched records from a multitude of sources. These new records can be queried, searched, mined and analyzed for discovering new, interesting and useful patterns.

The goal of this workshop is to bring together computer scientists with researchers from other domains and practitioners from businesses and governments to present and discuss current research directions on multi source data integration and its application. The workshop will provide a forum for original high-quality research papers on record linkage, data integration, mining techniques of integrated data, and applications, as well as multidisciplinary research opportunities. 

Topics of interest include (but are not limited to):

Data Integration Methodologies
  • Automating data cleaning and pre-processing
  • Data integration methods
  • Entity resolution, record linkage, data matching, and duplicate detection
  • Big Data integration
  • Integrating complex data
Evaluation, Quality and Privacy
  • Evaluation of linkage/matching/data integration methods
  • Data quality evaluation for source data and/or integrated data
  • Bias and quality of longitudinal data
  • Preserving privacy in data integration
Integrated Data and Longitudinal Data Applications
  • Mining and analysis of longitudinal data
  • Data integration applications for healthcare, social sciences, digital humanities, bioinformatics, genomics, etc. 
http://www.kdnuggets.com/