September 2020
Ghent, Belgium
Held in conjunction with ECML PKDD 2020
Data are at the core of research in many domains outside of computer science, such as healthcare, social sciences, and business. Combining diverse sources of data provides potentially very useful and powerful data, but it is also a challenging research problem. There are a multitude of challenges in data integration: the data collections to be integrated may come from different sources; the collections may have been created by different groups; their characteristics can be different (different schema, different data types); and the data may contain duplicates. Solving these challenges requires substantial effort and domain experts need to be involved. In the era of Big Data, with organizations scaling up the volume of their data, it is critical to develop new and scalable approaches to deal with all these challenges. In addition, it is important to properly assess the quality of the source data as well as the integrated data. As a consequence, the quality of the source data will drive the methods needed for its integration. Data integration is an important phase in the KDD process, by creating new and enriched records from a multitude of sources. These new records can be queried, searched, mined and analyzed for discovering new, interesting and useful patterns.
The goal of this workshop is to bring together computer scientists with researchers from other domains and practitioners from businesses and governments to present and discuss current research directions on multi source data integration and its application. The workshop will provide a forum for original high-quality research papers on record linkage, data integration, population informatics, mining techniques of integrated data, and applications, as well as multidisciplinary research opportunities.
Data Integration Methodologies
Population Informatics
Evaluation, Quality and Privacy
Integrated Data and Longitudinal Data Applications