Invited IV:
Cleaning for ML

Abstract

Data cleaning is widely regarded as a critical piece of machine learning (ML) applications, as data errors can corrupt models in ways that cause the application to operate incorrectly, unfairly, or dangerously. Traditional data cleaning focuses on quality issues of a dataset in isolation of the application using the data—Cleaning Before ML—which can be inefficient and, counterintuitively, degrade the application further. In this talk, I briefly discuss the application of general purpose cleaning algorithms in data science pipelines and review recent progress in Cleaning For ML. I will conclude with our vision of a holistic cleaning framework, and outline new challenges that arise when data cleaning meets ML applications.

Bio: Prof. Dr. Ziawasch Abedjan

Ziawasch Abedjan is a university professor and chairs the “Databases and Information Systems” group at the Leibniz University in Hanover. He serves as Principal Investigator at the L3S Research Centre in Hanover and fellow of the Berlin Institute for the Foundations of Learning and Data (BIFOLD). He is further Visiting Academic at Amazon and Junior Fellow of the German Computer Science Society (GI). Ziawasch Abedjan earned his doctorate at the Hasso-Plattner Institute in Potsdam and spent two years at MIT as a post-doctoral associate. Before his tenure in Hanover, he served as a Junior Professor at the Technical University of Berlin and as a senior researcher at the German Research Centre for Artificial Intelligence (DFKI). His research is supported by grants from the German Research Foundation (DFG) and the Federal Ministry for Education and Research (BMBF).