Keynote 1 
  Wed, 8th
☷  09:10 - 10:00

Commodifying Data Exploration

Abstract: Exploratory Data Analysis (EDA) is an iterative and often tedious process. Several strategies have been proposed to ease the burden on users in EDA ranging from stepwise to full-guidance approaches. Stepwise approaches rely on computing utility functions that determine the best action to take at each step. Full-guidance approaches rely on learning end-to-end exploration policies. Today’s big question is how to commodify EDA and make it easily deployable for all but for that we need to know what users are looking for: are they looking for a needle in a haystack, taking a tour of the data, or are they feeling lucky? This talk will investigate those questions and discuss the challenges of storing learned pathways through data or regenerating them when needed.

↗ Sihem Amer-Yahia (CNRS, Univ. Grenoble Alpes, France)
Sihem Amer-Yahia is a Silver Medal CNRS Research Director and Deputy Director of the Lab of Informatics of Grenoble. She works on exploratory data analysis and fairness in job marketplaces. Before joining CNRS, she was Principal Scientist at QCRI, Senior Scientist at Yahoo! Research and Member of Technical Staff at at&t Labs. Sihem is PC chair for SIGMOD 2023 and vice president of the VLDB Endowment. She currently leads the Diversity&Inclusion initiative for the database community.

 Keynote 2 
  Thu, 9th  
☷  09:10 - 10:00

Why Machine Learning for Automatically Optimizing Databases Doesn't Work

Abstract: Database management systems (DBMSs) are complex software that requires sophisticated tuning to work efficiently for a given workload and operating environment. Such tuning requires considerable effort from experienced administrators, which is not scalable for large DBMS fleets. This problem has led to research on using machine learning (ML) to devise strategies to optimize DBMS configurations for any application, including automatic physical database design, knob configuration, and query tuning. Despite the many academic papers that tout the benefits of using ML to optimize databases, there have been only a few major success stories in industry in the last decade.

In this talk, I discuss the challenges of using ML-enhanced tuning methods to optimize databases. I will address specific assumptions that researchers make about production database environments that are incorrect and identify why ML is not always the best solution to solving real-world database problems. As part of this, I will discuss state-of-the-art academic research and real-world tuning implementations.

↗ Andy Pavlo (Carnegie Mellon University, Pittsburgh, USA)
Andy Pavlo is an Associate Professor with Indefinite Tenure in the Computer Science Department at Carnegie Mellon University. He is also the co-founder of the OtterTune automated database optimization start-up (↗