Keynotes

Immanuel Trummer (Cornell University)

Title: Applications for Large Language Models in Data Management

Abstract 

The past years have been marked by several breakthrough results in the domain of generative AI, culminating in the rise of tools like ChatGPT, able to solve a variety of language-related tasks without specialized training. In this talk, I outline novel opportunities in the context of data management, enabled by these advances. I discuss several recent research projects at Cornell, aimed at exploiting advanced language processing for tasks such as parsing a database manual to support automated tuning, or mining data for patterns, described in natural language. Finally, I discuss our recent and ongoing research, aimed at synthesizing code for SQL processing in general-purpose programming languages, while enabling customization via natural language commands.

Bio 

Immanuel Trummer is an assistant professor at Cornell University and heads the Cornell Database Group. His papers were selected for “Best of VLDB”, “Best of SIGMOD”, for the ACM SIGMOD Research Highlight Award, and for publication in CACM as CACM Research Highlight. His online lecture introducing students to database topics collected over a million views. He received the NSF CAREER Award and multiple Google Faculty Research Awards.

Manos Athanassoulis (Boston University)

Title: Building Robust Data Systems

Abstract 

ML and AI components are being introduced in data systems replacing or augmenting their traditional counterparts, for example in query optimization, indexing, and query evaluation. A common theme across such efforts is to use specific information about the workload and/or the execution environment to quickly tailor the system to it. In this work we take a different spin on ML-augmented data systems. Specifically, we ask the question "what if the workload or the setup information comes with some uncertainty?" In other words, we investigate the benefits we can achieve if the workload or the underlying setup is different than what we originally expected. We focus our attention to two main modules of a data system. First, we investigate tuning a storage engine that uses log-structured merge (LSM) trees as its core data structure. LSM-based storage engines are widely used today for write-intensive applications. We show that nominal tuning (the one that assumes accurate knolwedge of the workload) leaves performance benefits on the table, and we propose a new framework for "robust tuning" that leads to much better results in the presence of uncertainty. We further discuss new research avenues that are opened via this research including percentile optimization, and learned cost models. Second, we investigate one of the most common operators: joins. We observe that state-of-the-art hash join algorithms are not designed to use the information of key multiplicity (how many matches with the other table each key has), thus, working under the assumption that the best partitioning strategy is to create partitions of equal size. We show that knowing key multiplicity the optimal partitioning is not equi-sized and develop a practical algorithm that outperforms the state of the art for both in-memory and on-disk execution for various "shapes" of key multiplicity. If time permits we will also briefly discuss the struggles of update-aware learned indexes with data sortedness.

Bio 

Manos Athanassoulis is an Assistant Professor of Computer Science at Boston University, Director and Founder of the BU Data-intensive Systems and Computing Laboratory and co-director of the BU Massive Data Algorithms and Systems Group. He also spent a summer as a Visiting Faculty at Meta. His research is in the area of data management focusing on building data systems that efficiently exploit modern hardware (computing units, storage, and memories), are deployed in the cloud, and can adapt to the workload both at setup time and, dynamically, at runtime. Before joining Boston University, Manos was a postdoc at Harvard University, earlier he obtained his PhD from EPFL, Switzerland, and spent one summer at IBM Research, Watson. Manos’ work has been recognized by awards like “Best of SIGMOD” in 2016, “Best of VLDB” in 2010 and 2017, and “Most Reproducible Paper” at SIGMOD in 2017, and has been supported by an NSF CRII and an NSF CAREER award and industry funds including a Facebook Faculty Research Award, multiple Red Hat Research Incubation Awards and gifts from Cisco, Red Hat, and Meta.