Keynotes

Tim Kraska (MIT)

Title: ML for Systems and Systems for ML

Abstract

In this talk, I will present an overview of our work on ML for Systems and Systems for ML. I will start by presenting Northstar, a novel system we developed for Interactive Data Exploration at MIT and Brown University. I will explain why Northstar required us to completely rethink the entire analytics stack, from the interface to the “guts” and highlight a few selected techniques we developed to provide a truly novel user-interface. Afterwards, I will provide an overview on how machine learning is changing the way we build systems and outline different ways to build learned algorithms and data structures to achieve “instance-optimality” with a particular focus on data management systems.

Bio

Tim Kraska is a director of applied science at Amazon AWS, an Associate Professor of Electrical Engineering and Computer Science in MIT's Computer Science and Artificial Intelligence Laboratory, co-director of the Data System and AI LAB in MIT’s CSAIL (DSAIL@CSAIL), co-founder of Instancio (acquired), and co-founder of Einblick Analytics (einblick.ai). Currently, his research focuses on building systems for machine learning, and using machine learning for systems. Before joining MIT, Tim was an Assistant Professor at Brown and spent time at Google Brain. Tim is a 2017 Alfred P. Sloan Research Fellow in computer science and received several awards including the VLDB Early Career Research Contribution Award, the VMware Systems Research Award, the university-wide Early Career Research Achievement Award at Brown University, an NSF CAREER Award, as well as several best paper and demo awards at VLDB, SIGMOD, and ICDE.

Matei Zaharia (Stanford University and Databricks)

Title: Retrieval as a Building Block for AI Systems

Abstract

In many domains, recent AI approaches are exciting because of their ability to incorporate a large amount of real-world knowledge into a model, whether for answering questions, generating language, or making decisions. However, managing knowledge solely as DNN parameters is clumsy for multiple reasons: it requires very large models that are expensive to train and execute, it reduces interpretability of outputs, and it makes it difficult to update the model’s knowledge. I’ll talk about a line of research from Stanford and other organizations that uses retrieval (search) as a key building block in AI systems instead, allowing dramatically smaller models to perform important tasks with high performance, high interpretability, and easy updatability. This line of work also has exciting connections with databases and indexing. I’ll specifically cover the ColBERT, ColBERT-QA and Baleen models for NLP from my group that have set state-of-the-art results in multiple NLP tasks.

Bio

Matei Zaharia is a Coufounder and Chief Technologist at Databricks as well as an Assistant Professor of Computer Science at Stanford. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly on other widely used data and machine learning software, including MLflow, Delta Lake and Apache Mesos. He works on a wide variety of projects in data management and machine learning at Databricks and Stanford. Matei’s research was recognized through the 2014 ACM Doctoral Dissertation Award, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).

Ippokratis Pandis (Amazon Web Service)

Title: Reinventing Cloud Analytics with Machine Learning

Abstract

Amazon Redshift is Amazon's Amazon's petabyte-scale data warehouse service. Over the recent years, Amazon Redshift has gone through a major rearchitecture, while remaining the most popular and highest performing cloud data warehouse service. One of the major elements of this rearchitecture has been the substantial use of machine learning in multiple areas of the service, with the culmination of the autonomics efforts being the introduction of Amazon Redshift Serverless offering. This talk presents the basic elements of Amazon Redshift's rearchitecture and then focuses on the different ways where machine learning is being used in Redshift to improve its operational health, efficiency, performance and cost. We close with the presentation of Amazon Redshift Serverless and its intelligent compute management layer.

Bio

Ippokratis Pandis is a Senior Principal Engineer at Amazon Web Services. He spends most of his time on AWS's Analytics services, especially Amazon Redshift. Redshift is Amazon's fully managed, petabyte-scale data warehouse service. Previously, Ippokratis has held positions as software engineer at Cloudera where he worked on the Impala SQL-on-Hadoop query engine, and as member of the research staff at the IBM Almaden Research Center, where he worked on IBM DB2 BLU. Ippokratis received his PhD from the Electrical and Computer Engineering department at Carnegie Mellon University. He is the recipient of a Test-of-Time award at EDBT 2019. He is the General Chair of SIGMOD 2023 and the president of HPTS.

Feifei Li (Alibaba Group)

Title: DAS: Building an Autonomy Database Service for Cloud Data Management

Abstract

Cloud database proactively adopts AI and ML techniques in the enterprise cloud database usage scenarios. This talk summarizes our practice to conduct fundamental research with an aim for adoption in cloud database production environment. AI and ML techniques can be used to improve both DevOps and DB kernels. For the DevOps scenario, we launched Database Autonomy Service (DAS), which is based on observability to improve cloud database usability, such as anomaly detection, root cause identification, drill down analysis, SQL query optimization, diagnosis via DB knowledge base, and resource optimization and autoscaling. For the intelligent database kernel scenario, we provide built-in AI computations through declarative SQL statements to reduce data movement and simplify the development cycle of applying AI solutions. We present our recent work on improving DB performance through knob tuning and identifying hot data for tiered DB storage based on survival analysis as two specific examples.

Bio

Feifei Li is currently a Vice President of Alibaba Group, director of the database team of Alibaba Cloud Intelligence, and director of the database lab of DAMO academy. He has won multiple awards from ACM and IEEE and others. He is a recipient of the EDBT 2022 10 Years Test of Time Award, IEEE ICDCS 2020 best paper award, ACM SoCC 2019 Best Paper Award Runner-up, IEEE ICDE 2014 10 Years Most Influential Paper Award, ACM SIGMOD 2016 Best Paper Award, ACM SIGMOD 2015 Best System Demonstration Award, and IEEE ICDE 2004 Best Paper Award. He has been an editor, PC co-chair, and core committee member for many prestigious journals, conferences, and technical meetings. He has led the R&D efforts of building cloud-native database systems and products at Alibaba, such as cloud-native relational database PolarDB and cloud-native data warehouse AnalyticDB which help Alibaba Cloud Database to be named as a Cloud DBMS leader by Gartner. He is an ACM/IEEE Fellow.