Tutorials

 Tutorial 1 
  Wed, 8th   
☷  11:00 - 12:30
☷  16:00 - 17:30    
☉ 1004 (Gr. Ratssaal Fak. Inf. 1. OG)

From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management

Abstract: Large language models have recently advanced the state of the art on many natural language processing benchmarks. The newest generation of models can be applied to a variety of tasks with little to no specialized training. This technology creates various opportunities for applications in the context of data management.

The tutorial will introduce participants to basic background on language models, discuss different methods to use language models, and give an overview and short demonstration of available libraries and APIs. Models for generating natural language will be considered as well as models, such as GPT-3 Codex, which complete program code or generate code from natural language instructions. Finally, the tutorial will discuss recent research in the database community that exploits language models in the context of traditional database systems or proposes novel system architectures that are based on them.

The tutorial is targeted at database researchers. No prior background on language models is required. The goal of the tutorial is to introduce database researchers to the latest generation of language models, and to their use cases in the domain of data management.

Immanuel Trummer is assistant professor at Cornell University. He has recently applied large language models to problems such as code synthesis for SQL processing, mining structured data for patterns described in natural language, and exploiting technical documentation for automated database tuning. His papers were selected for “Best of VLDB”, “Best of SIGMOD”, for the ACM SIGMOD Research Highlight Award, and for publication in CACM as CACM Research Highlight. His current research is funded by the NSF and by multiple Google Faculty Research Awards.

 Tutorial 2 
  Thu, 9th    
☷  11:00 - 12:30
☷  16:00 - 17:00    
☉ 1004 (Gr. Ratssaal Fak. Inf. 1. OG)

Alberto Lerner & Philippe Bonnet:
The Principles of Database and SSDs Co-Design

Abstract: The Solid-State Drive (SSD) landscape is in constant evolution. For years, this evolution was hidden behind the unchanging abstractions of block devices and POSIX I/O. However, these abstractions have become problematic. They hinder performance and no longer reduce software complexity. Such a state of affairs impacts the database community in at least two ways.

First, using SSDs through legacy interfaces that hide internal mechanisms invariably results in erratic performance. The blame often goes to SSDs’ notoriously expensive garbage collection. In truth, several other complex processes result in non-linear effects in terms of latency and bandwidth. In this tutorial, we describe these processes and how they are implemented in modern devices. This knowledge will help system designers better choose SSDs and shape database workloads to match their performance characteristics.

Second, the inadequacy of the traditional I/O abstractions opens up an entire research field focused on the co-design of SSD and database management systems (DBMS). Such research aims at devising mechanisms and policies coupling the storage manager of a DBMS and SSD internals: e.g., placing an SSD FTL under the control of an application, changing SSD subsytems in response to the workload, or executing logic within a SSD on a database’s behalf. In this tutorial, we describe the research opportunities and challenges through this continuum of DBMS/SSD co-design techniques, and present platforms supporting their simulation and prototyping.

We believe that those two areas—a more seamless integration of Database and Storage, and the study of SSD variations adapted to Database computations—are central to the development of the next generation of Database Systems. 

Alberto Lerner is a Senior Researcher at the University of Fribourg, Switzerland, at the eXascale Infolab. In the past, he was a postdoctoral researcher at IBM Research (both at T.J. Watson and Almaden), and participated in the design and development of several commercial database systems including at Google and MongoDB. His recent work focuses on using the network and storage stacks’ computational capacity to offloading Database Systems logic.

Philippe Bonnet is a professor at the IT University of Copenhagen. For more than a decade, Philippe has worked in the area of Data Management with Flash Devices. His group has focused on Open-Channel SSDs, FTL design and computational storage.