In relational databases stored procedures and user defined functions (UDFs) have been used to express application business logic by using control flow logic and DMLs. Recently their use is increasing with the rising popularity of real-time analytics applications since the applications often contain complex business logic. Thus hybrid transactional and analytical processing systems like SAP HANA have started to put more efforts to optimize the execution of UDFs.
There has been much work in the fields of program optimization and query optimization, respectively. However, most of the studies have been done separately in isolated worlds, and cross-optimizations breaking the boundary between declaratives and imperatives were not studied enough. Therefore, unified optimization techniques considering both program and query optimization techniques are essential for achieving optimal query performance.
The first part of this talk presents a literature overview of a successful approach to improve the performance of UDF execution. This approach transforms entire UDFs into equivalent relational algebra expressions, then applies existing relational query optimization techniques. It is interesting that the imperative constructs such as branches and loops can be converted into relational algebra expressions. This approach is attractive since we can simply take advantage of existing sophisticated query optimization techniques after the transformation. However, one of the most challenging issues faced by this approach is that transformation rules are currently limited and there are non-transformable imperative expressions.
The second part presents our past and ongoing work using another approach, which unifies both program and query optimization techniques in a framework. The framework consists of plan enumeration and cost estimation for the UDFs. We first demonstrate this approach with iterative query processing in a UDF. With the notion of query motion, by which an SQL query is moved in and out of a loop, we enumerate execution plans for the UDF. We choose the best plan by using a cost model which measures the procedure cost based on the cost estimation of the query optimizer and imperative constructs. We next discuss unified optimization using the concept of relational operator motion, where we pull up or push down relational operators among statements in the UDF to obtain a globally optimized plan of the UDF.
Young-Koo Lee is a professor in the Department of Computer Science and Engineering at Kyung Hee University. His research interests include query processing and optimization, storage systems, data mining, and big data processing and analytics. He has published many scientific papers in international journals and conferences, with more than 8,000 citations.
We have witnessed two decades of cloud computing research. Yet, programming the cloud remains a tedious task for both the application and cloud infrastructure developers: application developers need to consider various cloud deployment aspects as they write code, while infrastructure developers must determine how to execute and optimize code that mingles application semantics, fault tolerance, and hardware constraints, etc. all into a single program.
Yet, cloud programming is not an entirely new problem: initial commercial efforts at a programmable cloud have started to take wing recently in the form of “serverless” Functions-as-a-Service (FaaS). FaaS offerings allow developers to write sequential code and upload it to the cloud, where it is executed in an independent, replicated fashion at whatever scale of workload it attracts. But fundamentally, even “FaaS done right” is a low-level assembly language for the cloud, a simple infrastructure for launching sequential code, a UDF framework without a programming model to host it.
In this talk, I will describe our new Hydro project for next generation cloud programming research. The key idea behind Hydro is a new programming interface targeted for cloud applications, where we decouple cloud programming into four separate concerns, each with an independent language facet: Program semantics, Availability, Consistency and Targets for optimization (PACT). To use Hydro, programmers first specify the intended functionality of their application (i.e., its semantics) using a declarative language. Then, they describe the failure domains (i.e., availability) once their application is deployed to the cloud. Consistency is then specified, independent from the program semantics, by stating invariants that their application should satisfy (e.g., how concurrent writes to the same data should be handled). Finally, the target facet describes the platform that the application should be deployed, for instance the expected types of hardware, minimal network connectivity required, etc.
Implementing the Hydro stack will open up new research opportunities in query compilation, execution, and runtime monitoring, for instance using program synthesis and other learning-based techniques to drive PACT program compilation rather than a static pattern-matching approach targeted for a specific backend, a distributed execution framework for monotonic code, iterative refinement of programs, etc. Hydro is a joint project with Natacha Crooks and Joe Hellerstein. In this talk I will explain these ideas and describe the early progress we have made thus far, and the many exciting opportunities that lie ahead.
Alvin Cheung is an assistant professor in UC Berkeley's EECS Dept. His research focuses on designing new techniques to solve data systems problems. Alvin's research has been recognized through multiple early career awards such as the US Presidential Early Career Award for Scientists and Engineers, the Sloan Fellowship, along with a number of best paper and demo awards.