Title: Efficient Execution of UDF Queries in Modern Data Engines
Abstract:Â User-defined functions (UDFs) have been widely used to overcome the expressivity limitations of SQL and complement its declarative nature with functional capabilities. UDFs are particularly useful in today's applications that involve complex data analytics and machine learning algorithms and logic. However, UDFs pose significant performance challenges in query processing and optimization, largely due to the mismatch of the UDF execution and SQL processing environments. To deal with this problem, research and commercial systems employ a broad scope of solutions ranging from algebraic, cost-based optimization to low level, physical query optimization, compilation, and execution. In this talk, we will highlight effective techniques to boost the performance of UDF queries including vectorization, parallelization, tracing JIT compilation, and operator fusion for various types of UDF (scalar, aggregate, table UDFs) and relational operators.