Towards Scalable Schema Mapping using Large Language Models
Christopher Buss, Oregon State University
Mahdis Safari, Oregon State University
Arash Termehchy, Oregon State University
David Maier, Portland State University
Stefan Lee, Oregon State University
Abstract: The growing need to integrate information from a large number of diverse sources poses significant scalability challenges for data integration systems. These systems often rely on manually written schema mappings, which are complex, source-specific, and costly to maintain as sources evolve. While recent advances suggest that large language models (LLMs) can assist in automating schema matching by leveraging both structural and natural language cues, key challenges remain. In this paper, we identify three core issues with using LLMs for schema mapping: (1) inconsistent outputs due to sensitivity to input phrasing and structure, which we propose methods to address through sampling and aggregation techniques; (2) the need for more expressive mappings (e.g., GLaV), which strain the limited context windows of LLMs; and (3) the computational cost of repeated LLM calls, which we propose to mitigate through strategies like data type prefiltering.
Unveiling Challenges for LLMs in Enterprise Data Engineering
Carsten Binnig, TU Darmstadt
Jan-Micha Bodensohn, TU Darmstadt
Ulf Brackmann, TU Darmstadt
Liane Vogel, TU Darmstadt
Anupam Sanghi, TU Darmstadt
Abstract: Large Language Models (LLMs) have demonstrated significant potential for automating data engineering tasks on tabular data, giving enterprises a valuable opportunity to reduce the high costs associated with manual data handling. However, the enterprise domain introduces unique challenges that existing LLM-based approaches for data engineering often overlook, such as large table sizes, more complex tasks, and the need for internal knowledge. In this talk, we present the results of a study where we set out the goal of systematically studying the performance of LLMs for enterprise data engineering. As main results, we identify key enterprise-specific challenges related to data, tasks, and background knowledge and show their impact on recent LLMs for data engineering. Our analysis reveals that LLMs face substantial limitations in real-world enterprise scenarios, resulting in significant accuracy drops. We believe that our findings can thus contribute to a systematic understanding of LLMs for enterprise data engineering to support their adoption in industry.
Optimizing open-domain question answering with graph-based retrieval augmented generation
Joyce Cahoon, Microsoft - Gray Systems Lab
Nick Litombe, Microsoft - Gray Systems Lab
Jonathan Larson, Microsoft
Yiwen Zhu, Microsoft - Gray Systems Lab
Andreas Mueller, Microsoft - Gray Systems Lab
Fotis Psallidas, Microsoft - Gray Systems Lab
Carlo Curino, Microsoft - Gray Systems Lab
Abstract: In this work, we benchmark various graph-based retrieval-augmented generation (RAG) systems across a broad spectrum of query types, including OLTP-style (fact-based) and OLAP-style (thematic) queries, to address the complex demands of open-domain question answering (QA). Traditional RAG methods often fall short in handling nuanced, multi-document synthesis tasks. By structuring knowledge as graphs, we can facilitate the retrieval of context that captures greater semantic depth and enhances language model operations. We explore graph-based RAG methodologies and introduce TREX, a novel, cost-effective alternative that combines graph-based indexing and vector-based retrieval techniques. Our benchmarking across four diverse datasets highlights the strengths of different RAG methodologies, demonstrates TREX’s ability to handle multiple open-domain QA types, and reveals the limitations of current evaluation methods. We publicly release these datasets to facilitate further research and benchmarking at https://github.com/microsoft/graphrag-benchmarking-datasets. Our findings underscore the potential of augmenting large language models with advanced retrieval capabilities and scalable graph-based AI solutions.
Semantic Knowledge Graphs for High‑Precision, Low‑Latency NL2SQL
Wangda Tan, Waii
Abstract: Enterprise adoption of natural language to SQL interfaces is often blocked by complex database schemas and unclear user intents. We present a metadata‑driven knowledge graph that automatically infers table relationships, categorizes columns, and links technical documentation into a unified semantic framework, achieving translation accuracy above 95 percent on challenging schemas. At query time, graph‑derived concepts are dynamically retrieved and ranked to resolve ambiguity in user requests.
To optimize performance, our system employs model right‑sizing—routing simple intent detection to lightweight models and complex SQL generation to larger ones—and compresses schema references to reduce token usage. A multi‑tier, graph‑aware cache combined with speculative parallel execution and streamed intermediate artifacts (such as entity extractions and draft queries) cuts end‑to‑end latency by up to 50 percent without compromising accuracy. This talk will share the design and implementation of our unified semantic and performance framework, along with practical lessons for building scalable, responsive NL2SQL systems.
Advancing Workload Management with Foundational Models: Challenges in Time Series Similarity and Interpretability
Full Paper link available here.
Tiemo Bang, Microsoft - Gray Systems Lab
Sergiy Matusevych, Microsoft - Gray Systems Lab
Yuanyuan Tian, Microsoft - Gray Systems Lab
Georgia Christofidi, IMDEA Software Institute
Giannis Roumpos, IMDEA Software Institute
Thaleia Dimitra Doudali, IMDEA Software Institute
Abstract: Workload management (WLM) is essential for cloud providers to balance performance, reliability, and cost. Many WLM tasks rely on understanding workload behavior through time series similarity analysis, but traditional approaches face scalability challenges due to manual feature engineering and computational overheads. Foundational time series models promise to address these limitations by learning reusable representations with minimal supervision. This paper evaluates their practical potential for WLM through a focused case study on time series similarity. We present concrete use cases, characterize a real-world query arrival dataset from Microsoft Fabric Warehouse, and compare the foundational model MOMENT against conventional similarity methods. Our findings reveal that while foundational models offer computational efficiency, they produce overly generalized similarities with limited interpretability compared to hand-engineered features. We identify key challenges and research directions needed to make foundational models practical for workload management.
Reimagining Databases in the Age of LLMs
Georgia Koutrika, Athena Research Center, Greece
Abstract: The rise of large language models (LLMs) is transforming how we interact with and manage data. Traditional databases —designed for structured, schema-bound querying— are now being challenged by the fluid, context-aware capabilities of LLMs. From natural language interfaces for querying, which allow users to interact with data using everyday language instead of rigid SQL syntax, to semantic query operators that enable reasoning over the meaning of data, LLMs are extending the expressive power of traditional query languages. These semantic capabilities allow databases to interpret intent and operate across structured and unstructured data in novel ways. In parallel, we are witnessing the emergence of learned query optimizers that leverage LLMs and other neural models to predict efficient execution plans, rewrite queries, and adapt indexing strategies. Together, these innovations are driving a new generation of hybrid data architectures redefining the boundaries of what databases can do. This talk will explore these emerging paradigms at the intersection of databases and AI, and what they mean for the future of data systems.