Yuanyuan Tian
Microsoft
Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. How- ever, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to gain the most value from them. For cloud providers, managing every aspect of an ever-increasing set of data services, while meeting customer SLAs and minimizing operational cost is becoming more challenging. Cloud technology enables the collection of significant amounts of workload traces and system telemetry. With the progress in data science (DS) and machine learning (ML), it is feasible and desirable to utilize a data-driven, ML-based approach to automate various aspects of data services, resulting in the creation of autonomous data services. In this talk, I will present our perspectives and insights on creating autonomous data services on Azure. It also covers the future endeavors we plan to undertake and unresolved issues that still need attention.
Bio: Dr. Yuanyuan Tian is a Principal Scientist Manager and Graph Architect at Microsoft, and an ACM Distinguished Member. Before Microsoft, she was a Principal Research Staff Member at IBM Almaden Research Center. She obtained her Ph.D. in computer science from the University of Michigan. At GSL, Dr. Tian leads research efforts in graph queries & analytics, query/workload optimization, and ML-for-Systems. Her broader research interests include HTAP, SQL-on-Hadoop, big data federation, and Systems-for-ML. She has published two books and over 50 articles in top database venues with 5700+ citations. Dr. Tian has been an Associate Editor for the VLDB Journal, PVLDB, and SIGMOD, served on the editorial board of the Encyclopedia of Big Data, and played key roles in major database conferences, including PC Chair for SIGMOD 2027 and SoCC 2023. She has also participated in multiple NSF panels. She is the recipient of the DaMoN 2023 Best Short Paper Award, the SIGMOD 2019 Research Highlight Award, the EDBT 2018 Best Paper Award, and multiple Outstanding Technical Achievement and Research Division Awards from IBM. She also received the Distinguished Academic Achievement Award from the University of Michigan in 2008. Her research has been incorporated into products such as IBM Db2 Event Store and IBM Db2 Graph, as well as open-source projects like Apache SystemML (now SystemDS), Apache Giraph, and Apache DataSketches.
Aditya Parameswaran
UC Berkeley
LLMs are changing the world, but how can they help with data processing? In this talk, we discuss ongoing work in the EPIC Data Lab at Berkeley to rethink the end-to-end data lifecycle, now with LLMs in the mix. We describe our scalable, efficient, and usable text data processing system stack, aka our document "stack stack", as well as a couple of projects that are having impact across a number of real-world domains. We'll also briefly touch upon our future research vision around better supporting agentic workloads.
Bio: Aditya Parameswaran is an Associate Professor in EECS at UC Berkeley, and a co-director of the EPIC Data Lab. Aditya has published 100+ papers overall at top venues across multiple disciplines, with multiple best paper awards; just this year, his papers have appeared at the top DB (VLDB, SIGMOD), AI (ICLR, NAACL), and HCI (UIST, CSCW) venues. Multiple open-source tools developed in his group have received thousands of GitHub stars (including Modin, Lux, IPyFlow, DocETL)---and have been downloaded tens of millions of times overall across a spectrum of industries. His research was commercialized as a startup, Ponder, in 2021, where he served as Co-founder and President, before its acquisition by Snowflake. Aditya has received the Alfred P. Sloan Research Fellowship, VLDB Early Career Award, the NSF CAREER Award, the TCDE Rising Star Award, along with other recognitions. His website is at http://adityagp.net