Deniz Altınbüken, Google; Lyric P. Doshi, Google; Martin Maas, Google; Milad Hashemi, Google
ML-driven Cloud Resource Management
Neeraja Yadwadkar, UT Austin
Abstract:
The variety of user workloads, application requirements, heterogeneous hardware resources, and large number of management tasks have rendered today’s cloud fairly complex. Recent work has shown promise in using Machine Learning for efficient resource management for such dynamically changing cloud execution environments. These approaches range from offline to online learning agents. In this talk, I will focus on the challenges that arise when building such agents and those that arise when these agents are deployed in real systems. To do so, I will use SmartHarvest, a system that improves utilization of resources by dynamically harvesting spare CPU cores from primary workloads to run batch workloads on cloud servers, as an example. Building on that, I will briefly talk about SOL, a framework that assists developers in building and deploying online learning agents for various use-cases.
Bio:
Neeraja is an assistant professor in the department of Electrical and Computer Engineering at UT Austin. She is a Cloud Computing Systems researcher, with a strong background in Machine Learning (ML). Most of her research straddles the boundaries of Systems and ML: using and developing ML techniques for systems, and building systems for ML. Before joining UT Austin, she was a postdoctoral fellow in the Computer Science department at Stanford University and before that, received her PhD in Computer Science from UC Berkeley. She had previously earned a bachelors in Computer Engineering from the Government College of Engineering, Pune, India.
Abstract:
Machine programming (MP) is principally concerned with the automation of software development. Different than program synthesis, MP targets all aspects of software development such as automating debugging, testing, and profiling code, amongst other things. In this talk, we discuss the foundations of MP and consider its impact across three views: (i) academia, (ii) established corporations, and (iii) startup ventures.
We begin with the “The Three Pillars of Machine Programming” and the formation of the ACM SIGPLAN Machine Programming Symposium (MAPS) both in 2017. We then discuss critical developments that occurred in MP over the last five years leading us to today, including some potential missteps. We then forecast the future of MP over the next five years, including discussing some obvious upcoming developments (e.g., AI-coding partners) and some less obvious ones (e.g., semantic reasoners, transpilation, intentional programming languages, etc.).
Bio:
Justin Gottschlich is the Founder, CEO & Chief Scientist of Merly, Inc. (http://merly.ai), a company aimed at making software developers more productive using state-of-the-art machine programming systems. Justin also has an academic appointment as an Adjunct Lecturer at Stanford University, where he teaches machine programming. Previously, Justin was a Principal AI Scientist and the Founder & Director of Machine Programming Research at Intel Labs and an Adjunct Professor at University of Pennsylvania. In 2017, he co-founded the ACM SIGPLAN Machine Programming Symposium (MAPS) and now serves as its Steering Committee Chair. Justin also serves on the 2020 NSF Expeditions advisory board “Understanding the World Through Code” led by MIT Prof. Armando Solar-Lezama. Justin received his PhD in Computer Engineering from the University of Colorado-Boulder in 2011. He has 40+ peer-reviewed publications, 70+ issued patents, with 100+ patents pending. Justin and his team’s research have been highlighted in venues like The New York Times, Communications of the ACM, MIT Technology Review, and The Wall Street Journal.
Jonathan Raiman, OpenAI
Bio:
Jonathan Raiman is a Senior Research Scientist in the NVIDIA Applied Deep Learning Research group working on large-scale distributed reinforcement learning and AI for systems. Previously he was a Research Scientist at OpenAI where he co-created OpenAI Five, a superhuman Deep Reinforcement Learning Dota 2 bot. At Baidu SVAIL, he co-created several neural text-to-speech systems (Deep Voice 1, 2, and 3), and worked on speech recognition (Deep Voice 2), and question answering (Globally Normalized Reader). He is also the creator of DeepType 1, and DeepType 2, a superhuman entity linking system. He is completing his Ph.D. at Paris Saclay, and previously obtained his master’s at MIT.
Limitations of Data-driven based Approaches for Assuring Performance of Enterprise IT Systems
Rekha Singhal, TCS Research; Manoj Nambiar, TCS Research
Real-World Challenges of ML-based Database Auto-tuning
Shohei Matsuura, Yahoo Japan Cooperation; Takashi Miyazaki, Yahoo Japan Cooperation
Understanding Model Drift in a Large Cellular Network
Shinan Liu, University of Chicago; Francesco Bronzino, Université Savoie Mont Blanc; Paul Schmitt, University of Southern California; Arjun Nitin Bhagoji, University of Chicago; Nick Feamster, University of Chicago; Hector Garcia Crespo, Verizon Inc.; Timothy Coyle, Verizon Inc.; Brian Ward, Verizon Inc.
Benoit Steiner, Meta; Jonathan Raiman, OpenAI; Neeraja Yadwadkar, UT Austin; Siddhartha Sen, Microsoft
CacheSack: Lessons from deploying an admission optimizer for Google datacenter flash caches
Arif Merchant, Google
Abstract:
CacheSack is the admission algorithm for Google datacenter flash caches. CacheSack partitions cache traffic into categories, estimates the benefit and cost of different cache admission policies for each category, and assigns the optimal combination of admission policies while respecting resource constraints. This talk will briefly describe the design and deployment of CacheSack. We will then discuss the challenges of deploying CacheSack in production, and what lessons we learnt for the future.
Bio:
Arif Merchant is a Research Scientist at Google and leads the Storage Analytics group, which studies interactions between components of the storage stack. His interests include distributed storage systems, storage management, and stochastic modeling. He holds the B.Tech. degree from IIT Bombay and the Ph.D. in Computer Science from Stanford University. He is an ACM Distinguished Scientist.
Counterfactual Reasoning and Safeguards for ML Systems
Siddhartha Sen, Microsoft
Abstract:
Counterfactual reasoning is a powerful idea from reinforcement learning (RL) that allows us to evaluate new candidate policies for a system, without actually deploying those policies. Traditionally, this is done by collecting randomized data from an existing policy and matching this data against the decisions of a candidate policy. In many systems, we observe that a special kind of information exists that can boost the power of counterfactual reasoning. Specifically, system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback about alternative decisions. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited <X min, because time has a cumulative property. In this talk, I will describe a methodology called Sayer that leverages implicit feedback to evaluate and train new RL system policies. I will also show how counterfactual reasoning can provide a foundation for safety in RL systems, by giving bounds on a policy’s performance before it is deployed. Since system operators typically desire stronger assurances of safety, we build on the notion of implicit feedback to identify implicit constraints and structure in a system that can guide the design of safe RL policies.
Bio:
Siddhartha Sen is a Principal Researcher in the Microsoft Research New York City lab. His research trajectory started with distributed systems and data structures, evolved to incorporate machine learning, and is currently most inspired by humans. His current mission is to use AI to design human-oriented and human-inspired systems that advance human skills and empower them to achieve more. Siddhartha received his BS/MEng degrees in computer science and mathematics from MIT, then worked for three years as a developer in Microsoft’s Windows Server team before returning to academia to complete his PhD from Princeton University. Siddhartha’s work on data structures and human/AI gaming has been featured in several textbooks and podcasts.
We will discuss tips and recipes about How to make ML for Systems work. We will also brainstorm on ideas for the PACMI'23 workshop.