Model Monitoring Tutorial

Model Monitoring in Practice (Tutorial)

Overview

Artificial Intelligence (AI) is increasingly playing an integral role in determining our day-to-day experiences. Increasingly, the applications of AI are no longer limited to search and recommendation systems, such as web search and movie and product recommendations, but AI is also being used in decisions and processes that are critical for individuals, businesses, and society. With AI based solutions in high-stakes domains such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. Consequently, it becomes critical to ensure that these models are making accurate predictions, are robust to shifts in the data, are not relying on spurious features, and are not unduly discriminating against minority groups. To this end, several approaches spanning various areas such as explainability, fairness, and robustness have been proposed in recent literature, and many papers and tutorials on these topics have been presented in recent computer science conferences. However, there is relatively less attention on the need for {\em monitoring machine learning (ML) models once they are deployed} and the associated research challenges.

In this tutorial, we first motivate the need for ML model monitoring, as part of a broader AI model governance and responsible AI framework, from societal, legal, customer/end-user, and model developer perspectives, and provide a roadmap for thinking about model monitoring in practice. We then present findings and insights on model monitoring desiderata based on interviews with various ML practitioners spanning domains such as financial services, healthcare, hiring, online retail, computational advertising, and conversational assistants. We then describe the technical considerations and challenges associated with realizing the above desiderata in practice. We provide an overview of techniques/tools for model monitoring. Then, we focus on the real-world application of model monitoring methods and tools, present practical challenges/guidelines for using such techniques effectively, and lessons learned from deploying model monitoring tools for several web-scale AI/ML applications. We present case studies across different companies, spanning application domains such as financial services, healthcare, hiring, conversational assistants, online retail, computational advertising, search and recommendation systems, and fraud detection. We hope that our tutorial will inform both researchers and practitioners, stimulate further research on model monitoring, and pave the way for building more reliable ML models and monitoring tools in the future.

Contributors

Krishnaram Kenthapadi (Fiddler AI, USA)

Hima Lakkaraju (Harvard University, USA)

Pradeep Natarajan (Amazon Alexa AI, USA)

Mehrnoosh Sameki (Microsoft Azure AI, USA)

Model Monitoring in Practice: FAccT'22 & KDD'22 Tutorials

Tutorial Editions

The Web Conference (WWW 2023)
- 1:30 PM - 3:00 PM CT on Monday, May 1, 2023 in Classroom #203, AT&T Hotel and Conference Center [WWW Program Agenda]
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2022)
- 1:30 PM - 4:30 PM ET on Tuesday, August 16, 2022 in Room 201 [KDD Program Agenda]
ACM Conference on Fairness, Accountability, and Transparency (FAccT 2022)
- 2:30 PM - 4 PM PT on Thursday, June 23, 2022 [Link for registered attendees] [Recorded Video; Longer Recorded Video]

FAccT'22 and KDD'22 Tutorial Slides (embedded above)

FAccT'22 Tutorial Video Recording (embedded below; Google Drive containing videos of each section)

Contributor Bios

Krishnaram Kenthapadi is the Chief AI Officer & Chief Scientist of Fiddler AI, an enterprise startup building a responsible AI and ML monitoring platform. Previously, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, privacy, and model understanding initiatives in Amazon AI platform. Prior to joining Amazon, he led similar efforts at the LinkedIn AI team, and served as LinkedIn’s representative in Microsoft’s AI and Ethics in Engineering and Research (AETHER) Advisory Board. Previously, he was a Researcher at Microsoft Research Silicon Valley Lab. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006. His work has been recognized through awards at NAACL, WWW, SODA, CIKM, ICML AutoML workshop, and Microsoft’s AI/ML conference (MLADS). He has published 50+ papers, with 4500+ citations and filed 150+ patents (70 granted). He has presented tutorials on privacy, fairness, explainable AI, and responsible AI in industry at forums such as KDD ’18 ’19, WSDM ’19, WWW ’19 ’20 ’21, FAccT ’20 ’21, AAAI ’20 ’21, and ICML ’21, and instructed a course on AI at Stanford.

Hima Lakkaraju is an assistant professor at Harvard University focusing on explainability, fairness, and robustness of machine learning models. She has also been working with various domain experts in policy and healthcare to understand the real-world implications of explainable and fair ML. Hima has been named as one of the world’s top innovators under 35 by both MIT Tech Review and Vanity Fair. Her research has also received best paper awards at SIAM International Conference on Data Mining (SDM) and INFORMS, and grants from NSF, Google, Amazon, and Bayer. Hima has given keynote talks at various top ML conferences and workshops including CIKM, ICML, NeurIPS, AAAI, and CVPR, and her research has also been showcased by popular media outlets including the New York Times, MIT Tech Review, TIME magazine, and Forbes. More recently, she co-founded the Trustworthy ML Initiative to enable easy access to resources on trustworthy ML and to build a community of researchers/practitioners working on the topic.

Pradeep Natarajan is a principal applied scientist at Amazon’s Alexa AI division. He has over 20 years of experience in developing and deploying large scale machine learning systems in diverse modalities including computer vision, language understanding, speech recognition and financial time series analysis. His work has been published in leading venues including CVPR, ECCV, ACL, EMNLP, ICASSP, and Interspeech. He has served as a Principal Investigator for large DARPA and IARPA programs and developed industry leading technology for analyzing unstructured visual and audio data. He joined the Alexa AI team in 2018 and has been leading efforts for developing computer vision technology to enhance Alexa's voice based interactions and leveraging large language models in multiple applications across Alexa.

Mehrnoosh Sameki is a senior technical program manager at Microsoft, responsible for leading the product efforts on the open source machine learning interpretability and fairness toolkits (InterpretML and Fairlearn) and their platform integration within the Azure Machine Learning platform. She is also an adjunct assistant professor at Boston University, School of Computer Science, where she earned her PhD degree in 2017. She has presented at several industry forums (including Microsoft Build) and tutorials on fairness and responsible AI in industry at forums such as KDD '19, WWW '21, FAccT '21, AAAI '21, and ICML '21.

Tutorial Outline and Description

The tutorial will consist of two parts: ML model monitoring foundations including motivation, definitions, and tools for model monitoring in AI/ML systems (about 1.5 hours) and case studies across different companies, spanning application domains such as hiring, computer vision, lending, fraud detection, sales, and search and recommendation systems, along with open problems and research directions (about 1.5 to 2 hours).

Foundations

Motivation from regulatory, business, and data science perspectives.
Model monitoring desiderata and technical challenges, based on interviews with ML practitioners; Technical challenges such as data drift, concept drift, bias & feature attribution drift, and data integrity & other operational issues with ML models.
Open source and commercial tools for ML model monitoring (e.g., Amazon SageMaker Model Monitor & Clarify; Deequ; Fiddler's Explainable Monitoring; Google Vertex AI Model Monitoring; Microsoft Azure MLOps).

Case Studies (including practical challenges and lessons learned during deployment in industry)

We will present case studies across different companies, spanning application domains such as financial services, healthcare, hiring, conversational assistants, online retail, computational advertising, search and recommendation systems, and fraud detection. We hope that our tutorial will inform both researchers and practitioners, stimulate further research on model monitoring, and pave the way for building more reliable ML models and monitoring tools in the future.

This tutorial is aimed at attendees with a wide range of interests and backgrounds, including researchers interested in knowing about techniques and tools for ML model monitoring as well as practitioners interested in implementing such tools for web-scale AI/ML applications. We will not assume any prerequisite knowledge, and present the intuition underlying various model monitoring techniques to ensure that the material is accessible to all attendees.