The rapid rise of large language models and multi-agent AI systems is transforming healthcare at an unprecedented pace — reshaping how patients access information, how clinicians document care, and how health systems deliver services at scale. Yet as deployment accelerates, so do the risks. Omission errors, bias, automation complacency, and error propagation across complex agentic systems demand rigorous, principled approaches to evaluation and governance.
This tutorial provides a structured introduction to the key technical and methodological challenges in developing trustworthy agentic AI for healthcare. We examine multi-agent workflow design, pre- and post-deployment evaluation methodology, and governance frameworks — drawing on recent empirical findings and real-world deployments spanning virtual primary care, clinical documentation, and other healthcare workflows. A central theme is the inadequacy of current evaluation practice: benchmarks designed for single-model, single-turn settings systematically underestimate failure modes that emerge in interactive, multi-agent clinical agentic systems. We conclude by surfacing open research problems, including the formal characterization of trust hierarchies in agentic systems, evaluation methodology for multi-agent systems, detection of silent performance degradation, and the operationalization of calibrated autonomy across varying healthcare risk contexts. Our goal is to stimulate further research on multi-agent AI systems for healthcare, and enable researchers and practitioners to build more robust and trustworthy healthcare AI applications.
ACM Conference on AI and Agentic Systems (CAIS 2026)
8 AM - 12:30 PM PT on Tuesday, May 26, 2026 in Room TBD [CAIS Program Agenda]
CAIS'26 Tutorial Slides (TBA)
Tutorial Video Recording (TBA)
Dr. Anitha Kannan is part of the founding team and VP of AI at Curai Health. Curai Health, founded in 2017, is an AI-driven virtual primary care clinic focused on making healthcare easily accessible and affordable, by deploying AI/machine learning into clinical workflows. Dr. Kannan leads Curai's AI strategy, development, and deployment so that AI solutions are seamlessly integrated into clinician and patient workflows. Before Curai, she was a senior research scientist at Facebook AI Research and Microsoft Research. Her research impacted Microsoft products, for which she has received many technical awards, including multiple Gold Star awards. She holds a PhD in machine learning from the University of Toronto in 2006 and was a Darwin Fellow at the University of Cambridge.
Dr. Krishnaram Kenthapadi is the Chief Scientist, Healthcare AI at Oracle Health, where he leads the AI initiatives for Clinical AI Agent and other Oracle Health products. Previously, as the Chief AI Officer & Chief Scientist of Fiddler AI, he led initiatives on generative AI (e.g., Fiddler Auditor, an open-source library for evaluating & red-teaming LLMs before deployment; AI safety, observability & feedback mechanisms for LLMs in production), and on AI safety, alignment, observability, and trustworthiness, as well as the technical strategy, customer-driven innovation, and thought leadership for Fiddler. Prior to that, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, privacy, and model understanding initiatives in Amazon AWS AI platform. Prior to joining Amazon, he led similar efforts at the LinkedIn AI team, and served as LinkedIn’s representative in Microsoft’s AI and Ethics in Engineering and Research (AETHER) Advisory Board. Previously, he was a Researcher at Microsoft Research Silicon Valley Lab. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006. His work has been recognized through awards at NAACL, WWW, SODA, CIKM, ICML AutoML workshop, and Microsoft’s AI/ML conference (MLADS). He has published 50+ papers, with 7000+ citations and filed 150+ patents (75 granted). He has presented tutorials on grounding and evaluation for LLMs, trustworthy generative AI, privacy, fairness, explainable AI, ML model monitoring, and responsible AI in industry at forums such as KDD, WSDM, WWW, FAccT, AAAI, and ICML, and instructed a course on responsible AI at Stanford.
The tutorial will consist of the following parts (more details TBA):
This tutorial is aimed at attendees with a wide range of interests and backgrounds both in academia and industry, including researchers interested in knowing about trustworthy healthcare AI challenges, techniques, and best practices as well as practitioners interested in implementing multi-agent systems for various healthcare AI applications. We will not assume any prerequisite knowledge, and present the advances, challenges, and opportunities related to trustworthy agentic AI for healthcare by building intuition to ensure that the material is accessible to all attendees.
More TBA
Sarah Bird, Ben Hutchinson, Krishnaram Kenthapadi, Emre Kiciman, and Margaret Mitchell, Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned, Tutorials at WSDM 2019, WWW 2019, and KDD 2019.
Krishna Gade, Sahin Cem Geyik, Krishnaram Kenthapadi, Varun Mithal, and Ankur Taly, Explainable AI in Industry, Tutorials at KDD 2019, FAccT 2020, and WWW 2020.
Freddy Lecue, Krishna Gade, Fosca Giannotti, Sahin Geyik, Riccardo Guidotti, Krishnaram Kenthapadi, Pasquale Minervini, Varun Mithal, and Ankur Taly, Explainable AI: Foundations, Industrial Applications, Practical Challenges, and Lessons Learned, AAAI 2020 Tutorial.
Krishnaram Kenthapadi, Ben Packer, Mehrnoosh Sameki, Nashlie Sephus, Responsible AI in Industry, Tutorials at AAAI 2021, FAccT 2021, WWW 2021, ICML 2021.
Krishnaram Kenthapadi, Himabindu Lakkaraju, Pradeep Natarajan, Mehrnoosh Sameki, Model Monitoring in Practice, Tutorials at FAccT 2022, KDD 2022, and WWW 2023.
Krishnaram Kenthapadi, Himabindu Lakkaraju, Nazneen Rajani, Trustworthy Generative AI, Tutorials at ICML 2023, KDD 2023, and FAccT 2023.