Keynote Speakers

Nathalie Baracaldo
IBM Research - Almaden Lab

Title: What's next? Vertical Federated Learning

Abstract

Federated learning (FL) enables the training of machine learning models among multiple participants without requiring the transmission or sharing of their data. This is in sharp contrast to traditional machine learning that requires all data to be transmitted to a central place. Privacy requirements and regulation have been two important drivers of FL, making it a popular approach for creating machine learning models with data that would be otherwise unavailable for training. Federated learning can be classified into vertical and horizontal, depending on what data is available at each party. Horizontal FL has by far dominated the research space; in horizontal FL, all participants have access to the same type of data and can each create a model in isolation. In contrast, vertical FL has received far less attention despite its utility in real life use cases. In vertical FL each party collects and contributes diverse knowledge and not all participants need to have a label available. This makes it suitable for scenarios where each party cannot train a model in isolation or where ground truth data (labels) are not widely available. In this talk, I will start by defining what is vertical FL as there are multiple definitions in the literature. I will then present some interesting use cases, contrasting the requirements and pragmatics of vertical FL. I will also cover some of my recent work in vertical FL showcasing how different privacy requirements can shape drastically different solutions. Finally, I will conclude showing some open challenges in this area.

Short Bio

Nathalie Baracaldo received a Ph.D. from the University of Pittsburgh, USA. She now leads the AI Security and Privacy Solutions team and is a Research Staff Member at IBM's Almaden Research Center in San Jose, CA.

Nathalie co-edited the book "Federated Learning: A Comprehensive Overview of Methods and Applications". She is the primary investigator for the DARPA program "Guaranteeing AI Robustness Against Deception" (GARD). In 2020, Nathalie received the IBM Master Inventor distinction for her contributions to IBM Intellectual Property and innovation. She has published highly cited papers in peer-reviewed conferences and journals, receiving multiple best paper awards, and is frequently invited as a keynote speaker and panelist.

Further information about Nathalie can be found on her homepage.

Brendan McMahan
Google, Inc.

Title: Advances in Private Cross-Device Federated Learning

Abstract

Privacy for users is a central goal of cross-device federated learning. This talk will begin with a broad view of privacy, highlighting key principles and threat models. We will then deep-dive into some recent advances in providing stronger anonymization properties for cross-device federated learning, including the DP-FTRL algorithm. This algorithm has been deployed successfully at scale, enabling the training and launch of production neural language models trained with user-level differential privacy guarantees. We will also discuss the `federated select` primitive, which provides a useful abstraction for incorporating private information retrieval into federated computations, enabling a variety of new applications.

Short Bio

Brendan McMahan received a Ph.D. in Computer Science from Carnegie Mellon University, USA. He is now Principal Research Scientist at Google, leading efforts on decentralized and privacy-preserving machine learning efforts.

His team pioneered the concept of federated learning and continues to push the boundaries of what is possible when working with decentralized data using privacy-preserving techniques. Previously, he has worked in online learning, large-scale convex optimization, and reinforcement learning.

Additional details on Brendan's activities are available on his homepage.

Fabrizio Silvestri
Sapienza University of Rome

Title: Bridging the Gap: Federated Learning for Large Language Models and Information Retrieval

Abstract

We will explore the intersection of Federated Machine Learning (FML), Large Language Models (LLMs), Information Retrieval (IR), and Search, with a focus on edge computing. The talk will cover decentralized LLM training, efficient model updates, synchronization, and personalization. We will delve into privacy-preserving search systems, relevance feedback, and query expansion in federated settings. We will start by reviewing recent work in model quantization for FML to efficiently and effectively train a neural network model. We will then highlight the risk of adversarial attacks in federated learning systems, such as Byzantine attacks and discuss potential defense mechanisms. The presentation will also introduce machine unlearning as a promising avenue for future research in federated learning

Short Bio

Fabrizio Silvestri received a Ph.D. in Computer Science from the University of Pisa, Italy. He is now Full Professor at the Department of Computer Engineering of Sapienza University of Rome, Italy. Before, he was Research Scientist with Facebook AI London, UK, in the Integrity team and Principal Scientist with Yahoo Research.

His research interests include natural language processing, web search, query interpretation, sponsored search, and native advertising.

He authored around 150 research papers published in the topmost international peer-reviewed journals and conferences. He was invited speaker at several events and co-organized many workshops on information retrieval topics.

More information about Fabrizio can be found on his homepage.

Virginia Smith
Carnegie Mellon University

Title: Evaluating Large-Scale Learning Systems

Abstract

To deploy machine learning models in practice it is critical to have a way to reliably evaluate their effectiveness. Unfortunately, the scale and complexity of modern machine learning systems make it difficult to provide faithful evaluations and gauge performance across potential deployment scenarios. In this talk, I discuss our work addressing challenges in large-scale ML evaluation. First, I explore the problem of evaluating models trained in federated networks of devices, where issues of device subsampling, heterogeneity, and privacy can introduce noise in the evaluation process and make it challenging to provide reliable evaluations. Second, I present ReLM, a system for validating and querying large language models (LLMs). Although LLMs have been touted for their ability to generate natural-sounding text, there is a growing need to evaluate the behavior of LLMs in light of issues such as data memorization, bias, and inappropriate language. ReLM poses LLM validation queries as regular expressions to enable faster and more effective LLM evaluation.

Short Bio

Virginia Smith is an Assistant Professor in the Machine Learning Department at Carnegie Mellon University. Her research spans machine learning, optimization, and distributed systems. Virginia's current work addresses challenges related to optimization, privacy, and robustness in distributed settings to enable trustworthy federated learning at scale.

Virginia's work has been recognized by an NSF CAREER Award, MIT TR35 Innovator Award, Intel Rising Star Award, and faculty awards from Google, Apple, and Meta. Prior to CMU, Virginia was a postdoc at Stanford University and received a Ph.D. in Computer Science from UC Berkeley.

For further details on Virginia, please refer to her homepage.

Page updated

Google Sites

Report abuse

Keynote Speakers

Nathalie BaracaldoIBM Research - Almaden Lab

Brendan McMahanGoogle, Inc.

Fabrizio SilvestriSapienza University of Rome

Virginia SmithCarnegie Mellon University

Nathalie Baracaldo
IBM Research - Almaden Lab

Brendan McMahan
Google, Inc.

Fabrizio Silvestri
Sapienza University of Rome

Virginia Smith
Carnegie Mellon University