NeurIPS 2023 Workshop:

Competition on Privacy Preserving
Federated Learning Document VQA

Friday, December 15th 2023 | New Orleans, USA

Overview

This workshop is directly associated to the NeurIPS 2023 competition on "Privacy Preserving Federated Learning Document VQA" (PFL-DocVQA).

The objective of PFL-DocVQA is to develop privacy-preserving solutions for fine-tuning multi-modal language models for document understanding on distributed data. We seek efficient federated learning solutions for fine-tuning a pre-trained generic Document Visual Question Answering (DocVQA) model on a new domain, that of invoice processing.

Automatically managing the information of document workflows is a core aspect of business intelligence and process automation. Reasoning over the information extracted from documents fuels subsequent decision-making processes that can directly affect humans, especially in sectors such as finance, legal or insurance. At the same time, documents tend to contain private information, restricting access to them during training. This common scenario requires training large-scale models over private and widely distributed data.


This workshop aims to highlight the important synergies between the Document Intelligence and the Privacy communities. During the workshop, we will review the NeurIPS competition setup and the results of submitted methods, while we will invite winning teams from the competition to present their methods.

This will be coupled with invited talks on differential privacy, federated learning and document intelligence.


The PFL-DocVQA competition and associated workshop are organised under the aegis of the European Network on Safe and Secure AI (ELSA).

Find more details about the associated competition at the competition site.

ELSA Sponsored Prizes for Competition Winners

We are pleased to announce that the European Network on Safe and Secure AI (ELSA) will reimburse travel costs of up to 3,000 EUR for a single representative from the top competition teams to facilitate their participation to the workshop!

In addition, winners will receive a free registration to NeurIPS, .

The above are conditioned to the competitioon teams presenting their methods to the NeurIPS workshop.

Important Dates

Program

The workshop will take place at NeurIPS 2023 (Room 356, New Orleans Ernest N. Morial Convention Center) on Friday 15 Dec., from 9:00 to 12:00 (New Orleans time). See the NeurIPS page of the workshop for further technical information.

09:00 - 09:05    Opening remarks

09:05 - 09:20    Presentation of the ELSA network - Mario Fritz

09:20 - 09:50    Invited Talk: On Privacy and Personalization in Federated Learning - Virginia Smith

09:50 - 10:20    Invited Talk: Advancing Privacy and Dataset Augmentation in Medical and Chart Data Using AI-Driven Image Editing - David Doermann

10:20 - 10:30    Coffee Break

10:30 - 11:00    Invited Talk: Privacy side-channels in machine learning systems - Florian Tramèr

11:00 - 11:25    Overview of the competition: Datasets, Metrics, and Results

11:25 - 11:55    Presentations from winning teams

11:55 - 12:00    Closing Remarks

Invited Talks

Virginia Smith

Carnegie Mellon University

On Privacy and Personalization
in Federated Learning 

Abstract: A defining trait of federated learning is the presence of heterogeneity, i.e., that data may differ between clients in the network. In this talk I discuss how heterogeneity affects issues of privacy and personalization in federated settings. First, I present our work on private personalized learning in cross-device settings, where we show that personalized FL provides unique benefits when enforcing client-level differential privacy in heterogeneous networks. Second, I explore cross-silo settings, where differences in privacy granularity introduce new dynamics in terms of the privacy/utility trade-offs of personalized FL. I end by discussing our application of these works to privacy-preserving pandemic forecasting in the recent UK-US privacy-enhancing technologies prize challenge, and highlight promising directions of future work on privacy and personalization in FL. 

Bio: Virginia Smith is the Leonardo Assistant Professor of Machine Learning at Carnegie Mellon University. Her research spans machine learning, optimization, and distributed systems. Virginia’s current work addresses challenges related to optimization, privacy, and robustness in distributed settings to enable trustworthy federated learning at scale. Virginia’s work has been recognized by several awards, including an NSF CAREER Award, MIT TR35 Innovator Award, Intel Rising Star Award, and faculty awards from Google, Apple, and Meta. Prior to CMU, Virginia was a postdoc at Stanford University and received a Ph.D. in Computer Science from UC Berkeley. 

David Doermann

University of Buffalo

Advancing Privacy and Dataset Augmentation in Medical and Chart Data Using AI-Driven Image Editing 

Abstract: In the current landscape where data privacy intersects with the ever-growing demand for comprehensive datasets, this talk introduces a novel approach employing large language models (LLMs) for image-based editing, targeting medical images and chart image data. This technique emphasizes preserving data integrity while ensuring the utmost privacy and confidentiality. We delve into utilizing LLMs to interpret and manipulate data visualizations, including diverse chart forms like bar graphs, pie charts, and line plots, alongside medical imagery such as X-rays, MRIs, and CT scans. The LLMs discern and subtly modify particular data elements or features within these images. In chart data, this pertains to altering specific data points without skewing the overarching trends or statistical relevance. Medical imagery involves modifying or removing identifiable markers while retaining diagnostic value.

A significant aspect of our methodology is its role in data augmentation. For chart data, we generate synthetic images mirroring real data trends and enhancing datasets while adhering to privacy norms. In the realm of medical data, we create realistic, anonymized images that expand the scope of datasets, crucial in areas plagued by data scarcity, such as rare diseases or specific medical conditions.

This talk will showcase the efficacy of our approach through various case studies and experimental analyses. We will also address the ethical implications and potential constraints of using AI in this context, providing a glimpse into the future of secure data handling and augmentation in the AI era. This presentation is an invitation to explore the intersection of AI and data privacy, specifically in medical and chart data. It is a journey through the innovative ways large language models are redefining data enhancement and privacy preservation.

Florian Tramèr

ETH Zürich

Privacy side-channels in machine learning systems

Abstract: Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates than is otherwise possible for standalone models.

Bio: Florian Tramèr is an assistant professor of computer science at ETH Zurich. His research interests lie in Computer Security, Cryptography and Machine Learning security. In his current work, he studies the worst-case behavior of Deep Learning systems from an adversarial perspective, to understand and mitigate long-term threats to the safety and privacy of users.


Organizers