PrivateNLP 2024

Fifth Workshop on Privacy in Natural Language Processing

Colocated with ACL 2024, Aug 15, 2024, Bangkok, Thailand (and on Zoom)

Overview

Privacy-preserving data analysis has become essential in the age of Large Language Models (LLMs) where access to vast amounts of data can provide gains over tuned algorithms. A large proportion of user-contributed data comes from natural language e.g., text transcriptions from voice assistants.

It is therefore important to curate NLP datasets while preserving the privacy of the users whose data is collected, and train ML models that only retain non-identifying user data.

The workshop aims to bring together practitioners and researchers from academia and industry to discuss the challenges and approaches to designing, building, verifying, and testing privacy preserving systems in the context of Natural Language Processing.

Information about the workshop's topics of interest can be found in the Call for Papers.

News

2024-06-03 If you want to present your accepted ACL Findings paper, please upload your camera-ready PDF to this form https://ruhr-uni-bochum.sciebo.de/s/MucTRip06WyvmUx and let us know at ivan.habernal@uni-paderborn.de by June 15th (AoE). We will strongly prefer papers that can be presented in the poster session in person at the workshop at ACL.

2024-05-23 We will be happy to integrate your accepted relevant ACL* Findings paper to be presented at the workshop! We prefer the Findings papers to be presented in a poster session in person. Further decisions about Findings papers will be made after the direct paper submission deadline (depending on the number of submissions directly to the workshop). You don't have to submit your accepted Findings paper again through our submission system, just wait until June 1st. We will then setup a lightweight submission process (most likely just per e-mail) for your Findings paper.

Agenda

Venue: Bangkok, Thailand

Zoom link: Available here, under "Join Workshop Live"

Date: August 15 2024

Timezone: GMT+7

Key Dates

Submission Deadline: May 17, 2024 Extended to May 30, 2024
Acceptance Notification: June 17, 2024 (Extended to June 21)
Camera-ready versions: July 1, 2024
Workshop: August 15, 2024

Keynote Speaker

Shomir Wilson, Pennsylvania State University, USA

Program

8:50 - 9:00: Welcome and opening remarks
9:00 - 10:00: Keynote (Shomir Wilson) - Understanding Privacy Through the Lens of NLP
10:00 - 10:30: Oral presentations 1
- 10:00 - 10:15: Don't forget private retrieval: distributed private similarity search for large language models (Guy Zyskind, Tobin South, Alex Pentland)
- 10:15 - 10:30: A privacy-preserving approach to ingest knowledge from proprietary to open-source models for medical progress note generation (Sarvesh Soni, Dina Demner-Fushman)
10:30 - 11:00: Coffee break
11:00 - 12:30: Oral presentations 2
- 11:00 - 11:15: A collocation-based method for addressing challenges in word-level metric differential privacy (Stephen Meisenbacher, Maulik Chevli, and Florian Matthes)
- 11:15 - 11:30: Noisy neighbors: Efficient membership inference attacks against LLMs (Filippo Galli, Luca Melis, Tommaso Cucinotta)
- 11:30 - 11:45: Improving authorship privacy: Adaptive obfuscation with the dynamic selection of techniques (Hemanth Kandula, Damianos Karakos, Haoling Qiu, Brian Ulicny)
- 11:45 - 12:00: Unlocking the potential of large language models for clinical text anonymization: A comparative study (David Pissarra, Isabel Curioso, João Alveira, Duarte Pereira, Bruno Ribeiro, Tomás Souper, Vasco Gomes, André Carreiro, Vitor Rolla)
- 12:00 - 12:15: PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding (Krishna Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, Xuebing Zhou)
- 12:15 - 12:30: PrivaT5: A generative language model for privacy policies (Mohammad Zoubi, Santosh T.Y.S.S, Edgar Rosas, Matthias Grabmair)
12:30 - 13:30: Lunch
13:30 - 14:30: Poster session (in-person)
- Cache & Distil: Optimising API Calls to Large Language Models (Guillem Ramírez, Matthias Lindemann, Alexandra Birch, Ivan Titov)
- Proving membership in LLM pretraining data via data watermarks (Johnny Tian-Zheng Wei, Ryan Yixiang Wang, Robin Jia)
- Differentially Private Knowledge Distillation via Synthetic Text Generation (James Flemings, Murali Annavaram)
- Automated Detection and Analysis of Data Practices Using A Real-World Corpus (Mukund Srinath, Pranav Venkit, Maria Badillo, Florian Schaub, C. Lee Giles, Shomir Wilson)
- Anonymization Through Substitution: Words vs Sentences (Vasco Alves, Vitor Rolla, João Alveira, David Pissarra, Duarte Pereira, Isabel Curioso, Andre V Carreiro, Henrique Lopes Cardoso)
- PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs (Peng Dan, Zhihui Fu, Jun Wang)
- TOFU: A Task of Fictitious Unlearning for LLMs (Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary Chase Lipton, J Zico Kolter)
- Rethinking LLM Memorization through the Lens of Adversarial Compression (Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary Chase Lipton, J Zico Kolter)
- Towards More Realistic Extraction Attacks: An Adversarial Perspective (Yash More, Prakhar Ganesh, Golnoosh Farnadi)
14:30 - 15:30: Oral presentations 3
- 14:30 - 14:45: Can LLMs get help from other LLMs without revealing private information? (Florian Hartmann, Duc-Hieu Tran, Peter Kairouz, Victor Cărbune, Blaise Aguera y Arcas)
- 14:45 - 15:00: Characterizing stereotypical bias from privacy-preserving pre-training (Stefan Arnold, Rene Gröbner, Annika Schreiner)
- 15:00 - 15:15: Protecting privacy in classifiers by token manipulation (Re'em Harel, Yair Elboher, Yuval Pinter)
- 15:15 - 15:30: Cloaked Classifiers: Pseudonymization Strategies on Sensitive Classification Tasks (Arij Riabi, Menel Mahamdi, Virginie Mouilleron, Djamé Seddah)
15:30 - 16:00: Coffee break
16:00 - 17:15: Oral presentations 4
- 16:00 - 16:15: Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment (Qizhang Feng, Siva Rajesh Kasa, Hyokun Yun, Choon Hui Teo, Sravan Babu Bodapati)
- 16:15 - 16:30: Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems (Daniel Platnick, Bishoy Abdelnour, Eamon Earl, Rahul Kumar, Zahra Rezaei, Thomas Emmanuel Tsangaris, Faraj Lagum)
- 16:30 - 16:45: Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models (Adel Elmahdy, Ahmed Salem)
- 16:45 - 17:00: Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs (Xiangwen Wang, Jie Peng, Kaidi Xu, Huaxiu Yao, Tianlong Chen)
- 17:00 - 17:15: LLM Dataset Inference: Did you train on my dataset? (Pratyush Maini, Hengrui Jia, Nicolas Papernot, Adam Dziedzic)
17:15 - 17:30: Discussion and wrap-up

Previous Workshops

PrivateNLP at WSDM 2020 - view here

PrivateNLP at EMNLP 2020 - view here

PrivateNLP at NAACL 2021 - view here

PrivateNLP at NAACL 2022 - view here

PrivateNLP at EACL2023 - view here

Contact

For questions/queries regarding the workshop or submission, contact Ivan Habernal: ivan.habernal@uni-padeborn.de