PrivateNLP 2025
Sixth Workshop on Privacy in Natural Language Processing
Colocated with NAACL 2025, Albuquerque (NM), USA (and on Zoom)
Sixth Workshop on Privacy in Natural Language Processing
Colocated with NAACL 2025, Albuquerque (NM), USA (and on Zoom)
Overview
Privacy-preserving data analysis has become essential in the age of Large Language Models (LLMs) where access to vast amounts of data can provide gains over tuned algorithms. A large proportion of user-contributed data comes from natural language e.g., text transcriptions from voice assistants.
It is therefore important to curate NLP datasets while preserving the privacy of the users whose data is collected, and train ML models that only retain non-identifying user data.
The workshop aims to bring together practitioners and researchers from academia and industry to discuss the challenges and approaches to designing, building, verifying, and testing privacy preserving systems in the context of Natural Language Processing.
Information about the workshop's topics of interest can be found in the Call for Papers below.
News
18-03-2025 We will be happy to integrate your accepted relevant ACL* Findings paper to be presented at the workshop! Please send your camera-ready PDF to timour.igamberdiev(at)univie.ac.at by March 27th (AoE). We strongly prefer papers that can be presented at the poster session in person at the workshop at NAACL.
Call for Papers
PrivateNLP invites quality research contributions in different formats:
Original research papers (long and short)
Position and opinion papers
Posters
System Demonstrations
Abstracts for talks, or Discussion Panel proposals
All submissions will undergo a double-blind review process, and accepted submissions will be presented at the workshop.
Topics of interest include but are not limited to:
Privacy preserving machine learning for language models
Generating privacy preserving test sets
Data extraction attacks on NLP systems (e.g. membership inference attacks)
Differential privacy for NLP models and data
Generating Differentially private derived data
NLP, privacy and regulatory compliance
Private Generative Adversarial Networks
Privacy in Active Learning and Crowdsourcing
Privacy and Federated Learning in NLP
User perceptions on privatized personal data
Auditing provenance in language models
Continual learning under privacy constraints
NLP for studying privacy policies and other texts about privacy
Ethical ramifications of AI/NLP in support of usable privacy
Homomorphic encryption for language models
Important Dates
Submission Deadline: January 30, 2025 Extended to February 7, 2025
Fast-track Submission Deadline: February 20, 2025
Acceptance Notification: March 1, 2025 Extended to March 7, 2025
Camera-ready Versions: March 10, 2025 Extended to March 19, 2025
Submission Deadline for Presenting Findings Papers: March 27, 2025
Workshop: May 4, 2025
All deadlines 23:59 Anywhere on Earth
Submission Instructions
Two types of submissions are invited: full papers and short papers. Please follow the NAACL submission policies.
Full papers should not exceed eight (8) pages of text, plus unlimited references. Final versions of full papers will be given one additional page of content (up to 9 pages) so that reviewers' comments can be taken into account.
Short papers may consist of up to four (4) pages of content, plus unlimited references. Upon acceptance, short papers will still be given up to five (5) content pages in the proceedings.
We also ask authors to include a limitation section and broader impact statement, following guidelines from the main conference.
We will be using OpenReview for submissions: https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/PrivateNLP
Please note OpenReview's moderation policy for newly created profiles:
New profiles created without an institutional email will go through a moderation process that can take up to two weeks.
New profiles created with an institutional email will be activated automatically.
No anonymity period will be required for papers submitted to the workshop, per the latest updates to the ACL anonymity policy. However, submissions must still remain fully anonymized.
Fast-Track Submission
If your paper has been reviewed by ACL, EMNLP, EACL, or ARR and the average rating is higher than 2.5 (either average soundness or excitement score), the paper is qualified to be submitted to the fast-track. In the appendix, please include the reviews and a short statement discussing what parts of the paper have been revised.
Link to fast-track submissions: https://ruhr-uni-bochum.sciebo.de/s/oIRoO307eHepft4
Please upload the following 3 documents in a single ZIP file:
ARR reviews (including discussions and the meta-review) as a single PDF (e.g. printing the review webpage to PDF)
The submitted anonymous paper as PDF
A plain text file with the corresponding author's name and contact email
Dual Submission Policy
In addition to previously unpublished work, we invite papers on relevant topics which have been submitted to alternative venues (such as other NLP or ML conferences). Please follow double-submission policy from ACL. Accepted cross-submissions will be presented as posters, with an indication of the original venue. Selection of cross-submissions will be determined solely by the organizing committee.
Non-Archival Option
There are no formatting or page restrictions for non-archival submissions. The accepted papers to the non-archival track will be displayed on the workshop website, but will NOT be included in the workshop proceedings or otherwise archived.
Agenda
Venue: Albuquerque (NM), USA
Zoom link: Available on the Underline page for the workshop
Date: May 4, 2025 (Sunday)
Timezone: GMT-6
Keynote Speaker
Wei Xu, Associate Professor at Georgia Institute of Technology
Bio: Wei Xu is an Associate Professor in the College of Computing and Machine Learning Center at the Georgia Institute of Technology, where she directs the NLP X Lab. Her research interests are in natural language processing and machine learning, with a focus on Generative AI, robustness and multilinguality of large language models, and interdisciplinary research in AI for education, privacy, healthcare, and law. She is a recipient of the NSF CAREER Award, Faculty Research Awards from Google, Sony, and Criteo, CrowdFlower AI for Everyone Award, Best Paper Awards and Honorable Mentions at COLING'18, ACL’23, and ACL’24. She also received research funds from NIH, DARPA, and IARPA.
Title: Empowering Everyday Users to Protect Their Privacy in the Age of AI
Abstract: AI models are rapidly advancing in their ability to answer information-seeking questions. As these models are increasingly deployed in consumer applications, they present significant privacy risks. In this talk, I will share (1) our findings on the alarming extent these models can geolocate images with high precision, identify individuals, and even infer sensitive information from seemingly innocuous data; (2) how we use AI to empower everyday users in protecting their privacy against AI itself.
First, I will demonstrate that current open-source and proprietary VLMs possess highly effective image geolocation capabilities, making widespread geolocation an immediate privacy concern rather than a distant theoretical risk. Our experiments show that GPT-4 can infer precise GPS coordinates from street-view images through multi-turn dialogues, especially when text is present. To address this challenge, we introduce GPTGeoChat, a benchmark for evaluating and training multimodal moderation agents that prevent excessive location disclosure while offering user control over privacy settings.
Second, I will discuss our work on probabilistic reasoning for privacy protection, focusing on estimating the k-anonymity of user-generated text, such as social media posts or exchanges with ChatGPT. We introduce BRANCH, a novel probabilistic reasoning method, that leverages LLMs to factorize a joint probability distribution and estimate the number of individuals matching a given set of attributes mentioned in the text. Additionally, we explore text-based disclosure abstraction as a proactive strategy for privacy preservation in PrivacyMirror, an AI-driven privacy protection tool we are developing. Our models detect and rephrase specific self-disclosures into more general terms while preserving their conversational utility. For example, the statement "I'm 16F" can be transformed into "I'm a teenage girl," reducing users' privacy risks while maintaining their intended meaning.
Program
8:50 - 9:00: Welcome and opening remarks
9:00 - 10:00: Keynote (Wei Xu) - Empowering Everyday Users to Protect Their Privacy in the Age of AI
10:00 - 10:30: Oral session 1
10:00 - 10:30: Investigating User Perspectives on Differentially Private Text Privatization (Stephen Meisenbacher, Alexandra Klymenko, Alexander Karpp, Florian Matthes)
10:30 - 11:00: Coffee break
11:00 - 12:15: Oral session 2
11:00 - 11:25: Balancing Privacy and Utility in Personal LLM Writing Tasks: An Automated Pipeline for Evaluating Anonymizations (Stefan Pasch, Minchul Cha)
11:25 - 11:50: TUNI: A Textual Unimodal Detector for Identity Inference in CLIP Models (Songze Li, Ruoxi Cheng, Xiaojun Jia)
11:50 - 12:15: Beyond De-Identification: A Structured Approach for Defining and Detecting Indirect Identifiers in Medical Texts (Ibrahim Baroud, Lisa Raithel, Sebastian Möller, Roland Roller)
12:15 - 13:45: Lunch
13:45 - 15:25: Oral session 3
13:45 - 14:10: TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods (Gabriel Loiseau, Damien Sileo, Damien Riquet, Maxime Meyer, Marc Tommasi)
14:10 - 14:35: Named Entity Inference Attacks on Clinical LLMs: Exploring Privacy Risks and the Impact of Mitigation Strategies (Adam Sutton, Xi Bai, Kawsar Noor, Thomas Searle, Richard Dobson)
14:35 - 15:00: Inspecting the Representation Manifold of Differentially-Private Text (Stefan Arnold)
15:00 - 15:25: Beyond Reconstruction: Generating Privacy-Preserving Clinical Letters (Libo Ren, Samuel Belkadi, Lifeng Han, Warren Del-Pinto, Goran Nenadic)
15:25 - 15:30: Closing remarks
Committee
Organizers
Ivan Habernal - Ruhr-University Bochum (Germany)
Sepideh Ghanavati - University of Maine (USA)
Shomir Wilson - Pennsylvania State University (USA)
Timour Igamberdiev - University of Vienna (Austria)
Vijayanta Jain - University of Maine (USA)
Program Committee
Afsaneh Razi
Andrea Atzeni
Antoine Boutet
Christina Lohr
Eugenio Martínez Camara
Gergely Acs
Isar Nejadgholi
James Flemings
Kambiz Ghazinour Naini
Lizhen Qu
Natasha Fernandes
Peter Story
Pierre Lison
Riccardo Taiello
Ruyu Zhou
Sai Teja Peddinti
Sebastian Ochs
Travis Breaux
Christos Dimitrakakis
Stephen Meisenbacher
Stefan Arnold
Seyi Feyisetan
Contact
For questions/queries regarding the workshop or submission, please contact: privatenlp25-orga@lists.ruhr-uni-bochum.de