Marika Swanberg
🇸🇪 → 🇺🇸
About Me
I am a fifth year PhD Candidate, grateful to be advised by Adam Smith at Boston University, where I am a Hariri Institute graduate student fellow. My work with the BU security group spans the theory and practice of privacy, cryptography, and intersections with the law. My current work focuses on real-world deployments of differential privacy, facilitated by research internships at Tumult Labs and Google. Read an article featuring my research here. I am also on the program committees for Theory and Practice of Differential Privacy (TPDP), and Conference on Computer and Communications Security (CCS).
I am currently on the job market for full-time positions for Summer 2024.
Prior to joining BU, I received my Bachelor of Arts in Mathematics and Computer Science at Reed College in Portland, OR, where I taught Algorithms and Data Structures as a visiting assistant professor in Spring 2023, and I also served as the faculty chaperone for the ski club.
In my free time, I enjoy entertaining my border collie.
I'm interested in operationalizing the theory of privacy in real-world systems.
News
April 2023: I am excited to announce that I'll be a Student Researcher at Google NYC this summer with the DP+Federated team, hosted by Edo Roth!
April 2023: An ongoing work "Attaxonomy: Unpacking Differential Privacy Guarantees Against Practical Adversaries" was accepted to FORC's non-archival track! See you there!
June 2023: Our paper, Control, Confidentiality, and the Right to be Forgotten, was accepted to CCS 2023!
June 2023: Our paper, DP-SIPS was accepted to PETS 23!
March 2023: I am excited to announce that I'll be a PhD Research Intern at Google NYC this summer, hosted by Andres Muñoz Medina and Travis Dick.
January 2023: The algorithm I developed with Sam and Damien at Tumult Labs is now public!
October 2022: After much anticipation, my paper, "Control, Confidentiality, and the Right to be Forgotten" is finally live on arxiv!
September 2022: I am organizing a Boston-area student differential privacy reading group.
August 2022: I will be a visiting professor at Reed College for the Spring 2023 semester!
June 2022: My paper, "Universally Composable End-to-End Secure Messaging" was accepted to CRYPTO 2022!
May 2022: I started my position as Scientist Intern at Tumult Labs for the summer.
May 2022: I passed my qualifying exam! Officially a PhD Candidate
Publications and manuscripts (concise)
A Unified Analysis of Label Inference Attacks. Robert Busa-Fukete, Travis Dick, Claudio Gentile, Andres Munoz Medina, Adam Smith, Marika Swanberg. In submission (preliminary version appeared at the NeurIPS Regulatable ML Workshop).
DP-SIPS: A simpler, more scalable mechanism for differentially private partition selection. Marika Swanberg, Damien Desfontaines, Sam Haney. PETS 2023
Control, Confidentiality, and the Right to Be Forgotten. Aloni Cohen, Adam Smith, Marika Swanberg, Prashant Nalini Vasudevan. CCS 2023
Universally Composable End-to-End Secure Messaging. Ran Canetti, Mayank Varia, Palak Jain, Marika Swanberg. CRYPTO 2022
Differentially Private Sampling from Distributions. Sofya Raskhodnikova, Satchit Sivakumar, Adam Smith, Marika Swanberg. NeurIPS 2022
Improved Differentially Private Analysis of Variance. Ira Globus-Harris, Iris Griffith, Anna Ritz, Adam Groce, Andrew Bray. PoPETS 2019
Invited Talks and Posters
Conferences
Talk at CCS 2023
Talk at Privacy Enhancing Technologies Symposium (PETS) 2023
Lightning talk at 2022 ACM CS and Law Conference.
Lightning talk at CRYPTO 2022
Poster presentation and talk at Neural Information Processing Symposium 2021 (28% of accepted NeurIPS papers)
Workshops
Poster at NeurIPS 2023 Workshop on Regulatable Machine Learning
Invited talk and poster at 2022 Fields Institute Workshop on Differential Privacy and Statistical Data Analysis (invite-only attendance)
Spotlight talk and poster at 2022 Updatable Machine Learning workshop (co-located with ICML)
Featured talk at 2021 Theory and Practice of Differential Privacy
Seminars (full-length)
Internal Google Algorithms seminar (2023)
Invited talk at Data Co-ops seminar led by Kobbi Nissim and Katrina Ligett (2023)
Invited talk at CS + Law Monthly seminar (2023)
Guest lecture on reidentification risks at BU Law course on Healthcare Decisions and Bioethics (2022)
Talk at Boston University Security seminar (2022)
Talk at Boston University Algorithns and Theory seminar (2019)
Talk at Harvard University Differential Privacy Seminar (2019)
Talk at Boston Area Differential Privacy Seminar (2020)
Publications (verbose)
Robert Busa-Fukete, Travis Dick, Claudio Gentile, Andres Munoz Medina, Adam Smith, Marika Swanberg • In submission
Randomized response and label aggregation are two common ways of sharing sensitive label information in a private way. In spite of their popularity in the privacy literature, there is a lack of consensus on how to compare the privacy properties of these two different mechanisms. In this work, we investigate the privacy risk of sharing label information for these privacy enhancing technologies through the lends of label reconstruction advantage measures. A reconstruction advantage measure quantifies the increase in an attacker's ability to infer the true label of an unlabeled example when provided with a private version of the labels in a dataset (e.g., averages of labels from different users of noisy labels otput by randomized response), compared to an attacker that only observes the feature vectors, but may have prior knowledge of the correlation between features and labels. We extend the Expected Attack Utility (EAU) and Advantage of previous work to mechanisms that involve aggregation of labels across different examples. We theoretically quantify this measure for Randomized Response and random aggregates under various correlation assumptions with public features, and then empirically corroborate these findings by quantifying EAU on real-world data. To the best of our knowledge, these are the first experiments where randomized response and label proportions are placed on the same privacy footing. We finally point out that simple modifications to the random aggregate approach can provide extra DP-like protection.
Marika Swanberg, Damien Desfontaines, Sam Haney • PETS 2023
Partition selection, or set union, is an important primitive in differentially private mechanism design: in a database where each user contributes a list of items, the goal is to publish as many of these items as possible under differential privacy.
In this work, we present a novel mechanism for differentially private partition selection. This mechanism, which we call DP-SIPS, is very simple: it consists of iterating the naive algorithm over the data set multiple times, removing the released partitions from the data set while increasing the privacy budget at each step. This approach preserves the scalability benefits of the naive mechanism, yet its utility compares favorably to more complex approaches developed in prior work.
Along the way, this work also gives an alternate definition of approximate zero-concentrated DP, and reports some empirical observations on the utility of other partition selection mechanisms.
Aloni Cohen, Adam Smith, Marika Swanberg, Prashant Nalini Vasudevan • CCS 2023
Recent digital rights frameworks stipulate an ability (e.g., the "right to be forgotten'' embodied in the GDPR and CCPA) to request that a data controller -- roughly, any system that stores and manipulates personal information -- delete one's data.
We ask how deletion should be formalized in complex systems that interact with many parties and store derivative information. There are two broad principles at work in existing approaches to formalizing deletion: confidentiality and control.
We build a unified formalism for deletion that encompasses previous approaches as special cases. We argue that existing work on deletion-as-control (in particular, "machine unlearning'') can be reframed in the language of (adaptive) history independence, for which we provide a general definition. We also show classes of controllers, such as those that publish differentially private statistics without later updating them, that satisfy intuitive notions of deletion and our formal definition, but which fall outside the scope of previous approaches.
Ran Canetti, Mayank Varia, Palak Jain, Marika Swanberg • CRYPTO 2022
We provide a full-fledged security analysis of the Signal end-to-end messaging protocol within the UC framework. In particular:
We formulate an ideal functionality that captures end-to-end secure messaging, in a setting with PKI and an untrusted server, against an adversary that has full control over the network and can adaptively and momentarily compromise parties at any time and obtain their entire internal states. In particular our analysis captures the forward and backwards secrecy properties of Signal and the conditions under which they break.
We model the various components of Signal (PKI and long-term keys, backbone “asymmetric ratchet,” epoch-level symmetric ratchets, authenticated encryption) as individual ideal functionalities that are analyzed separately and then composed using the UC and Global-State UC theorems.
We use the Random Oracle Model to model non-committing encryption for arbitrarylength messages, but the rest of the analysis is in the plain model based on standard primitives. In particular, we show how to realize Signal’s key derivation functions in the standard model, from generic components, and under minimalistic cryptographic assumptions.
Our analysis improves on previous ones in the guarantees it provides, in its relaxed security assumptions, and in its modularity. We also uncover some weaknesses of Signal that were not previously discussed. Our modeling differs from previous UC models of secure communication in that the protocol is modeled as a set of local algorithms, keeping the communication network completely out of scope. We also make extensive, layered use of global-state composition within the plain UC framework. These innovations may be of separate interest.
Sofya Raskhodnikova, Satchit Sivakumar, Adam Smith, Marika Swanberg • NeurIPS 2022 • Gave Featured talk at Theory and Practice of Differential Privacy 2021
We initiate an investigation of private sampling from distributions. Given a dataset with n independent observations from an unknown distribution P, a sampling algorithm must output a single observation from a distribution that is close in total variation distance to P while satisfying differential privacy. Sampling abstracts the goal of generating small amounts of realistic-looking data. We provide tight upper and lower bounds for the dataset size needed for this task for three natural families of distributions: arbitrary distributions on {1,...,k}, arbitrary product distributions on {0,1}^d, and product distributions on on {0,1}^d with bias in each coordinate bounded away from 0 and 1. We demonstrate that, in some parameter regimes, private sampling requires asymptotically fewer observations than learning a description of P nonprivately; in other regimes, however, private sampling proves
to be as difficult as private learning. Notably, for some classes of distributions, the
overhead in the number of observations needed for private learning compared to
non-private learning is completely captured by the number of observations needed
for private sampling.
Marika Swanberg, Ira Globus-Harris, Iris Griffith, Anna Ritz, Adam Groce, Andrew Bray • Proceedings of Privacy Enhancing Technologies Symposium 2019
Hypothesis testing is one of the most common types of data analysis and forms the backbone of scientific research in many disciplines. Analysis of variance (ANOVA) in particular is used to detect dependence between a categorical and a numerical variable. Here we show how one can carry out this hypothesis test under the restrictions of differential privacy. We show that the F-statistic, the optimal test statistic in the public setting, is no longer optimal in the private setting, and we develop a new test statistic F1 with much higher statistical power. We show how to rigorously compute a reference distribution for the F1 statistic and give an algorithm that outputs accurate p-values. We implement our test and experimentally optimize several parameters. We then compare our test to the only previous work on private ANOVA testing, using the same effect size as that work. We see an order of magnitude improvement, with our test requiring only 7% as much data to detect the effect.