Marika Swanberg

marikas@bu.edu

🇸🇪 → 🇺🇸

About Me

I am a fifth year PhD Candidate, grateful to be advised by Adam Smith at Boston University, where I am a  Hariri Institute graduate student fellow. My work with the BU security group spans the theory and practice of privacy, cryptography, and intersections with the law. My current work focuses on real-world deployments of differential privacy, facilitated by research internships at Tumult Labs and Google. Read an article featuring my research here. I am also on the program committees for Theory and Practice of Differential Privacy (TPDP),  and Conference on Computer and Communications Security (CCS).

I am currently on the job market for full-time positions for Summer 2024.

Prior to joining BU, I received my Bachelor of Arts in Mathematics and Computer Science at Reed College in Portland, OR, where I taught Algorithms and Data Structures as a visiting assistant professor in Spring 2023, and I also served as the faculty chaperone for the ski club.

In my free time, I enjoy entertaining my border collie. 


I'm interested in operationalizing the theory of privacy in real-world systems.

News

Publications and manuscripts (concise)

Invited Talks and Posters

Conferences

Workshops

Seminars (full-length)

Publications (verbose)

Robert Busa-Fukete, Travis Dick, Claudio Gentile, Andres Munoz Medina, Adam Smith, Marika Swanberg  •  In submission


Randomized response and label aggregation are two common ways of sharing sensitive label information in a private way. In spite of their popularity in the privacy literature, there is a lack of consensus on how to compare the privacy properties of these two different mechanisms. In this work, we investigate the privacy risk of sharing label information for these privacy enhancing technologies through the lends of label reconstruction advantage measures. A reconstruction advantage measure quantifies the increase in an attacker's ability to infer the true label of an unlabeled example when provided with a private version of the labels in a dataset (e.g., averages of labels from different users of noisy labels otput by randomized response), compared to an attacker that only observes the feature vectors, but may have prior knowledge of the correlation between features and labels. We extend the Expected Attack Utility (EAU) and Advantage of previous work to mechanisms that involve aggregation of labels across different examples. We theoretically quantify this measure for Randomized Response and random aggregates under various correlation assumptions with public features, and then empirically corroborate these findings by quantifying EAU on real-world data. To the best of our knowledge, these are the first experiments where randomized response and label proportions are placed on the same privacy footing. We finally point out that simple modifications to the random aggregate approach can provide extra DP-like protection.

Marika Swanberg, Damien Desfontaines, Sam Haney •  PETS 2023


Partition selection, or set union, is an important primitive in differentially private mechanism design: in a database where each user contributes a list of items, the goal is to publish as many of these items as possible under differential privacy. 


In this work, we present a novel mechanism for differentially private partition selection. This mechanism, which we call DP-SIPS, is very simple: it consists of iterating the naive algorithm over the data set multiple times, removing the released partitions from the data set while increasing the privacy budget at each step. This approach preserves the scalability benefits of the naive mechanism, yet its utility compares favorably to more complex approaches developed in prior work. 


Along the way, this work also gives an alternate definition of approximate zero-concentrated DP, and reports some empirical observations on the utility of other partition selection mechanisms.

Aloni Cohen, Adam Smith, Marika Swanberg, Prashant Nalini Vasudevan  •  CCS 2023


Recent digital rights frameworks stipulate an ability (e.g., the "right to be forgotten'' embodied in the GDPR and CCPA) to request that a data controller -- roughly, any system that stores and manipulates personal information -- delete one's data. 


We ask how deletion should be formalized in complex systems that interact with many parties and store derivative information. There are two broad principles at work in existing approaches to formalizing deletion: confidentiality and control


We build a unified formalism for deletion that encompasses previous approaches as special cases. We argue that existing work on deletion-as-control (in particular, "machine unlearning'') can be reframed in the language of (adaptive) history independence, for which we provide a general definition. We also show classes of controllers, such as those that publish differentially private statistics without later updating them, that satisfy intuitive notions of deletion and our formal definition, but which fall outside the scope of previous approaches.

Ran Canetti, Mayank Varia, Palak Jain, Marika Swanberg  •  CRYPTO 2022 


We provide a full-fledged security analysis of the Signal end-to-end messaging protocol within the UC framework. In particular: 


Our analysis improves on previous ones in the guarantees it provides, in its relaxed security assumptions, and in its modularity. We also uncover some weaknesses of Signal that were not previously discussed. Our modeling differs from previous UC models of secure communication in that the protocol is modeled as a set of local algorithms, keeping the communication network completely out of scope. We also make extensive, layered use of global-state composition within the plain UC framework. These innovations may be of separate interest. 

Sofya Raskhodnikova, Satchit Sivakumar, Adam Smith, Marika Swanberg • NeurIPS 2022 • Gave Featured talk at Theory and Practice of Differential Privacy 2021


We initiate an investigation of private sampling from distributions. Given a dataset with n independent observations from an unknown distribution P, a sampling algorithm must output a single observation from a distribution that is close in total variation distance to P while satisfying differential privacy. Sampling abstracts the goal of generating small amounts of realistic-looking data. We provide tight upper and lower bounds for the dataset size needed for this task for three natural families of distributions: arbitrary distributions on {1,...,k}, arbitrary product distributions on {0,1}^d, and product distributions on on {0,1}^d with bias in each coordinate bounded away from 0 and 1. We demonstrate that, in some parameter regimes, private sampling requires asymptotically fewer observations than learning a description of P nonprivately; in other regimes, however, private sampling proves

to be as difficult as private learning. Notably, for some classes of distributions, the

overhead in the number of observations needed for private learning compared to

non-private learning is completely captured by the number of observations needed

for private sampling.


Marika Swanberg, Ira Globus-Harris, Iris Griffith, Anna Ritz, Adam Groce, Andrew Bray  •  Proceedings of Privacy Enhancing Technologies Symposium 2019


Hypothesis testing is one of the most common types of data analysis and forms the backbone of scientific research in many disciplines. Analysis of variance (ANOVA) in particular is used to detect dependence between a categorical and a numerical variable. Here we show how one can carry out this hypothesis test under the restrictions of differential privacy. We show that the F-statistic, the optimal test statistic in the public setting, is no longer optimal in the private setting, and we develop a new test statistic F1 with much higher statistical power. We show how to rigorously compute a reference distribution for the F1 statistic and give an algorithm that outputs accurate p-values. We implement our test and experimentally optimize several parameters. We then compare our test to the only previous work on private ANOVA testing, using the same effect size as that work. We see an order of magnitude improvement, with our test requiring only 7% as much data to detect the effect.