# GraPFiCs

Foundations of Fairness, Privacy and Causality in Graphs

Thursday, October 19 - Friday, October 20, 2023

Santa Cruz, California

Overview

There is a growing appreciation for the need to understand the ethical and societal implications of AI and algorithmic decision making. Of the many concerns, two of the most basic, fairness and privacy, have taken center stage. Despite potential conflicts, these intertwined principles necessitate joint consideration. For instance, while fairness requires access to sensitive protected attributes, data protection often requires these attributes remain private. The choice of fairness criteria—whether aimed at groups or individuals—also shapes privacy definitions. Although notions like differential privacy are well aligned with notions of individual fairness, they may not align with group fairness. Additionally, recognizing these concepts as social processes with temporal and relational interdependencies is crucial. Handling their complexities in relational contexts requires a unified approach to examine their interplay.

One way to understand this connection is through the lens of causality. Causal reasoning helps us understand how current definitions of fairness and privacy breakdown. By explicitly modeling the causal relationships between variables in data, causal inference helps to identify how sensitive attributes can affect a fair outcome, how information may be inadvertently leaked in released data, or both. In relational domains, it is crucial that these models explicitly represent the relationships between individuals. Further, causal inference with interference, the study of how interventions of interconnected units affect their outcomes, can help us understand dynamic processes in relational, temporal and decision making settings.

While there has been work at the intersections of fairness and causality, privacy and fairness, and there has been work on fairness in graphs and privacy in graphs, there is little work that bridges all four topics. Most research on privacy, fairness and causal inference makes strong independence assumptions about the domain. While powerful, these assumptions often do not hold, especially in relational (e.g., graph structured) domains. This workshop aims to delve into their intersection and explore how they can coexist harmoniously while preserving their individual importance.

The workshop will open with several tutorials that will help ground the discussion, and include invited talks, contributed talks and poster sessions. A goal will be to understand the foundational challenges from computational, statistical and mathematical perspectives. A highlight will be the round tables in which we work to identify research agendas in this area.

This is an NSF TRIPODS Sponsored Workshop organized by the Institute for Foundations of Data Science (IFDS)

and the Institute for Data, Econometrics, Algorithms, and Learning (IDEAL).

## Logistics

The workshop will be held Oct 19 and 20, 2023 in Santa Cruz, CA at the Seymour Marine Discovery Center.

Seymour Marine Discovery Center

100 McAllister Way

Santa Cruz, CA 95060

(831) 459-3800

Organizers:

Lise Getoor, UC Santa Cruz

Elena Zheleva, University of Illinois Chicago

Golnoosh Farnadi, McGill University/University of Montréal/MILA

## Speakers

Kristen Altenburger, Meta

Lu Cheng, University of Illinois Chicago

Maryam Fazel, University of Washington

Golnoosh Farnadi, University of Montreal

Yang Liu, University of California, Santa Cruz

Mesrob Ohanessian, University of Illinois Chicago

Krishna Pillutla, Google Research

Bruno Ribeiro, Purdue University

Babak Salimi, University of California, San Diego

Berk Ustun, University of California, San Diego

Elena Zheleva, University of Illinois Chicago

## Schedule

Day 1 - October 19

08:30 - 09:00 am Breakfast

09:00 - 10:30 am Fairness & Recourse Tutorial

Yang Liu, UCSC

10:30 - 11:00 am Coffee Break

11:00 - 12:30 pm Causal Inference from Graphs Tutorial

Elena Zheleva, UIC

12:30 - 01:30 pm Lunch

01:30 - 02:00 pm Addressing Some Incomplete or Unrepresentative Data Challenges to Fairness

Mesrob Ohannessian, UIC

02:00 - 02:30 pm Insights into Causal Link Prediction Through Causal Lifting

Bruno Ribeiro, Purdue University

02:30 - 02:45 pm Coffee Break

02:45 - 04:00 pm Roundtable Discussion 1

04:00 - 04:15 pm Poster Spotlights

04:15 - 06:00 pm Poster Session & Happy Hour

Day 2 - October 20

08:30 - 09:00 am Breakfast

09:00 - 09:30 am Graphs for Product Innovation: Explanation vs. Prediction Problems

Kristen Altenberger, Meta

09:30 - 10:00 am Unleashing the Power of Randomization in Auditing Differentially Private ML

Krishna Pillutla, Google

10:00 - 10:30 am Causal Inference from Social Networks

Babak Salimi, UCSD

10:30 - 11:00 am Coffee Break

11:00 - 11:30 am Algorithmic Fairness in an Uncertain World

Lu Cheng, UIC

11:30 - 12:00 pm When Personalization Harms Performance

Berk Ustun, UCSD

12:00 - 12:30 pm Multiplayer Performative Prediction

Maryam Fazel, UW

12:30 - 01:30 pm Lunch

01:30 - 02:00 pm The Interplay of Fairness and Privacy: Bridging the Gap

Golnoosh Farnadi, McGill University/University of Montréal/MILA

02:00 - 03:00 pm Roundtable Discussion 2

03:00 - 03:15 pm Concluding Remarks

## Talk Abstracts

## Yang Liu, UCSC

Fairness & Recourse Tutorial

Fairness & Recourse Tutorial

## Elena Zheleva, UIC

Causal Inference from Graphs Tutorial

Causal Inference from Graphs Tutorial

This tutorial presents state-of-the-art research on causal inference from network data in the presence of interference. We start by motivating research in this area with real-world applications, such as measuring influence in social networks and market experimentation. We discuss the challenges of applying existing causal inference techniques designed for independent and identically distributed (i.i.d.) data to relational data, some of the solutions that currently exist and the gaps and opportunities for future research. We present existing network experiment designs for measuring different possible effects of interest. Then we focus on causal inference from observational data, its representation, identification, and estimation. We conclude with research on causal discovery in networks.

## Mesrob Ohannessian, UIC

Addressing Some Incomplete or Unrepresentative Data Challenges to Fairness

Addressing Some Incomplete or Unrepresentative Data Challenges to Fairness

## Bruno Ribeiro, Purdue University

Insights into Causal Link Prediction through Causal Lifting

Insights into Causal Link Prediction through Causal Lifting

In this talk we explore the role of symmetries in predicting the evolution of links on graphs both with and without interventions. Specifically, we will see that standard (associational) temporal link prediction tasks can always be solved by static graph machine learning methods. Further, using invariant theory, we will highlight the limitations of matrix factorizations as graph embeddings when predicting intervention outcomes on links within path-dependent graphs (e.g., friendship recommendations in a social network and product recommendations in e-commerce). Finally, I will introduce the symmetry-based concept of “Causal Lifting" for predicting the effect of link interventions on path-dependent graphs and discuss some applications.

## Kristen Altenberger, Meta

Graphs for Product Innovation: Explanation vs. Prediction Problems

Graphs for Product Innovation: Explanation vs. Prediction Problems

Technological innovations have fundamentally transformed information flow and social behavior. As society’s digital platforms continue to promote instantaneous connectivity, graphs can help advance machine learning (prediction) and causal inference (explanation) methods to better account for complex user behavior. This talk will provide an overview of new methods for addressing interference in A/B tests on networks, how we leverage multi-level networks for prediction, and thoughts on future research directions on a unified, graph-based framework for fairness, causality, and privacy for industry practitioners.

## Krishna Pillutla, Google

Unleashing the Power of Randomization in Auditing Differentially Private ML

Unleashing the Power of Randomization in Auditing Differentially Private ML

We present a rigorous methodology for auditing differentially private machine learning algorithms by adding multiple carefully designed examples called canaries. We take a first principles approach based on three key components. First, we introduce Lifted Differential Privacy (LiDP) that expands the definition of differential privacy to handle randomized datasets. This gives us the freedom to design randomized canaries. Second, we audit LiDP by trying to distinguish between the model trained with K canaries versus K − 1 canaries in the dataset, leaving one canary out. By drawing the canaries i.i.d., LiDP can leverage the symmetry in the design and reuse each privately trained model to run multiple statistical tests, one for each canary. Third, we introduce novel confidence intervals that take advantage of the multiple test statistics by adapting to the empirical higher-order correlations. Together, this new recipe demonstrates significant improvements in sample complexity, both theoretically and empirically, using synthetic and real data. Further, recent advances in designing stronger canaries can be readily incorporated into the new framework.

## Babak Salimi, UCSD

Causal Inference from Social Networks

Causal Inference from Social Networks

## Lu Cheng, UIC

Algorithmic Fairness in an Uncertain World

Algorithmic Fairness in an Uncertain World

Significant progress in the field of fair machine learning (ML) has been made to counteract algorithmic discrimination against marginalized groups. However, fairness remains an active research area that is far from settled. One key bottleneck is the implicit assumption that environments where ML is developed and deployed are certain and reliable. In a world that is characterized by volatility, uncertainty, complexity, and ambiguity, whether what has been developed in algorithmic fairness can still serve its purpose is far from obvious. In this talk, I will discuss how to improve algorithmic fairness under two kinds of predictive uncertainties, i.e., aleatoric uncertainty (i.e., randomness and ambiguity in the data) and epistemic uncertainty (i.e., a lack of data or knowledge), respectively. The former regards historical bias reflected in the data and the latter corresponds to the bias perpetuated or amplified during model training due to lack of data or knowledge. In particular, the first work studies pushing fairness-utility trade-off through aleatoric uncertainty, and the second work investigates fair few-shot learning.

## Berk Ustun, UCSD

When Personalization Harms Performance

When Personalization Harms Performance

Machine learning models often include group attributes like sex, age, and HIV status for the sake of personalization – i.e., to assign more accurate predictions to heterogeneous subpopulations. In this talk, I will describe how such practices inadvertently lead to worsenalization, by assigning unnecessarily inaccurate predictions to minority groups. I will discuss how these effects violate our basic expectations from personalization and describe how these violations arise due to standard practices in model development. I will end by highlighting work on how to address these issues in practice – first, by setting"personalization budgets" to test for worsenalization; second, by developing models where individuals can consent to personalization at prediction time.

## Maryam Fazel, UW

Multiplayer Performative Prediction

Multiplayer Performative Prediction

When multiple learning algorithms interact with user populations, the population data reacts to competing decision makers' actions. In this talk we formulate a game-theoretic model to describe and study this phenomenon, called multi-player performative prediction. We show that for this model under mild assumptions, several algorithms, including repeated minimization and repeated stochastic gradient methods, converge to the so-called "performatively stable" points. Nash equilibria of the game are also of interest, but can be found efficiently only when the game is monotone; we present sufficient conditions for strong monotonicity of the game and use them to develop algorithms for finding Nash equilibria. Joint work with A. Narang, E. Faulkner, D. Drusvyatskiy, and L. Ratliff.

## Golnoosh Farnadi, McGill University/University of Montréal/MILA

The Interplay of Fairness and Privacy: Bridging the Gap

The Interplay of Fairness and Privacy: Bridging the Gap

In today's automated decision-making landscape—healthcare, banking, hiring, and education—machine learning's influence is profound. Ensuring responsible development mandates both algorithmic fairness and privacy. While initially studied separately, recent research explores their synergy, leading to interesting questions on how fairness can be achieved under privacy constraints. Given the intriguing inquiries arising at the intersection of these diverse fields, this talk aims to investigate compatibility and trade-offs, with a focus on developing fair and private models. Lastly, I will discuss the limitations of current technical approaches and outline potential future directions to bridge the gap in algorithmic fairness and privacy.

## Posters

## Ahmed Sayeed Faruk, UIC

Leveraging Heterogeneous Spillover Effects in Maximizing Contextual Bandit Rewards

Leveraging Heterogeneous Spillover Effects in Maximizing Contextual Bandit Rewards

Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of these bandit algorithms is to learn the best arm (i.e., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. However, current approaches ignore potential spillover between interacting users, where the action of one user can impact the actions and rewards of other users. Moreover, spillover may vary for different people based on their preferences and the closeness of ties to other users. This leads to heterogeneity in the spillover effects, i.e., the extent to which the action of one user can impact the action of another. Here, we propose a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. By experimenting on several real-world datasets using prominent linear and non-linear contextual bandit algorithms, we observe that our proposed method leads to significantly higher rewards than existing solutions that ignore spillover.

## Krishna Pillutla, Google

Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choice of the correlation function, giving precise analytical bounds for linear regression and as the solution to a convex program for general convex functions. We show, using these bounds, how correlated noise provably improves upon vanilla DP-SGD as a function of problem parameters such as the effective dimension and condition number. Moreover, our analytical expression for the near-optimal correlation function circumvents the cubic complexity of the semi-definite program used to optimize the noise correlation matrix in previous work. We validate these theoretical results with experiments on private deep learning. Our work matches or outperforms prior work while being efficient both in terms of computation and memory.

## Lijun Ding, UW–Madison

Flat Minima Generalize for Low-Rank Matrix Recovery

Flat Minima Generalize for Low-Rank Matrix Recovery

Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the loss grows slowly -- appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing, robust PCA, covariance matrix estimation, and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. For matrix completion, we establish weak recovery, although empirical evidence suggests exact recovery holds here as well. We conclude with synthetic experiments that illustrate our findings and discuss the effect of depth on flat solutions.

## Reilly Raab, UCSC

Fairness in a Dynamical Context

Fairness in a Dynamical Context

An overview of fairness in the context of feedback-driven dynamics, when deployed machine learning modes and affected populations react each other.

## Shishir Adhikari, UIC

Inferring Causal Effects Under Heterogeneous Peer Influence

Inferring Causal Effects Under Heterogeneous Peer Influence

Causal inference in networks should account for interference, which occurs when a unit's outcome is influenced by treatments or outcomes of peers. There can be heterogeneous peer influence between units when a unit's outcome is subjected to variable influence from different peers based on their attributes and relationships, or when each unit has a different susceptibility to peer influence. Existing solutions to causal inference under interference consider either homogeneous influence from peers or specific heterogeneous influence mechanisms (e.g., based on local neighborhood structure). This paper presents a methodology for estimating individual causal effects in the presence of heterogeneous peer influence due to arbitrary mechanisms. We propose a structural causal model for networks that can capture arbitrary assumptions about network structure, interference conditions, and causal dependence. We identify potential heterogeneous contexts using the causal model and propose a novel graph neural network-based estimator to estimate individual causal effects. We show that existing state-of-the-art methods for individual causal effect estimation produce biased results in the presence of heterogeneous peer influence, and that our proposed estimator is robust.

## Yatong Chen, UCSC

Model Transferability with Responsive Decision Subjects

Model Transferability with Responsive Decision Subjects

Given an algorithmic predictor that is accurate on some source population consisting of strategic human decision subjects, will it remain accurate if the population respond to it? In our setting, an agent or a user corresponds to a sample (X,Y) drawn from a distribution D and will face a model h and its classification result h(X). Agents can modify X to adapt to h, which will incur a distribution shift on (X,Y). Our formulation is motivated by applications where the deployed machine learning models are subjected to human agents, and will ultimately face responsive and interactive data distributions. We formalize the discussions of the transferability of a model by studying how the performance of the model trained on the available source distribution (data) would translate to the performance on its induced domain. We provide both upper bounds for the performance gap due to the induced domain shift, as well as lower bounds for the trade-offs that a classifier has to suffer on either the source training distribution or the induced target distribution. We provide further instantiated analysis for two popular domain adaptation settings, including covariate shift and target shift.

## Zahra Fatemi, UIC

Contagion Effect Estimation Using Proximal Embeddings

Contagion Effect Estimation Using Proximal Embeddings

Contagion effect refers to the causal effect of peers' behavior on the outcome of an individual in social networks. Contagion can be confounded due to latent homophily which makes contagion effect estimation very hard: nodes in a homophilic network tend to have ties to peers with similar attributes and can behave similarly without influencing one another. One way to account for latent homophily is by considering proxies for the unobserved confounders. However, as we demonstrate in this paper, existing proxy-based methods for contagion effect estimation have a very high variance when the proxies are high-dimensional. To address this issue, we introduce a novel framework, Proximal Embeddings (ProEmb), that integrates variational autoencoders with adversarial networks to create low-dimensional representations of high-dimensional proxies and help with identifying contagion effects. While VAEs have been used previously for representation learning in causal inference, a novel aspect of our approach is the additional component of adversarial networks to balance the representations of different treatment groups, which is essential in causal inference from observational data where these groups typically come from different distributions. We empirically show that our method significantly increases the accuracy and reduces the variance of contagion effect estimation in observational network data compared to state-of-the-art methods.