## Societal Concerns in Algorithms and Data Analysis

### Kickoff Tutorials and Lectures

### October 3rd, 4th, 10th, 11th, 2018

### Weizmann Institute of Science

To kick off the program on "**Societal Concerns in Algorithms and Data Analysis**" (SCADA 2018), we will hold a series of **tutorials** and** lectures** at the Weizmann Institute. These will provide an introduction to topics central to the program, as well as providing a venue for program participants to get to know one another. The tutorials are meant for a broad and diverse CS-literate audience.

**Location.** Ziskind Building, Lecture Hall 1, Weizmann Institute of Science (see map).

**Parking:** At any available spot on campus (please specify to guards at the gate that you are coming to an event of the computer science department).

**To register:** please follow the link (registration is required).

# Program

## Wednesday, October 3rd, 10AM-4:30PM

- 10:00-10:15 Coffee
- 10:15-10:30 Opening remarks and introductions
- 10:30-11:30
**Stability and Generalization in Machine Learning****(**Ohad Shamir**,**Weizmann Institute**)** - 11:30-11:45 Break
- 11:45-12:45
**Is Depth Needed for Deep Learning? Circuit Complexity in Neural Networks (**Ohad Shamir**,**Weizmann Institute**)** - 12:45-2:15 Lunch
- 2:15-3:15
**The Cryptographic Lens (**Moni Naor,**)** - 3:15-3:30 Break
- 3:30-4:30
**An Introduction to Fully Homomorphic Encryption**(Zvika Brakerski, Weizmann Institute)

## Thursday, October 4th, 10AM-4:30PM

- 10:00-10:15 Coffee
- 10:15-10:45
**Tools for a revolution in data ownership and usage (**Katrina Ligett**,**Hebrew University) - 10:45-11:15
**Why good rules can be bad**(Sigal Oren, Ben Gurion University) - 11:15-11:30 Break
- 11:30-12:15
**Cryptocurrencies and Societal Concerns**(Eli Ben-Sasson, Technion) - 12:15-1:45 Lunch
- 1:45-2:45
**No Man is an Island: Combining Different People’s Views to Reach a Joint Outcome, Part I**(Omer Lev**,**Ben-Gurion University**)** - 2:45-3:00
**Break** - 3:00-4:00
**No Man is an Island: Combining Different People’s Views to Reach a Joint Outcome, Part II**(Omer Lev**,**Ben-Gurion University**)** - 4:00-4:30
**Market Design Patterns**

## Wednesday, October 10th, 10AM-4:30PM

- 10:00-10:15 Coffee
- 10:15-11:15
**Differential Privacy Tutorial: Part I**(Guy Rothblum**,**Weizmann Institute**)** - 11:15-11:30
- 11:30-12:15
**Differential Privacy Tutorial: Part II**(Guy Rothblum**,**Weizmann Institute**)** - 12:15-12:45
**Differentially Private Change-Point Detection**(Rachel Cummings, Georgia Tech) - 12:45-2:15 Lunch
- 2:15-3:15
**Differential Privacy and Adaptive Data Analysis, Part I**(Uri Stemmer**,**Ben-Gurion University**)** - 3:15-3:30 Break
- 3:30-4:30
**Differential Privacy and Adaptive Data Analysis, Part II**(Uri Stemmer**,**Ben-Gurion University**)**

## Thursday, October 11th, 10AM-4:30PM

- 10:00-10:15 Coffee
- 10:15-11:15
**Reinforcement Learning and Markov Decision Processes, Part I**(Yishay Mansour,**)** - 11:15-11:30
- 11:30-12:15
**Reinforcement Learning and Markov Decision Processes, Part II**(Yishay Mansour,**)** - 12:15-12:45
**Societal Concerns in Targeted Advertising**(Aleksandra Korolova, USC) - 12:45-2:15 Lunch
- 2:15-3:15
**A Tutorial on Fairness for Classification, Part I**(Michael Kim**,**Stanford University**)** - 3:15-3:30 Break
- 3:30-4:30
**A Tutorial on Fairness for Classification, Part II**(Michael Kim**,**Stanford University**)**

**Organizers.** Moni Naor and Guy Rothblum

## Abstracts

**Stability and Generalization in Machine Learning**

**Ohad Shamir****, Weizmann Institute**

I will present a short introductory tutorial on statistical generalization in machine learning, and its connections to algorithmic stability and privacy. The tutorial is aimed towards those without prior knowledge about these topics.

**Is Depth Needed for Deep Learning? Circuit Complexity in Neural Networks **

**Ohad Shamir****, Weizmann Institute**

Deep learning, as its name indicates, is based on training artificial neural networks with many layers. A key theoretical question is to understand why such depth is beneficial, and when is it provably necessary to express certain types of functions. In fact, this question is closely related to circuit complexity, which has long been studied in theoretical computer science -- albeit for different reasons, and for circuits which differ in some important ways from modern neural networks. Despite this similarity, the interaction between the circuit complexity and machine learning research communities is currently quite limited. In this talk, I'll survey some of the recent depth separation results developed in the machine learning community, and discuss open questions where insights from circuit complexity might help.

The talk is aimed at a general theoretical computer science audience, and no prior knowledge about deep learning will be assumed.

**The Cryptographic Lens **

**Moni Naor****, Weizmann Institute**

Cryptography deals with methods for protecting the privacy, integrity and functionality of computer and communication systems. In this short tutorial I will introduce some of the key cryptographic ideas that are essential to Societal Concerns in Algorithms and Data Analysis. These include : adversarial thinking, tight relationship with complexity, Indistiguishability, Simulation and Zero-knowledge.

**An Introduction to Fully Homomorphic Encryption**

**Zvika Brakerski****, Weizmann Institute**

A fully homomorphic encryption scheme scheme (FHE) is a method of encryption where an encryption of a message x can be converted into an encryption of a related message f(x), for any f, without compromising the security of the encryption. This means that data processing can be performed in an encrypted manner, without revealing the underlying information. This holds a promise for many applications such as encrypted cloud storage which can also perform computational operations on stored data without compromising its privacy. FHE had been first suggested as an idea by Rivest, Adleman and Dertouzos in 1978, but finding a candidate scheme had been a long lasting open problem, finally solved by Gentry in 2009. Since then, we have seen very rapid progress in terms of security and efficiency.

In the talk I intend to provide some introduction to FHE and describe how to construct FHE based on the hardness of the learning with errors (LWE) problem.

**Tools for a revolution in data ownership and usage**

**Katrina Ligett****, Hebrew University**

We present a rethinking of the model of data ownership and usage, and discuss the research agenda that would support such developments. In particular, we will discuss many open problems in the space of mechanism design, incentives, and valuing data. Joint work with Kobbi Nissim.

**Why good rules can be bad **

**Sigal Oren****, Ben Gurion University**

We will discuss two different settings. The first setting is evolving social groups. These are groups in which the group members decide whether or not to admit a candidate to the group. In our model, every group member is characterized by his opinion, which is represented as a point on the real line. The group evolves in discrete time steps through a voting process carried out by the group's members. Due to homophily, each member votes for the candidate who is more similar to him (i.e., closer to him on the line). An admission rule is then applied to determine which candidate, if any, is admitted. We consider several natural admission rules including majority and consensus and ask how different admission rules affect the composition of the group in the long term.

Time permitting we will discuss a second setting of scientific credit allocation. We present a stylized model and show that in our model allocating credit fairly is at odds with maximizing the social welfare.

Based on joint works with: Noga Alon, Michal Feldman, Jon Kleinberg, Yishay Mansour and Moshe Tennenholtz.

**Cryptocurrencies and Societal Concerns **

**Eli Ben-Sasson****, Technion**

Cryptocurrencies, led by Bitcoin, are attempting to challenge conventional financial, economic and societal structures. This talk will focus on societal aspects of cryptocurrencies: (1) the "old" problems they attempt to solve and (2) the "new" problems they raise.

**No Man is an Island: Combining Different People’s Views to Reach a Joint Outcome**

**Omer Lev****, Ben-Gurion University**

The way people interact with each other and reach decisions and outcomes that involve all of them has been an object of research for at least a thousand years. We will explore two main topics:

1. Voting: Various methods that aggregate different views and beliefs into a single decision. We will discuss some voting systems, and the impossibility results regarding them, shown in the last 70 years. We will then show some methods developed since, particularly in computer science, that deal with these impossibilities, and some use-cases for the voting systems.

2. Cake-cutting: Despite the name, no food will be served during this part. It is, however, a discussion of how a group of people can divide something they are all interested in. We will discuss some desirable properties, and show some techniques to reach them, and note the results which we are still waiting for.

**Market Design Patterns **

**Avinatan Hassidim****, Bar Ilan University**

Market design patterns guide us on the practical aspects of redesigning a market. In the talk, we will define the concept, present some patterns and anti patterns, and discuss how we encountered them in the redesign of several matching markets in Israel.

Based on joint works with Assaf Romm, Ran Shorrer and Scott Kominers.

**Differential Privacy Tutorial**

**Guy Rothblum****, Weizmann Institute**

Differential privacy provides a rigorous and robust privacy guarantee to individuals in the context of statistical data analysis. It has sparked a revolution in the study of privacy-preserving data analysis, impacting fields from computer science to statistics to legal scholarship.

This tutorial will outline some highlights from the differential privacy literature, from composition theorems and algorithmic techniques to impossibility and hardness results. No prior knowledge of the field will be assumed.

**Differentially Private Change-Point Detection**

**Rachel Cummings****, Georgia Tech**

(joint work with Sara Krehbiel, Yajun Mei, Rui Tuo, and Wanrong Zhang)

The change-point detection problem seeks to identify distributional changes at an unknown change-point k* in a stream of data. This problem appears in many important practical settings involving personal data, including biosurveillance, fault detection, finance, signal detection, and security systems. The field of differential privacy offers data analysis tools that provide powerful worst-case privacy guarantees. We study the statistical problem of change-point detection through the lens of differential privacy. We give private algorithms for both online and offline change-point detection, analyze these algorithms theoretically, and provide empirical validation of our results.

Arxiv version: https://arxiv.org/abs/1808.10056

**Differential Privacy and Adaptive Data Analysis**

**Uri Stemmer****, Ben-Gurion University**

One of the primary enemies of data-driven endeavors is *overfitting*, which occurs when algorithms begin to "memorize" data rather than "learning" to generalize from a trend. Traditional methods for avoiding overfitting require that the analysis to be performed on a given data set is selected independently of the data itself. In real life, however, the data is often reused adaptively across multiple stages of the analysis, or across different scientific studies.

Recent work by Dwork et al. initiated the formal study of this problem, and related it to the concept of differential privacy. Informally, they showed that differential privacy *guarantees *generalization, meaning that if the (adaptive) interaction with the dataset is done with differential privacy, then the interaction would not lead to overfitting. Resilience to adaptivity issues follows "for free" from composition theorems for differential privacy.

We will review some known results in this area and highlight the connection to differential privacy.

**Reinforcement Learning and Markov Decision Processes**

**Yishay Mansour****, Tel Aviv University**

Reinforcement learning studies a dynamic environment, where the learner's actions influence the state of the environment, which in turn influences the future rewards of the learner. The goal of the learner is to maximize the long-term reward. The common model for reinforcement learning is Markov Decision Processes (MDP)

I will give a tutorial on reinforcement learning and MDP. I will (try) and cover the following topics:

- Mathematical model of Markov Decision Processes (MDP)
- Planning in MDP: computing an optimal policy
- Learning in (unknown) MDP
- Large (exponential) state MDP
- Partially Observable MDP (time permitting)

This tutorial is intended to be interactive with the audience participation :-)

**Societal Concerns in Targeted Advertising **

**Aleksandra Korolova****, USC**

Targeted advertising finances a large fraction of today’s Internet. In this talk, I will give an overview of the powerful ad targeting capabilities provided by Facebook, and describe their implications for ability to violate privacy, engage in predatory behaviors and, intentionally or unintentionally, run discriminatory campaigns.

**A Tutorial on Fairness for Classification**

**Michael Kim****, Stanford University**

As algorithmic prediction systems have become more widespread, so too have concerns that these systems may be discriminatory against groups of people protected by laws and ethics. To address these concerns, there is a growing research effort to define formally what we mean when we say a procedure is ``discriminatory", and to provide algorithms that provably prevent such forms of discrimination.

This tutorial will give an overview of the progress towards establishing these frameworks for fairness in classification systems. We will highlight the main definitions of fairness including notions of group fairness and individual fairness, as well as recent notions interpolating between the two (multigroup fairness). We will compare these notions of fairness in terms of the protections they provide, their efficiency (both computational and information-theoretic), and the degree to which they do (or do not) constrain utility. We will conclude with a discussion of other recent works and open research questions, particularly focusing on connections between fairness and other SCADA topics (e.g. incentives, privacy, adversarial robustness).