# Rutgers/DIMACS Theory of Computing Seminar

Place: Core 301

Time: Wednesdays 11:00 am -- 12:00 Noon

Organizers: Pranjal Awasthi and Shubhangi Saraf

## Directions

For those arriving by train to New Brunswick station, the best way to get to the seminar room is by Rutgers bus. The directions are available by clicking here.

For those driving in, the best strategy is to pre-arrange to pick up a parking tag from one of the organizers upon arrival, and park in lot 64. For directions to Lot 64, click on this link.

## Schedule (Fall 2017)

September 13 Speaker Title

Pranjal Awasthi Efficient PAC Learning from the Crowd

Abstract:

In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. For example, in most cases there are no known computationally efficient learning algorithms that are robust to the high level of noise that exists in crowdsourced data. Furthermore, efforts to eliminate noise through majority voting often require a large number of queries per example.

In this talk I will describe a model of crowdsourcing and algorithms that interleave the process of labeling and learning to get good classifiers with minimal overhead. In particular, I will show that any function class learnable in the traditional PAC model is also learnable in the crowdsourced model where only a noticeable fraction of the labelers are correct. Furthermore, each labeler is required to label only a constant number of examples and each example is only queried a constant number of times on average.

Joint work with Avrim Blum, Nika Haghtalab and Yishay Mansour.

September 20 Matthäus Kleindessner Title

Machine Learning in a Setting of Ordinal Distance Information

Abstract:

In a typical machine learning scenario we are given numerical dissimilarity values between objects (or feature representations of objects, from which such dissimilarity values can readily be computed). In the relaxed setting of ordinal distance information we are only provided with binary answers to comparisons of dissimilarity values such as d(A,B)<d(A,C) instead. This setting has attracted interest in recent years, mainly due to the possibility of simple collection of such data by means of crowdsourcing.

In my talk, I want to present two of my contributions to this field. First, I will talk about a result that states the asymptotic uniqueness of ordinal embeddings. Ordinal embedding, up to now, is the standard approach to machine learning in a setting of ordinal distance information. The idea is to map the objects of a data set to points in a Euclidean space such that the points preserve the given ordinal distance information as well as possible (with respect to the Euclidean interpoint distances). Second, I will introduce data-dependent kernel functions that can be evaluated given only ordinal distance information about a data set. They provide a generic alternative to the ordinal embedding approach and avoid some of its drawbacks. For both works, I want to address a number of open questions.

September 27 Noah Stephens-Davidowitz Title

On the Quantitative Hardness of CVP

**Abstract: **For odd integers p >= 1 (and p = \infty), we show that the Closest Vector Problem in the \ell_p norm (CVP_p) over rank n lattices cannot be solved in 2^{(1-\eps) n} time for any constant \eps > 0 unless the Strong Exponential Time Hypothesis (SETH) fails. We then extend this result to ``almost all'' values of p \geq 1, not including the even integers. This comes tantalizingly close to settling the quantitative time complexity of the important special case of CVP_2 (i.e., CVP in the Euclidean norm), for which a 2^{n +o(n)}-time algorithm is known.

We also show a similar SETH-hardness result for SVP_\infty; hardness of approximating CVP_p to within some constant factor under the so-called Gap-ETH assumption; and other hardness results for CVP_p and CVPP_p for any 1 <= p < \infty under different assumptions.

October 4 (Location: SERC 111) Christos Papadimitriou* * Title

A computer scientist thinks about the Brain

**Abstract**: When key problems in science are revisited from the computational viewpoint, occasionally unexpected progress results. There is a reason for this: implicit algorithmic processes are present in the great objects of scientific inquiry - the cell, the brain, the market - as well as in the models developed by scientists over the centuries for studying them. This unexpected power of computational ideas, sometimes called "the algorithmic lens", has manifested itself in these past few decades in virtually all sciences: natural, life, or social. For example, in statistical physics through the study of phase transitions in terms of the convergence of Markov chain-Monte Carlo algorithms, and in quantum mechanics through quantum computing. This talk will focus on three other instances. Almost a decade ago, ideas and methodologies from computational complexity revealed a subtle conceptual flaw in the solution concept of Nash equilibrium, which lies at the foundations of modern economic thought. In the study of evolution, a new understanding of century-old questions has been achieved through surprisingly algorithmic ideas. Finally, current work in theoretical neuroscience suggests that the algorithmic point of view may be useful in the central scientific question of our era, namely understanding how behavior and cognition emerge from the structure and activity of neurons and synapses.

October 11 (Location Core 431) Ilya Razenshteyn Title:

Practical Data-Dependent Metric Compression with Provable Guarantees

Abstract:

How well can one compress a dataset of points from a

high-dimensional space while preserving pairwise distances? Indyk and

Wagner have recently obtained almost optimal bounds for this problem,

but their construction (based on hierarchical clustering) is not

practical. In this talk, I will show a new practical, quadtree-based

compression scheme, whose provable performance essentially matches that

of the result of Indyk and Wagner.

In additional to the theoretical results, we will see experimental

comparison of the new scheme and Product Quantization (PQ)--one of the

most popular heuristics for distance-preserving compression--on several

datasets. Unlike PQ and other heuristics that rely on the clusterability

of the dataset, the new algorithm ends up being more robust.

The talk is based on a joint work with Piotr Indyk and Tal Wagner.

October 18 Steven Wu Title:

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Abstract: Bandit learning models the common setting when the decisions of an algorithm feed back into its training data, and it cannot observe counter-factuals. These settings include criminal recidivism prediction (would an inmate have committed another crime had he been released?), lending (would an applicant denied a loan have paid it back?), and drug trials (how would a patient have responded to a different drug?) The main tension in bandit learning is balancing exploration with exploitation. However, exploration, which explicitly sacrifices welfare today in exchange for potential future benefit, can be distasteful when decisions pertain to individuals. For example, it might be considered unethical to give drug A to a patient while knowing drug B often cures their illness merely to learn about drug A’s efficacy. In other settings (like predictive policing), a lack of exploration has been blamed for perpetuating unfairness. This motivates studying the performance of the greedy algorithm, which only exploits and never explores. Although this is not a no-regret algorithm, we show that when adversarially selected contexts are perturbed by a small amount of Gaussian noise, it recovers strong no-regret bounds.

October 25 Eshan Chattopadhyay Title: Towards Optimal Randomness Extractors and Ramsey Graphs

Abstract:

I will survey some of the recent exciting progress on

explicit constructions of randomness extractors for independent

sources. Many of the new constructions rely on explicit constructions

of newly introduced pseudorandom primitives, and there remains scope

of finding better explicit constructions of these primitives. I will

also discuss some possible approaches for constructing optimal Ramsey

graphs and Extractors.

November 1 Mark Zhandry Title: The MMap Strikes Back: Conquering Cryptography using Weak Multilinear Maps

Abstract:

Multilinear maps are known to have numerous applications to cryptography. Unfortunately, current proposals for multilinear maps suffer from major security vulnerabilities due to a class of attacks known as “zeroizing” attacks. These attacks have broken many of the desired applications. In this talk, I will outline how these attacks work, and then explain how to use weaked multilinear maps securely for applications, neutralizing the threat of zeroizing attacks. In particular, we show how to build indistinguishability obfuscation, which can be used to build most of cryptography. We also give a “direct” construction of multiparty non-interactive key exchange. We formally prove that our constructions are secure against zeroizing attacks.

* Based on joint work with Sanjam Garg, Fermi Ma, Eric Miles, Pratyay Mukherjee, Amit Sahai, and Akshayaram Srinivasan

November 8 (Location Core 431) Omri Weinstein Title: Crossing the Logarithmic Barrier for Dynamic Boolean Data Structure Lower Bounds

Abstract:

We prove the first super-logarithmic lower bounds on the cell-probe complexity

of dynamic *boolean* (a.k.a. decision) data structure problems, a long-standing

milestone in data structure lower bounds.

We introduce a new technique and use it to prove a (log^{1.5} n) lower bound on the

operational time of a wide range of boolean data structure problems, most notably,

on the query time of dynamic range counting *over F_2* ([Patrascu07]). Proving an

\omega(log n) lower bound for this problem was explicitly posed as one of five important

open problems in the late Mihai Patrascu's obituary [Thorup13]. This also implies the

first \omega(log n) lower bound for the classical 2D range counting problem, one of the

most fundamental data structure problems in computational geometry and spatial databases.

We derive similar lower bounds for boolean versions of dynamic "polynomial evaluation"

and "2D rectangle stabbing", and for the (non-boolean) problems of "range selection"

and "range median".

Our technical centerpiece is a new way of ``weakly" simulating dynamic data structures

using efficient *one-way* communication protocols with small advantage over random

guessing. This simulation involves a surprising excursion to low-degree (Chebychev)

polynomials which may be of independent interest, and offers an entirely new algorithmic

angle on the "cell sampling" method of Panigrahy et al. [PTW10].

Joint work with Kasper Green Larsen and Huacheng Yu.

November 15 Aditya Potukuchi** ** Title: Syndrome decoding of Reed-Muller codes and tensor decomposition over finite fields

Abstract:

In this talk, we will look at decoding Reed-Muller codes beyond their minimum distance when the errors are random (i.e., in the binary symmetric channel). A recent beautiful result of Saptharishi, Shpilka and Volk showed that for binary Reed-Muller codes of length n and degree n - O(1), one can correct polylog(n) random errors in poly(n) time (which is well beyond the worst-case error tolerance of O(1)). We will see two efficient algorithms as well as a different proof of the same result, where the decoding is done given the polylog(n)-bit long syndrome vector of the corrupted codeword:

- The first is via. a connection to the well-studied `tensor decomposition problem'.
- The second via. a reduction to finding all common roots of a space of low degree polynomials, which is also of independent interest.

Joint work with Swastik Kopparty

November 29 Toniann Pitassi Title

December 6 Greg Bodwin Title: The Distance Oracle Hierarchy

Abstract:

A lot of well-studied problems in CS Theory are about making “sketches” of graphs that occupy much less space than the graph itself, but where the shortest path distances of the graph can still be approximately recovered from the sketch. For example, in the literature on Spanners, we seek a sparse subgraph whose distance metric approximates that of the original graph. In Emulator literature, we relax the requirement that the approximating graph is a subgraph. Most generally, in Distance Oracles, the sketch can be an arbitrary data structure, so long as it can approximately answer queries about the pairwise distance between nodes in the original graph.

Research on these objects typically focuses on optimizing the worst-case tradeoff between the quality of the approximation and the amount of space that the sketch occupies. In this talk, we will survey a recent leap in understanding about this tradeoff, overturning the conventional wisdom on the problem. Specifically, the tradeoff is not smooth, but rather it follows a new discrete hierarchy in which the quality of the approximation that can be obtained jumps considerably at certain predictable thresholds.

Includes joint work with Amir Abboud and Seth Pettie.