# Rutgers/DIMACS Theory of Computing Seminar

Place: Core 301

Time: Wednesdays 11:00 am -- 12:00 Noon

Organizers: Pranjal Awasthi and Shubhangi Saraf

## Directions

For those arriving by train to New Brunswick station, the best way to get to the seminar room is by Rutgers bus. The directions are available by clicking here.

For those driving in, the best strategy is to pre-arrange to pick up a parking tag from one of the organizers upon arrival, and park in lot 64. For directions to Lot 64, click on this link.

## Schedule (Spring 2018)

Jaunary 24 Shay Moran Title: On the expressiveness of comparison queries.

Abstract:

Comparisons are a classical and well studied algorithmic tool that is used in a variety of contexts and applications.

I will present three examples that manifest the expressiveness of these queries in information theory, machine learning, and complexity theory.

20 (simple) questions. A basic combinatorial interpretation of Shannon's entropy function is via the "20 questions" game. This cooperative game is played by two players, Alice and Bob: Alice picks a distribution π over the numbers {1,…,n}, and announces it to Bob. She then chooses a number xaccording to π , and Bob attempts to identify x using as few Yes/No queries as possible, on average. An optimal strategy for the "20 questions" game is given by a Huffman code for π: Bob's questions reveal the codeword for x bit by bit. This strategy finds x using fewer than H(π)+1 questions on average. However, the questions asked by Bob could be arbitrary. We investigate the following question: Are there restricted sets of questions that match the performance of Huffman codes? We show that for every distribution π, Bob has a strategy that uses only questions of the form "x<c?" and "x=c?", and uncovers x using at most H(π)+1 questions on average, matching the performance of Huffman codes in this sense.

Active classification with comparison queries. This part concerns an extension of active learning in which the learning algorithm may ask the annotator to compare the distances of two examples from the boundary of their label-class. For example, in a recommendation system application (say for restaurants), the annotator may be asked whether she liked or disliked a specific restaurant (a label query); or which one of two restaurants did she like more (a comparison query). We focus on the class of half-spaces, and show that under natural assumptions, such as large margin or bounded bit-description of the input examples, it is possible to reveal all the labels of a sample of size n using approximately O(logn) queries.

Nearly optimal linear decision trees for k-SUM and related problems. We use the above active classification framework to construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant k, we construct linear decision trees that solve the k-SUM problem on n elements using O(n log n) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k-subsets; when viewed as linear queries, comparison queries are 2k-sparse and have only {−1,+1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms.

Based on joint works with Yuval Dagan, Yuval Filmus, Ariel Gabizon, Daniel Kane, Shachar Lovett, and Jiapeng Zhang.

January 31 Bhargav Narayanan Title: Monotonicity problems in graph theory

Abstract: I'll talk about a few different simple-looking problems in percolation and random walks that all ask for the same innocuous thing-- monotone behaviour. I'll also say a few words about what (little) I can do. I will assume no background knowledge in the area.

February 7 Mina Ghashami Title: Matrix Sketching over Streams

Abstract:

It is common to represent data in the form of a matrix, and a large set of data analytic tasks rely on obtaining a low-rank approximation of the data matrix. Such approximations can be computed using the Singular Value Decompositions (SVD). In many scenarios, however, data matrices are extremely large and computing their SVD exactly is infeasible. Efficient approximate solutions exist for distributed setting or when data access otherwise is limited. In the data streaming model, the data points are presented to the algorithm one by one in an arbitrary order. The algorithm is tasked with processing the stream in one pass while being severely restricted in its memory footprint. At the end of the stream, the algorithm must provide a sketch matrix which is a good approximation of the original data.

In this talk, we will discuss two recent matrix sketching methods over data streams.

February 14 Gil Cohen

Title: Explicit Binary Tree Codes with Polylogarithmic Size Alphabet

Abstract:

In this talk, we consider the problem of explicitly constructing a binary tree code with constant distance and constant alphabet size. We give an explicit binary tree code with constant distance and alphabet size polylog(n), where n is the depth of the tree. This is the first improvement over a two-decade-old construction that has an exponentially larger alphabet of size poly(n). For analyzing our construction, we prove a bound on the number of integral roots a real polynomial can have in terms of its sparsity with respect to a suitable basis--a result of independent interest.

Joint work with Bernhard Haeupler and Leonard Schulman.

February 21 Mrinal Kumar

Title : Some closure results for polynomial factorization

In a sequence of  extremely fundamental results in the 80's, Kaltofen showed that any factor of n-variate
polynomial with degree and arithmetic circuit size poly(n) has an arithmetic circuit of size
poly(n). In other words, the complexity class VP is closed under taking factors. A very basic
question in this context is to understand if other natural classes of multivariate polynomials, for
instance, arithmetic formulas, algebraic branching programs, bounded depth arithmetic circuits
or the class VNP, are closed under taking factors.

I will talk about the following two results, whose study is motivated by these questions.

1. VNP is closed under taking factors.
This confirms a conjecture of B{\"u}rgisser, and improves upon a recent result of Dutta, Saxena and
Sinhababu who showed a quasi-polynomial upper bound on the number of auxiliary variables, and
the complexity of the verifier circuit of factors of polynomials in VNP.

2. All factors of degree at most (log n)^a of polynomials with poly(n) size depth k circuits have poly(n)
size circuits of depth at most O(k + a).
This partially answers a  question of Shpilka-Yehudayoff and has applications to
hardness-randomness tradeoffs for bounded depth arithmetic circuits.

Based on joint work with Chi-Ning Chou and Noam Solomon.

February 28 Ilya Volkovich

Title: Complete Derandomization of Identity Testing of Read-Once Formulas

Abstract:

In this paper we study the identity testing problem of arithmetic read-once

formulas (ROF) and some related models. A read-once formula is formula (a

circuit whose underlying graph is a tree) in which the operations are {+, ×}

and such that every input variable labels at most one leaf. We obtain the

ﬁrst polynomial-time deterministic identity testing algorithm that operates

in the black-box setting for read-once formulas, as well as some other

related models. As an application, we obtain the ﬁrst polynomial-time

deterministic reconstruction algorithm for such formulas. Our results are

obtained by improving and extending the analysis of the algorithm of

Shpilka-Volkovich 09’.

Joint work with Daniel Minahan.

March 7 Sepideh Mahabadi (Cancelled due to storm)

Title: Set Cover in Sub-linear Time

Abstract:

Given access to a collection of $m$ sets over a ground set of $n$ elements, the classic set cover problem asks for the minimum number of sets in the collection that cover all the elements. We study this problem from the perspective of sub-linear algorithms, where the input can be accessed by querying either the ith element contained in a set, or the jth set containing an element. In this work, we present sub-linear algorithms for the set cover problem and show that they achieve almost tight query complexities.

More specifically, on the one hand, we show an algorithm that returns an $\alpha$-approximate cover using $\tilde O(m(n/k)^{1/(\alpha-1)} + nk)$ queries to the input, where $k$ denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of $k$, the required number of queries is $\tilde Omega(m(n/k)^{1/(2\alpha)})$. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require $\Omega(nk)$ queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter $k$. On the other hand, we show that this bound is not optimal for larger values of the parameter $k$, as there exists a $(1+\eps)$-approximation algorithm with $\tilde O(mn/k\eps^2)$ queries. We show that this bound is also essentially tight by establishing a lower bound of $\tilde Omega(mn/k)$.

This is a joint work with Piotr Indyk, Ronitt Rubinfeld, Ali Vakilian, and Anak Yodpinyanee.

March 14 SPRING BREAK

March 21 Dean Doron (Cancelled due to storm) Title: Explicit two-source extractors for near-logarithmic min-entropy

Abstract:

In this talk, we show an explicit construction of extractors for two independent sources of near-logarithmic min-entropy.

Previous constructions required either polylog(n) min-entropy or more than two sources.

The result extends the breakthrough result of Chattopadhyay and Zuckerman and also uses non-malleable extractors.

The main new ingredient is a somewhere-random condenser with a small entropy gap, used as a sampler.

Our construction can be seen as an efficient reduction to constructing non-malleable extractors, so using recent constructions of non-malleable extractors (by Cohen and Li) - we

will see how to obtain explicit two-source extractor for O(log n loglog n) min-entropy and constant error.

March 28 Sumegha Garg

Title: Hitting Sets with Near-Optimal Error for Read-Once Branching Programs

Abstract: Nisan [Nis92] constructed a pseudorandom generator for length n, width n read-once branching programs (ROBPs) with error ε and seed length O(log^2(n) + log(n) · log(1/ε)). A major goal in complexity theory is to reduce the seed length, hopefully, to the optimal O(log(n) + log(1/ε)), or to construct improved hitting sets, as these would yield stronger derandomization of BPL and RL, respectively.

In this talk, we make the first improvement by constructing a hitting set with seed length ~O􏰆(log^2(n) + log(1/ε)). That is, we decouple ε and n, and obtain near-optimal dependence on the former. The regime of parameters in which our construction strictly improves upon prior works, namely, log(1/ε) ≫ log n, is well- motivated by the work of Saks and Zhou [SZ99] who use pseudorandom generators with error ε = 2^{−log^2(n)} in their proof for BPL ⊆ L^{3/2}.

Joint work with Mark Braverman and Gil Cohen.

Title: General Strong Polarization

A martingale is a sequence of random variables that maintain their future expected value conditioned on the past.  A $[0,1]$-bounded martingale is said to polarize if it converges in the limit to either $0$ or $1$ with probability $1$.  A martingale is said to polarize strongly, if in $t$ steps it is sub-exponentially close to its limit with all but exponentially small probability. In 2008, Arikan built a powerful class of error-correcting codes called Polar codes. The essence of his theory associates a martingale with every invertible square matrix over a field (and a channel) and showed that polarization of the martingale leads to a construction of codes that converge to Shannon capacity. In 2013, Guruswami and Xia,  and independently Hassani et al. showed that strong polarization of the Arikan martingale leads to codes that converge to Shannon capacity at finite block lengths, specifically at lengths that are inverse polynomial in the gap to capacity, thereby resolving a major mathematical challenge associated with the attainment of Shannon capacity.

We show that a simple necessary condition for an invertible matrix to polarize over any non-trivial channel is also sufficient for strong polarization over all symmetric channels over all prime fields.  Previously the only matrix which was known to polarize strongly was the $2\times 2$ Hadamard matrix. In addition to the generality of our result, it also leads to arguably simpler proofs. The essence of our proof is a local definition'' of polarization which only restricts the evolution of the martingale in a single step, and a general theorem showing the local polarization suffices for strong polarization.

In this talk I will introduce polarization and polar codes and, time permitting, present a full proof of our main theorem. No prior background on polar codes will be assumed.

Based on joint work with Jaroslaw Blasiok, Venkatesan Guruswami, Preetum Nakkiran and Atri Rudra.

Title:

K-means clustering with optimization

Abstract::

K-means clustering aims to partition a set of n points into k clusters in such a way that each observation belongs to the cluster with the nearest mean, and such that the sum of squared distances from each point to its nearest mean is minimal. In the worst case, this is a hard optimization problem, requiring an exhaustive search over all possible partitions of the data into k clusters in order to find the optimal clustering. At the same time, fast heuristic algorithms for k-means are widely used for data science applications, despite only being guaranteed to converge to local minimizers of the k-means objective.

In this talk, we consider a semidefinite programming relaxation of the k-means optimization problem. We discuss two regimes where the SDP provides an algorithm with improved clustering guarantees compared to previous results in the literature: (a) for points drawn from isotropic distributions supported in separated balls, the SDP recovers the globally optimal k-means clustering under mild separation conditions; (b) for points drawn from mixtures of distributions with bounded variance, the SDP solution can be rounded to a clustering which is guaranteed to classify all but a small fraction of the points correctly.

An interesting feature about the theoretical tools developed for proving (approximate) optimality of partitions under models (a) and (b) is that they can also be used to a posteriori certify (approximate) optimality of k-means clustering solutions of real data, no model required.

April 18 Karan Singh

Title: Taking Control by Convex Optimization

Abstract: Linear dynamical systems (LDSs) are a class of time-series models widely used in robotics, finance, engineering, and meteorology. In it's general form (when state transition dynamics are unknown), learning LDS is a classic non-convex problem, typically tackled with heuristics like gradient descent ("backpropagation through time") or the EM algorithm.

I will present our new "spectral filtering" approach to the identification and control of discrete-time general LDSs with multi-dimensional inputs, outputs, and a latent state. This approach yields a simple, efficient, and practical algorithm for low-regret prediction (i.e. asymptotically vanishing MSE).

The talk will cover a series of results, which are joint work with Elad Hazan, Cyril Zhang, Sanjeev Arora, Holden Lee, and Yi Zhang.

April 25 Anand Sarwate

Title: Between Shannon and Hamming: the impact of delay on communication capacity

The information theory community has traditionally studied two different models for communication. The Shannon-theoretic model treats the channel’s impact as random, so codes must correct most error patterns of a given weight; this as an average-case analysis. The coding-theoretic (Hamming-theoretic?) model treats the channel as adversarial, so codes must correct all error patterns of a given weight; this is a worst-case analysis. Between the two lie several different channel models which can be usefully described in the language of arbitrarily varying channels (AVCs).

In an AVC, the communication channel has two inputs at each time, one for the encoder and one for a state that is controlled by an adversary who wishes to foil the communication. The difference between average- and worst-case can be captured by changing the information available to the adversary. In this talk I will describe this model and recent results on how the adversary can and cannot benefit from partial knowledge of the transmitted codeword. In particular, I will discuss how knowledge of delayed or future encoder inputs affect the channel capacity in sometimes surprising ways.

Anand D. Sarwate is an Assistant Professor in the ECE Department at Rutgers. He currently works on information theory, machine learning, high-dimensional statistics, and signal processing with applications in distributed systems, privacy and security, and biomedical research.

Given access to a collection of $m$ sets over a ground set of $n$ elements, the classic set cover problem asks for the minimum number of sets in the collection that cover all the elements. We study this problem from the perspective of sub-linear algorithms, where the input can be accessed by querying either the ith element contained in a set, or the jth set containing an element. In this work, we present sub-linear algorithms for the set cover problem and show that they achieve almost tight query complexities.
More specifically, on the one hand, we show an algorithm that returns an $\alpha$-approximate cover using $\tilde O(m(n/k)^{1/(\alpha-1)} + nk)$ queries to the input, where $k$ denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of $k$, the required number of queries is $\tilde Omega(m(n/k)^{1/(2\alpha)})$. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require $\Omega(nk)$ queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter $k$. On the other hand, we show that this bound is not optimal for larger values of the parameter $k$, as there exists a $(1+\eps)$-approximation algorithm with $\tilde O(mn/k\eps^2)$ queries. We show that this bound is also essentially tight by establishing a lower bound of $\tilde Omega(mn/k)$.