Home

Not-so-local local algorithms

An online talk series on sublinear algorithms

Scheduled talks

Past talks

Jessica Shi, [June 13, 2022, 1pm EST], Bridging Theory and Practice in Parallel Subgraph Computations

Abstract: Discovering dense substructures in graphs is a fundamental topic in graph mining, and has been studied across many areas including computational biology, spam and fraud detection, and large-scale network analysis. In particular, k-cliques are the core building blocks of dense substructures, and finding k-cliques in a graph is a classic problem with a long history of study in both theory and practice. In this work, we present new parallel algorithms for k-clique counting and listing, which we then use in new parallel subgraph decomposition algorithms that capture higher-order structures in networks. Specifically, we design new parallel algorithms for the k-clique densest subgraph problem and the nucleus decomposition problem, which generalizes the notions of k-cores and k-trusses.

We show that our parallel k-clique counting / listing algorithm has polylogarithmic span (parallel time) and is work-efficient (matches the work of the best sequential algorithm) for sparse graphs. We also design two new parallel work-efficient algorithms for approximating the k-clique densest subgraph, the first of which is a 1/k-approximation and the second of which is a 1/(k(1+epsilon))-approximation and has polylogarithmic span. Our first algorithm does not have polylogarithmic span, but we prove that it solves a P-complete problem. Furthermore, we present a novel work-efficient parallel algorithm with low span for the nucleus decomposition problem, which significantly improves upon the only existing parallel nucleus decomposition algorithm (Sariyuce et al., PVLDB 2018). In addition to our theoretical results, we implement these algorithms and propose several new practical optimizations. We show that our implementations are fast in practice and outperform the existing state-of-the-art implementations.

Based on joint work with Laxman Dhulipala and Julian Shun.

https://arxiv.org/abs/2002.10047 (ACDA 2021)

https://arxiv.org/abs/2111.10980 (VLDB 2022)

Santoshini Velusamy, [June 6, 2022, 2pm EST], Approximating CSPs in the streaming setting

Abstract: The Constraint satisfaction problems (CSPs) are ubiquitous in computer science and encompass optimization problems such as Max-CUT, Max-DICUT, Max-k-SAT, Max-q-Coloring, Unique Games, etc. Until recently, Max-CUT and a few other Max-2CSPs were the only CSPs studied in the streaming setting. In this talk, I will describe our recent results giving new upper and lower bounds on the approximability of every CSP in the single-pass streaming setting. In particular, we prove the following theorem: For every CSP, there is a constant "alpha" such that the CSP is alpha-approximable in polylogarithmic space by a linear sketching algorithm and any sketching algorithm that beats alpha-approximation requires polynomial space. We also prove lower bounds against general streaming algorithms for a broad subclass of CSPs, significantly generalizing previous works. I will conclude with some interesting open problems in this direction. No background in streaming, sketching, or constraint satisfaction problems will be needed to enjoy this talk!

Based on joint works with Chi-Ning Chou, Alexander Golovnev, Madhu Sudan, and Ameya Velingker.

Nicole Wein, [May 23, 2022, 1pm EST], Online List Labeling: Breaking the log^2 n Barrier

Abstract: The online list labeling problem is a basic primitive in data structures. The goal is to store a dynamically-changing set of n items in an array of m slots, while keeping the elements in sorted order. To do so, some items may need to be moved over time, and the goal is to minimize the number of items moved per insertion/deletion. When m = Cn for some constant C>1, an upper bound of O(log^2 n) items moved per insertion/deletion has been known since 1981. There is a matching lower bound for deterministic algorithms, but for randomized algorithms, the best known lower bound is Omega(log n), leaving a gap between upper and lower bounds. We improve the upper bound, providing a randomized data structure with expected O(log^{3/2} n) items moved per insertion/deletion.

Joint work with Michael Bender, Alexander Conway, Martin Farach-Colton, Hanna Komlos, and William Kuszmaul

Goran Žužić, [May 9, 2022, 1pm EST], Universal optimality in distributed computing and its connections to diverse areas of theoretical computer science

Abstract: The modern computation and information processing systems shaping our world have become massively distributed, and a fundamental understanding of distributed algorithmics has never been more important. At the same time, despite 40 years of intense study, we often do not have an adequate understanding of the fundamental barriers that rule out the existence of ultra-fast distributed algorithms. This is true even for the most well-studied problems in computer science---including the shortest path, minimum spanning tree, minimum cut, and many other well-known tasks.

In this talk, I will present a high-level overview of a sequence of papers that give a near-complete answer to the above question. Its culmination is the following pie-in-the-sky result called *universal optimality*: for all of the tasks mentioned, there exists a single distributed algorithm that, when run on any communication network G, is provably competitive with the fastest algorithm on G.

The pursuit of universal optimality has led to the development of many new connections between distributed computing and other seemingly unrelated areas of theoretical computer science. Curiously, these connections have been mutually-beneficial and have already led to many breakthroughs not only in distributed computing, but also in the theory of metric embedding, information theory, and oblivious packet routing. I will briefly explore these connections.

Nithin Varma, [May 2, 2022, 1pm EST], Improved sublinear algorithms for testing permutation freeness

Abstract: Given a permutation pi of length k, an array A is pi-free if there are no k array values that, when considered from left to right, have the same relative order as that of the permutation. For example, the array A has is (2,1)-free if there are no two indices i < j such that A[i] > A[j] and the array is (1,3,2)-free if there are no three indices i < j < k such that A[j] > A[k] > A[i]. In particular, the set of (2,1)-free arrays are simply the set of all sorted arrays.

The problem of testing pi-freeness is to distinguish arrays that are pi-free from arrays that need to be modified in at least a constant fraction of their values to be pi-free. This problem was first studied systematically by Newman, Rabinovich, Rajendraprasad and Sohler (Random Structures and Algorithms; 2019), where they proved that for all permutations pi of length at most 3, one can test for pi-freeness in polylog n many queries, where n is the array length. For permutations of length k > 3, they described a simple testing algorithm that makes O(n^{1-1/k}) queries. Most of the followup work has focused on improving the query complexity of testing pi-freeness for monotone pi.

In this talk, I will present a recent algorithm with query complexity O(n^{o(1)}) that tests pi-freeness for arbitrary permutations of constant length, which significantly improves the state of the art. I will also give an overview of the analysis that involves several combinatorial ideas.

Joint work with Ilan Newman.

Amartya Shankha Biswas, [April 25, 2022, 1pm EST], Distributed Graph Spanners: Trade-offs in Stretch vs Runtime

Abstract: A spanner is a sparsified subgraph, such that each distance increases by a factor of at most k (the stretch). It is always possible to sparsify down to n^(1+1/k) edges and obtain a spanner of stretch k. This is conjectured to be tight. I'll present a new algorithm for spanner construction that generalizes the classic algorithm of Baswana-Sen, and obtain a tradeoff between parallel depth/number of rounds and the stretch of the spanner. One concrete result is that we can improve the number of rounds from k to O(log^2 k) and obtain a spanner with stretch k^(1+o(1)).

Tom Gur, [April 11, 2022, 1pm EST], Worst-Case to Average-Case Reductions via Additive Combinatorics

Abstract: We present a new framework for designing worst-case to average-case reductions. For a large class of problems, it provides an explicit transformation of algorithms running in time T that are only correct on a small (subconstant) fraction of their inputs into algorithms running in time O(T \log T) that are correct *on all* inputs.

Using our framework, we obtain such efficient worst-case to average-case reductions for fundamental problems in a variety of computational models; namely, algorithms for matrix multiplication, streaming algorithms for the online matrix-vector multiplication problem, and static data structures for all linear problems as well as for the multivariate polynomial evaluation problem.

Our techniques crucially rely on additive combinatorics. In particular, we show a local correction lemma that relies on a new probabilistic version of the quasi-polynomial Bogolyubov-Ruzsa lemma.

Joint work with Vahid Asadi, Alexander Golovnev, and Igor Shinkar (to be presented at STOC ’22).

David Doty, [December 6, 2021, 1pm EST], Crystals that think about how they’re growing

Abstract: Biology offers inspiring examples of molecules that can store and process information to construct and control the sophisticated nanoscale devices that regulate the machinery of life. Yet biology offers few effective design principles for manufacturing such molecules ourselves. Much of synthetic biology relies on "alien technology": evolved proteins that, had evolution not furnished them, we would not know how to design ourselves.

DNA nanotechnology offers a different approach, enabling design of smart molecular systems from first principles. Theory that combines mathematical tiling and statistical-mechanical models of crystallization has shown that algorithmic behavior can be embedded within molecular self-assembly processes. Previous results had experimentally demonstrated algorithmic "tile" self-assembly with up to 22 tile types, creating algorithmically generated patterns such as Sierpinski triangles and binary counters. Despite that success, many information technologies exhibit a complexity threshold -- such as the minimum transistor count needed for a general-purpose computer -- beyond which there is a qualitative increase in the power of a reprogrammable system, and it has not been clear whether the biophysics of DNA self-assembly would allow that threshold to be exceeded.

Here we report the design and experimental validation of a DNA tile set containing 355 single-stranded tiles, reprogrammable by tile selection to implement a wide variety of 6-bit algorithms, including copying, sorting, recognizing palindromes and multiples of 3, random walking, obtaining an unbiased choice from a biased random source, electing a leader, simulating Turing-universal cellular automata, generating deterministic and randomized patterns, and serving as a period 63 counter. The system is quite reliable: averaged across the 21 implemented circuits, the per-tile error rate is less than 1 in 3000.

paper: https://www.nature.com/articles/s41586-019-1014-9
additional links related to paper: http://web.cs.ucdavis.edu/~doty/papers/#drmaurdsa

Jakub Tetek, [November 29, 2021, 1pm EST], Edge Sampling and Graph Parameter Estimation via Vertex Neighborhood Accesses

Abstract:

In this talk, I will consider the problems from the area of sublinear-time algorithms of edge sampling, edge counting, and triangle counting. Part of the contribution in our work is that we consider three different settings, differing in the way in which one may access the neighborhood of a given vertex.

In previous work, people have considered indexed neighbor access, with a query returning the i-th neighbor of a given vertex. Full neighborhood access, which has a query returning the whole neighborhood at a unit cost, has been recently considered in the applied community. Between these, we propose hash-ordered neighbor access, inspired by coordinated sampling, where we have a global hash function, and can access neighbors in order of their hash values, paying a constant for each accessed neighbor.

We get both new algorithms and lower bounds in these new settings, as well as in some more standard and previously considered settings. In this talk, I will focus on a few of them and will describe the techniques that lead to these developments. I will also focus on the motivation behind our model choice.

Based on joint work with Mikkel Thorup

Sebastian Brandt, [November 22, 2021, 1pm EST], Proving Locality Lower Bounds

Abstract: Traditionally, proving substantial time complexity lower bounds in the LOCAL model of distributed computing has been a challenging and little-understood task. Recent years, however, have seen the emergence of a conceptually simple, yet powerful technique, called "round elimination", that provides a blueprint for proving such lower bounds and has been responsible for a number of lower bounds for problems central to the LOCAL model, such as Lovász Local Lemma, Maximal Matching, or Maximal Independent Set. While the technique itself is truly local, the obtained results are quite relevant also for not-so-local local models: for instance, some of the bounds have already been lifted to models such as Local Computation Algorithms and Massively Parallel Computation.

The fundamental idea behind round elimination is that the existence of a t-round algorithm for some given problem P can be used to infer the existence of a (t-1)-round algorithm for either the same or a different problem P_1. By applying this idea iteratively, obtaining problems P_1, P_2, ..., the round complexity of P can then be related to the question of which of the P_i is the first problem in the sequence that can be solved in 0 rounds. As we will see, the above sequence of problems can be obtained in an automated way, opening new and exciting possibilities for obtaining improved complexity bounds.

In this talk, we will take a close look at the round elimination technique, its inner workings, potential and limitations.

Sayantan Sen, [November 15, 2021, 1pm EST], Interplay between Graph Isomorphism and Earth Mover’s Distance in the Query and Communication Worlds

Abstract: The graph isomorphism distance between two graphs $G_u$ and $G_k$ is the fraction of entries in the adjacency matrix that has to be changed to make $G_u$ isomorphic to $G_k$. We study the problem of estimating, up to a constant additive factor, the graph isomorphism distance between two graphs in the query model. In other words, if $G_k$ is a known graph and $G_u$ is an unknown graph whose adjacency matrix has to be accessed by querying the entries, what is the query complexity for testing whether the graph isomorphism distance between $G_u$ and $G_k$ is less than $\gamma_1$ or more than $\gamma_2$, where $\gamma_1$ and $\gamma_2$ are two constants with $0\leq \gamma_1 < \gamma_2 \leq 1$. It is also called the tolerant property testing of graph isomorphism in the dense graph model. The non-tolerant version (where $\gamma_1$ is $0$) has been studied by Fischer and Matsliah~(SICOMP'08).

In this paper, we prove a (interesting) connection between tolerant graph isomorphism testing and tolerant testing of the well studied Earth Mover's Distance (EMD). We prove that deciding tolerant graph isomorphism is equivalent to deciding tolerant EMD testing between multi-sets in the query setting. Moreover, the reductions between tolerant graph isomorphism and tolerant EMD testing (in query setting) can also be extended directly to work in the two party Alice-Bob communication model (where Alice and Bob have one graph each and they want to solve tolerant graph isomorphism problem by communicating bits), and possibly in other sublinear models as well.

Testing tolerant EMD between two probability distributions is equivalent to testing EMD between two multi-sets, where the multiplicity of each element is taken appropriately, and we sample elements from the unknown multi-set \textbf{with} replacement. In this paper, our (main) contribution is to introduce the problem of \emph{(tolerant) EMD testing between multi-sets (over Hamming cube) when we get samples from the unknown multi-set \textbf{without} replacement} and to show that \emph{this variant of tolerant testing of EMD is as hard as tolerant testing of graph isomorphism between two graphs}. {Thus, while testing of equivalence between distributions is at the heart of the non-tolerant testing of graph isomorphism, we are showing that the estimation of the EMD over a Hamming cube (when we are allowed to sample \textbf{without} replacement) is at the heart of tolerant graph isomorphism.} We believe that the introduction of the problem of testing EMD between multi-sets (when we get samples \textbf{without} replacement) opens an entirely new direction in the world of testing properties of distributions.

This is a joint work with Sourav Chakraborty, Arijit Ghosh and Gopinath Mishra.

Dean Leitersdorf, [November 8, 2021, 1pm EST], Extremely Efficient Distance Computations using Distributed Sparsity-Aware Algorithms

Abstract: Given a distributed network, represented by a graph of computation nodes connected by communication edges, a fundamental problem is computing distances between nodes. Our recent line of works show that while exactly computing all distances (All-Pairs-Shortest-Paths, APSP) in certain distributed settings currently has $O(n^{1/3})$ complexity, it is possible to find very good distance approximations even with an $O(1)$ complexity. This talk will present a unified view of our various papers developing these interesting advances in distributed computing.

The central theme uniting these developments is designing sparsity-aware algorithms, and then applying them to problems on general, non-sparse, graphs.

Primarily, the main tools used are a series of distributed sparse matrix multiplication algorithms we develop. We then use these tools in novel manners to develop a toolkit for basic distance computations, such as computing for each node the distances to the O(n^{2/3}) nodes closest to it. Next, we use these tools to solve end-goal distance problems, in $O(\log^2 n)$ rounds.

Subsequently, we adapt the load-balancing techniques used in the matrix multiplication algorithms to show combinatorial algorithms which directly compute APSP approximations, without passing through other intermediate distance tools. This allows us to show $O(1)$ round algorithms.

Finally, the talk concludes with observing the connection of the above addressed works to other developments in the realm of distributed computing, specifically MST computation and subgraph existence problems.

Michal Dory, [October 18, 2021, 1pm EST], Fault-Tolerant Labeling and Compact Routing Schemes

Abstract: Assume that you have a huge graph, where edges and vertices may fail, and you want to answer questions about this graph quickly, without inspecting the whole graph. A fault-tolerant labeling scheme allows us to do just that. It is a distributed data structure, in which each vertex and edge is assigned a short label, such that given the labels of a pair of vertices s,t, and a set of failures F, you can answer questions about s and t in G\F. For example, determine if s and t are connected in G\F, just by looking at their labels.

I will discuss a recent work where we show the first fault-tolerant connectivity labeling schemes for general graphs. I will also show applications for fault-tolerant routing. Here the goal is to provide each vertex a small routing table, such that given these tables we can efficiently route messages in the network, even in the presence of failures.

Based on a joint work with Merav Parter.

Michael Kapralov, [June 21, 2021, 11am EST], Space Lower Bounds for Approximating Maximum Matching in the Edge Arrival Model

Abstract: The bipartite matching problem in the online and streaming settings has received a lot of attention recently. The classical vertex arrival setting, for which the celebrated Karp, Vazirani and Vazirani (KVV) algorithm achieves a $1-1/e$ approximation, is rather well understood: the $1-1/e$ approximation is optimal in both the online and semi-streaming setting, where the algorithm is constrained to use $n\cdot \log^{O(1)} n$ space. The more challenging the edge arrival model has seen significant progress in the past few years in the online algorithms literature. For the strictly online model (no preemption) approximations better than trivial factor $1/2$ have been ruled out [Gamlath et al'FOCS'19]. For the less restrictive online preemptive model a better than $\frac1{1+\ln 2}$-approximation [Epstein et al'STACS'12] and even a better than $(2-\sqrt{2})$-approximation[Huang et al'SODA'19] have been ruled out.

The recent hardness results for online preemptive matching in the edge arrival model are based on the idea of stringing together multiple copies of a KVV hard instance using edge arrivals. In this paper, we show how to implement such constructions using ideas developed in the literature on Ruzsa-Szemer\'edi graphs. As a result, we show that any single pass streaming algorithm that approximates the maximum matching in a bipartite graph with $n$ vertices to a factor better than $\frac1{1+\ln 2}\approx 0.59$ requires significantly more than $n \log^{O(1)} n$ space. This gives the first separation between the classical one sided vertex arrival setting and the edge arrival setting in the semi-streaming model.

Noga Ron-Zewi, [May 24, 2021, 11am EST], Local Proofs Approaching the Witness Length

Abstract:

Interactive oracle proofs are a hybrid between interactive proofs and probabilistically-checkable proofs, where the prover is allowed to interact with a verifier by sending relatively long messages to the verifier, who in turn is only allowed to query a few of the bits that were sent. Efficient interactive oracle proofs are useful building blocks in constructing doubly-efficient interactive proofs, and are at the core of leading practical implementations of highly efficient proof-systems.

In this work we construct, for any NP relation for which membership can be decided in polynomial-time and bounded polynomial space (e.g., SAT, Hamiltonicity, Clique, Vertex-Cover, etc.), interactive oracle proofs in which the proof length approaches the witness length. The number of rounds, as well as the number of queries made by the verifier, are constant. Our proof leverages recent constructions of high-rate locally testable tensor codes. In particular, we bypass the barrier imposed by the low rate of multiplication codes (e.g., Reed-Solomon, Reed-Muller or AG codes) - a core component in prior constructions.

Joint work with Ron Rothblum.

Dor Minzer, [May 3, 2021, 2pm EST], On the Structure of Approximate Polymorphisms

Abstract:
For g : {0,1}^k \to {0,1}, a function f:{0,1}^n \to {0,1} is called a g-polymorphism if the following two operations, when applied on any X\in {0,1}^{k \times n}, are equivalent:

(a) apply g on each column of X to get a Boolean vector of length n, then apply f on it. This is the function (f\circ g^n)(X);

(b) apply f on each row of X to get a Boolean vector of length k, then apply g on it. This is the function g\circ f^{k}(X).

We say f is an approximate polymorphism if (a) and (b) above yield the same result with probability close to 1, where X is sampled uniformly.

What can be the structure of approximate polymorphisms?

Particular instances of the approximate polymorphisms question include the classical linearity testing problem of Blum, Luby and Rubinfeld, in which k = 2 and g(a,b) = a\xor b,

as well as the AND testing problem, in which k=2 and g(a,b) = a\land b.

In this talk, we will discuss the problem of classifying the functions g for which there are non-trivial approximate polymorphisms. We show the only functions g for which there are non-trivial approximate polymorphisms are AND's, OR's and XOR's.

Based on a joint work with Gilad Chase, Yuval Filmus, Nitin Saurabh

Esty Kelman, [Apr 26, 2021, 11am EST], KKL Theorem via Random Restrictions and Log-Sobolev Inequality

Abstract:
We give alternate proofs for three related results in analysis of Boolean functions, namely the KKL Theorem, Friedgut’s Junta Theorem, and Talagrand’s strengthening of the KKL Theorem. We follow a new approach: we look at the first Fourier level of the function after a suitable random restriction and we apply the Log-Sobolev inequality appropriately. In particular, we avoid using the hypercontractive inequality that is common to the original proofs. Our proofs might serve as an alternative, uniform exposition to these theorems and the techniques might benefit other research.

In this talk we will focus on our techniques and along the way we will prove the KKL Theorem using our new approach.

Based on a joint work with Subhash Khot, Guy Kindler, Dor Minzer and Muli Safra

Yuval Filmus, [Apr 12, 2021, 11am EST], AND testing

Abstract: Call f: {0,1}^n→{0,1} linear if f(x⊕y)=f(x)⊕f(y) for all x,y, and multiplicative if f(x·y)=f(x)·f(y) for all x,y. A function f is linear if it is an XOR of a subset of inputs, and multiplicative if it is an AND of a subset of inputs (or zero).

Blum, Luby and Rubinfeld (1993) showed that we can test whether f is linear by checking the condition f(x⊕y)=f(x)⊕f(y) for random x,y. Answering (in a sense) a question of Parnas, Ron and Samorodnitsky (2002), and improving on results of Ilan Nehama (2013), we show that a similar statement holds for testing whether f is multiplicative.

Joint work (2020) with Noam Lifshitz (HUJI), Dor Minzer (MIT), and Elchanan Mossel (MIT).

Eldar Fischer, [Apr 5, 2021, 11am EST], Crafting hard to test properties with very short PCPPs

Abstract: A Probabilistically Checkable Proof of Proximity (PCPP) allows a verifier, when provided with a proof that an input satisfies a given property, to verify this using a constant number of queries to the input and the provided proof. Just as with Property Testing, since the input is not even read in full, it is only required to reject inputs that are far from any input satisfying the property (a "bad" input should be rejected with high probability regardless of the provided proof).

PCPPs were first defined by Ben-Sasson, Goldreich, Harsha, Sudan and Vadhan, and independently by Dinur and Reingold, in a style and context related to Proofs of Proximity (PCPs). The original PCPPs had a proof length that is polynomial in the input length, if the given property is polynomial time decidable. Later advances achieved a proof length that is quasilinear in the computation time of the decision algorithm.

A question that remains is whether we can go below a quasilinear proof length, maybe even a linear one. However, some common computational models already have logarithmic gaps between each other, so this question as stated is not even completely well-defined.

A simpler question would be whether there exist some properties that are "non-trivial enough", and yet admit a PCPP whose proof length is less than quasilinear. A natural criterion for non-trivially here would be that the property is maximally hard to test (considering that a Property Test is essentially just a "proof-less PCPP"). Such properties necessarily require at least a linear size proof for a PCPP, lending some "sense-making" to the question. Additionally, such properties immediately strengthen the known gap between intolerant-testing and tolerant-testing.

We construct properties that are maximally hard to test, and yet have a PCPP whose proof size is only larger than linear by a factor of the l-times-iterated-log function, for every fixed l. This also provides properties with an efficient intolerant test, whose tolerant testing would require to query a one-over-l-log fraction of the entire input. On the way we construct a PCPP-related primitive that might be useful in future constructions.

The talk is based on the following paper:

O. Ben-Eliezer, E. Fischer, Amit Levi and Ron D. Rothblum, Hard properties with (very) short PCPPs and their applications, Proceedings of the 11th ITCS (2020)

Krzysztof Onak, [Feb 22, 2021], Dynamic Algorithms for Tracking Patterns in Sparse Graphs

Abstract: Counting graph patterns has various practical applications, including detecting fraud and anomalies in financial transactions and social networks. While the vast majority of previous work focuses on the static case and computing the global number of occurrences of a given pattern, I will show that using simple techniques, it is possible to efficiently track the number of copies of a pattern in which each vertex appears. I will demonstrate our techniques for simple graph patterns such as triangles, and cliques and cycles of size 4. As long as the graph has low arboricity, the update time of our algorithms is significantly sublinear in the number of vertices.

Joint work with Karthikeyan Shanmugam and Shay Solomon.

Andrew McGregor, [Dec 14, 2020], Recent Results on Cycle Counting in the Data Stream Model

Abstract: Estimating the number of cycles in a graph is one of the most widely studied graph problems in the data stream model. From the theoretical perspective, it is a convenient problem to explore the efficacy of various sampling techniques and to understand the limits of stream computation. From an applied perspective, the frequency of cycles and other subgraphs captures interesting structural properties of massive real-world graphs. Three relevant variants of the data stream model include: the arbitrary order model in which the stream consists of the edges of the graph in arbitrary order, the random order model in which the edges are randomly permuted, and the adjacency list order model in which all edges incident to the same vertex appear consecutively. In this talk, we survey recent work in all these models.

Nithin Varma, [Dec 7, 2020], New Lower Bounds and Sublinear Algorithms for LIS estimation

Abstract: Estimating the length of the longest increasing subsequence (LIS) in an array is a problem of fundamental importance. Despite the significance of the LIS estimation and the amount of attention it has received, there are important aspects of the problem that are not yet fully understood. There are no better lower bounds for LIS estimation than the obvious bounds implied by testing monotonicity (for adaptive or nonadaptive algorithms). In this talk, I will present some recent results on LIS estimation, including the first nontrivial lower bound on its complexity. Additionally, I will discuss the high level ideas involved in the design of new LIS estimation algorithms that complement our lower bound and improve state of the art.

Joint work with Ilan Newman.

Jane Lange, [Nov 30, 2020], A noise-stabilization procedure for learning and testing decision trees

Abstract: In this talk, we consider three problems related to decision trees: learning, testing, and reconstruction. We show that small decision trees -- and functions that are close to small decision trees -- have a structural property that yields algorithms for all three of these tasks. Specifically, a function f that is opt-close to any decision tree of size s is O(opt + ε)-close to another decision tree of size S = S(s, ε), whose nodes can be computed efficiently by estimating the noise sensitivity of f under restrictions. The proof of this property draws on techniques from the Fourier analysis of boolean functions. I will describe the proof of this structural property, as well as its connections to learning, testing, and reconstruction.

Joint works with Guy Blanc, Neha Gupta, and Li-Yang Tan.

Iden Kelmaj, [Nov 23, 2020], Isoperimetric Inequalities for Real-Valued Functions with Applications to Monotonicity Testing

Abstract: We study sublinear algorithms for monotonicity of real-valued functions. An algorithm for testing monotonicity queries a function at a few locations and distinguishes between when the function is monotone and when it is far from monotone. An algorithm for approximating the distance to monotonicity returns the distance to monotonicity of the function with some additive and multiplicative error. We improve previous algorithms for these two tasks by showing that several isoperimetric inequalities about Boolean functions on the hypercube can be generalized to real-valued functions.

A celebrated line of work [Margulis '74, Talagrand '93, Chakrabarty and Seshadhri '13, Khot Minzer Safra '15] studied the size of the ''boundary'' between the points on which a Boolean function takes value 0 and the points on which it takes value 1. This boundary is defined in terms of the edges of the -dimensional hypercube, where the edges can be directed or undirected. In particular, the inequality of Khot, Minzer, and Safra '15 implies all previous inequalities. It bounds the average square root of the number of decreasing edges incident on a vertex of the hypercube by the distance to monotonicity of the function. We generalize all the inequalities in this line of work to real-valued functions. Our main contribution is a Boolean decomposition technique that represents a real-valued function as a collection of Boolean functions that roughly capture the distance to monotonicity of the original function. In this talk, we prove the Boolean decomposition theorem and touch on the improved algorithms for monotonicity testing and distance approximation.

This is joint work with Hadley Black and Sofya Raskhodnikova.

Michal Dory, [Oct 19, 2020], Exponentially Faster Shortest Paths in the Congested Clique

Abstract: I will describe recent algorithms for approximate shortest paths in the distributed Congested Clique model. We obtain constant factor approximations for the fundamental all-pairs shortest paths (APSP) and multi-source shortest paths (MSSP) problems, that take just poly(loglogn) rounds. Previous algorithms require at least poly-logarithmic number of rounds.

The talk will give an overview of the techniques for computing shortest paths in this model, with a focus on two main ingredients: distance tools based on sparse matrix multiplication, and a new fast construction of a near-additive emulator, a sparse graph that preserves the distances of the graph up to a near-additive stretch.

Based on a joint work with Merav Parter.

Ilan Newman, [Oct 16, 2020], On the Characterization of $1$-sided error Strongly-Testable Graph Properties for bounded-degree graphs

Abstract: We study property testing of (di)graph properties in bounded-degree graph models. Despite of the many results and the extensive research effort, there is no characterization of the properties that are strongly-testable (i.e., testable with constant query complexity) even for $1$-sided error tests.

We give a characterization of the $1$-sided error strongly-testable {\em monotone} graph properties, and the $1$-sided error strongly-testable {\em hereditary} graph properties in the bounded degree graph model.

The bounded-degree model can naturally be generalized to directed graphs resulting in two models that were considered in the literature. Our characterization is generalized to these models as well.

This is a joint work with Areej Khoury and Hiro Ito.

Organizers:

Page updated

Google Sites

Report abuse