HPZZ21

The Communication Complexity of Set Intersection and Multiple Equality Testing

In this paper we explore fundamental problems in randomized communication complexity such as computing Set Intersection on sets of size k and Equality Testing between vectors of length k. Sağlam and Tardos [FOCS 2013] and Brody et al. [Algorithmica 2016] showed that for these types of problems, one can achieve optimal communication volume of O(k) bits, with a randomized protocol that takes O(log^* k) rounds. They also proved that this is one point along the optimal round-communication trade-off curve.

Aside from rounds and communication volume, there is a third parameter of interest, namely the error probability p_{err}, which we write 2^{-E}. It is straightforward to show that protocols for Set Intersection or Equality Testing need to send at least Ω(k+E) bits, regardless of the number of rounds. Is it possible to simultaneously achieve optimality in all three parameters, namely O(k+E) communication and O(log^* k) rounds?

In this paper we prove that there is no universally optimal algorithm, and we complement the existing round-communication trade-offs [FOCS 2013, Algorithmica 2016] with a new trade-off between rounds, communication, and probability of error. In particular, any protocol for solving Multiple Equality Testing in r rounds with failure probability p_{err}=2^{-E} has communication volume Ω(Ek^{1/r}). We present several algorithms for Multiple Equality Testing (and its variants) that match or nearly match our lower bound and the lower bound of [FOCS 2013, Algorithmica 2016]. Lower bounds on Equality Testing extend to Set Intersection, for every r, k, and p_{err} (which is trivial); in the reverse direction, we prove that upper bounds on Equality Testing for r, k, p_{err} imply similar upper bounds on Set Intersection with parameters r+1, k, and p_{err}.

Our original motivation for considering p_{err} as an independent parameter came from the problem of enumerating triangles in distributed (CONGEST) networks having maximum degree Δ. We prove that this problem can be solved in O(Δ / log n + log log Δ) time with high probability 1-1/poly(n). This beats the trivial (deterministic) O(Δ)-time algorithm and is superior to the Õ(n^{1/3}) algorithm of [SODA 2019, PODC 2019] when Δ=Õ(n^{1/3}).