## Yale Combinatorics and Probability Seminar

The Yale Combinatorics and Probability Seminar will usually meet on Thursdays at 4:00 in LOM 215. It is being organized by Dan Spielman and Van Vu. We expect to host a variety of lectures by faculty, graduate students, and visitors.

## Talks in Spring 2016:

Feb 4: H. Zhou (Yale) "Community Detection: Minimaxity and a Computationally Efficient Algorithm"

Recently network analysis has gained more and more attention in Statistics, as well as in Computer Science, Probability, and Applied Mathematics. Community detection for stochastic block model (SBM) is possibly the most studied topic in network analysis. Many methodologies have been proposed. Several beautiful and significant phase transition results are obtained in various settings. In this talk, we provide a general minimax theory for community detection. It gives the minimax rates of mis-match ratio for a wide rage of settings including homogeneous and inhomogeneous SBM, dense and sparse networks, finite and growing number of communities. The result immediately implies threshold phenomenon for consistent community detection, exact recovery as well as a convergence rate sandwiched in-between. The rate is in an exponential form. We obtain the upper bound by a penalized likelihood approach. The lower bound is achieved by novel reduction from a global mis-match ratio to a local clustering problem for one node through the exchangability property. In addition, we present a computationally feasible two-stage method that achieves optimal statistical performance in misclassification proportion for stochastic block model under very weak regularity conditions. Our two-stage procedure consists of a generic refinement step that can take a wide range of weakly consistent community detection procedures as initializer, to which the refinement stage applies and outputs a community assignment achieving optimal misclassification proportion with high probability. If time permits, we will discuss briefly extensions to more
general models.

February 25: Y. Do (Univ of Virginia): Repulsion between  roots of random polynomials

Consider a polynomial P of large degree n with random +-1  coefficients (Rademacher polynomials).
The object of our study is the  distance between consecutive real roots. Among others, we
obtain an optimal bound for the probability that  P has a double real root.

Estimating the number of real roots of random polynomials is a long standing question that has been attacked by many leading mathematicians  (including Waring, Kac, Littlewood-Offord, Ibragimov-Mashlova, Erdos,  Polya and others).
Prior to our consideration, a precise answer was known only in the case the coefficients are gaussian (Kac's formula).  Our result allows us to obtain similar estimates for a much larger class of polynomials, including Rademacher polynomials.

(joint work with  Hoi Nguyen and Van Vu)

March 3: P. Borgade (NYU): Universality for random matrices beyond mean-field models

The goal of this talk is to introduce new ideas to prove universality for a class of random band matrices.  We will show that quantum unique ergodicity provides a key role in proving random matrix statistics for such non-mean field models.

March 10: S. Vempala (GaTech): The Complexity of Random Constraint Satisfaction Problems
 Satisfiability problems appear to be hard even when generated randomly. To understand their complexity, we consider planted k-SAT/k-CSP where the clauses are drawn at random from those satisfying a fixed assignment. There are currently large gaps between the point at which the planted solution becomes unique (information-theoretically) and when it can be found efficiently (in time polynomial in the size of the input). For random k-SAT/k-CSP, the current best algorithms need roughly nk/2 clauses to find the planted assignment, although it becomes unique after roughly n log n clauses. Is this gap inherent? In this talk, we describe a paring" transition as the number of clauses grows, and show how it directly affects the complexity of any statistical algorithm to detect planted solutions. Since known algorithmic techniques for these problems all have statistical counterparts, this is concrete evidence that planted k-SAT/k-CSP are intractable.The talk is based on joint work with Vitaly Feldman and Will Perkins.

March 31: Emmanuel Ebbe (Princeton): Reaching the thresholds for block models

The talk overviews our recent results on the stochastic block model thresholds, both for exact recovery (the CH-divergence threshold) and weak recovery (the Kesten-Stigum threshold). While Laplacians fail to achieve the KS threshold, it is shown that Acyclic BP - or equivalently a spectral method on a nonbacktracking operator of generalized order - achieves the threshold in quasi linear time. It is also shown that starting from 5 communities, information theoretic methods can cross the KS threshold. Joint works with C. Sandon:http://arxiv.org/abs/1503.00609 http://arxiv.org/abs/1512.09080

April 7: J. H. Kim (KIAS): How to find counterfeit coins?  An algorithmic version.

In this talk, we consider a well-known combinatorial search problem.
Suppose that there are n identical looking coins and  some of them are
counterfeit.  The weights of all authentic coins are the same and known a priori.
The weights of counterfeit coins vary but different from the weight of
an authentic coin.

Without loss of generality, we may assume the weight of authentic coins is 0.
The problem is to find all counterfeit coins by weighing (queries) sets ofcoins
on a spring scale. Finding the optimal number of queries is difficult even
when there are only 2 counterfeit coins.

We introduce a polynomial time randomized algorithm to find all
counterfeit coins when the number of them is known to be at most
m > 1  and the weight w(c) of each counterfeit coin c satisfies
a < |w(c)| < b
for fixed constants a, b>0. The query complexity of the
algorithm is  O(frac{m log n }{log m}), which is optimal up to a
constant factor. The algorithm uses, in part, random walks.

The algorithm may be generalized to find all edges of  a hidden
weighted graph  using a similar type of queries. This graph finding algorithm
has various applications including DNA sequencing.

April 21: Anup Rao (GaTech):  Agnostic Estimation of Mean and Covariance

In this talk, we consider the problem of estimating the mean and covariance of a distribution from iid samples in $\R^n$, in the presence of an $\eta$ fraction of malicious noise; this is in contrast to much recent work where the noise itself is assumed to be from a distribution of known type. We will give polynomial-time algorithms to compute estimates for the mean, covariance and operator norm of the covariance matrix, and show that the dimensional dependence of the error is optimal up to a $O(\sqrt{\log n})$ factor. This gives polynomial-time solutions to some of the questions studied in robust statistics. No background other than linear algebra and probability will be required for the talk.

This is a joint work with Kevin Lai and Santosh Vempala.