# Statistical foundations for evolving networks

## Mini-WORKshop @ Freiburg, March 11--13, 2020

## Description

Networks represent relational structure in a sparse and compact manner. Very commonly, these relationships evolve over time, and the number of objects in the studied system may also increase or decrease over time. Describing such complex phenomena is difficult, especially if the evolution of the network depends on previously observed structure. For this reason, efficient statistical inference of evolving networks becomes a key challenge in the sciences. In this context, it is crucial to provide confidence statements for statistical inference which requires establishing sophisticated probabilistic limit theorems.

The aim of the workshop is to bring together experts from probability theory with experts on high-dimensional statistics and adaptive inference together with young researchers working on these aspects of dynamic networks. This way, ideas and methods coming from mathematical statistics and probability cross-fertilize in order to trigger further advances in both, the theory of statistical inference in dynamic networks and the development of corresponding probabilistic limit theorems.

## Keynote Talks by

**Shankar Bhamidi**(University of North Carolina): Probabilistic and Statistical Problems Pertaining to Dynamic Networks

The last few years have witnessed an explosion in the amount of empirical data on real networks motivating an array of mathematical models for the evolution of such networks. Examples range from biological networks (brain networks of interacting neurons), information transmission (Internet), transportation, social networks and swarm intelligence and the evolution of self-organized behavior through the interactions of simple agents. This has stimulated vigorous activity in a multitude of fields, including biology, statistical physics, statistics, mathematics and computer science to understand these models and quantify their predictions and relevance to real systems.

The aim of these two lectures is to introduce researchers to one corner of this vast field, Dynamic networks: systems that evolve over time through probabilistic rules. The following two main themes will be pursued:

(I) Macroscopic connectivity and higher order structures: A host of real world networks have structural connectivity that evolves over time or structural connectivity arising indirectly owing to measurements on vertices in the network (for example time series measurements on vertices in the network resulting in correlation networks which are then thresholded to obtain a binary network representation). We will discuss various probabilistic techniques to understand asymptotic for such systems including both techniques from the world of sparse networks such as Aldous’s multiplicative coalescent as well as from the world of dense graph limits.

(II) Evolving networks and continuous time branching: The second lecture will study the so-called preferential attachment family of network models emphasizing one particular technical tool: continuous time branching processes. We will show how this technique leads to rigorous asymptotic descriptions to a number of problems in the statistical modeling of real world systems ranging from Twitter event networks to change point detection in evolving networks.

**Marc Hoffmann**(Paris Dauphine University): Statistical inference in mean-field networks: a PDE approach

In this talk, we study certain statistical inference problems for interacting particles systems in a mean-field limit. By studying the fluctuations of the (properly normalised) data toward a deterministic limit that solves a nonlinear transport or parabolic equation, we take advantage of the limiting object structure to estimate nonparametrically the parameters of interests. Two typical examples of applications are: nonlinear Hawkes processes and McKean-Vlasov interacting diffusions.

## Invited Talks by

Social networks are usually modeled as graphs, where edges between vertices represent friendship. There exist many graph models describing their statistical properties or the growth behaviour with an increasing number of vertices. Compared to that, antagonistic relationships are only rarely modeled and their structural properties are poorly understood. An important theory in the social sciences is the *balance theory* which predicts that social networks are locally balanced. This means that friends should have the same friends and the same enemies. If edges are signed (+ for friends, - for enemies), this translates for the graph to have more *balanced* triangles with an even number of negative edges, while all other triangles are deemed *unbalanced*.

Balance theory has received only little empirical evaluation, also because antagonistic relationships are infrequently measured and often biased. Previous statistical tests for balance are usually permutation tests, where the null hypothesis is a graph without balance. These tests always assume that negative and positive edges are interchangeable. Based on a novel dataset of signed networks, collected from 32 isolated, rural villages in Honduras, we show, however, empirically that the statistical properties of positive and negative edges are very different. This leads to inflated test statistics, wrongly indicating balance where it is not present.

We improve on this by introducing a new permutation test that incorporates the structure of negative edges. The new test is indeed more accurate at detecting balance than the old one, which is demonstrated in several examples. Moreover, we prove asymptotic normality of our test statistic under the null model. Compared to other permutation tests, this property does not follow from a combinatorial central limit theorem, but is instead based on stochastic monotonicity and a central limit theorem for random variables in a fixed Rademacher chaos. The results are illustrated for the dataset mentioned above. Using our test and contrary to previous results, we find that there is only marginal evidence for balance.

Most of the complex networks evolve over time and this can have substantial impact on the processes unfolding on them. We will give a brief overview of the very recent steps towards the asymptotic theory of evolving networks. In particular, we will discuss the ideas related to the graph limit theory. Partially based on joint work with Jiří Černý.

We focus on modeling the dynamic attributes of a dynamic network with a fixed number of vertices. These attributes are considered as time series which dependency structure is influenced by the underlying network. They are modeled by a multivariate doubly stochastic time series framework, that is we assume linear processes for which the coefficient matrices are stochastic processes themselves. We explicitly allow for dependence in the dynamics of the coefficient matrices as well as between the two stochastic processes driving the time series. This framework allows for a separate modeling of the attributes and the underlying network. In this setting, we define network autoregressive models and discuss their stationarity conditions. Furthermore, anestimation approach is discussed in a low- and high-dimensional setting and how this can be applied to forecasting. The finite sample behavior of the forecast approach is investigated. This approach is applied to real data whereby the goal is to forecast the GDP of 33 economies.

**Alexander Kreiß**

We will consider multivariate stochastic processes indexed either by vertices or pairs of vertices of a dynamic network. Under a dynamic network we understand a network with a fixed vertex set and an edge set which changes randomly over time. We will assume that the spatial dependence-structure of the processes is linked with the network in the following way: Two neighbouring vertices (or two adjacent pairs of vertices) are dependent, while we assume that the dependence decreases as the distance in the network increases. We make this intuition mathematically precise by considering three concepts based on correlation, beta-mixing with time-varying mixing-coefficients and conditional independence. Then, we will use these concepts in order to prove weak-dependence results, e.g. an exponential inequality, which might be of independent interest. In order to demonstrate the use of these concepts in an application we study the asymptotics (for growing networks) of a goodness of fit test in a dynamic interaction network model based on a multiplicative Cox-type hazard model. This model is then applied to bike-sharing data.

## Schedule

**Wednesday, March 11**

13:00 Meeting of the scientific network

14:15 Opening

14:30 Shankar Bhamidi I

15:30 Coffee Break

16:00 Marc Hoffmann I

17:00 Discussion

19:00 Workshop Dinner

**Thursday, March 12**

08:30 Discussion

09:45 Marc Hoffmann II

10:45 Coffee Break

11:15 Jonas Krampe

12:00 Lunch Break

14:00 Randolf Altmeyer

14:45 Shankar Bhamidi II

15:45 Coffee Break

16:15 Alexander Kreiß

17:00 Discussion

**Friday, March 13**

09:45 Anton Klimovsky

10:30 Discussion

## Local Organizer

## Venue

- University of Freiburg, Natural Sciences Campus ("Institutsviertel"), Lecture Hall at the Crystallography,
- Hermann-Herder-Str. 5, 79104 Freiburg i. Br., Germany