Entropy of Riemann zeta zero sequence

We represent the non-trivial zeros of the Riemann zeta function by a sequence n₁n₂...

where n_i is the number of zeros in Gram interval i . Then the mean value of n_i

over the sequence is 1. Table 1 shows the probability for a given Gram interval

to contain a specified number of zeros.

zero count : Probability

0 : 0.162

1 : 0.678

2 : 0.158

3 : 0.002

Table 1: Distribution of zero counts in a Gram interval

Given a semi-infinite sequence of symbols like the sequence of zero counts in

successive Gram intervals, one would like to measure the amount of structure

present in the sequence. The concept of entropy provides such a measure. We

will first review the general concept as presented by Shannon, and then apply

it to the sequence of Riemann zeta zeros.

Following Shannon, consider a general source of information (like a source

generating English sentences in a message). In general the sequences are not

completely random. They have the statistical structure of the language. In English,

the letter E occurs more frequently than Q, the sequence TH occurs more

frequently than XP, etc. We can think of the information source as generating

the message, emitting symbol after symbol. The source will choose successive

symbols according to certain probabilities. The probabilities could depend on

the preceding choices. Such a system is known as a stochastic process. The number

of preceding symbols on which the probability of the next symbol depends

gives the length to which the structure extends.

In the simplest case (albeit artificial) a choice depends only on the preceding

symbol and not on the symbols before that. The statistical structure can then be

described by a set of transition probabilities p_i (j ), the probability that symbol i

is followed by the symbol j . The indices i and j range over all possible symbols.

A second equivalent way of describing the structure is to give the "digram"

probabilities p (i; j ), i.e., the relative frequency of the digram ij . More complex

processes would involve trigram frequencies, tetragram frequencies, etc.

In our study, the symbols represent the number of zeros in a Gram interval,

and the sequence is dened by specifying the number of zeros in successive Gram

intervals. For example, if a given Gram interval contained 1 zero, and the next

one contained no zeros, and the one after that contained 2 zeros, we represent

the sequences as 102.

These processes can also be described as a state machine, with transition

probabilities dened between the states. To make the state machine an information

source we have to just specify the symbol that is emitted when a

transition between states occurs. The states will correspond to the "residue of

influence" from preceding letters. For example, we can represent the Riemann

zeta zero sequence generating process as a state machine with three states, state

A representing a sequence in which the number of zeros is exactly equal to the

number of gram intervals (e.g., sequences like 1111 or 0121), state B representing

a sequence in which the number of zeros is smaller than the number of gram

intervals (e.g., sequences like 1011 or 0111), and state C representing a sequence

in which the number of zeros is greater than the number of gram intervals (e.g.,

sequences like 1211 or 1121). It is known that the sequences of the Riemann

zeta zeros are almost always in state A . Transitions from state A to state B ,

for example, would be accompanied by the emission of a 0 symbol, e.g. 1111

would transition to 11110 when the succeeding gram interval contains no zeros.

If P_i denotes the probability of being in a state i , and p_i (j ) denotes the

probability of producing the next symbol j when in state i , then the entropy is

defined as

We make of the following theorem from Shannon to estimate the entropy.

Theorem 1. Let p (Bi; Sj ) be the probability of sequence Bi followed by symbol

Sj . Let p_Bi (Sj ) = p (Bi; Sj )/P (Bi ) be the conditional probability of Sj after Bi.

Let

where the sum is over all blocks Bi of N - 1 symbols and over all symbols. Then

F_N is a monotonic decreasing function of N, and LimN->infinityF_N = H.

A series of approximations to H can be obtained by considering the statistical

structure of the sequences extending over 1; 2; ...N symbols. If there are

no statistical infuences extending over more than N symbols, then F_N = H .

The ratio of the entropy of a source to the maximum it could have while still

restricted to the same symbols is called its relative entropy. For the sequence

n₁n₂...n_k (where n_i is the number of zeros in Gram interval i and k is the length

of the sequence), the mean value of n_i over the sequence is 1

(i.e., the sum of the n_i is k on average).

sequence FN Normalized

length N FN

1 0.860 0.620

2 0.772 0.557

3 0.741 0.535

4 0.725 0.523

5 0.715 0.516

6 0.707 0.510

7 0.702 0.506

8 0.697 0.503

9 0.693 0.500

10 0.689 0.497

11 0.684 0.493

12 0.677 0.489

13 0.669 0.482

14 0.656 0.474

15 0.639 0.461

16 0.616 0.445

17 0.586 0.423

18 0.550 0.397

19 0.508 0.367

20 0.464 0.334

21 0.417 0.301

22 0.371 0.268

23 0.325 0.235

24 0.281 0.203

25 0.240 0.173

Table 2: Approximations to H by considering sequences of N symbols.

We collected statistics on a sample of 9,000,000 zeros at t = 10²⁶. Table 2

shows the F_N for N from 1 to 25. We see that the entropy is low, and the

structure extends quite far out. Figure 1 gives a plot of the results and a fitted function.

The exponential decay at large sequence lengths is due to limited statistics.

Unfortunately, the F_N itself decreases with N, so it becomes

difficult to decide exactly when the low statistics effect takes over.

Matiyasevich has also found a remarkable ability to predict new zeros using the preceding zeros.

This lends support to the presence of high structure in the sequence of zeros.

We find that the entropy is very low. This provides a quantitative measure of the large

amount of structure present in the sequence of zeros. The presence of structure

is encouraging for attempts to predict the position of the zeros using machine

learning techniques.

Figure 1: F_N vs Sequence length N (*), and exponential fit function(+)