Entropy of Riemann zeta zero sequence
We represent the non-trivial zeros of the Riemann zeta function by a sequence n1n2...
where ni is the number of zeros in Gram interval i . Then the mean value of ni
over the sequence is 1. Table 1 shows the probability for a given Gram interval
to contain a specified number of zeros.
zero count : Probability
0 : 0.162
1 : 0.678
2 : 0.158
3 : 0.002
Table 1: Distribution of zero counts in a Gram interval
Given a semi-infinite sequence of symbols like the sequence of zero counts in
successive Gram intervals, one would like to measure the amount of structure
present in the sequence. The concept of entropy provides such a measure. We
will first review the general concept as presented by Shannon, and then apply
it to the sequence of Riemann zeta zeros.
Following Shannon, consider a general source of information (like a source
generating English sentences in a message). In general the sequences are not
completely random. They have the statistical structure of the language. In English,
the letter E occurs more frequently than Q, the sequence TH occurs more
frequently than XP, etc. We can think of the information source as generating
the message, emitting symbol after symbol. The source will choose successive
symbols according to certain probabilities. The probabilities could depend on
the preceding choices. Such a system is known as a stochastic process. The number
of preceding symbols on which the probability of the next symbol depends
gives the length to which the structure extends.
In the simplest case (albeit artificial) a choice depends only on the preceding
symbol and not on the symbols before that. The statistical structure can then be
described by a set of transition probabilities pi (j ), the probability that symbol i
is followed by the symbol j . The indices i and j range over all possible symbols.
A second equivalent way of describing the structure is to give the "digram"
probabilities p (i; j ), i.e., the relative frequency of the digram ij . More complex
processes would involve trigram frequencies, tetragram frequencies, etc.
In our study, the symbols represent the number of zeros in a Gram interval,
and the sequence is dened by specifying the number of zeros in successive Gram
intervals. For example, if a given Gram interval contained 1 zero, and the next
one contained no zeros, and the one after that contained 2 zeros, we represent
the sequences as 102.
These processes can also be described as a state machine, with transition
probabilities dened between the states. To make the state machine an information
source we have to just specify the symbol that is emitted when a
transition between states occurs. The states will correspond to the "residue of
influence" from preceding letters. For example, we can represent the Riemann
zeta zero sequence generating process as a state machine with three states, state
A representing a sequence in which the number of zeros is exactly equal to the
number of gram intervals (e.g., sequences like 1111 or 0121), state B representing
a sequence in which the number of zeros is smaller than the number of gram
intervals (e.g., sequences like 1011 or 0111), and state C representing a sequence
in which the number of zeros is greater than the number of gram intervals (e.g.,
sequences like 1211 or 1121). It is known that the sequences of the Riemann
zeta zeros are almost always in state A . Transitions from state A to state B ,
for example, would be accompanied by the emission of a 0 symbol, e.g. 1111
would transition to 11110 when the succeeding gram interval contains no zeros.
If Pi denotes the probability of being in a state i , and pi (j ) denotes the
probability of producing the next symbol j when in state i , then the entropy is
defined as
We make of the following theorem from Shannon to estimate the entropy.
Theorem 1. Let p (Bi; Sj ) be the probability of sequence Bi followed by symbol
Sj . Let pBi (Sj ) = p (Bi; Sj )/P (Bi ) be the conditional probability of Sj after Bi.
Let
where the sum is over all blocks Bi of N - 1 symbols and over all symbols. Then
FN is a monotonic decreasing function of N, and LimN->infinityFN = H.
A series of approximations to H can be obtained by considering the statistical
structure of the sequences extending over 1; 2; ...N symbols. If there are
no statistical infuences extending over more than N symbols, then FN = H .
The ratio of the entropy of a source to the maximum it could have while still
restricted to the same symbols is called its relative entropy. For the sequence
n1n2...nk (where ni is the number of zeros in Gram interval i and k is the length
of the sequence), the mean value of ni over the sequence is 1
(i.e., the sum of the ni is k on average).
sequence FN Normalized
length N FN
1 0.860 0.620
2 0.772 0.557
3 0.741 0.535
4 0.725 0.523
5 0.715 0.516
6 0.707 0.510
7 0.702 0.506
8 0.697 0.503
9 0.693 0.500
10 0.689 0.497
11 0.684 0.493
12 0.677 0.489
13 0.669 0.482
14 0.656 0.474
15 0.639 0.461
16 0.616 0.445
17 0.586 0.423
18 0.550 0.397
19 0.508 0.367
20 0.464 0.334
21 0.417 0.301
22 0.371 0.268
23 0.325 0.235
24 0.281 0.203
25 0.240 0.173
Table 2: Approximations to H by considering sequences of N symbols.
We collected statistics on a sample of 9,000,000 zeros at t = 1026. Table 2
shows the FN for N from 1 to 25. We see that the entropy is low, and the
structure extends quite far out. Figure 1 gives a plot of the results and a fitted function.
The exponential decay at large sequence lengths is due to limited statistics.
Unfortunately, the FN itself decreases with N, so it becomes
difficult to decide exactly when the low statistics effect takes over.
Matiyasevich has also found a remarkable ability to predict new zeros using the preceding zeros.
This lends support to the presence of high structure in the sequence of zeros.
We find that the entropy is very low. This provides a quantitative measure of the large
amount of structure present in the sequence of zeros. The presence of structure
is encouraging for attempts to predict the position of the zeros using machine
learning techniques.
Figure 1: FN vs Sequence length N (*), and exponential fit function(+)