The application of machine learning to the Riemann zeta function is discussed in Good to Bad Gram point Ratio For Riemann Zeta Function. More techniques for applying Machine Learning to the study of Riemann zeta function can be found at Machine Learning. In the 1850’s Riemann came up with an intriguing hypothesis about the location of the roots of the Riemann zeta function, which challenges us to this day. The Riemann Zeta function is defined for Re(s) > 1 by
ζ(s) has a continuation to the complex plane and satisfies a functional equation:
ξ(s) is entire except for simple poles at s = 0 and 1. We write the zeroes of ξ(s) as 1/2+iγ. The Riemann Hypothesis asserts that γ is real for the non-trivial zeroes. We order the γs in increasing order, with
......γ−1 < 0 < γ1 ≤ γ2 .... (3)
Then γj = −γ−j for j = 1,2,..., and γ1, γ2, ... are roughly 14.1347, 21.0220, ....
There is an impressive body of empirical evidence for Riemann's hypothesis, but a formal proof has been elusive. The numerical studies have found regions where the hypothesis is almost violated (Lehmer's phenomenon), but no counter-example has been found. The numerical calculations are time-consuming, and would benefit from tools which give approximate locations for the zeros of the function. We applied neural network regression to aid the empirical studies of the locations of the zeros.
Asymptotically, for the Riemann zeta function the mean number of zeros with height less than γ is
Thus, the mean spacing of the zeros at height γ is 2π(ln(γ/2π))−1. Numerical analysis of the zeta function zeros takes advantage of the functional equation Eq. (2). One defines
(5)
where the argument is defined by continuous variation of t starting with the value 0 at t = 0. For large t θ has the aymptotic expansion
A consequence of the zeta functional equation is that the function Z(t) = exp(iθ(t))γ(1/2 + it), known as the Riemann-Siegel Z-function, is real valued for real t. Moreover we have |Z(t)| = |ζ(1/2 + it)|. Thus the zeros of Z(t) are the imaginary part of the zeros of ζ(s) which lie on the critical line. We are led to finding the change of sign of a real valued function to find zeros on the critical line. This is a very convenient property in the numerical verification of the Riemann Hypothesis. Another very helpful property is that many of the zeros are separated by the ”Gram points”. When t ≥ 7, the θ function is monotonic increasing. For n ≥ 1, the n − th Gram point gn is defined as the unique solution > 7 to θ(gn) = nπ. The Gram points are as dense as the zeros of ζ(s) but are much more regularly distributed. Their locations can be found without any evaluations of the Riemann-Siegal series. Grams law is the empirical observation that Z(t) usually changes its sign in each Gram interval Gn = [gn , gn+1 ). This law fails infinitely often, but it is true in a large proportion of cases. The average value of Z(gn) is 2 for even n and −2 for odd n [3], and hence Z(gn) undergoes an infinite number of sign changes. Given the desirable properties of the Gram points, it seems natural to use the values at these points as the feature set for the neural network regression.
The Riemann-Siegel Z-function is evaluated using the Riemann-Siegal series
where m is the integer part of sqrt(t/(2π)), and R(t) is a small remainder term which can be evaluated to the desired level of accuracy. The most important source for loss of accuracy at large heights is the cancellation between large numbers that occur in the arguments of the cos terms. We use a high precision module to evaluate the arguments. The rest of the calculation is done using regular double precision accuracy. The range of t studied is 267653395648.87 to 267653398472.51, which encompasses the range of zeros from 1012 to 1012 +104.
Neural Network Regression (NNR) models have been applied for forecasting and are included in state of the art techniques in forecasting methods. Specific neural network models can approximate any continuous function if their hidden layer is sufficiently large. Furthermore, neural network models are equivalent to several nonlinear nonparametric models, i.e. models where no decisive assumption about the generating process must be made in advance. Most forecasting models (ARMA models, bilinear models, autoregressive models with thresholds, non- parametric models with kernel regression, etc.) are embedded in neural network models. The advantage of neural network models can be summarised as follows: if, in practice, the best model for a given problem cannot be determined, it is a good idea to use a modelling strategy which is a generalisation of a large number of models, rather than to impose a priori a given model specification.
We use values defined at the Gram points for the input feature set provided to the neural network. This is because the positions of the Gram points can be determined to the desired accuracy without extensive costly calculations. We use a two layer neural network with 200 neurons in the hidden layer.
We train the neural network to separately predict two quantities. The first is the distance from a Gram point to the next zero. The second is the smallest zero difference in the Gram interval with the left hand zero lying in the Gram interval. The reason for choosing these output functions is that close pairs of zeros are interesting. They correspond to cases for which the Riemann Hypothesis is nearly false. Calculating the Riemann zeta function in regions where two zeros are very close is a stringent test of the Riemann Hypothesis (often described as Lehmer’s phenomenon since Lehmer was the first to observe such situations). We use the scaled difference:
Figure 1: Comparision of neural network prediction vs actual value for the distance from a Gram point to the next zero.
Figure 1 shows the comparision of the neural network prediction vs actual value for the distance from a Gram point to the next zero for 50 Gram points beginning at index 1000000008584. Figure2 shows the comparision of the neural network prediction vs actual value for the smallest zero difference in the Gram interval with the left hand zero lying in the Gram interval, for the same 50 Gram points.
In both cases, we see that the neural network gives fairly good predictions. It follows the peaks and dips of the output function faithfully. It is successful in reproducing the mean value as well as the variation of the output it is attempting to predict. Given this encouraging result, it seems worthwhile to investigate using more features in the input layer and more neurons in the hidden layer to increase the fidelity of the predictions.
Figure 2: Comparision of neural network prediction vs actual value for the smallest zero difference in the Gram interval (the left hand zero lies in the Gram interval).
References
https://people.math.osu.edu/hiary.1/amortized.html gives sample data for Z(t) using 10^7 zeros near each height T for T ~ 1e12 to 1e28.
Algorithms (described here and here) for the numerical evaluation of of zeta(sigma + iT) in about T^(1/3) time.
Gourdon Calculation.