Machine Learning

Deep Learning for the Riemann zeta function

The study of large values of the Riemann zeta function on the critical axis is a topic of mathematical interest. One such mathematical topic is the Karatsuba problem. It will be useful to have empirical results on the distribution of large values of the zeta function, to give insights into the theoretical studies. Since evaluating the Riemann zeta function at large heights is a non-trivial task, requiring much computer time (and some knowledge of special techniques to find the roots), we applied machine learning[1] to the problem. We trained an LSTM model to predict large values of the Riemann zeta on short intervals. The comparision of actual values with the predictions of the LSTM model[LSTM] are shown in the Figure.

[1] Deep Learning for Riemann Zeta Function: Large Values and Karatsuba problem

DOI: 10.13140/RG.2.2.25733.83681

[LSTM] Sepp Hochreiter, Ju ̈rgen Schmidhuber “Long Short-term Memory” Neural Computation , 9, 1735-1780 , (1997).

Machine Learning for predicting Riemann zeta zeros

The zeros of the Riemann zeta function have been studied from many different angles. This is because the Riemann Hypothesis is a deep and fundamental problem, which has defied solution for more than a century. On this page we explore how Machine Learning can be applied to possibly shed light on the patterns underlying the zeros distribution (see also Advanced Modeling and Optimization, Volume 14, Number 3, pp 717-728, 2012). This angle of attack has not got the attention it deserves, given the importance of the problem. The application of machine learning to the Riemann zeta function is discussed in Good to Bad Gram point Ratio For Riemann Zeta Function.

Why do we think that machine learning is a good tool to apply to the study of the zeros? It is because Machine Learning is an excellent tool to extract patterns when one has a huge amount of data. And indeed, with tens of trillions of the Riemann zeta function having been evaluated, one has the huge data set which would enable the application of machine learning techniques. Another encouraging indication that machine learning will be useful is a study of the entropy of the Riemann zeta zeros, which shows that the pattern of zeros shows a good amount of order.

Neural Networks: feature selection

How can we apply Machine Learning techniques to the problem? This is an open question, and one can think of different possible applications. Here we will concentrate on one potential use of Machine Learning. A great deal of effort has gone into finding the zeros of the Riemann zeta function. At large heights finding the zeros is a non-trivial task, requiring much computer time (and some knowledge of special techniques to find the roots). It would be useful to have some guide to the location of the roots. A preliminary study of features which could help locate the roots has been done in Neural Network Prediction of Riemann Zeta Zeros. That link has many useful definitions of terminology related to the Riemann zeta function. We will continue the exploration of this problem. Probably the biggest open task is to identify a feature set which is good at predicting zero differences, while not requiring excessive computing time. As explained[2] in Neural Network Prediction of Riemann Zeta Zeros, we believe that the behavior of the zeta function at Gram points is a good starting point to extract features for use in prediction (See also "Conjectures") . This reference found the following feature set to be useful for predicting the locations of the zeta zeros and for finding close pairs of zeros: the values of the Riemann-Segal Z-function at 21 Gram points surrounding the Gram interval in which we wish to predict the location of the zeros, the first ten terms in the Riemann-Segal series, and nine terms with the cos function replaced by the respective sin function, for a pair of consecutive Gram points. The first sin term is not useful because it is always zero at Gram points. Thus, the size of the input feature set is 40. The first cos term is +1 or -1, and we retained it as a way of including information about the odd or even character of the Gram Point. Actually, it may be useful to separate the odd and even Gram points, and build separate models for them. Given the symmetry properties of the zeta values, we will probably find that the two models will be closely related to each other. Furthermore, one can use symmetry properties of the zeta values at Gram points to impose restrictions on the models to be fitted. Also, it would be useful to see if one can use a partial sum of the terms in the Riemann-Segal Z-function, instead of the complete sum. If that restriction does not decrease the predictive power of the model too much, then it would give us a feature set which is significantly more economical to evaluate.
[2] O. Shanker, “Neural Network prediction of Riemann zeta zeros” Advanced Modeling and Optimization, 14, 717-728, (2012), tinyurl.com/4scve3nj.

Machine Learning: detection of rare events

Let us now specify more concretely the exact problem we are trying to solve, so we can come up with a good approach to the solution. We want to identify a set of features (for definiteness, let us say, the feature set specified above), and use the features to predict zeros which are located very close to each other. We have available to us extensive compilations of the location of zeros. The features are not available on the web, but we can calculate them. The main challenge is that the phenomenon we are trying to detect, namely zeros which are located close to each other, is very rare.

The approach we propose[3] is to estimate the expected probability density for the feature set, and to use the estimated probability density to identify values of the observed features which are "anomalous", in that the estimated probability for the occurrence of the observed values is below a set threshold. The hope is that the places where the anomalous feature values occur are the places where the Riemann zeta zeros deviate from the normal pattern of behavior. The deviations could be of different types (small zero differences, large zero differences, large absolute values for the "S" function, etc.). If the classifier is sufficiently selective, then we will have a smaller number of roots that we have to evaluate to check for interesting behavior.

Density estimation is a hard problem. We will use an approach described in J. Friedman, T. Hastie, and R. Tibshirani, "The Elements of Statistical Learning" Springer, 2001 (Section 14.2.4) to solve the problem. They discuss a technique for transforming the density estimation problem into one of supervised function approximation.To learn the input density, we generate two sets of data. The first set (class 0) contains the actual (sampled) inputs from the training data, whereas the second set (class 1) consists of randomly-generated feature values. I.e. class 0 represents a sample representing the actual density function, and class 1 represents samples drawn from the same input space, with a uniform density function. Given this, the two data sets are then used to learn a logistic regression model, whose output can be interpreted as an approximate estimate of the actual density of the input.
[3] O. Shanker, “Good to Bad Gram Point Ratio For Riemann Zeta Function”, Experimental Mathematics 30, 76-85, tinyurl.com/mwd5uwc5 (2021)

§ The feature set values occurring in the training data are assumed to be from the probability density function g(x)

§ consider a uniform probability density function over the same feature space, g₀(x)

§ generate as many random examples from g₀(x) as the number of training examples.

§ learn a logistic regression model for the log-odds f(x):

Once the model is built, all the actual inputs (class 0) are run through it, and the model log-odds outputs f(x) (which approximate the actual probability of being a valid input) are sorted in ascending order. A threshold output score is then determined. The threshold is chosen such that a pre-determined fraction of the inputs are classified as outliers. The model and the threshold are then used to detect anomalous inputs for new generated feature sets. In particular, all inputs for which the model predicts a value below the threshold are identified for detailed analysis of the patterns of zeros. For all this to be useful, the cost of generating new feature sets must be lower than the cost of calculating all the zeros.

https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem/

http://www-users.cs.umn.edu/~aleks/pakdd04_tutorial.pdf

Can Computers Think?