The equation x = dlog(x)

We give six algorithms to solve the equation:

(1). x = dlog(x), 1< x < p-1.

which can also be written:

(2). x = r**x (mod p), 1< x < p-1.

where in both cases I always choose r to be the least positive primitive root of the prime p. With a definite choice of r (such as this) there may not be any solutions. In [1] it is proved that for any prime p>3 there exists some primitive root r for which there is a solution.

Example 1. p=65537 (the Fermat number F4), r=3. There are 3 solutions, x = 6504, 27376, 28185.

It is unusual to find three solutions. I make the (rather obvious) conjecture is that the number of solutions is Poisson with mean 1, i.e. P(i) = 1/(e*i!), i=0,1,... Here are a few values:

number. probability

0 0.3679

1 0.3679

2 0.1839

3 0.0613

4 0.0153

5. 0.0031

..................

(2) gives an easy way to find the solutions in time O(p), which we will call Algorithm S (slow). Put f(x) = r**x (mod p). We run through the values of f(x), x = 1, ... , (p-1)/2.

i. If x = f(x), then x is a solution of (2)

ii. If x + f(x) = (p+1)/2, then X = x + (p-1)/2 is a solution of (2) - since then X = p - f(x) = f(X).

Two ways to improve on Algorithm S are described later.

Example 2. I solved (2) for the first 100 primes above 10**n, for n = 3,..,9:

number. 3 4 5 6 7 8 9 Average. Poisson

0 41 38 42 36 39 39 36 38.71 36.79

1 40 34 36 44 36 28 41 37 36.79

2 14 21 16 15 17 27 13 17.57 18.39

3. 4 5 4 4 8 4 9 5.48 6.13

4 1 2 2 1 0 1 1 1.14 1.53

5. 0 0 0 0 0 1 0 0.14 0.31

This note will mainly describe three faster algorithms, which run in time O(p**2/3), and find any existing solution with any required probability a < 1. The best one is:

Algorithm Q

We choose a small integer K and numbers A and B around p**1/3 (A<B) and T around p**2/3, such that (K+1)p < AT

The idea of the algorithm is to hope that one of the numbers x + kp, 0<=k<=K is of the form:

(3). x + kp = ut, A <= u <= B, t <=T

If so, we will find the solution. We will consider two versions of the algorithm, Q1 and Q2, according as u is or is not restricted to prime values in the interval.

From (3) we have:

(4) dlog(x) = dlog(ut) = dlog(u) + dlog(t) - m*(p-1), where m = 0 or 1.

Combining (1), (3) and (4) we get

(5) ut = kp + dlog(u) + dlog(t) - m*(p-1), where m = 0 or 1.

Choose an integer q around p**1/3. The idea of algorithms Q1 and Q2 is to find solutions of (5) taken as a congruence (mod q), and then to check each solution to see if it satisfies (5) as written.

The outline of the algorithms is simple:

1. Compute a table of dlogs (mod p) up to T. (It is also possible to use a smaller table, and compute further dlogs as required).

2. for j from 0 to q-1

3. search the integers t<=T, t = j (mod q) for solutions of (5)

Step 3 breaks down into:

3.1 Run through the numbers t<=T, t=j (mod q). Partition these integers into q sets S[j], 0<=j<=q according to the value of dlog(t) (mod q).

3.2 Run through the ranges of m,k, and u. (In Algorithm Q1, u is restricted to prime values). For each set of values, we can compute from (5) the required value of dlog(t) mod q. From the appropriate set S[j] we can find the pairs (t,dlog(t)), if any, which meet this condition. For each such pair, put x = (dlog(u) + dlog(t) ) mod (p-1), and test whether ut = x + kp. If so we have found a solution. In general, we have to try m = 0 and 1 separately. But if q divides (p-1), if suffices to try m = 0.

Note that these algorithms can, and often do, find the same solution several times.

Example 3. I tried algorithms Q1 and Q2 on the first 20 primes above 10**10, with the parameters: K=3, A=4000, B=8000, T=10**7, q = 3001.

The total number of solutions was 23. Algorithm Q2 found 20 of these solutions, but Q1 only 4.

For the prime, p = 10**10 + 33, for which r = 5, there are three solutions. Algorithm Q1 found none of them but Algorithm Q2 found all three, one of them being found twice. The details were as follows:

x u t k

29001,15299 6385 35,86549 2

56642,32779 7831 7,23309. 0

56642,32779 6794 23,05598 1

85719,26329 5026 76,74478 3

To estimate the prob of success of Algorithm Q, we consider the following problem:

Let P1(A,B) and P2(A,B) be the probabilities that a random large integer has a prime factor or integer factor in a given interval [A,B].

Example 4. P1(4,6) = 1/5, P2(4,6) = 1/4 + 1/5 + 1/6 - 1/20 - 1/12 - 1/30 + 1/60 = 7/15 (by the inclusion/exclusion principle).

It is well-known [2, Theorem 429] that the product of (1 - 1/p) over primes p<=x is asymptotic to C/ln x, where C = 1/(e**gamma) = 0.56146.

So P1(A,B) is asymptotic to 1 - ln (A)/ln(B). For example, P1(A,A**2) tends to 1/2.

I think there is no such simple result for P2, but it is easy to estimate both by a Monte Carlo calculation.

Example 3 (contd). With A=4000, B=8000, we calculate P1(A,B) = 0.0771. I took 10**6 random integers in the range [1,10**10], and obtained P1(A,B) = 0.0769 and P2(A,B)=0.340.

In this case, we have P2/P1 = 4.42 ,while the number of integers in the interval divided by the number of primes is 4001/233 = 17.17.

We can now estimate the probabilities of that Q1 and Q2 will find any given solution. With K=3, we have 4 chances of finding any given solution, so our prob. of success is

1 - (1 - z )**4, where z is the prob. of success for each value of k. Putting z = P1(A.B) and P2(A,B) gives the results 0.274 (Q1) and 0.810 (Q2)

I first ran a version of Algorithm Q in 1991 on a computer with very small storage as follows:

Algorithm E (Early version of Q)

The idea of this to take the integer t to be a product t = vw of integers v and w in specified ranges. We use d. logs (mod q) to obtain the solutions of vw = j (mod q), v<=w. We discard pairs (v,w) which repeat a value vw already obtained. For the others we compute dlog(vw) = ( dlog(v)+dlog(w) ) mod (p-1) and continue as in Algorithm Q. The algorithm is less reliable since only some of the integers t = j (mod q) will be obtained.

Here is another small storage algorithm, which however has the same defect as Algorithm E

Algorithm U

Going back to (5), another idea is to fix u, k and m, and try to find t<=T such that the right hand side is divisible by u. Apparently this can only be done efficiently by again taking t as a product t = vw of smaller integers. We replace (5) by:

(6). uvw = kp + dlog(u) + dlog(v) + dlog(w) - m (p-1), 0 <= m <=2

Then the method is as follows. Choose u, A <= u <=B. Divide the values of w into u sets S[i] according to the value of dlog(w) (mod u). Now run through the values of v, k and m and use (6) to find the required values of dlog(w) (mod u). Then the stored information gives the possible values of w (if any) to make the right hand side of (6) divisible by u. In each case, put x = ( dlog(u) + dlog(v) + dlog(w) ) mod (p-1), and test if uvw = x + kp.

We now describe two ways to solve (2) with certainty in time O(p/log(p)). Since these are more complicated than the obvious algorithm S, the advantage is at best small and might be non-existent (in my programs in Python they do gain a factor of around 3). Both these ways work best when r=2, and we consider only this case. As before put

(7) f(x) = 2**x (mod p),

so that we are trying to find x, 1 < x < (p-1)/2 such that

(8) x = f(x), or

(9) x + f(x) = (p+1)/2

In both algorithms we use the current values of x and f(x) to obtain a small integer k, 1 <= k <= K (with K in the range 10 to 20) such that we can make a jump from x to x+k (by multiplying by 2**k (mod p)), knowing that we will not miss a solution of (8) or (9).

Algorithm M (modular).

We choose an integer q, such that we have enough storage for a 2-dimensional array of q*q small integers. We use the numbers i = x (mod q) and j = f(x) (mod q) to access the (i,j) element of the array obtaining the required jump k. The array is computed once and for all at the start of the algorithm.

The idea is as follows. To get from f(x) to f(x+1) we multiply by 2 and possibly subtract p. Thus given the value of f(y) (mod q), there are two possible values for f(y+1) (mod q). Thus there are two values of y (mod q) which might lead to a solution of (8) with x = y+1, and two more values which might lead to a solution of (9). Any other value cannot lead to a solution and so a jump of at least two is then possible. Continuing this argument leads to the precomputed array.

Algorithm I (Intervals).

We divide the interval [0,p] into N equal parts, which we will call big intervals (e.g. N=100). Each big interval is divided into 2**(K-1) small intervals, making Q = N * 2**(K-1) small intervals in all. We use a 1-dimensional array of length Q. We can find which small interval contains f(x) by computing i = int( f(x)*Q/p ), where "int" means integer part. The i-th element of the array gives the required jump k. As x moves across a particular big interval, say [L,U], to have any chance of a solution of (8) we must have f(x) also in that interval. If so, f(x-1) must lie in one of two half-intervals:

[L/2, U/2] or [L/2 +/- p/2, U/2 +/- p/2],

where the sign is chosen to get an interval within [0,p]. When this occurs the corresponding array elements must have value 1. Considering equation (9) in the same way leads to more elements of value 1. We continue this process for K-1 stages. Those array elements not then decided take the value K. The array must be recalculated each time x moves into a new big interval - but only half of the big intervals need to be considered.

Reference:

1. M.Levin, C.Pomerance and K.Soundararajan, Fixed points for discrete logarithms, ANTS IX Proceedings, LNCS 6197(2010), 6-15.

2. Hardy and Wright, Introduction to the Theory of Numbers (4th edition).

(Last modified 5th July 2021)