Optimization of Miller-Rabin algorithm with Sieve of Eratosthenes accelerator

An optimum number of trial-division primes as a function of Miller-Rabin generated prime number size.

Introduction.

Although probabilistic, Miller-Rabin is still computationally expansive algorithm, especially when it is used to generate large prime numbers. One way to unload some of computational complexity is “pre-screen” generated prime number candidates with a simple trial division algorithm. Sieve of Eratosthenes may be used to generate an array of small primes (please see http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes for more details on Sieve of Eratosthenes algorithm). This array may be cached in memory and reused to “pre-screen” prime candidates whenever prime number generating function is invoked. If none of the array elements can divide prime candidate without a remainder then Miller-Rabin algorithm should be launched to complete prime test. Complexity of trial division method depends on a size of prime candidate and a size of the array containing small primes. Complexity of Miller-Rabin method in general case depends on a size of prime candidate and number of “witnesses” of prime candidate compositeness (please see http://en.wikipedia.org/wiki/Miller–Rabin_primality_test for the detailed description of Miller-Rabin prime test method). Since number of “witnesses” is usually a fixed constant in practical implementations, Miller-Rabin complexity will only depend on a size of prime candidate. So combination of trial division method and Miller-Rabin method may be optimized by varying length of the array of small primes used in trial division method to minimize complexity and hence, time required to run combined trial-division-Miller-Rabin prime test implementation. This idea came to me when I was working on my own long exponent implementation of RSA algorithm (please see http://en.wikipedia.org/wiki/RSA for the detailed description of RSA algorithm). My custom implementation as a variety of other implementations offers selection of public/private key size. This size determines the size of prime number candidates verified by trial-division and Miller-Rabin prime tests. Reading GNU RSA developer message boards I found that GNU implementation at that time used fixed size array of 300 small primes to run through trial-division prime test stage. Surprising to me this number was fixed and didn’t depend on generated prime number size. For my own RSA implementation, based on long exponents, I decided to find a functional dependency of optimum number of small primes used in trial division from the size of generated prime number candidate.

Optimization Problem.

From prime number theorem number of primes not exceeding number x can be easily estimated as

pi(x) = x/ln(x)

(see http://en.wikipedia.org/wiki/Prime_number_theorem). If the largest small prime number used for trial division is not exceeding number x then trial division algorithm will cover approximately x^2/2^k of all possible divisors of a large prime number candidate which can be recorded using k bits. Total computational costs incurred by division by prime numbers ranging from 2 to x will be equal to Cd*x/ln(x), where Cd is a cost of one division operation. Now if we assume that Miller-Rabin algorithm will cover rest of possible dividers, then we can write down the following utility function:

u(x)=(1–x^2/2^k)*Cmr+x^2/2^k*x/ln(x)*Cd, (1)

where Cmr – computational costs incurred by Miller-Rabin algorithm to verify that k-bit long prime number candidate is actually a prime number. We can find number x value which will minimize value of our utility function u(x) (1) and hence costs which are incurred by prime number verification. To do this we simply need to solve the following equation which involves finding of first derivative of (1):

du(x)/dx=0 (2.1)

(2.1) should be solved for x and then we need to verify that second derivative is positive, i.e.

d^2u(x)/dx^2>0 (2.2)

When both inequality (2.2) and equation (2.1) hold, then argument x gives minimum value of u(x) (1).

Solution To Optimization Problem.

First derivative (2.1) of the utility function (1) is equal to:

du(x)/dx=-2*x/2^k*Cmr+3*x^2/ln(x)/2^k*Cd-x^3/ln^2(x)/x/2^k*Cd=(-2*x*Cmr+x^2/ln(x)*Cd*(3-1/ln(x)))/2^k

equating du(x)/dx to 0 and further simplifying this we have:

Cd/Cmr/2*x-ln(x)/(3-1/ln(x))=0 (3)

Carefully measuring Miller-Rabin and large number division computation complexities I found that

Cmr=42585663 and Cd=163759,

where Cmr and Cd are expressed in number of elementary arithmetic operations necessary to perform one Miller-Rabin test or one large number division of 4096 bit long number by one small primes. Cmr and Cd values were determined for my custom implementation by overloading +,-,*,/ operators with versions that used static counters to measure usage of each operator. This turned out to be very simple but extremely useful instrumentation approach.

Now solving (3) numerically yields the following result:

x=1304.232275 and pi=x/ln(x)=181.8158413 (4)

This just means that running trial division on ~182 small primes before launching Miller-Rabin test will minimize combined complexity of trial-division and Miller-Rabin methods on a condition that we can show that (2.2) holds for (4). To do this let’s find second derivative (2.2) and compute its value with x argument equal to (4):

d^2u(x)/dx^2=(-2*Cmr+Cd*x/ln(x)*(6-5/ln(x)+2/ln^2(x)))/2^k>0 (5)

Further simplifying (5) we have:

-2*Cmr/Cd+ x/ln(x)*(6-5/ln(x)+2/ln^2(x))>0 (6)

Substituting (4) into (6) yields:

-2*Cmr/Cd+ x/ln(x)*(6-5/ln(x)+2/ln^2(x))=-520.1016494+971.2319979=451.1303485>0 (7)

Since (7) is positive, (4) minimizes (1).

In real world several supported key sizes are known in advance. Hence developers may use this technique described above to pre-calculate optimal number of small primes that corresponds to each supported key length and create static lookup table that references them. On initialization Sieve of Eratosthenes algorithm should generate number of small primes for trial division part of combined prime test which corresponds to the lookup table entry for a given key length.