mega_storage

A Cubic Runtime algorithm for the last 1024 digits of Mega

For reasons that I will elaborate on, mega is actually a relatively inexpensive number to compute. The exponential runtime/space I encountered initially was due to trying to either store the entire order (not necessary), or alternatively by computing any position in the order by going through all previous positions (also not necessary). It turns out that modular exponentiation works such that there is actually a simple procedure that will give an arbitrary position in time log(order). In 2015 I revisited the problem of computing the digits of mega and other numbers defined in steinhaus-moser notation as part of a research project in computer science at my college. The results of this research was an exponential improvement in the number of digits that could be computed. To understand how this is possible we have to first understand how modular exponentiation works.

For our purposes when we want to carry out modular exponentiation for "n" digits we want to evaluate:

(b^p)%(10^n)

Where the "%" is the modulo operator. Let m = 10^n. There are some useful properties of modular exponentiation. Namely:

Property I

(b^p)%m = ((b%m)^p)%m

and

Property II

b^(x+y)%m = [ (b^x)%m * (b^y)%m ]%m

The first property means we only ever need the last "n" digits of the base, and the second property means that we needn't have to compute every (b^i)%m for i < p in order to compute

(b^p)%m. Firstly we have the property that for fixed b and n, (b^p)%(10^n) must always become a periodic function of p. That is to say, there always exists some k such that:

b^(p+k)%m = (b^p)%m

for every p greater than or equal to some constant c.

To understand this recall that:

the integers modulo "m" will always have exactly "m" possible congruences (remainders), namely 0 through m-1. Therefore %(10^n) has exactly 10^n congruences, which is strictly finite. Next we have the following iterative relation:

b^(p+1)%m = [ (b^p)%m * b%m ]%m

via property II. In other words, to find the next congruence in the sequence of a_p = (b^p)%m, we need only take the prior congruence, multiply by the base modulo m, and then take modulo m of the result. Note that the end result depends solely on the previous result, the base, and the modulus. We assume here the "base" and "modulus" are fixed for a particular case. Therefore, every congruence in the sequence must be followed by a predetermined congruence next in the sequence. Combining these two results guarantees that such sequences are always periodic. The length of their period I refer to as their "order". To understand why this leads to periodic behavior consider a finite set of congruences:

{0,1,2,3, ... ,m-1}

Generate a sequence by first choosing a member of this set for a_1. Chose some positive integer constant called b. For every successive member compute it as follows:

a_(n+1) = (a_n * b )%m

Because we taking modulo m, we are guaranteed that every successive member must be in the initial set of congruences. What we are now left with is an iterative process, where each congruence has a predetermined path to the next congruence. Regardless of the base, the initial value, or the modulus, the following must hold.

Assume there exists i,j such that a_i = a_j

If this holds for any pair i,j, then the sequence must be periodic. Let's assume j represents the larger of the pair of values. In this case:

a_(j+1) = (a_j * b)%m = (a_i * b)%m = a_(i+1)

this implies:

a_(j+2) = (a_(j+1) * b)%m = (a_(i+1) * b)%m = a_(i+2)

by induction we must conclude that:

a_(j+n) = a_(i+n)

for all n. Therefore a_i is a periodic function since we can set k = j-i, and let n >= i, and we obtain the following:

a_n = a_(n+j-i)

Therefore in order for such a sequence to be non-periodic there must not exist any i,j such that a_i = a_j

We then consider the sequence inductively.

a_1 must be in the congruence set. a_2 must also be in the congruence set but

a_2 != a_1, Since a_2 is not a_1, there are only m-1 possible congruences it could be.

Furthermore a_3 != {a_1,a_2}. There is only m-2 possible congruences it could be.

In general

a_(k+1) != {a_1,a_2, ... ,a_k}

and therefore has only m-k possible congruences. Since there are only a finite number of possible congruences, no matter how these congruences are choosen eventually we must exhaust the set. The lastest this can possibly happen is at a_(m+1), since a_(m+1) must not equal {a_1,a_2,a_3,...,a_m} but this must represent every possible congruence. Therefore there must exist an i, for 1 <= i <= m, such that a_(m+1) = a_i. Thus no matter what happens all such sequences are periodic. It should be noted that actually the exponentiation in this case is not essential. Any recursively defined sequence where the successive congruences are predetermined will fall into this same periodic behavior. At minimum we just need an integer-to-integer function, f(n), and then we define:

a_1 is a member of {0,1,2,3,...,m-1}

and

a_(n+1) = f(a_n)%n

This is actually becomes important when we look at computing the last digits of numbers even larger and more recursively complex than mega.

While this property is handy to prove that all such sequences are periodic, it doesn't put many restrictions on the order of the sequence. Again we can draw a few useful conclusions. Firstly there always exists a pair i,j such that a_i = a_j. If this is the case then we know that the congruences in the sequence repeat at most every j-i members. However it's important to note that j-i will not be identical for every pair i,j we could chose. To understand why, let's apply the condition that the congruences repeat every j-i members to our equality and see what happens. Firstly note that:

a_i = a_i+(j-i) = a_j

Therefore it must also be true that:

a_j = a_j+(j-i) = a_2j-i

a_2j-i = a_2j-i+(j-i) = a_3j-2i

...

and in general

a_j+q(j-i)

So we might chose...

a_i = a_j+q(j-i)

in which case the distance between these members is:

j+q(j-i)-i = (q+1)(j-i) = (q+1)k

This shows that if we have a period of k, we also have a period of any multiple of k. This is implicit in the definition of a periodic function since:

a_n = a_n+k

implies...

a_n+k = a_n+2k

So just because we've found an i,j such that a_i = a_j, it doesn't mean that j-i is the order. However j-i must be an integer multiple of the period, so we can say that the order must be a factor of j-i. Is there a maximum possible size to j-i? Yes. Since a_(m+1) must be equal to some a_i with 1<i<=m, it follows that the maximum possible difference is:

m+1-i = m+1-1 = m

where i = 1

The minimum possible difference would be:

m+1 - m = 1

Nothing about this guarantees that the order has to be any particular value in this range, at least not for arbitrary recursively defined sequences. This presents something of a problem for the computation of terminal digits of mega and other recursively defined numbers. The basic idea is that for any sequence b^p%m there must be an order for the above reasons. Therefore, we may generally find b^p%m by simply finding the power, p, modulo the order:

(b^p)%m = b^(p%order)%m

The problem is that unless the order is a factor of m, we run into the problem that we might need to know the full value of p in order to compute p%order. On the other hand, if order is a factor of m, then we don't have a problem since p is already a congruence in {0,1,2,...,m-1} and this is all we need to determine p%order.

The iteration for mega then goes something like this. First we generate a congruence sequence:

a_i = b^i % 10^n

Conveniently we have that:

a_1 = b^1 % 10^n = b % 10^n

So the first member of the sequence is always just the base value.

We know that a_(10^n+1) must equal some previous member. Ideally we want a_(10^n+1) to equal a_1. If this is the case then we have returned to the base in order 10^n, or some factor there of, and this means that whenever we attempt to compute b^p % 10^n for very large "p" we only need to know the last "n" digits of "p" since this is equivalent to b^(p%order)%10^n. Even if the base occurs multiple times in this sequence, this is okay as long as it also occurs at a_(10^n+1). Now if we were to compute these congruences in order then its runtime would be 10^n, which is exponential. This means that we would only ever be able to check the orders for only a handful of digits. Given a processing speed in the GHz range and a couple of hours of computation this would still only amount to around 12 digits, for example. Thankfully this is not necessary. Due to properties one and two, we can break up "p" instead into it's decimal digits:

example:

p = 894750987345

b^894750987345 % m

( b^800000000000%m * b^90000000000%m * ... b^300%m * b^40%m * b^5 %m )%m

Furthermore we can quickly compute b^10^i for any power by simply iterating...

b^100%m = (b^10%m)^10%m

b^1000%m = (b^100%m)^10%m

b^10000%m (b^1000%m)^10%m

etc.

Because of this we needn't take 10^n steps to check the sequence. We can do it in 10log(10^n), by computing the following sequence:

b^1%m

b^2%m

b^3%m

...

b^9%m

b^10%m

b^20%m

b^30%m

b^40%m

...

b^90%m

b^100%m

b^200%m

b^300%m

...

etc.

b^10^n%m

10log(10^n) simplifies to 10n steps, but we may ignore the constant term. So we can simplify say the time is proportional to "n" the number of digits. This potentially opens the doorway to computing millions or even billions of digits, but for reasons to come up later that turns out to not be practical for the entire algorithm.

If for whatever reason a_1 != a_(10^n+1), then the program simply terminates. So it can not get stuck in an infinite loop searching for the base value, as my previous program did.

If the sequence checks out it then proceeds to compute the values 256[3]_i in order.

If 256[3]_i = b^p%10^n for some p, 1<=p<=10^n we call p the "mode" of the congruence. Not only must the congruences of 256[3]_i be stored, but at every stage we must also store their respective mode. So by definition we have:

256[3]_i = b^(mode_i)%10^n

Computing the next value in the sequence then becomes fairly routine. We have:

256[3]_(i+1) %10^n = (256[3]_i)^(256[3]_i)%10^n

= (b^mode_i)^(256[3]_i)%10^n

= b^( mode_i * 256[3]_i ) %10^n

= b^( mode_i * 256[3]_i % 10^n )%10^n

In theory each one of these calculations can also be done in log(order) time, and so we might naively conclude that the entire runtime is proportional to "n". However for technical reasons it's not. Firstly the values computed in this program are too large to be stored even in the largest available integer data type, so a special arbitrary precision data type has to be written instead. This is done by arrays. An array is simply a sequence of same sized memory addresses in a computer. For the purposes of my program, each one of these addresses is a size of 8-bytes (64 bits). The addresses are indexed in the source code by the non-negative integers. For example array[0] would be the value stored at "address 0", array[1] would be the value stored at "address 1", and array[i] would be the value stored at "address i". Because of read and write issues, the total number of addresses in an array must be specified upon allocation. How can we use arrays to store very large numbers. Very simple. Let's say we have a large integer, N. Since N is an integer it must have a one's place digit, a ten's place digit, a hundred's place digit an so on. In this case we store each digit of N in a separate address in the array. So we let:

array[0] = N%10

array[1] = (N/10)%10

array[2] = (N/100)%10

and in general

array[i] = (N/10^i)%10

It may seem that we are storing the digits backwards, but in fact this order is very convenient as the position corresponds to the power of 10 the place value represents. For example array[3] would represent the 10^3 place value or 1000s place value. There is so strict limit on the size of arrays, so they can be as large as we want provided it can be allocated in the available memory. For my laptop the maximum available RAM is about 3GB. With 8-byte addresses this works out to a maximum of 402,653,184 addresses. So raw data limits us only to about 400 million digits. But we never end up needing that many because runtime quickly become infeasible well below this limit.

To understand why we need to consider how the calculations are actually carried out. Both mode_i and 256[3]_i are stored as arrays of digits. To multiply them together a special function is written to "multiply" the two arrays. It does this by taking every possible pair of digits selecting a digit from each number, multiplying them together, and then accumulating them in the correct place value in the resulting array. Since every number is fixed at "n" digits (determined at the outset of the program by the user), there are n^2 multiplications to be carried out. So every time we preform a multiplication between two arrays it actually takes n^2 time to carry out. How many multiplications must be carried out? Well after we compute the new mode, mode_(i+1) by computing mode_i * 256[3]_i % 10^n, we next have to compute b^mode_(i+1)%10^n. This will take 10log(mode_(i+1)) time, and since mode_(i+1) < 10^n (because it's a congruence modulo 10^n) this means it always takes less than 10log(10^n) multiplications. But since each multiplication takes n^2 steps we have:

10log(10^n) * n^2 = 10 * n * n^2 = 10 * n^3 ~ n^3

Thus the algorithm is a cubic runtime. What about space? The great thing is that we only need to store a single mode and congruence at a time. So space is minimally 2n. In practice however a few additional arrays are used to handle data transfer and so on, but this is always just a small constant less than say 10. So space is proportional the "n" and we can say the program is only linear space. This is a vast improvement over the previous implementation.

This new implementation was codified in a new c++ program, mega91.cpp. Interestingly mega91.cpp was an upgraded version of mega89.cpp which was a c++ implementation of the same program I had written on my TI-89 calculator (hence the name). The results were impressive.

16,32, and 64 digits were computed in a mere matter of seconds. Next 128 digits were computed in a mere 90 seconds. Then I computed 256 in approximately 9 minutes. I estimated that every time I doubled the numbers of digits the runtime is multiplied by a factor of 8. To compute the last 512 digits took 1 hr and 12 minutes. Lastly I computed the last 1024 digits in 8 hours and 12 minutes. Because the next calculation would take approximately 64 hours I didn't immediately run the program. But about a month later I decided to go for it. I set the compute up to compute the last 2048 digits. It took 77 hours and 18 minutes, and was computed over the course of 4 days and the Independence day weekend. Here is a table of the runtimes:

The last digits were then stored in a series of screen shots:

The last 256 digits of mega are listed under "MODULUS(256):"

Here are the last 512 digits of mega

The last 1024 digits of mega

... and finally the last 2048 digits of mega