Prime numbers play a vital role in the Fundamental Theorem of Arithmetic and Cryptography. Thus, it is important to be able to calculate the amount ("count") of the Prime Numbers which exist in an arbitrary range of natural numbers. The easiest way (in my opinion) to do this is to calculate the "number of primes less/equal to x", and use different values of x for the desired range. For example, "How many prime numbers exist between 10 and 20?" ... Well, calculate the number of primes less/equal to 20, and then subtract the number of primes less than 10. I hope we all agree this formula will give the desired answer!
The problem thus becomes, how to calculate the number of primes less/equal to x... So far, mathematics can't give us a firm/precise answer with a simple formula. This means if one wants an answer to the "counting" problem, then (so far) only two options are available: brute-force computer examination of all numbers in a range, or approximation.
π(N) = count of prime numbers less than or equal to N
A few exact values include: - π(2) = 1 (only 2)
- π(3) = 2 (two prime numbers: 2 and 3)
- π(4) = 2 (same as above)
- π(5) = 3 (three prime numbers: 2, 3, and 5)
- π(6) = 3 (same as above)
- π(7) = 4 (four prime numbers: 2, 3, 5, and 7)
- π(8) = 4 (same as above)
- π(9) = 4 (same as above)
- π(10) = 4 (same as above)
- π(11) = 5 (five prime numbers: 2, 3, 5, 7, and 11)
- π(12) = 5 (same as above)
- π(13) = 6 (six prime numbers: 2, 3, 5, 7, 11, and 13)
- π(14) = 6 (same as above)
- π(15) = 6 (same as above)
- π(16) = 6 (same as above)
It is fairly easy for a computer (or modestly skilled mathematician) to determine if a single number (X) is prime... just test weather it is exactly divisible by any lower natural number (except 1). If the test number (X)
is exactly divisible by some other (lesser) number, then (by definition) it is not prime. Hopefully you see (and agree) that exhaustively testing all lower numbers (all N less than X) is reliable, but better suited to computers than humans.Even if we decide to use computers (eliminate the human error/slowness factors), there is still the problem of scale. For example, if we wish to test if 16 is prime, we need to test if it is exactly divisible by any lower primes: is it divisible by 2, 3, 5, 7, 11, or 13? This is a minor nuisance for a human, and no problem for a computer -- simply do 6 different division problems, if any have a remainder of zero, the number in question (16 in this example) is NOT prime.
Also remember, we want to count the total number of primes in a range. So we have to repeat the same process just described for every number in a range. For example, if we wanted to know the number of primes less than 20, we would have to ask our "brute-force" computer to test 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. That is 18 tests each which requires testing division by all lower numbers. In case this isn't obvious, it means our poor computer (if it uses "brute force") must do 1 calculation (to answer primes less than 2), 2 calculations (to answer primes less than 3), 3 calculations (to answer primes less than 4) .... etc. To summarize, the brute force method will require N * (N - 1) calculations, which is roughly N
^{2} calculations.The above may be too abstract, so let me give you a real example... if you wanted your computer to calculate how many numbers between 1 and 1000 were prime, the "brute-force" method would require approximately 999
^{2} = 998,001 tests of division. Modern computers (which can do millions of math ops per second) wouldn't complain too much about this example... but it is a problem of SCALE. What if you needed to do the same/similar calculation (998,001 tests) several thousand times (for example, to make a video)... is your computer becoming "slushy" yet? And worse, what if you wanted (or in the case of cryptography) needed a prime number in the range of (for example) 9 digits... this would require (approximately) 999,999,999 * 999,999,999 "brute force" calculations... or about 1,000,000,000,000,000,000 (1*10^{18}) division tests. If each test required a microsecond , then all tests would require 10^{18} operations / 10^{6} operations/second = 10^{12} seconds = 2.778 x 10^{9} hours = 116 x 10^{6} days = 316 887 years! That's right, over 300 thousand years with "brute force"Going a bit off-topic, a 9-digit (decimal) number is roughly equivalent to a 24-digit (binary) number. Nobody uses 24-bit encryption because it can be cracked with a moderate amount of effort. Modern standards include 128-bit and 256-bit encryption. This is equivalent to 37 and 76 decimal digits, respectively. So, as a further example, a "simple" 37-digit [decimal] encryption would require roughly 10
^{37} * 10^{37} = 10^{74} calculations for a preliminary prime number analysis (the begins of cracking the encryption). This would take many many years with modern technology.... even if you used clusters of computers to magnify the processing speed!!
The alternative is to sacrifice precision for speed. With a well-tested formula, one can calculate the approximate number of primes very quickly (1 microsecond or less with modern computers). One of the earliest (and most common) approximations is:
where ln(X) is the natural logarithm (base e), and the symbol "~=" means "asymptotically equal" (never perfect, but increasingly better as N becomes larger). From now on, I will call this PNC_1. A better approximation (in my opinion) is the following:
The above approximation has been proven to be the lower bound of the Prime Number Counting function (for moderately large N... 55 or more). From now on, I will call this PNC_2. Another, more complex, relation is:
where 'err' is a positive real number... Please note the above is true/proven fact of prime numbers! In other words, the relation does not give an exact value, but the possible values are strictly limited.... this is what I call a good (or at least acceptable) approximation! From now on, I will this PNC_3.
If you look at the above two equations (PNC_2 and PNC_3), then hopefully you will see that PNC_2 is a "stronger" limit than the left side of PNC_3. If you combine both of these facts, you get following (more precise) relation:
The only unknown above (besides the user-value N), is the value of 'err'. What should this value be? Many mathematicians have produced very complex equations to give a better approximation... and as far as I can tell, they all involve the the Riemann Zeta Function. This "zeta" function requires Complex Analysis in order to understand its true effects on prime numbers. (I'm not qualified to speak further... but I truly believe this complex analysis is not strictly needed for prime number approximation). Based on everything above, and with some "real world testing" I developed my own approximation, which is "very good" in a generic sense, and which can be actually quantified (in "true math" sense)....
The best relation I have discovered so far (you can test for yourself) is the following relation:
My "best relation" is simply the "proven" relation with 'err' equal to "2 / ln(N)"... and also selecting the "better" first term [N / ln(N)-1 is 'better' than N / ln(N) or N / ln(N)-1+err]. I tried using a slightly larger value than 2 in the 'err' [ for example 2.14159 / ln(N) ], and this gives better answers (if used in the following equation), but for very large N (like 10 ^{20} or more), the results got really bad. Using 2.1111, I never saw things go bad, but I can't calculate arbitrarily high values (with accuracy), so I suspect that might fail (also, 2.1111 is something I just made up). I feel pretty confident with 2.0 ...I realize the above may be "too abstract" to understand (not just you... but me!) Anyway, my simple/final approximation is this:
Although my formula is more complex than the classic approximation (N / ln(N), it converges MUCH faster , and doesn't involve anything fancy like integrals or the zeta function). The accuracy reaches 3 digits rather quick (around N = 100,000). Beyond that (bigger N), the accuracy improves slowly; for example, at N=10 ^{20}, you only get four or five digits of accuracy.Below is a table showing the actual values of π(N), the classic approximation, my approximation, and a better approximation called the Li(N) function on Wikipedia [it is equal to ∫N 2 dx/ln(x) ].
Because the above table might be confusing (or intimidating), let me give a verbose description of a single row (before we dive into the following analysis). So the 3rd row has: - N = 10
^{3}= 1000 = 1 thousand (how many prime numbers exist which are less/equal to one thousand?) - π(N) = 168 (there are 168 prime numbers less/equal to one thousand)
- classic approx. = 145 (the "classic" approximation says there are only 145 primes less/equal to 1000)
- classic error = -13.7% (the "classic" approximation is 13.7%
**less**than the true number of primes)
- my approx. = 174 (my approximation says here are 174 primes less/equal to 1000)
- my error = 3.57% (my approximation is 3.57%
**more**than the true number of primes) - Li approx. = 178 (the 'logarithmic integral' approximation says there are 178 primes less/equal to 1000)
- Li error = 5.95% (the 'logarithmic integral' approximation is 5.95%
**more**than the true number of primes)
Hopefully, now, you can understand each row in the table (as described above). If not, stop reading now! The following is an analysis of the full table (if you don't understand one row, you can't understand the summary of all rows). - The "classic" approximation always under-estimates the actual value (except for N < 100)
- The "classic" approximation is EXTREMELY slow to converge to the real value... there is still a 3% error when N = 10
^{15}(1 quadrillion)! - My approximation is the worst below N=100
- My approximation is the best between N=1,000 and N=5,000,000
- My approximation is better than the 'classic' for all N > 100 (but in particular, for N > 5,000,000)
- My approximation always under-estimates the actual value when N > 100,000
- The "logarithmic integral" function is the best for all N > 5,000,000
- The "logarithmic integral" function ALWAYS over-estimates the actual value
- The "logarithmic integral" converges the fastest (but is the most complex)
Because the "logarithmic integral" always over-estimates the real value, and "classic" approximation always under-estimates the real value (except for N<100), it seems there is room for improvement. My approximation begins with over-estimation, but soon (for N > 100,000) becomes an under-estimation. I'm thinking the error value (e) described above, which is used in my approximation, can (should?) be changed for better results. By limited trial-and-error, I came upon 2 / ln(N) as the error value (used for all my approximations)... but by reading more on Wikipedia, I'm thinking the error value might be better expressed as a function of arctan? [ Something like 1/π * arctan( π/ln(N) ) ? ] If I find a better approximation, which is simpler than the Logarithmic Integral (Li[x]) and simpler than the Zeta function, I will update this page.
Okay, I tried using an 'err' expression of e/2 * actan( π/2 / ln(N)
), and was really excited because it converged quicker than previous
approximation... however, after further investigation, I realized the
approximation was getting worse for large N (such as 10
^{22} or more). Next I tried slightly different 'err' expression of 4/3 * actan( π/2 / ln(N) ), this converged almost as fast and never 'failed' up to values of N = 10^{22}. However, I can't guarantee it will work (be a good estimate) at much larger N (like 10^{25}).Although better than previous approximation (see above), it is a bit more complicated. That, combined with my non-gaurantee just described, it probably good reason NOT to use my new and 'better' approximation. I'll publish the results, in case you are curious...
This new approximation is based upon "Mobius Inversion" (where the arctan originates).... I don't know enough advanced math to explain it, so check out Wikipedia if you are curious! ANYWAY, below is yet another table about "prime number counting".... except it now uses my "improved" formula... ,
I won't attempt to describe the layout of each row in this table (see comments for previous table if you are curious). However I will point out a few facts:
- My NEW approximation converges faster than my old/original approximation
- My NEW approximation is better than the fancy Li(N) function, when N < 20,000,000 (approximate)
- My NEW approximation is less than the real/true value, when N < 500,000 (approximate).
- The "Li(N) approximation" is the best when (approximately) N > 20,000,000
If you're curious, the highest value I tested was N=10 © H2Obsession, 2016, 2017 The final two tables of 'Prime Number Counting' are based on data from Wikipedia (http://en.wikipedia.org/wiki/Prime-counting_function, retrieved 11 July 2016) |

Miscellaneous >