OpenStax-CNX module: m12764 1 The Prime Number Theorem (PNT) ∗ F. Michel This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 1.0† Abstract The Prime Number Theorem is of key interest to number theorists owing to the importance of Riemann's work on the subject. But it can also be viewed as a tting function that approximates the distribution of the prime numbers. We provide such a "pedestrian" viewpoint and development using elementary mathematics. The famous mathematician Bernhard Riemann did some novel and exciting work in 1859 on the old question of how many primes there are below a given number, which is usually written π (n) This function is equivalent to asking, "what is the value of the k-th prime for any given k?" The latter is actually a more specic question since an arbitrary n will in general fall between primes. Indeed, the notation is most curious because π refers to the number of primes up to n, not to any prime per se, while n actually refers to the values of the primes (since π changes by unity every time n passes through a prime value. Thus if n is a prime, it is the π -th prime, pretty much the inverse of dening a function p (n) where p is the value of the n-th prime. We will encounter this inversion as a practical matter later. However, it is a testiment to Riemann's genius that he derived closed expressions that gave the exact number of primes. For example, the 4th prime is 7, but if you chose n = 8, 9, 10, Riemann's formula will give the same answer (4) for each of these values for n. For this reason, the notation seems reversed, but actually makes sense. The PNT as a theorem asserts that the following interpolation formulas, which attempt to smoothly t onto the trends in the distribution of the primes, are asymptotically exact. An early estimate was that π (n) was approximately n ln (n) This estimate systematically underestimates the numbers of primes, except possibly for staggeringly large n values of n. As a theorem it would state that in the limit of n going to innity, the ratio of π (n) to ln(n) is unity. The theorem is of particular interest to mathematicians in that it follows from an as-yet unproven conjecture by Riemann on the distribution of zeros of the zeta function, a complicated topic which is not required for our simple analysis here. An improved approximation is Li (n) ∗ Version 1.1: Apr 25, 2005 1:48 pm -0500 † http://creativecommons.org/licenses/by/1.0 http://cnx.org/content/m12764/1.1/ OpenStax-CNX module: m12764 2 where Li, the log integral function is dened to be n Z 2 1 dt ln (t) This estimate is substantially better and systematically underestimates the number of primes (again until one gets to staggeringly large values). Notice that the rst approximation results from considering the log n . The log term to be slowly varying and taking it out of the integral to get the (crude) estimate of ln(n) integral function diverges at n = 1, but this has no relevance to its applications at large n. Some large values (from Derbyshire, p. 116 [1]). n 10 8 10 9 1010 1011 1012 1013 1014 π Li − π 5,761,455 50,847,534 455,052,511 4,118,054,813 37,607,912,018 346,065,536,839 3,204,941,750,802 754 1,701 3,104 11,588 38,263 108,971 314,890 Table 1 Anyone used to examining errors for trends would see that the steady increase in the dierence Li − π by about a factor of 3 per decade in n suggests that there is a scaling correction that needs to be applied. 1 Calculating the primes The method of choice to calculate a list of primes is to use the "sieve" of Eratosthenes. Here one starts with a list the natural numbers. Then starting with 2, one removes every even number. The next number is 3, so now one removes every third number (half of these have already been removed by the 2). Here it is easiest to simply replace the values with zero, rather than literally removing the numbers. Now the next (non-zero) number is 5, and we set every fth number to zero. Then 7 and then 11. By eleven, the rst 121 non-zero numbers will all be primes (i.e., 11 squared). The alternative of testing each successive number to see if a smaller prime divided it would be hopelessly inecient. Here is a simple MATLAB program (easily translated into the program of your choice): 1 2 3 4 5 6 7 8 9 10 11 %make-prime routine using sieve N = 100; %largest possible prime in list rpr=linspace(1,N,N); %starting list=all numbers up to N nextp = 2; %next prime to remove from list (this is ALSO its location) for j=2:N if (nextp*nextp)<N %quit once all remaining nonzeros are primes for k=2:N %starts at 2 to preserve the prime itself if nextp*k <= N %don't exceed length of vector rpr(nextp*k) = 0; %the sieve! end end http://cnx.org/content/m12764/1.1/ OpenStax-CNX module: m12764 12 13 14 15 16 17 18 19 20 21 22 23 3 for n = 1:N if rpr(nextp+n) ∼= 0 %run up list until first nonzero value nextp = rpr(nextp+n); %update nextp break %stop looking further, otherwise exit at N end end primes=[ ]; %start with empty vector for n=2:length(rpr) %now list the prime values only, excluding "1" if (rpr(n)∼=0) primes=[primes,rpr(n)]; %add to existing vector of primes end end Here one simply chooses the value of N , and after executing the program, the vector "primes" will contain the list: 2; 3; 5; 7; 11; 13; ...; 97 (if N = 100). Note that the "sieve" itself never removes "1" and this number has all of the properties of a prime (not divisible by any other number besides 1 and itself). But it has no use in the unique factoring of natural (whole) numbers into primes, and is so excluded. This program has been checked against the list given in Abramowitz and Stegun (the primes up to 99,991, which are roughly the rst 10,000 primes). 2 Derivation of a PNT The sieve suggests a simple way of estimating the distribution of primes. After any given prime, there will be a density of non-zero numbers remaining. For example, after 2, half the remaining numbers will be nonzero. After the 3, one removes 13 but 21 are already gone, so we get 12 − 16 = 13 left. Since there are an innite number of numbers left, it is easier to think in terms of remaining blocks of numbers; after removing 2 and 3, we have blocks of 2 × 3 = 6, and in any arbitrary block of 6 consecutive numbers beyond the 3, there will be exactly 2 non-zero numbers. If we go to 5, the blocks become 2 × 3 × 5 = 30, and there will remain exactly 8 non-zero numbers in any block of 30 numbers beyond 5. It is easy to verify that this density evolves quite systematically. After each successive prime p, the average density per block drops by exactly a factor of p−1 p This density then corresponds to a mean distance between primes, since the next prime is chosen from the rst non-zero number in the following block, and we can guess this distance from the mean density. In particular, after each successive prime p, the density decreases by p−1 p and the mean distance correspondingly p . Let us call this mean distance after the k-th prime ∆k , while the k-th prime itself we increases by p−1 would call Pk . It follows then that the next prime will be approximately Pk+1 = Pk + ∆k but given this new prime, the mean step size increases to ∆k+1 = ∆k Pk+1 Pk+1 − 1 But now we are all done. If we chose the rst prime (P1 ) and the distance to the next prime (∆1 ), we can simply iterate these relationships indenetely. Indeed, we know these values: P1 = 2 http://cnx.org/content/m12764/1.1/ OpenStax-CNX module: m12764 4 and ∆1 = 1 A MATLAB program would simply be 1 2 3 4 5 6 7 8 %recursion on steps to generate prime number counts N=9500; %primes up to this number pr(1) = 2; %initial prime del(1) = 1; %intial step for k= 1:N pr(k+1) = pr(k) + del(k); del(k+1) = del(k)*pr(k+1)/(pr(k+1)-1); end We chose N = 9500 because this available on the tabulated lists, or if you created the list of primes, you should get primes (9500) = 98, 947. In contrast, the above program gives 102,014, which is too large by only about 3%. This result is rather astonishingly good given that the initial primes and steps between primes are anything but regular. 3 Obtaining the PNT Results It is easy to rearrange the equations as dierence equations and then write them as dierential equations. First we write Pk+1 − Pk = ∆k and ∆k+1 − ∆k = ∆k 1 Pk+1 − 1 For large k, we can approximate these as d P (k) = ∆ (k) dk and d ∆ (k) ∆ (k) = dk P (k) One might worry about simplifying Pk+1 − 1 to P (k), but the "corrections" are tiny, and unimportant as we will show. The ratio is d (P ) = P d∆ which integrates to give P = P1 e∆−∆1 which then gives Z N Z P dk = 1 P1 1 dP ∆1 + log PP1 Notice that this is in fact the log integral function of the improved PNT. However, because of the curious choice of notation, the natural symbols for π (here N ) and n (here P ) are reversed. http://cnx.org/content/m12764/1.1/ OpenStax-CNX module: m12764 5 Notice that the ∆1 in the denominator can be absorbed into the log PP1 and the rescaling P moves these terms to the limits on the integral. Eectively, this simply rescales Li (n). Accordingly, we can chose a ∆1 such that the curve is through the primes instead of being always o to one side. Such a rescaling makes perfect sense given that the PNT is just a t to the primes and such a t has to be global. Thus the t is not obliged to either pass exactly through the prime 2 nor have the initial step size be exactly 1. Somehow the PNT derivation of Li (n) has eectively slipped in an implicit assumption that ∆1 = logP1 (e.g., ∆1 = log2 = 0.6931...). In fact, choosing ∆1 = 0.625 provides a better t that at large k gives an estimate for the value of the primes that oscillates about the correct values, being o a few percent to either side. This oscillation shows that there is little point in trying to do "better" (e.g., trying to solve a more exact dierential equation above; in fact we just iterated instead). A signicant improvement could be gotten by letting ∆1 be a weak function of k. Notice that we have not proven the PNT, we have just derived the same asymptotic functions cited in the proofs. One can create a number of interesting relationships using these approximate expressions and the PNT itself. For example, if we integrate the log integral function between two (large) successive primes, the result should be just unity (i.e., one new prime). Since the logPk denominator will hardly change, the integral itself will be approximately Pk+1 − Pk (1) 1= logPk which can be rewritten as the recursion relation Pk+1 = Pk + logPk (2) which must be one of the simplest possible recursion relations for primes. If we start it with the obviously simplest choice P1 = 2, we get Figure 1. http://cnx.org/content/m12764/1.1/ OpenStax-CNX module: m12764 6 Figure 1: The rst thousand or so primes, plotting Pk against k. The uppermost (ragged) line are the actual primes, the dashed line just below it is Li (k), and the solid line is the simple recursion relation Pk+1 = Pk + logPk starting with P1 = 2. Another step is to replace Pk+1 − Pk with ∆k , namely the mean distance between primes. This gives us 1 1 1 limit logPk 1 − 1− ... 1 − = const kk→∞ P1 P2 Pk and the result for the rst million primes is plotted in Figure 2. The constant is actually e−γ where gamma is Euler's constant, 0.5772157... and the exponential is 0.56145946... and we can see the convergence to this value. This result was derived in 1874 and is known as Merten's theorem (but the proof was based on Riemann's (1859) paper and not on a causal idea of a "mean distance between primes"). The above product form is a favorite of number theorists because its inverse is a special case of the zeta function, namely the value of that function at unity, which happens to be an innity that just cancels the innity in the limit of Pk (in the sense of the above limit). http://cnx.org/content/m12764/1.1/ OpenStax-CNX module: m12764 7 Figure 2: Plot of logPk divided by ∆k . If we were to plot this function on ordinary linear scale, it would look like a step function starting at 0.346 and almost immediately jumping to what looks like the asymptotic value, and then being constant out to a value of almost 100,000. So we have expanded the vertical scale to show the trend and plotted the horizontal scale as a log plot to show how the "uctuations" die o with increasing size of the primes. Here a linear plot suppresses that trend because almost all of the points to the right of the origin would now correspond to "large" primes. In Merten's theorem we have used the "natural" denition for ∆1 of 12 , whereas when we used it in the recursion relation for primes we instead dened ∆1 = 1, since that was the actual spacing between the starting prime 2 and the next prime, 3. Alternatively, to agree with the log integral expression we would have had to dene ∆1 = log2. In recursion relations there is often such a freedom of choice, as well a "natural" choice. References [1] J. Derbyshire. Prime Obsession. Joseph Henry Press, 2003. http://cnx.org/content/m12764/1.1/