A STUDY OF THE ANATOMY OF THE INTEGERS VIA LARGE PRIME FACTORS AND AN APPLICATION TO NUMERICAL FACTORIZATION By TODD MOLNAR A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2012 c 2012 Todd Molnar ⃝ 2 ACKNOWLEDGMENTS There are far too many people without which this thesis could not have been written; however, I owe special thanks to my parents William and Deborah Molnar and my brothers Bradley and Andrew Molnar. Without their support this thesis simply would not have been possible. I would also like to thank Dr. Peter Sin and Dr. Andrew Rosalsky for their many useful comments, and in particular I would like to thank my advisor Dr. Krishnaswami Alladi for his superb guidance and exceptional advice. 3 TABLE OF CONTENTS page ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 CHAPTER 1 INTRODUCTION AND HISTORY . . . . . . . . . . . . . . . . . . . . . . . . . . 2 THE DISTRIBUTION OF PRIMES AND PRIME FACTORS . . . . . . . . . . . 35 2.1 2.2 2.3 2.4 3 Notation and Preliminary Observations . The Prime Number Theorem . . . . . . The Hardy-Ramanujan Theorem . . . . Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 35 44 53 66 ARITHMETIC FUNCTIONS INVOLVING THE LARGEST PRIME FACTOR . . . 69 3.1 The Alladi-Erdős Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2 Ψ(x, y ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3 Generalized Alladi-Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4 THE KNUTH-PARDO ALGORITHM . . . . . . . . . . . . . . . . . . . . . . . . 101 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4 Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science A STUDY OF THE ANATOMY OF THE INTEGERS VIA LARGE PRIME FACTORS AND AN APPLICATION TO NUMERICAL FACTORIZATION By Todd Molnar December 2012 Chair: Krishnaswami Alladi Major: Mathematics Ostensibly, this thesis attempts to discuss some important aspects of the theory surrounding the anatomy of the integers (to borrow a term from de Koninck, Granville, and Luca in [15]), by which I mean the theory surrounding the prime factorization of an integer, and the properties of these prime factors. The anatomy of the integers is a very dense topic which has attracted the attention of theoretical and applied mathematicians for many years, due largely to the fact that the questions involved are often difficult and can be approached from many different angles. Virtually every branch of mathematics has benefited from the study of the prime factorization of the integers including (but certainly not limited to) number theory, combinatorics, algebra, and ergodic theory (to name but a few). This thesis will concern itself with both the theoretical and computational aspects of this study, and will use that theory to understand the algorithmic factorization technique introduced by Knuth and Pardo in 1976. To fully understand this algorithmic procedure, we will discuss a good deal of theory related to the distribution of prime numbers and the largest prime factor of an integer. The thesis is divided into four chapters, Chapter 1 is an introduction which gives a brief overview of the motivation and rich history surrounding these deep problems. Chapter 2 sketches two different proofs of the Prime Number Theorem, which is an indispensable tool in addressing problems related to integer factorization, as well as containing 5 several comments related to (currently) unproven results concerning the distribution of primes (in particular, the Riemann hypothesis). One of the proofs of the Prime Number Theorem we have included in this section appears novel, although it undoubtedly could be deduced by any individual with sufficient knowledge of analytic number theory. Following the work of Alladi, Erdős, Knuth, and Pardo, Chapter 3 develops the necessary theoretical results concerning the largest prime factors of an integer. To this end, proofs of the average value of the Alladi-Erdős functions, the average value of the largest prime factor, a proof of a special case of Alladi’s duality principle, and certain density estimates are supplied. The proofs presented here related to the Alladi-Erdős functions differ from their original proofs in that we use the theory of complex variables, whereas in their original paper Alladi and Erdős derive their results elementarily. This allows us to improve their estimates for bounded functions, as well as showing the connection of these estimates with the Riemann hypothesis. The proof of Alladi’s duality principle is also derived in an analytic fashion, although not in its most general form. This differs from Alladi’s original treatment of the problem which is entirely elementary; however, in using analytic techniques we need to impose certain bounds on the arithmetic functions in question, whereas the elementary approach holds unconditionally. Further results on the largest prime factors of integers are also included, but they are due to Knuth and Pardo. The final section, Chapter 4, formally introduces the Knuth-Pardo factorization algorithm and includes their proof that the probability of a random integer between 1 and N with k-th largest prime factor ≤ N x , for any given x ≥ 0 approaches a limiting distribution; furthermore, we quote several results from the paper of Knuth and Pardo that will prove useful for studying their algorithm. 6 CHAPTER 1 INTRODUCTION AND HISTORY This thesis will concern itself primarily with questions related to numerical factorization, that is, the study of the prime decomposition of integers-both from a theoretical angle, and from a practical point of view-namely, the numerical factorization of integers into primes and prime powers, and the running time for such a procedure. We will take the basic factorization algorithm introduced by Knuth and Pardo as a prototypical example due to its rudimentary nature and intuitive appeal. A study of this algorithm provides theoretical insights into the multiplicative structure of the integers, and leads to a useful application of the algorithm itself. Following the work of Knuth and Pardo [14], their algorithm will be treated in a more formal fashion in Chapter 4; however, a brief overview of the method will be enlightening as an introduction to factorization techniques. For a given integer n, we test for a prime divisor of n among the numbers 1 < p ≤ n1/2 , and when we find a prime divisor p such that p|n we divide the integer n by p and repeat the process for m = n/p < n until all prime divisors of n have been √ determined, which will occur when we find a prime p|m such that p ≥ m. It is to be noted that when the Knuth-Pardo algorithm terminates, it has already determined the largest prime factor of the integer being factored, and therefore it offers a way to determine the largest prime factor. There has been a considerable amount of work done on the distribution of the prime factors of integers, but this work is scattered in the literature. Also, while there are many excellent books on factorization and analytic number theory, a systematic study of the prime factors of integers with the goal to understand their anatomy is not included in these books. This thesis will focus on developing the necessary theoretical results on the anatomy of integers using both elementary and analytic number theory to understand the Knuth-Pardo factorization method [14]. It is therefore the hope of the author that this thesis will be of independent interest to those who are already familiar 7 with the published literature, as well as offering a useful introduction to those who are not. To fully appreciate the dense theory surrounding the largest prime factor of an integer, the distribution of the prime numbers, and the algorithmic techniques developed by Knuth and Pardo, we will take a brief digression in this chapter to discuss the history and motivation for studying the questions of this thesis. It is not clear when the classification of integers into the two distinct categories of being prime or composite was first considered. Prime numbers are those integers n such that if n = ab then a = 1 or a = n, composite numbers are those for which there exist integers a and b such that n = ab with a, b > 1. Note that it is a trivial fact that there exists infinitely many composite numbers; however, it is an important and highly non-trivial fact that there exist infinitely many prime numbers. Again, it is unclear who first proved this observation rigorously as it was made so long ago that no records survive. However, this result was known to the Greek mathematician Euclid who included its proof in Book VII of his textbook The Elements at around 300 B.C. and therefore this result is typically attributed to Euclid by modern authors [5]. While previous mathematicians may have been aware of this result, it was Euclid who first recognized the necessity to systematically develop the theory of numbers and geometry from basic axioms and it is for this reason that Euclid is considered the father of both geometry and number theory. Euclid’s proofs are also brilliant in their simplicity; for example, Euclid’s proof that there exist infinitely many primes easily generalizes to prove that any infinite commutative ring with unity contains infinitely many prime ideals (for the proof of this fact see the introductory text [13] by I. Martin Isaacs). Hence, this basic result of Euclid anticipated the work of later mathematicians by almost two millennia; but Euclid was also a great disseminator of mathematical knowledge, and his magnum opus, The Elements, is considered by many to be the single most influential text book ever published in the field of mathematics. This can hardly be 8 understated, The Elements is the second most published text in history (second only to the Bible), was used by Arab and European mathematicians throughout the ancient and medieval world, was one of the first mathematical texts to be set in type (by Venetian printers in 1482), and is still widely referenced in our modern age [5]. For example, the physicist Sir Isaac Newton learned geometry from Euclid’s texts as late as 1667, and the philosophers Immanuel Kant (who died 1804) and Arthur Schopenhauer (who died in 1860) vehemently defended Euclid’s geometry in their writings. The re-examination of Euclid’s work in the 18th and 19th centuries in the hands of N. I. Lobachevsky, J. Bolyai, C. F. Gauss, and B. Riemann led to the discovery of so called non-Euclidean geometries. Furthermore, it should be noted that while over two thousand years have elapsed between the time of its original publication, virtually every introductory number theory textbook still uses Euclid’s proof to show the infinity of prime numbers. Around about Euclid’s lifetime there lived another mathematician known as Eratosthenes of Cyrene, who in addition to being a mathematician was also a celebrated poet, athlete, geographer, and astronomer. Eratosthenes is perhaps best remembered for being the first person to accurately measured the circumference of the Earth. Sometime around 270 B.C. Eratosthenes developed what may be considered the first algorithm for determining the prime numbers less than a given integer. This method, known as the sieve of Eratosthenes, rests on the following simple observation (which we √ will use throughout this thesis): if n = ab and a, b > n then n = ab > n, which is a √ contradiction; hence, if n = ab and a ≥ b then a ≥ n ≥ b. Therefore, if n is composite, √ then it must always contain a prime factor less than or equal to n. Eratosthenes noted that if one lists all the numbers from 1 to N, then starting by circling the prime two, strike off every second (i.e. even) number beyond it, then go to circle the prime three and strike off every third number beyond it, and continue this process until you reach the √ N, then all the numbers which have been crossed off are composite. All the numbers which have not been crossed off are precisely the prime numbers less than or equal to N. This 9 process is successful precisely because we are eliminating those numbers less than or equal to N which are composite, i.e. since each of these composite numbers must have √ a prime factor p such that p ≤ N, all of these numbers have been eliminated. Thus, we have an effective method for determining prime numbers. It is interesting to compare the two methods utilized by Euclid and Eratosthenes to address the theory of prime numbers. Euclid’s proof is very useful from a theoretical standpoint, as it always guarantees (within certain bounds) the existence of a prime number, and in fact a direct application of his proof shows that the number of primes p ≤ x is greater than log log(x), for x > 2, and a slightly more sophisticated argument allows us to improve this result to log(x)/2 log(2) (see Propositions 2.4.1 and 2.4.2 in [12]). However, despite its theoretical utility and its ability to derive nontrivial lower bounds, Euclid’s method does not offer any hope of explicitly computing the prime numbers p ≤ x. By contrast, Eratosthenes method allows one to explicitly compute all primes p ≤ x; nevertheless, at present no sieving procedure (be it the sieve of Eratosthenes or its more modern generalizations such as the Legendre sieve or Brun’s sieve) can even prove the infinitude of prime numbers (which is easily proved using Euclid’s theorem). This highlights a major theme which will follow us for the remainder of the thesis: theoretical results, however powerful, are often useless when attempting to answer questions related to the explicit computation of the prime factors of integers. However, these questions may be answered quite easily by using computational methods (such as the Sieve of Eratosthenes or the Knuth-Pardo algorithm). Of course, computational methods rarely address questions related to the distribution of primes, fortunately, this is precisely where theoretical results show their mettle. Hence, if we truly wish to understand the prime factorization of an integer we must familiarize ourselves with the allied disciplines of computational and theoretical number theory. After the fall of the Ptolemaic Dynasty at the hands of the Roman Empire, the era of Greek mathematics in which Euclid and Eratosthenes lived came to an end. The 10 waning of the great Greek Empires brought with it a stagnation in the state of number theory, and particularly the theory of prime numbers. For many years number theory, algorithms, and algebra (relying upon personal research and translations of Greek mathematical works) was the exclusive domain of Eastern mathematicians living in the Arab world, and it is for this reason that the words Algebra and Algorithm have their roots in the Arab and Farsi languages. It was not until after the middle ages that modern western authors, such as P. Fermat and L. Euler, noted how fundamental the primes numbers were to basic questions of arithmetic, and began to reexamine their properties. It is often the case in number theory that one studies a given arithmetic function f (n), where if n = ab and a and b share no common prime factors (i.e. are relatively prime) then f (n) = f (ab) = f (a)f (b). Arithmetic functions with this property are called multiplicative functions; and many fundamental arithmetic functions are indeed multiplicative. Although Euclid essentially proved in antiquity that all integers can be written uniquely as the product of prime powers, this statement was not rigorously proved until the 18th century by the mathematician C.F. Gauss [5], and is of such importance to the theory of numbers that it is often referred to as the Fundamental Theorem of Arithmetic. Hence if one is given a multiplicative function f , and as distinct prime powers are relatively prime, it follows that if we can determine the values of f at the prime powers then we may always determine the value of f (n) where n is composite. This fact alone motivates us to delve deeper into the properties of the prime numbers. An interesting digression follows from not only studying the primes in the ring Z but by studying primes in a general commutative ring R. Of course, as should be expected, there are a few technicalities when dealing with R instead of Z. If I is a submodule over R such that I ⊂ R then I is called an ideal (although R satisfies this property it is customary to only look at ideals I ̸= R, which are called proper ideals). An element u ∈ R is called a unit if u −1 ∈ R, i.e. u is a unit if it is invertible. A nonunit a ∈ R is said to be irreducible if whenever a = bc, with b, c ∈ R then either 11 b or c is a unit (note that this definition of irreducibility was the original definition of being prime, until later developments in algebra separated the two ideas). A nonunit a ∈ R is said to be prime if the ideal generated by a, (a) = P, has the property that if bc ∈ P then either b ∈ P or c ∈ P. Of course, in Z the two concepts of irreducible elements and prime elements coincide; however, in general it is not necessary that irreducible elements must be prime. Furthermore, Gauss’s observation that integers factor uniquely into prime powers does not hold in rings in general, as the example of √ √ √ 6 = (2)(3) = (1 + −5)(1 − −5) demonstrates that in Z[ −5] the integer 6 may be factored into the product of two irreducible elements in two different ways. Nevertheless, the concept of unique factorization can be generalized to ideals in R by imposing the condition that such ideals factor uniquely as the product of the powers of prime ideals. This leads to the concept of Dedekind Domains, which have a central place in the modern theory of commutative algebra (unfortunately it would take us too far afield to discuss these algebraic structures in their full generality). Essentially, it is our hope that the reader will recognize how important Gauss’s theorem on unique factorization is. It is of further interest to note that while the Greek and Arab mathematicians of antiquity were well aware of the infinity of prime numbers, and that they were all able to prove theorems concerning prime numbers, there is no recorded evidence that any mathematician from this era considered the question of how many prime numbers p ≤ x or, essentially, how are the primes distributed throughout the integers? Furthermore, there is no reference to the elementarily equivalent question of how large pi (the i th prime number) is? While one can easily speculate as to why these questions were not asked (namely, due to hindsight, we now know these questions are very deep), it is entirely possible that the ancient’s lack of more advanced mathematical techniques made it nearly impossible for them to address such questions. Also, it is only natural that the Greeks would have published only what they could prove; hence, while there is no evidence of anyone trying to prove results concerning the distribution of prime numbers 12 in antiquity, this should not be taken as evidence that the question was of no interest to someone in the ancient world. The resolution of this question would occupy the attention of many mathematicians for several centuries, before ultimately being resolved at the end of the 19th century. Let π(x) denote the number of primes less than or equal to x. In the late 18th century, the French mathematician A.M. Legendre, using extensive numerical evidence, conjectured that π(x) would be about x log(x)−A for some constant A. He also further conjectured that A ≈ 1.08366.... This constant appears to have been chosen largely so that the estimate would fit the data available. At around the same time Legendre made this conjecture, the prodigious German mathematician C.F. Gauss (also using extensive numerical data) speculated that π(x) would be about ∫ li (x) = 2 x dt x x + ... = + log(t) log(x) log2 (x) (1–1) Note that while the ratio li (x)(log(x) − A) =1 x→∞ x lim for any value of A, the function li (x) is a far stronger approximation than what was conjectured by Legendre. Another reason why Gauss’s estimate is superior to that of Legendre’s is that, had the two individuals had the extensive lists of prime numbers that we have today, they would have noted that x |li (x) − π(x)| < − π(x) log(x) − A for larger values of x, [7]. To point out the superiority of Gauss’s estimate over that of Legendre’s, of the 78,498 primes less than one million li (x) gives an estimate of 78,628, a difference of 130; whereas, x log(x)−A gives an estimate of 72,372, a difference of about 6,116 13 (for further results see [7]). For larger values of x this discrepancy only becomes larger, and by simple numerical estimates it became apparent that li (x) was the better approximation to π(x). The only problem was that for all of their brilliance, neither Gauss nor Legendre could prove this conjecture, thereafter referred to as the Gauss-Legendre Conjecture by many notable authors, and later came to be called the Prime Number Theorem. While the proof of the Prime Number Theorem eluded mathematicians for the majority of the 19th century, its utility could not be ignored. As was stated before, many useful functions in number theory are multiplicative; therefore, if one could determine the approximate distribution of the prime numbers then many other questions in number theory could also be solved. Thus, a proof of the prime number theorem became a sort of ”holy grail” for 19th century mathematicians. The next notable contribution to the study of π(x) came from the prolific Russian mathematician P.L. Tchebyschev. Around 1850 Tchebyschev proved that 0.89 x x < π(x) < 1.11 , log(x) log(x) (1–2) so that the order of magnitude of π(x) conjectured by Gauss and Legendre was indeed correct. Tchebyschev also proved that if there exists an A such that the relative x is minimized (i.e. if there exists a ”best possible” A in error of π(x) − log(x) − A the Gauss-Legendre estimate), then A = 1, disproving Legendre’s observation that A = 1.083.... Furthermore, Tchebyschev demonstrated that if the limit lim x→∞ π(x) log(x) =C x (1–3) exists, then C = 1 (proofs of all of these results can be found in [7]). This result illustrates the subtleties which arise when attempting to prove the Prime Number Theorem, as Tchebyschev’s result states that the Gauss-Legendre conjecture is correct 14 π(x) log(x) approaches a limit. It is tempting to take this result too far x and conclude the Prime Number Theorem, but recall that there is no reason (given the π(x) log(x) results known to mathematicians before 1850) that must approach a limit at x all! if and only if Tchebyschev’s methods were subsequently refined and given sharper bounds in the years following the publication of his memoir on π(x), with the best known result due to the mathematician J.J. Sylvester who improved Tchebyschev’s estimate to .956 x x < π(x) < 1.045 ; log(x) log(x) unfortunately, neither the brilliance of Sylvester, nor that of any of his contemporaries, was capable of improving the Tchebyschev estimate to demonstrate the validity of the Gauss-Legendre conjecture. This fact was lamented by Sylvester, who concluded his article improving Tchebyschev’s estimate [28] with the statement that in order to prove the Prime Number Theorem ”...we shall probably have to wait until someone is born into the world as far surpassing Tchebyschev in insight and penetration as Tchebyschev has proved himself superior in these qualities to the ordinary run of mankind.” In many ways the dream of J.J. Sylvester was already realized several decades earlier by the next major advancement in the theory of prime numbers, which came from the German mathematician G.B. Riemann. Riemann’s approach to the Prime Number Theorem differed significantly from that of his predecessor’s in that he began to unleash the powerful techniques of complex analysis to answer the question. If we let ∞ ∑ 1 ζ(s) = ns n=1 then a well known identity due to L. Euler shows that for ℜ(s) > 1, )−1 ∞ ∏( ∑ 1 1 . = 1− s ns p p n=1 15 (1–4) Hence, the connection between the above series and the prime numbers was known from the time of Euler, and was used to great effect by Euler and Tchebyschev; however, they appear to have only considered this series as a function of a real variable s ∈ R, [5]. Riemann’s great insight was that by considering the above series as a function of a complex variable one could then extend it to a function for any s ∈ C, s ̸= 1 (a technique which in modern terminology is known as analytic continuation); hence, this function is now called the Riemann Zeta function in his honor, and following Riemann’s notation is denoted ζ(s) for s ∈ C, s ̸= 1. Using Mellin inversion (a variation of Fourier’s inversion technique) Riemann obtained an analytic expression for the function π(x), and gives this solution as ∫ ∞ ∑[ ] 1 dt ρ 1−ρ 1/2 π(x)+ π(x )+... = li (x)− li (x ) + li (x ) −log(2)+ , (1–5) 2 2 t(t − 1) log(t) x ℑρ>0 for x > 1, where ρ represents a complex zero of the function ζ(s) [7]. This equality demonstrates the very important connection between the complex zeros of the zeta function and the distribution of the prime numbers, in particular, it demonstrates that if one can show that ℜ(ρ) < 1 for all complex zeros ρ, then the Prime Number Theorem could be proved. In addition, Riemann speculated that ℜ(ρ) = 1 2 for all complex zeros ρ of the zeta function. This last statement is the still unproven Riemann Hypothesis, and is considered by many to be one of the most important unsolved problems in mathematics (see [7]). Riemann summarized all of his results concerning the distribution of prime numbers in his influential 8-page paper Ueber die Anzahl der Primzahlen unter einer gegebenen Grösse (On the number of primes less than a given magnitude) which he submitted to the Prussian Academy in 1859 in thanks for his induction there. Riemann states in its first paragraph that he wishes to share some of his observations with the Academy, and it is for this reason perhaps that Riemann does not include rigorous proofs of 16 many of the results which he derives. Sadly, Riemann died shortly after the publication of this memoir, so we will never truly know if these statements were conjectures or well-reasoned theorems whose proofs were simply omitted for brevity. Nevertheless, Riemann’s paper essentially outlined for future generations a way to prove the Prime Number Theorem, and over the course of the next 40-years several mathematicians would fill these gaps and resolve the conjecture of Gauss and Legendre. The last decade of the 19th century saw great strides in the theory of functions of a complex variable. These results allowed the German mathematician H. von Mangoldt to rigorously prove identity (1-5) in 1894 [7], as well as several other assertion made by Riemann concerning the complex zeros of ζ(s). The French mathematician J. Hadamard also succeeded in proving several necessary results on the function ζ(s) to resolve some of Riemann’s statements; however, in 1896 the two mathematicians J. Hadamard and C. de la Vallée Poussin finally demonstrated that ℜ(ρ) < 1, a necessary result which they used to successfully prove the Prime Number Theorem (for more on the history and proofs of these theorems see [7]). Of the two proofs, de la Vallée Poussin’s is the deeper and will be sketched in section 2.2, while Hadamard’s is the simpler. Hadamard’s proof demonstrates that π(x) = li (x) + R(x), where (R(x)/li (x)) → 0 as x → ∞, whereas de la Vallée Poussin’s proof demonstrates that we may take R(x) to be some function which grows no faster than a constant times √ −C log(x) , for some constant C > 0, i.e. xe π(x) = li (x) + O(xe −C √ log(x) ). (1–6) De la Vallée Poussin established this bound by showing that if ρ = σ + i τ with σ, τ ∈ R, then 17 σ >1− c , log(τ ) (1–7) for some constant c > 0; hence, ζ(s) has no zeros in some region about the line ℜ(s) = 1. However, if one could prove Riemann’s hypothesis that ℜ(ρ) = 1 2 then the error term could be improved substantially, that is, one may take R(x) to be a function which grows no faster than a constant times x 1/2 log(x). Therefore, optimizing the error term in the Prime Number Theorem required knowledge of the complex zeros of ζ(s) which were not available to mathematicians of the 19th century. Largely from this motivation, number theorists of the 20th century began to further explore the properties of functions of a complex variable. The advent of the 20th century brought with it a vast increase in the applications and worldliness of number theory research. Advances in technology made communication amongst mathematicians easier and results could disseminate more quickly than in the past. One may note that the majority of the work done on the Prime Number Theorem during the 19th century was accomplished by mathematicians working in continental Europe. England, at the time, was still stymied in its reverence for the past (then the cornerstone of the English educational system) and lagged in the theory of functions of a complex variable. As an example, students in Cambridge University in 1910 would still prefer the use Newton’s cumbersome notation for differentiation over that of Leibniz, and with all due respect to Newton’s mathematical talents, Leibniz’s notation is clearly superior. However, the first two decades of the 20th century brought English mathematicians into the continental discussion over the distribution of the prime numbers, with great effect. During this time G.H. Hardy and J.E. Littlewood advanced the theory of functions of a complex variable, and even succeeded in giving their own proofs of the Prime Number Theorem which were far simpler than the originals supplied by Hadamard and de la Vallée Poussin. They also demonstrated the logical equivalence of the Prime Number Theorem with the statement that ζ(s) ̸= 0 where ℜ(s) = 1. G.H. 18 Hardy also succeeded in showing that there are infinitely many zeros ρ of ζ(s) such that ℜ(ρ) = 12 , and went on to give more specific estimates of how many zeros of ζ(s) lie on the line ℜ(s) = 1 2 (for these results, and their extensions, see [7]). To this day G.H. Hardy’s work is some of the strongest evidence that Riemann’s Hypothesis is true, although his work does not provide a proof of this statement. The work of Hardy and Littlewood on functions of a complex variable, while motivated primarily by questions related to number theory, extends far beyond the implications of these theorems for arithmetic functions. A classical result due to the celebrated Norwegian mathematician N.H. Abel states that if f (z) = ∞ ∑ bk z k (1–8) k=0 is a power series with radius of convergence 1, converging for z = 1 such that all bi ≥ 0 and ∞ ∑ bk < +∞, k=0 then lim f (z) = z→1− ∞ ∑ bk = f (1) (1–9) k=0 This theorem has an analogue for Dirichlet series, which are basic objects in the study of analytic number theory; therefore, it became apparent that if a Dirichlet series could be shown to satisfy certain conditions, then Abel’s theorem would imply important properties about the arithmetic functions generating these Dirichlet series. One irritating fact for mathematicians who lived during Abel’s time is that the full converse of the above theorem is not true, and in fact this oversight led even the very talented Abel to publish some erroneous results concerning infinite series. It would not be until 1897 that a partial converse to Abel’s theorem would be discovered by the German mathematician 19 A. Tauber. Tauber’s theorem states that if we let f (z) be the power series in equation (1-8) with radius of convergence 1 and suppose that there exists an ℓ ∈ C such that lim f (z) = ℓ, z →1 where 0 ≤ z < 1. Furthermore, if ∑ bn n = o(x) n≤x then, lim f (z) = ∞ ∑ z→1 bk = ℓ < +∞. (1–10) k=0 This theorem, which allows one to solve for the sum of the coefficients of a power series based solely on the analytic nature of the given power series, led to a new and powerful analytic approach to answering questions in the theory of numbers (though it should be noted that Tauber was not a number theorist, and the application of his theorem to prime number theory would have to wait for the work of later authors). Hardy and Littlewood, recognizing the utility of this approach, generalized the results of A. Tauber in order to deduce their own proofs of the prime number theorem; furthermore, they added a new word to the mathematical lexicon by referring to results of this type as Tauberian Theorems, in honor of the work of Alfred Tauber. Informally, any result that deduces properties of a function from the average of that function we will refer to as a Tauberian theorem; and conversely, any result which deduces properties of the average of a function from the properties of that function we will refer to as an Abelian theorem. In general, Tauberian theorems are of greater interest to number theorists than are Abelian theorems because the latter class of theorems is essentially an exercise (one is given a function and so one must calculate its average) whereas the former class of 20 theorems attempts to impose restrictions upon what functions, if any, can possess a given average. In 1913 Hardy and Littlewood showed that the results of Tauber’s theorem would follow from far more general assumptions. In 1931 the Serbian mathematician J. Karamata succeeded in giving a much simpler proof of the Hardy-Littlewood Tauberian theorem, and it is for this reason that the Tauberian theorem of Hardy-Littlewood is often referred to as the Hardy-Littlewood-Karamata theorem. Unfortunately, even using the Hardy-Littlewood-Karamata theorem it is not particularly straightforward how one can deduce the prime number theorem in a ”simple” manner, and it was not until 1971 that Littlewood supplied a ”quick” proof of the prime number theorem using this method; however, it should be noted that Littlewood’s proof requires several deep results on the analytic nature of the Riemann Zeta function (see [29]). One disadvantage of the Hardy-Littlewood-Karamata theorem is that it can only deal with singularities of a standard type, i.e. singularities of the form (s − a)−b , where a ∈ R and b ∈ Z. This deficiency was overcome by the combined work of several prominent mathematicians. In 1931 the American mathematician N. Wiener and the Japanese mathematician S. Ikehara (an erstwhile student of Wiener’s) extended the work of Hardy, Littlewood, and Karamata to include singularities of the type s −ω−1 , where ω > −1 is any real number ([29]). Their approach, in its modern formulation, also owes much to the work of the English mathematician A. Ingham. It is also interesting to note that the Tauberian theorem of Wiener-Ikehara-Ingham allows one to deduce the prime number theorem with only a single minimal assumption, namely, that the Riemann Zeta function ζ(s) is nonzero for any complex number s such that ℜ(s) = 1; hence, the prime number theorem and the non-vanishing of ζ(s) for ℜ(s) = 1 are logically equivalent ([29]). As the non-vanishing of ζ(s) for ℜ(s) = 1 is an entirely complex-analytic property of ζ(s), this result led many prominent mathematicians (such as G.H. Hardy) to believe that the prime number theorem could not be deduced without using (either implicitly 21 or explicitly) the theory of complex variables. In fact, Hardy was so sure that the Prime Number Theorem could not be proved without the theory of complex functions that he famously stated that if such an elementary proof could be found, then all of our number theory textbooks would have to be taken off of the shelves and rewritten. It therefore came as quite a surprise when, in 1948, P. Erdös and A. Selberg succeeded in proving the prime number theorem without the use of the Riemann Zeta function or the properties of complex functions. Their proof, while ”elementary” in the sense that it does not require the use of complex function theory, is not lacking in subtlety; hence, it would be unwise to confuse ”elementary” with ”simple” (for the motivated reader, the proof may be found in [7]). Around the same time that Tauber, Hardy, and Littlewood were conducting their investigations into Tauberian theorems, the Finnish mathematician H. Mellin began considering an integral transform now referred to as the Mellin transform. Note that if we are given a convergent Dirichlet series D(s) = ∞ ∑ dn n=1 ns , and denote S(x) = ∑ dn , n≤x where S(x) = O(x α ) then we may interpret D(s), for ℜ(s) > α as the Stieltjes integral: ∫ D(s) = 1 ∞ dS(t) =s ts ∫ 1 ∞ S(t) , t s+1 (1–11) obtained by using integration by parts (and using the fact that ℜ(s) > α). Identity (1-11) can also be viewed as a special case of the following much older formula due to Abel, and which is typically referred to as the Abel summation formula. If 22 SN = N ∑ an bn n=0 and Bn = n ∑ bk k=0 then SN = aN BN − N−1 ∑ Bn (an+1 − an ) . (1–12) n=0 It is interesting to note that (1-11) can be deduced from (1-12). The integral in (1-11), however, is now a Mellin transform, and it is an astoundingly fortunate property that many Dirichlet series can easily be viewed as the Mellin transform of certain interesting arithmetic functions, and it is even more fortunate that the theory of the Mellin transform has achieved such a level of maturity in our modern age (being a generalization of the work of several famous mathematicians). As early as 1744 the Swiss mathematician Leonard Euler considered (in a rather un-rigorous manner) what was essentially a Mellin transform, and his work was expanded upon by Joseph Louis Lagrange (a great admirer of Euler’s). The modern theory of integral transforms began in 1785 with the French mathematician and physicist Pierre-Simon marquis de Laplace, who gave the theory a more solid theoretical basis and made two important observations regarding integral transforms. Firstly, Laplace showed that by applying a specific Mellin transform to a given function (called the Laplace transform) one could deduce important properties about the derivatives of the transformed function, making the Laplace transform an important tool in the study of differential equations. Laplace’s second observation was that one could recover the original function from its Laplace transform by applying another integral transform, which for obvious reasons is called the inverse Laplace transform. Another important contributor to the theory of such inversion 23 techniques was the French scientist Jean Baptiste Joseph Fourier, who’s most popular publication Théorie analytique de la chaleur appeared in 1822. The key observation of Laplace, Fourier, and others was that (subject to certain conditions on the function being transformed) one could apply an integral transform to a function, deduce useful properties about that function, and then invert the procedure to recover the original function; thereby elucidating new properties about the function in question. It is also interesting to note that this technique was developed largely to answer questions in heat conduction, wave propagation, celestial mechanics, and probability. Perhaps owing to the emphasis of integral transforms in the applied sciences, Mellin was less interested in the number theoretic ramifications of his work than the function theoretic implications of his inversion procedure. While the application of this inversion procedure had essentially already been used by Hadamard and de la Vallée-Poussin in their proofs of the Prime Number Theorem, It was not until the 1908 paper [19] when the German mathematician O. Perron applied Mellin’s inversion procedure to a general Dirichlet series that very important arithmetic consequences of Mellin’s work could be appreciated. Perron succeeded in deriving a formula (now referred to as Perron’s formula) which equated the partial sums of the coefficients of a given Dirichlet series with the inverse Mellin transform of this Dirichlet series. More precisely, if a Dirichlet series given by D(s) = ∞ ∑ an n=1 ns is convergent for all s ∈ C such that ℜ(s) > σc , and κ > max(0, σc ), then ∫ κ+i∞ 1 S (x) = D(s)x s s −1 ds, (1–13) 2πi κ−i∞ ∑ ∑ 1 where x > 0 and S ∗ (x) = an if x ∈ R − N, while S ∗ (x) = an + ax if x ∈ N. For 2 n<x n≤x ∗ more on the derivation of Perron’s formula see Theorem 3.1.7. 24 Formula (1-13), which we will use throughout the thesis, is a remarkably useful tool for deriving results in analytic number theory, and is one of the most useful Abelian theorems currently known (see Chapter II.2 in [29]). In fact, the formula of Perron would easily supersede most Tauberian theorems if it weren’t for the fact that, in general, it is a rather difficult task to evaluate the inverse Mellin transform of a Dirichlet series. In 1953 the Indian mathematician L.G. Sathe succeeded in deriving an asymptotic formula for the number of integers n ≤ x whose distinct prime factors were equal to k, denoted πk (x). While other mathematicians (such as E. Landau) were capable of deriving this asymptotic for fixed k, Sathe demonstrated that this result (equation (2-32)) held uniformly in k. Sathe’s original proof of this result uses the principal of mathematical induction, and is very involved (see [24]). As an example, Sathe’s original proof spanned over 70 pages and was in fact so involved that one year later, in 1954, Sathe saw it fit to publish a simplified account of his original proof which, although shorter than the original, still involves 54 pages of very difficult mathematics. In the words of the highly influential Norwegian mathematician Atle Selberg (who in fact was the referee of Sathe’s second paper), ”While the results of Sathe’s paper are very beautiful and highly interesting, the way the author has proceeded in order to prove these results is a rather complicated and involved one, and this by necessity since the proof by induction...presents overwhelming difficulties in keeping track of the estimates of the remainder terms...” It is nevertheless to Sathe’s credit that he could derive his results inductively, however complicated his arguments may have been. Selberg noted that Sathe’s results could be derived, and expanded upon, by attacking the problem from a more classical approach (i.e. using Mellin’s inversion theorem, which does not require the use of Sathe’s complicated inductive argument, to be discussed in section 2.3). In a somewhat unorthodox moment, Selberg authored a short note, [26], on Sathe’s paper which appeared in the same issue of the Journal of the Indian Mathematical Society as Sathe’s second paper (and recall that Selberg 25 was the referee for Sathe’s paper!). Selberg successfully approximated the partial sums of the coefficients of Dirichlet series of the type F (s) = G (s; z)ζ(s)z , where ζ(s) is the Riemann Zeta function, ℜ(s) > 1, z ∈ C is an arbitrary complex number, and G (s; z) is a function which is analytic in ℜ(s) > 1/2 satisfying rather modest growth conditions (see Chapter II.5 of [29], or the discussion of the Sathe-Selberg technique in Section 2.3); moreover, these results either reproved or expanded upon all of Sathe’s theorems. Sadly, perhaps owing to the length of his own papers, or the brevity of Selberg’s arguments, Sathe never published another paper in the field of mathematics. Essentially, the Sathe-Selberg method allows us to treat Dirichlet series with singularities of the form (s − 1)z , z ∈ C. However, it is often the case in analytic number theory that we must consider singularities of a different type, such as logarithmic 1 singularities of the form: log( s−1 ). In 1954 the French mathematician H. Delange expanded upon the work of Sathe and Selberg to provide a theorem which applied to all Dirichlet series satisfying modest conditions and possessing singularities of the 1 ), where ω ∈ R and k ∈ Z with k ≥ 0 (See note 5.1 at the type (s − 1)−ω logk ( s−1 end of Chapter II.5 in [29]). As a result of the combined efforts of Sathe, Selberg, and Delange, the method utilized to derive results of this type is typically referred to as the Sathe-Selberg or Selberg-Delange method (see Chapter II.5 The Selberg-Delange method in [29]). In Theorem 2.2.1 of Chapter II we will state a major generalization of the Wiener-Ikehara-Ingham theorem first proved by Delange in 1954. In many ways this result can be viewed as the pinnacle of known Tauberian theorems, as it applies in an astoundingly general setting (that is, for sufficiently regular Dirichlet series possessing singularities which are both monomial and logarithmic). In Chapters I and II it will further be demonstrated that most of our results follow from simple applications of the Selberg-Delange method in the guise of Theorem 2.2.1. At the same time that G.H. Hardy and J.E. Littlewood were conducting their research in analytic number theory, the largely self-taught Indian mathematician S. 26 Ramanujan drafted a letter to Hardy in an attempt to gain recognition of his work. Hardy, so overwhelmed with the brilliance of Ramanujan’s work, arranged for the young mathematician to come work with him at Cambridge. This began a significant collaboration which inspired, amongst other things, the study of additive arithmetic functions. Recall that an arithmetic function f (n) is multiplicative if n = ab with a and b relatively prime implies that f (n) = f (ab) = f (a)f (b). An additive function is very similar, if n = ab with a and b relatively prime, then an arithmetic function is additive if f (n) = f (ab) = f (a) + f (b). One example of an additive function is log(n). A more interesting number theoretic example is the function ω(n), which is used to denote the number of distinct prime divisors of an integer n; similarly, the number of prime power divisors of an integer n, denoted Ω(n), is also additive. Hardy and Ramanujan proved that the ratio ∑ n≤x ω(n) = 1, lim ∑ x→∞ n≤x Ω(n) and that (as will be discussed in section 2.3) the averages of both ω(n) and Ω(n) for n ≤ x have order of magnitude log log(x) (see [11] or [29]). This in itself is a rather surprising result, as it states that on average most prime factors occur square-free. This result was also one of the first results concerning the average value of an additive arithmetic function. Note that ∑ ω(n) = 1 p|n and that Ω(n) = ∑ p α ||n 27 α, where p α ||n implies that p α |n but p α+1 does not divide n. With these definitions it is only natural to ask about the similar functions A∗ (n) = ∑ p p|n and A(n) = ∑ αp, p α ||n which are the sum of the distinct prime divisors of n and the sum of the prime divisors of n weighted according to multiplicity, respectively. These functions, just like ω(n) and Ω(n), are also additive and are intimately connected with these functions. It is therefore some surprise that it took over 50 years from the time of Hardy and Ramanujan’s work before these functions were studied in earnest. The theory of these functions is also the story of a collaboration between an eminent European and a young Indian mathematician, and as these functions have a central role in the following thesis we will go into some detail about this collaboration, and its ultimate results. The prolific Hungarian mathematician P. Erdős visited Calcutta in 1974 where he met the theoretical physicist and educator Alladi Ramakrishnan. Ramakrishnan’s son, Krishnaswami Alladi, had been investigating the functions A(n) and A∗ (n) (defined below as the Alladi-Erdős functions) independently and had obtained several interesting results, as well as raising many deeper questions which he was unable to answer. Erdős, ever eager to collaborate with young mathematicians, rerouted his flight from Calcutta to Australia to stop in Madras so that he could visit the young K. Alladi, who was at the time still an undergraduate. After their discussions, many of which were conducted while walking along the beach, Erdős and Alladi published several papers ∑ ∑ which proved, amongst other things, that n≤x A(n) and n≤x A∗ (n) both have order of magnitude π2 x 2 , 12 log(x) [2] (the Alladi-Erdős collaborative encounter is also nicely recounted 28 in Chapter 1 of Bruce Schechter’s biography on Erdős entitled: My Brain is Open, [25]). Note that the results of Alladi and Erdős further validate the observations from the Hardy-Ramanujan Theorem that most integers do not have large prime power divisors. Alladi and Erdős’s work also concerned itself with the largest prime factors of an integer. Let P1 (n) be the largest prime factor of n, then clearly ∑ P1 (n) ≤ ∑ n≤x n≤x however, it is a fact (first shown by Alladi and Erdős) that of magnitude π2 x 2 , 12 log(x) (1–14) A(n); ∑ n≤x P1 (n) also has order which is somewhat surprising. In fact, this result shows that the majority of the value of A(n) is accounted for by P1 (n) (see [2]). In a later paper (see [3]) Alladi and Erdős proceed to evaluate the sum ∑ (A(n) − P1 (n) − ... − Pk−1 (n)) , n≤x and (rather surprisingly) demonstrate that: ∑ lim x→∞ n≤x (A(n) − P1 (n) − ... − Pk−1 (n)) ∑ = 1. (1–15) Pk (n) n≤x This result states that not only is the majority of the contribution of A(n) accounted for by P1 (n), but the majority of the contribution of A(n) − P1 (n) is accounted for by P2 (n), i.e. the second largest prime divisor, and so forth. Another useful outcome from the Alladi-Erdős collaboration, was that Erdős drew Alladi’s attention to the function Ψ(x, y ), which is defined as the number of positive integers n ≤ x whose prime factors are smaller than y , i.e. P1 (n) ≤ y . This function, whose first published reference can be found in the work of R. Rankin to investigate the differences between consecutive prime numbers, has been studied extensively by many mathematicians. It plays an important role in the Alladi-Erdős papers, and we 29 will have reason to refer to its basic properties in this thesis as well. While the original study of the function Ψ(x, y ) is typically attributed to the paper [23] by Rankin (with contemporaneous investigations being conducted by A. Buchstab [6], K. Dickman [10] and V. Ramaswami [22]) it is interesting that the study of Ψ(x, y ) (and related functions) actually appears in the much earlier work of S. Ramanujan. As the story surrounding this area of Ramanujan’s research is particularly fascinating, we will take a brief pause to appreciate his life and the rediscovery of his work related to Ψ(x, y ). As was noted previously, Ramanujan was a largely self-taught Indian mathematician who collaborated later in life with G.H. Hardy. Ramanujan had a tremendous love for mathematics and would not only study it in his free time, but would also go on to derive numerous identities which he recorded in his personal papers. Unfortunately, Ramanujan was very poor, and his poverty made paper a precious commodity, so as a result Ramanujan would often simply write down a formula without any reference to how he had arrived at the result or (more importantly) how one could prove his claim. Ramanujan filled his notes with literally thousands of identities which run the gamut from simple geometric observations (which can be proven in a page or two) to absurdly complicated identities whose proof requires hundreds of pages to establish. Sadly, perhaps due to overwork, living in the foreign climate of Cambridge, England, or as a result of general ill-health (typically attributed to amoebic dysentery, malnutrition, or tuberculosis) Ramanujan died in 1920 at the age of 32. After Ramanujan passed away, many of his personal papers were given to the University of Madras by his wife, and the University passed a large number of Ramanujan’s notebooks to his friend and collaborator G.H. Hardy. After Hardy received Ramanujan’s papers, he and his contemporaries spent years attempting to prove Ramanujan’s claims, and after Hardy’s death in 1947, many other mathematicians resumed the work of proving Ramanujan’s identities. Most of the posthumously published work of Ramanujan was already proven and well-known to 30 mathematicians by 1976, when the fabled ”lost notebook” of Ramanujan was discovered by the American mathematician G. Andrews at the Wren Library at Trinity College, Cambridge. Apparently, sometime between 1934 and 1947 G.H. Hardy handed a number of Ramanujan’s manuscript pages to the English mathematician G.N. Watson, who lost these pages in the sea of cluttered mathematical papers which occupied his office. After Watson’s death, these pages were saved from incineration (but not obscurity) by the mathematicians J.M. Whittaker and R. Rankin who had them stored in the Wren Library. They remained there until Andrews caused a sensation in the mathematical world when he located them in 1976. Andrews’ account of the story is somewhat less apocryphal, and definitely less swashbuckling; apparently he had a good idea of where to look for these lost notes in the library, although he was not certain about their content or exact location. Amongst the many topics studied by Ramanujan during the last years of his life was the function Ψ(x, y ), as well as a related function called the Dickman Function. Ramanujan managed to deduce many results concerning these functions decades before the individuals for whom they are named, and were it not for the misplacement of his papers may be named for him. Ramanujan’s mathematical abilities were truly amazing, and of the 1600-or so identities contained in his lost notebook, its editor B. Burnt speculates that less than five are incorrect. As an example of this self-taught individual’s insight, in 1930 K. Dickman re-derived a function related to Ψ(x, y ) called the log(x) , x ≥ y ≥ 2. The Dickman function is Dickman function, denoted ρ(u) where u = log(y ) related to Ψ(x, y ) by the fact that uniformly for x ≥ y ≥ 2 we have Ψ(x, x 1/u ) ∼ xρ(u) and, in addition to this important connection, ρ(u) satisfies some rather surprising identities. For instance, ρ(u) is continuous at u = 1, differentiable for u > 1, and for u > 1 satisfies the difference-differential equation 31 uρ′ (u) + ρ(u − 1) = 0, which were all known to de Bruijn [9]. De Bruijn used the difference differential equation to demonstrate that ρ(u) decreases very rapidly (see section 3.2 for explicit bounds). Unknown to anyone, many of the results related to the Dickman function had already been deduced by Ramanujan almost 20-years earlier (for more details concerning the re-discovery of these interesting identities see [27]). Seven years after Dickman published his work A. Buchstab derived a functional equation for Ψ(x, y ), and two decades after Buchstab’s work, N.G. de Bruijn succeeded in significantly improving on the results of Rankin, Ramaswami, and Dickman concerning this function. For this reason Ψ(x, y ) is often referred to as the Buchstab-de Bruijn function, as opposed to the Ramanujan (or some other more appropriately named) function. Around the time that Buchstab, Rankin, Dickman, and de Bruijn were investigating Ψ(x, y ) (a period from about 1930-1960) the world entered what may be termed the computer age. Computing machines began to outpace human calculating skills to the point that in a very short period of time mathematicians had access to amounts of data which were far in excess of what their predecessors had known (or even could have known). In the present, most of us use the internet, a personal computer, or some sort of modular computing device on a daily basis, and it would be difficult for us to imagine a world where digital communication did not exist. This relatively cheap, reliable, and instantaneous form of communication has one major problem, simply put, it is difficult to determine when the message is sent in a secure fashion. To solve this problem, many computer programmers and mathematicians have developed sophisticated means of encrypting messages so that only the intended recipient can read the transmitted message. These encryption techniques have become very sophisticated; however, most rely upon one simple principle, namely, that it takes a relatively short amount of time for a computer to multiply a given set of integers but it takes an extraordinarily large amount 32 of time to factor an integer into prime numbers. For example, some numbers used in computer encryption take only fractions of a second to multiply, but require billions of years using our best known factoring algorithms and supercomputers to factor into prime numbers. Hence, numerical factorization is a very important application of mathematics to our modern world. In 1976, while Alladi and Erdős were conducting their theoretical investigations into the functions A(n) and A∗ (n), D. Knuth and L. T. Pardo published a paper [14] where they investigated the running time of what is perhaps the simplest way to factor an integer algorithmically, namely, the algorithmic procedure described in the first paragraph of this introduction (and we will call this method the Knuth-Pardo factorization algorithm). This familiar procedure for factoring an integer is taught to most of us in pre-calculus (albeit not in an algorithmic fashion) and is a very intuitive way to factor an integer. However, it does raise the question: how fast is this procedure? Knuth and Pardo essentially answer this question in [14]. The connection between Ψ(x, y ) and integer factorization algorithms is quite transparent, if we could guarantee that an integer n ≤ x had all prime factors less than or equal to y , i.e. P1 (n) ≤ y ≤ x, then we need only check for prime divisors of n which are less than or equal to y ≤ x (which would obviously decrease the number of trial divisions, and hence the efficiency, of the Knuth-Pardo algorithm, provided this y is appreciably smaller than x of course). We will have much more to say about the connection between Ψ(x, y ) and the running time of the Knuth-Pardo algorithm in the body of the thesis. A similar observation holds if we knew the largest prime factor P1 (n) of an integer n, then we could simply apply the algorithm to m = n/P1 (n) ≤ n, which can now be factored faster using the Knuth Pardo algorithm (as m ≤ n). By letting Pk (n) denote the kth largest prime factor of an integer n, it is fairly clear that we could decrease the running time of the Knuth-Pardo algorithm even further if we had knowledge of not only P1 (n), but also Pk (n), for k ≥ 2. For this reason Knuth and 33 Pardo investigate the mean and variance of Pk (n) in their paper theoretically, as well as supplying extensive tables of how this effects the running time of their algorithm (see [14]). 34 CHAPTER 2 THE DISTRIBUTION OF PRIMES AND PRIME FACTORS 2.1 Notation and Preliminary Observations We begin by proving the classical result that there exist infinitely many primes in a perhaps unfamiliar fashion. The following proof, due to Paul Erdős [8], is particularly ∑1 beautiful in that it not only shows the stronger result that diverges, but uses the p underlying structure (or anatomy) of the integers to do so. ∑1 Theorem 2.1.1. diverges. p p Proof (Erdős): Let p1 , p2 , ... be the sequence of primes, listed in increasing order. ∑1 Assume, by way of contradiction, that converges. Then there must exist a natural p p number k such that ∑ 1 1 < . pi 2 i≥k+1 Call the primes p1 , p2 , ..., pk the small primes and pk+1 , pk+2 , ... the large primes. Let N be an arbitrary natural number. Then ∑ [N ] N < . p 2 i i≥k+1 Now, let Nb be the number of positive integers m ≤ N which are divisible by at least one big prime, and let NS be the number of positive integers n ≤ N which have only small prime divisors. First we estimate NB by noting that [ pNi ] counts the positive integers n ≤ N which are multiples of pi . Then Nb ≤ ∑ N N < . pi 2 i≥k+1 Now we estimate Ns by noting that every integer n ≤ N with only small prime factors may be written in the form n = an bn2 , where an is square-free. Every an must therefore be 35 the product of different small primes; hence, there are precisely 2k different square-free √ √ √ parts. Furthermore, as bn ≤ n ≤ N, we see that there must be at most N different square parts, and so √ Ns ≤ 2k N; therefore, N = Ns + Nb < √ N + 2k N, 2 √ which holds for any N. Letting, say, N = 22k+2 , implies that 2k 22k+2 = 22k+1 = N . 2 Then for this choice of N N = Ns + Nb < N N + = N, 2 2 which is the desired contradiction. Let ζ(s) be the Riemann-zeta function; as was alluded to in the introduction, this function plays a central role in analytic number theory. We will state several of the more important results concerning ζ(s) which will allow us to deduce the later results of this paper. We begin with the formal definition: Definition 2.1.2. For ℜ(s) > 1 the Riemann Zeta function is defined as ∞ ∑ 1 ζ(s) = . ns n=1 The zeta function was first studied by Euler who derived the following product formula, referred to by some authors as an analytic form of the fundamental theorem of arithmetic. Euler noted that for s > 1 )−1 ∞ ∏( ∑ 1 1 , = 1− s s n p p n=1 36 (2–1) and in fact that identity holds for all complex s such that ℜ(s) > 1, [7]. This product formula is the key to understanding how one may deduce properties of the prime numbers from those of ζ(s). By simply taking the logarithm of ζ(s), for ℜ(s) > 1, and noting that the Taylor series expansion of − log(1 − x) is valid for x ∈ C such that |x| ≤ 1, we easily derive the following identity, log ζ(s) = − log ∏( p = 1 1− s p ) =− ∑ log(1 − p −s ) (2–2) p ∑ 1 ∑ 1 ∑ 1 + + ... = + h(s), ps 2p 2s ps p p p where h(s) is bounded for ℜ(s) > 1 , 2 and identity (2-2) holds for ℜ(s) > 1. Hence, the logarithm of the zeta function essentially gives us the sum of the reciprocals of the primes, raised to the s power. The key to unraveling the asymptotic distribution of the prime numbers will be to apply some sort of inversion technique to log ζ(s); ideally we would like to evaluate the ∑ 1 finite sums at s = 1, but this is not in the domain of absolute convergence of s p p≤x log ζ(s) so we cannot simply set s = 1 in the above identity and conclude any meaningful result. Instead, motivated by the applications of Fourier inversion, we may consider ζ(s) to be a function of a complex variable s ∈ C, then applying a version of Fourier’s inversion theorem called the Mellin inversion theorem we may evaluate finite sums such ∑ as 1. This sum is of such central importance to number theory that it warrants its p≤x own definition as a function: Definition 2.1.3. We call the sum π(x) = ∑ 1 p≤x the prime counting function, that is, π(x) is the number of primes less than or equal to x, x ∈ R. Furthermore, define 37 Π(x) = ∞ ∑ π(x 1/n ) n n=1 to be the special prime counting function. Note that as lim x 1/n = 1, and π(1) = 0 the special prime counting function is actually a n→∞ finite sum. It has been discovered in the personal papers of Euler, who was the first to note the connection between the prime numbers and the zeta function, that he attempted to extend ζ(s) to be a function of a complex number s. Euler, however, was only partially successful in this attempt ([7]). As an interesting historical anecdote, it is now known that the Euler-Maclaurin summation formula (which Euler was certainly aware of) can be used to analytically continue the function ζ(s) to the entire complex plane (see [30]); unfortunately, this marks one of those rare instances when Euler did not exhaust his mathematical talents to attack the problem he was addressing. However, a straightforward application of Stieltjes integration will allow us to analytically continue ζ(s) beyond the line ℜ(s) = 1. Consider the integral representation ∫ ∞ ζ(s) = 1− d[t] =s ts ∫ 1 ∞ [t] dt. t s+1 Upon replacing [t] = t − {t} we then obtain ∫ s 1 ∞ [t] dt = s t s+1 ∫ 1 ∞ ( {t} 1 − s+1 s t t ) s dt = −s s −1 ∫ 1 ∞ {t} dt t s+1 (2–3) and as {x} is bounded, it can easily be seen that the final integral in (2-3) is bounded for ℜ(s) > 0. Therefore, (2-3) gives the analytic continuation of the function ζ(s) to the region ℜ(s) > 0, s ̸= 1 [30]. About 50 years after Euler’s time, Riemann succeeded in analytically continuing ζ(s), s ̸= 1, to the entire complex plane, and in his seminal memoir supplied several proofs of this continuation. Although it will not be necessary for 38 the purposes of this paper, it is interesting to see these complex analytic properties of ζ(s). If we let Γ(s) be the gamma function of Euler and set ξ(s) = (s ) 1 s(s − 1)π −s/2 Γ ζ(s), 2 2 (2–4) then Riemann demonstrated that ξ(s) = ξ(1 − s), (2–5) and thus from (2-4) and (2-5) ζ(s) is defined for all complex numbers s, such that ℜ(s) < 0 and ℜ(s) > 1. Equation (2-3) supplies the analytic continuation for the remaining complex numbers s such that 0 ≤ ℜ(s) ≤ 1, s ̸= 1; these complex numbers constitute what is referred to as the critical strip of the function ζ(s). For several proofs of (2-3), (2-4), and (2-5) consult [29], [7], and [30]. Riemann also stated, without proof, that the function ζ(s) satisfied an infinite product over the zeros of the zeta function. Although he was correct, this product representation was not proved rigorously until several decades later as a consequence of Hadamard’s proof of his more general factorization theorem, and for this reason identity (2-6) is typically referred to as the Hadamard product. For all s ̸= 1, there exists a constant b such that: ) ∏( s e bs 1− ζ(s) = e s/ρ 2(s − 1)Γ( 2s + 1) ρ ρ (2–6) where the ρ are complex zeros of ζ(s), i.e. the zeros of ζ(s) within the critical strip. It follows from (2-3), Riemann’s functional equation (2-5), and the properties of Γ(s) that (s −1)ζ(s) is an analytic function for all s ∈ C; hence, ζ(s) is a meromorphic function in C with a sole singularity at s = 1. However, in order to answer questions concerning the prime numbers, we must consider the principal branch of the function log ζ(s), which by the above observations is now a function of the complex variable s, with a branch 39 point singularity extending to the left of 1 in a zero-free region of ζ(s). Furthermore, by applying Stieltjes integration we may derive the useful identity for ℜ(s) > 1: ∫ ∞ log ζ(s) = s 2 Π(x) dx, x s+1 (2–7) which justifies the need to define two different prime counting functions, as Π(x) is more easily handled using analytic techniques than π(x). The two functions are nevertheless very close to one another, as |Π(x) − π(x)| ≤ √ x logA (x) for some A ≥ 1. Given that π(x) is of the order of magnitude of x/ log(x), this difference is actually quite small. One difficulty with log ζ(s) as a complex function is that it may have other singularities, which will correspond to the zeros of ζ(s). For this and other reasons, the zeros of ζ(s) are of great interest to number theorists. The zeros of the Riemann Zeta function are among the most intriguing objects in all of mathematics; however, for the purpose of proving the prime number theorem we need only show that ζ(s) ̸= 0 for ℜ(s) = 1. The following theorem, first proved by de la Vallée-Poussin, while very interesting is therefore more than is required for the proof of the Prime Number Theorem. This result, when ζ ′ (s) , will be sufficient to prove a much stronger combined with suitable bounds for ζ(s) form of the prime number theorem. The next identity follows from taking the logarithmic derivative of equation (2.1.6), and is valid for all s ∈ C where s ̸= 1, −2n − 2 for n ∈ Z+ , or ρ (ρ a complex zero of ζ(s)) 1 1 Γ′ ( 2s + 1) ∑ ζ ′ (s) = log(2π) − − + ζ(s) s − 1 2 Γ( 2s + 1) ρ 40 ( 1 1 + s −ρ ρ ) (2–8) It is clear that equation (2-8) holds for ℜ(s) > 1, that it holds for a general complex s ̸= 1, −2n − 2, ρ follows from Mittag-Leffler’s theorem. The restriction that s ̸= −2n − 2 comes from the poles of the logarithmic derivative of the Gamma function. Theorem 2.1.4. Let s = σ + i τ with σ, τ ∈ R. Then there exists an absolute constant c c > 0 such that ζ(s) has no zeros in the region σ ≥ 1 − . log(|τ | + 2) Proof: For σ > 1 we have ( ℜ ζ ′ (s) ζ(s) ) = ∑ log(p) p mσ p,m cos(mt log(p)). Hence, for σ > 1 and any real γ, ζ ′ (s) −3 − 4ℜ ζ(s) = ∑ log(p) p,m p mσ ( ζ ′ (σ + i γ) ζ(σ + i γ) ) ( −ℜ ζ ′ (σ + 2i γ ζ(σ + 2i γ) ) (2–9) [3 + 4 cos(mγ log(p)) + cos(2mγ log(p))] and by the simple trigonometric identity 3 + 4 cos θ + cos 2θ = 2(1 + cos θ)2 ≥ 0, it follows that (2-9) is greater than or equal to 0. Now − ζ ′ (s) 1 < + O(1). ζ(s) σ−1 (2–10) and by (2-8) ∑ ζ ′ (s) − = O(log(t)) − ζ(s) ρ ( 1 1 + s −ρ ρ ) , where ρ = β + i γ runs through the complex zeros of ζ(s). Hence, ( −ℜ ζ ′ (s) ζ(s) ) = O(log(t)) − ∑( ρ β σ−β + 2 2 2 (σ − β) + (t − γ) β + γ2 Now, as every term in the last sum is positive, it follows that 41 ) . ( −ℜ ζ ′ (s) ζ(s) ) (2–11) << log(t) and, if β + i γ is a particular zero of ζ(s), then ( −ℜ ζ ′ (σ + i γ) ζ(σ + i γ) ) < O(log γ) − 1 . σ−β (2–12) Now, from the comments following (2-9), (2-10), (2-11), and (2-12) we may obtain the inequality 3 4 − + O(log γ) ≥ 0, σ−1 σ−β or say 3 4 − ≥ −c1 log γ. σ−1 σ−β Solving for β yields 1−β ≥ 1 − (σ − 1)c1 log γ . 3 + c1 log γ σ−1 Note that the right hand side is positive if σ − 1 = 1−β ≥ c1 , and thus 2 log γ c2 , log γ which is the desired result. In particular, Theorem 2.1.4 implies that log ζ(s) is analytic in a region extending beyond the half-plane ℜ(s) ≥ 1, while avoiding the singularity at s = 1. Although one can deduce the prime number theorem from these properties alone, the function log ζ(s) will contain a branch point singularity at s = 1 as well as at every ρ for which ζ(ρ) = 0. In general, it is more difficult, analytically, to handle branch point singularities than it is to analyze a function without such singularities, such as a function with removable 42 singularities (although in this case Theorem 2.2.1 actually applies if we are dealing with a logarithmic or monomial singularity). Therefore, following the classical approach to proving the Prime Number Theorem, we may avoid this difficulty by differentiating ζ ′ (s) log ζ(s) to obtain , which is now a single valued function whose poles are all ζ(s) removable singularities (as can be seen by equation (2-8)), and hence a meromorphic function in C. This motivates the following definition: Definition 2.1.5. Let n be an integer, n = p α , where p is a prime number, then define Λ(n) = log(p); if an integer n is not the power of a prime then Λ(n) = 0. The function ∑ Λ(n), called the Λ(n) is called the von Mangoldt function. Furthermore, define ψ(x) = n≤x Tchebyschev function. The motivation for these definitions becomes apparent when we consider the Dirichlet series generated by Λ(n), that is, for ℜ(s) > 1, ∞ ∑ Λ(n) n=1 ns =− ζ ′ (s) ζ(s) (2–13) which is simply the negative of the logarithmic derivative of ζ(s). The von Mangoldt function is closely related to the function Π(x) as Π(x) = ∑ Λ(n) , log(n) n≤x (2–14) and applying formula (1-10) (the Abel summation formula) gives the identity ψ(x) + Π(x) = log(x) ∫ x 2 ψ(t) dt t log2 (t) (2–15) relating the Tchebyschev function ψ(x) to Π(x). Note that from equation (2-15) the ratio ψ(x) Π(x) log(x) approaches a limit as x → ∞ if and only if approaches the same limit. x x By a theorem of Tchebyschev’s (mentioned in the introduction) if this limit exists it must equal 1. The Prime Number Theorem is the statement that this limit does in fact exist; 43 however, rather than dealing with the function Π(x) directly we will prove this statement in the elementarily equivalent form ψ(x) = 1. x→∞ x (2–16) lim It will be useful to apply Abel summation to the Dirichlet series generated by Λ(n) to obtain the following equalities which we will use in the proof of the Prime Number Theorem, and which hold for all complex s such that ℜ(s) > 1: ζ ′ (s) ∑ Λ(n) − = =s ζ(s) ns n=1 ∞ ∫ 1 ∞ ψ(t) dt; t s+1 (2–17) furthermore, by Theorem 2.1.4 this function may be analytically continued to a neighborhood of any point on the real line ℜ(s) = 1, s ̸= 1 (as the ζ(s) in the denominator will be nonzero). 2.2 The Prime Number Theorem First I will remind the reader of Landau’s big-O and little-o notation. Let g(x) and h(x) > 0 be two real valued functions. If there exists some positive constant C such that for all sufficiently large x |g(x)| ≤ Ch(x) then we say that g(x) = O(h(x)), or g(x) << h(x). Similarly, if g(x) =0 x→∞ h(x) lim then we say that g(x) = o(h(x)). If lim x→∞ g(x) =1 h(x) then we say that g(x) and h(x) are asymptotically equal, and denote this by g(x) ∼ h(x). Finally, if there exist constants c1 and c2 such that 44 c1 h(x) ≤ g(x) ≤ c2 h(x) then we write g(x) ≍ h(x). We now require only one additional tool to prove the Prime Number Theorem, and ζ ′ (s) that is an upper bound on the function − in the zero-free region of ζ(s) supplied by ζ(s) Theorem 2.1.4. Equation (2.1.11) gives a satisfactory upper bound for the real part of this function; however, it is a fact (which will be necessary to obtain later estimates) that c for s = σ + it where σ ≥ 1 − and t ≥ 3: log(t) − ζ ′ (s) = O(log(t)), ζ(s) (2–18) and in fact this result can be improved to O(log3/4 (t) log log3/4 (t)) by using the methods of I.M. Vinogradov, [30]. We may now proceed to prove the Prime Number Theorem. The first proof of the ”strong form” of the Prime Number Theorem (that is, a proof of the Prime Number Theorem with a nontrivial error term) was supplied by the Baron de la Vallée-Poussin. This classical proof also closely parallels the proof of Theorem 3.1.7 in Section 3.1 (although Theorem 3.1.7 has a branch point singularity, which requires greater care). As this proof is so well known we omit some details, however a full justification of any comments can be found in [4], [7], [29], or [30]. Theorem 2.2.1. (de la Vallée-Poussin) There exists a constant h > 0 such that √ ψ(x) = x + O(xe −h log(x) ) Proof: Recall that Perron’s formula (equation (1-13)) gives for κ > 1: 1 ψ(x) = 2πi ∫ κ+i∞ − κ−i∞ ζ ′ (s) x s ds. ζ(s) s c We now deform the straight line contour as follows: fix T and let a = 1 + log(T ) c and b = 1 − , with c the constant in Theorem 2.1.4. Then deform the path of log(T ) integration between a − iT and a + iT to go horizontally from a − iT to b − iT , vertically 45 from b − iT to b + iT and horizontally from b + iT to a + iT . Taking into account the ζ ′ (s) simple pole at s = 1 with residue 1 of − , we obtain: ζ(s) 1 2πi ∫ a+iT a−iT 1 ζ ′ (s) x s − ds = x + ζ(s) s 2πi (∫ ∫ b−iT + a−iT ∫ b+iT a+iT ) + b−iT b+iT = x + I1 + I2 + I3 ; consequently, 1 ψ(x) = x + 2πi (∫ ∫ a−iT a+i∞ + a−i∞ a+iT ( ζ ′ (s) − ζ(s) )) + I1 + I2 + I3 = x + I4 + I5 + I1 + I2 + I3 = x + R(x). It may then be demonstrated using the bounds of equation (2-18) that the following estimates hold: ( 1 log(T ) I1 + I3 = O x T ) a I2 = O(x b log2 (T )) ( I4 + I5 = O xa T ( ) ) 1 1−a +x log(x) log(T ) a−1 with the dominant terms arising from (x a log(T ))/T and x b log2 (T ). To balance these effects we set: x a log(T ) = x b log2 (T ) T that is, 46 a−b = log(T ) . log(x) Solving for T we obtain for some constant c1 > 0: T = e c1 √ log(x) , thus R(x) = O(x b log(T )) = O(xe −c2 log = O(xe −h √ log(x) 1/2 (x)+c3 log log(x) ), ). Therefore, we obtain the estimate that for some h > 0, ψ(x) = x + O(xe −h log 1/2 (x) ), and in particular: lim x→∞ ψ(x) = 1, x x . log(x) Applying partial summation to the results of the above theorem we obtain the thus ψ(x) ∼ x, implying that π(x) ∼ immediate corollary: ∫ √ dt + O(xe −h log(x) ) 2 log(t) It is also worth noting that the above corollary, due originally to de la Vallée-Poussin, x Corollary 2.2.2. π(x) = was the best estimate for π(x) for over thirty years. It should be somewhat obvious from the above proof that the error term in the Prime Number Theorem is directly related to how far we may enlarge the zero-free region of ζ(s). Some notable improvements include those of Littlewood, who in 1922 demonstrated that there exists a k > 0 such that ζ(s) ̸= 0 for any s = σ + it where log log(|t| + 3) σ >1−k . This corresponds to an improved estimate of log(|t| + 3) 47 ∫ x π(x) = 2 √ dt + O(xe −α log(x) log log(x) ), log(t) the proof can be found in [30], although it is significantly deeper than Theorem 2.1.4. Still better estimates are known, although the results are very difficult to establish. The best estimate known to date (see [29]) is ∫ π(x) = 2 x dt 3/5 −1/5 + O(xe −β log (x) log log (x) ), log(t) and is obtained using the zero-free region of Korobov and Vinogradov. This discussion should give one the impression that improving the zero-free region of ζ(s) appears to be a very difficult problem. For example, it is still unknown whether there exists a single ϵ > 0 such that ζ(s) ̸= 0 for ℜ(s) > 1 − ϵ. It should also be noted that if θ is an upper bound for the real parts of the zeros of ζ(s) then ∫ π(x) = 2 x dt + O(x θ log(x)). log(t) However, this formula is worthless if θ = 1, and (as the above comments note) thus far every zero-free region which has been established for ζ(s) cannot guarantee that we may choose θ = 1 − ϵ for any ϵ > 0. Having demonstrated the proof of the Prime Number Theorem along classical lines, I will now state the 1954 theorem of Delange alluded to in the introduction. This result itself utilizes very careful estimates of ζ(s) which are much stronger than what is required to prove the Prime Number Theorem, therefore, Delange’s theorem is hardly the easiest way in which one could conclude the Prime Number Theorem from the properties of ζ(s). But, Delange’s theorem will allow us to evaluate the finite sums of a much broader class of functions than Λ(n)-especially additive functions. Delange was actually motivated to establish his general theorem in order to estimate the moments of 48 certain additive functions, an in doing so to give a proof of the Erdős-Kac theorem by the method of moments. However, we are going to use this theorem to study the moments of Λ(n). Of particular interest will be the function Λ2 (n) which arises in the Selberg asymptotic formula, used by Erdős and Selberg to give the first elementary proofs of the prime number theorem. It should also be noted that Delange’s theorem is an exceedingly deep result which relies upon complicated methods of contour integration, and can in many ways be viewed as a generalization of the Sathe-Selberg method (to be outlined in greater detail shortly). As Delange’s theorem applies to Dirichlet series which may have branch point singularities (at s=a, say) it is necessary to deform the path of integration beyond the line ℜ(s) = a in some zero-free region of the series and evaluate so-called Hankel contours. These contours are formed by the circle |s − a| = r excluding the point s = a − r , together with the line (δ, a − r ] (for some 0 < δ < a − r + ϵ) traced out twice, with respective arguments +π and −π. These contours are also intimately related to the function Γ(s) (see Chapter II.5 in [29] for more on this connection); furthermore, one may wish to consult Theorem 3.1.8 in Section 3.1, where we explicitly calculate the Hankel contour of a branch point singularity. While Delange’s theorem will be an indispensable tool for the remainder of this thesis, the reader should note that its general applicability follows from difficult properties (such as the evaluation of Hankel contours around branch-point singularities). We include this theorem to allow for greater continuity of the arguments in the following 1 as well theorems, as we will have to evaluate monomial singularities such as (s − 1)2 ( ) 1 as logarithmic singularities such as log . The evaluation of general monomial s −1 singularities was already known to Selberg; it is the evaluation of mixed monomial and logarithmic singularities which was Delange’s major contribution to the theory. Hence, we stress that this theorem is a highly nontrivial application of the Selberg-Delange 49 method (for a full proof, and a more thorough discussion of the Selberg-Delange method consult Chapters II.5 and II.7 of [29]). ∞ ∑ bn be a Dirichlet series with non-negative coefficients, Theorem 2.2.3. Let F (s) := s n n=1 converging for σ > a > 0. Suppose that F (s) is holomorphic at all points lying on the line σ = a other than s = a and that, in the neighborhood of this point and for σ > a, we have F (s) = (s − a) −ω−1 q ∑ ( gj (s) log j=0 j 1 s −a ) + g(s) where ω is some real number, gj (s) and g(s) are functions holomorphic at s = a, the number gq (a) being nonzero. If ω is a non-negative integer, then we have: B(x) := ∑ bn ∼ n≤x gq (a) x a logω (x) log logq (x) aΓ(ω + 1) where Γ(s) is Euler’s Gamma function. If ω = −m − 1 for some non-negative integer m and if q ≥ 1 then: B(x) ∼ (−1)m m! qgq (a) a −m−1 x log (x) log logq−1 (x) a Note that this theorem applies so long as the function is holomorphic at all points on the line σ = a, s ̸= a. It is a subtle point, but worth noting, that this does not imply that the function need be holomorphic in any open neighborhood of the entire line. For our purposes we will be most interested in applying the Selberg-Delange’s method to various manifestations of log ζ(s), and at present it is not known for any constant ϵ > 0 whether log ζ(s) is holomorphic in the region ℜ(s) = σ > 1 − ϵ, s ̸= 1; however, it can be shown that log ζ(s) is holomorphic at every point on the line σ = 1, s ̸= 1. With this theorem in hand, the prime number theorem follows easily from the known properties of ζ(s), the Riemann Zeta Function. As ζ(s) is absolutely convergent in σ > 1, has a simple pole at s = 1 with residue 1, and is holomorphic and nonzero for all points lying on the line σ = 1 for s ̸= 1, we may apply the Selberg-Delange’s 50 ζ ′ (s) , which is also an absolutely convergent Dirichlet series in σ > 1, has ζ(s) a simple pole at s = 1 with residue 1, has non-negative coefficients, and is holomorphic method to − for all points lying on the line σ = 1, s ̸= 1 (for a more detailed discussion of these facts see [7] or [29]). This immediately implies the prime number theorem in the form: ψ(x) = x + o(x). Of course, as was shown above, there is no need to appeal to such a deep theorem for such a result. Deriving the prime number theorem via the function ψ(x) is the classical approach to solving this problem, and was used by both Hadamard and de la Vallée Poussin in their original proofs of this theorem. In the following discussion, we will take a slightly different approach from this traditional way of proving the prime number theorem. This approach is implicit in the work of Hadamard, de la Vallée-Poussin, Selberg, and Delange (to name a few), although I have never seen the result stressed in the published literature. Let ψ2 (x) = ∑ Λ2 (n). If we define n≤x Π2 (x) = ∑ Λ2 (n) , 2 log (n) n≤x (2–19) then we may view the above sum as a Stieltjes integral, and applying the integration by parts formula yields, ∫ x Π2 (x) = 2 dψ2 (t) log2 (t) ψ2 (x) ψ2 (2) = − +2 2 log (x) log2 (2) ∫ x 2 ψ2 (t) dt t log3 (t) The second term in equation (2-20) is easily seen to be log2 (2) ψ2 (2) = = 1. log2 (2) log2 (2) By a classical result due to Tchebyschev ([7]) ψ(x) = O(x), implying that 51 (2–20) ψ2 (x) = ∑ Λ2 (n) ≤ log(x) n≤x ∑ Λ(n) = ψ(x) log(x) = O(x log(x)). n≤x Using Tchebyschev’s estimate we may conclude that the third term in equation (2-20) is thus ∫ 2 2 x ψ2 (t) dt = O t log3 (t) (∫ x =O 2 dt log2 (t) (∫ x t log(t) dt t log3 (t) 2 ) ( =O x log2 (x) ) (2–21) ) , ∫ x obtained by a further application of the integration by parts formula to 2 dt ; hence, log2 (t) taking into account (2-20) and the above estimates we may infer that: ψ2 (x) Π2 (x) = +O log2 (x) ( x log2 (x) ) ψ2 (x) +1= +o log2 (x) ( x log(x) ) . (2–22) Noting that the special prime counting function Π(x) is very close to Π2 (x) (as Π(x) − Π2 (x) ≤ √ x log(x) ) 4 then if we could show that ψ2 (x) = x log(x) + o(x log(x)) it would follow ( ) ( ) x x x x that Π2 (x) = +o and hence Π(x) = +o . log(x) log(x) log(x) log(x) ∞ ∑ ζ ′ (s) Λ(n) 1 We begin by differentiating =− =− + ... to obtain: s ζ(s) n s −1 n=1 ζ ′′ (s) − ζ(s) ( ζ ′ (s) ζ(s) )2 = ∞ ∑ Λ(n) log(n) ns n=1 = 1 + O(1). (s − 1)2 (2–23) + h(s) (2–24) Moreover, ∞ ∑ Λ(n) log(n) n=1 ns where h(s) is analytic in s ∈ C, ℜ(s) > = ∞ ∑ Λ2 (n) ns n=1 1 . 2 By applying partial summation to this Dirichlet series for ℜ(s) > 1 we find that: ζ ′′ (s) − ζ(s) ( ζ ′ (s) ζ(s) )2 ∫ ∞ =s 2 52 ψ2 (x) dx, x s+1 (2–25) ( ′ )2 ζ ′′ (s) ζ (s) and as the Dirichlet series − has non-negative coefficients, is ζ(s) ζ(s) holomorphic at all points lying on the line ℜ(s) = 1, s ̸= 1, with a sole singularity at s = 1 of multiplicity 2 and a coefficient of 1, we may invoke Theorem 2.2.3 (Delange’s Theorem) to conclude that: (2–26) ψ2 (x) = x log(x) + o(x log(x)), inserting equation (2-26) into equation (2-22) yields x Π2 (x) = +o log(x) ( x log(x) ) and therefore, x π(x) = +o log(x) ( x log(x) ) , which is the Prime Number Theorem. It is again stressed that we need not approach the problem using the Selberg-Delange method, as the proof of Theorem 2.2.1 shows that 1 , and this method is easily adapted to we can evaluate singularities of the type (s − 1) 1 the singularity in the above theorem. (s − 1)2 2.3 The Hardy-Ramanujan Theorem The functions ω(n) and Ω(n), which denote the number of distinct prime divisors of n and the number of distinct prime powers dividing n, respectively, were first studied by Hardy and Ramanujan in their 1917 paper [11]. Not only are they interesting additive functions in their own right, but they also motivated the study of probabilistic number theory and the Alladi-Erdős functions (which will be discussed in the next chapter). Hardy and Ramanujan demonstrated that ∑ ( ω(n) = x log log(x) + c1 x + O n≤x 53 x log(x) ) (2–27) and ∑ ( Ω(n) = x log log(x) + c2 x + O n≤x x log(x) ) (2–28) where c1 = 0.261497... and c2 = 1.0345653... [29]. All that is needed to prove (2-27) and (2-28) is the fact that ∑1 p≤x p ( = log log(x) + c1 + O 1 log(x) ) (2–29) ; however, we note that Delange’s Theorem yields (2-27) and (2-28) analytically. With this method of proof in mind, we will later introduce a companion function to Ω(n) and ω(n) which, when taken with a well-known result of Hardy and Ramanujan, will give the result with little effort. Before proceeding to the Hardy-Ramanujan Theorem, we require one additional result from [29]: ∑ (Ω(n) − ω(n)) = Cx. n≤x ∑ 1 ≈ 0.773156. p(p − 1) p ( ) ∑ 1 Theorem 2.3.1. ω(n) = x log log(x) + c1 x + O log(x) n≤x for some well-known constant C = Proof: The classical approach to this theorem is given by the simple observation that: ∑ n≤x ω(n) = ∑∑ n≤x p|n 1= ∑ 1= pm≤x ∑ [x ] p≤x p =x ∑1 p≤x p + O(π(x)) from equation (2-29), and by Tchebyschev’s estimate (or the Prime Number Theorem), ∑ ( ω(n) = x log log(x) + c1 x + o n≤x 54 x log(x) ) . Therefore, we obtain equation (2-27), and by the comments preceding this theorem we obtain equation (2-28). Let πk (x) denote the number of integers n ≤ x such that ω(n) = k, and let Nk (x) denote the number of integers n ≤ x such that Ω(n) = k. By using induction on k, Hardy and Ramanujan succeeded in proving that there exists a constant C such that: ( πk (x) = O x(log log(x) + C )k−1 (k − 1)! log(x) ) (2–30) holds uniformly in k. The essence of their proof is the observation that kπk+1 (x) ≤ ∑ √ p≤ x ( ) x πk p from which the theorem easily follows by using the Prime Number Theorem as the base case of k = 1 and applying induction on k. For the case of Nk (x) Hardy and Ramanujan demonstrated that there exists a constant D such that: ( Nk (x) = O x(log log(x) + D)k−1 (k − 1)! log(x) ) (2–31) holds uniformly for k ≤ (2 − δ) log log(x), δ > 0. The additional restriction on k in equation (2-31) follows from some subtle aspects concerning the function Ω(n) which will be discussed below. Hardy and Ramanujan made many more observations in their seminal paper, one of which was to introduce the concept of the normal order of an arithmetic function. An average order is a very natural estimate to seek, for given an arithmetic function f (n) the 1∑ average order is simply f (n). Furthermore, by summing over the function f (n) for x n≤x n ≤ x we are in effect smoothing out the irregularities of the function which will almost certainly exist; hence, the average order may be influenced by sporadically occurring irregular values. These irregular values can sometimes give misleading results, and in this section we will give a famous example of one such situation. However, Hardy and 55 Ramanujan developed a different type of statistic called the normal order, which is in many ways more natural and informative than the average order, and which we now define. Definition 2.3.2. We say that an arithmetic function f (n) has normal order g(n) if g(n) is a monotone arithmetic function such that for any ϵ > 0 |f (n) − g(n)| ≤ ϵ|g(n)| on a set of integers n having natural density 1. The observations of Hardy and Ramanujan allow us to obtain interesting probabilistic interpretations of various arithmetic functions. Recall that a random variable X is Poisson with parameter λ > 0 if P(X = j) = λj −λ e j! for j = 0, 1, 2, .... The expected value of a Poisson random variable is given by E [X ] = λ, see [21]. Equation (2-30) of Hardy and Ramanujan demonstrates that πk (x) may be interpreted as being bounded by a Poisson random variable with parameter λ = log log(x); furthermore, this same equation shows that πk (x) is small compared to x when |k − log log(x)| is large. This is because for every δ > 0 ∑ lim e −t t→∞ |k−t|>t 1/2+δ tk →0 k! (see [21]) so that (2-30) demonstrates that 1 x→∞ x lim ∑ |k−log log(x)|>(log log(x))1/2+δ and, hence, that the inequality 56 ω(x) = 0 (2–32) |ω(n) − log log(n)| < (log log(n))1/2+δ holds on a set of integers with density 1. These observations led Hardy and Ramanujan to deduce that ω(n) not only has average order log log(n), but also has normal order log log(n). In 1934 P. Turán succeeded in proving a stronger result concerning ω(n) than that derived by Hardy and Ramanujan. This result, known as Turán’s inequality, has further probabilistic implications for the Hardy-Ramanujan functions and is the subject of the next theorem. Theorem 2.3.3. (Turán) ∑ (ω(n) − log log(x))2 = O(x log log(x)) n≤x Proof: Consider: ∑ (ω(n) − log log(x))2 = ∑ ω 2 (n)−2 log log(x) n≤x n≤x = ∑ ∑ ω(n)+x(log log(x))2 +O(x log log(x)) n≤x ω 2 (n) − x(log log(x))2 + O(x log log(x)). n≤x Now, as ∑ n≤x ω 2 (n) = ∑ n≤x 2 ∑ ∑ ∑ ∑ 1 = 1+ 1 , n≤x p|n pq|n p 2 |n where it is understood that p ̸= q are distinct primes. It is easily seen that ∑∑ 1 = O(x) n≤x p 2 |n and that 57 ∑ ∑ ∑ x , pq pq≤x 1≤ p,q|n n≡0(pq) where it is implicit that on the left hand side of the above equation the outer sum is summed over all pq ≤ x and the inner sum is summed over all n ≤ x. Now, as ( ∑ x ≤x pq pq≤x ∑1 q≤x )( p ∑1 q≤x ) q ≤ x(log log(x))2 + O(x log log(x)); hence, ∑ ω 2 (n) ≤ x(log log(x))2 + O(log log(x)). n≤x We may therefore conclude that: ∑ (ω(n) − log log(x))2 = O(x log log(x)), n≤x which is Turán’s result. We may further note that if T > 0 is large and NT (x) denotes the sum of the √ integers less than x, such that |ω(n) − log log(x)| > T log log(x), that is: ∑ NT (x) = |ω(n)−log log(x)|>T √ 1, log log(x) where the sum is over all n ≤ x, then ∑ T log log(x)NT (x) ≤ |ω(n)−log log(x)|>T √ (ω(n) − log log(x))2 ≤ O(x log log(x)). log log(x) Therefore, we may conclude that NT (x) = O 58 (x ) T , and in particular, that lim NT (x) = o(x). T →∞ We now offer an alternative proof of equations (2-27) and (2-28) via the following companion function: Definition 2.3.4. Let w (n) = ∑ Λ(d) , that is log(d) d|n w (n) = ∑1 α α p |n . With this definition it should be clear that ω(n) ≤ w (n) ≤ Ω(n) for all integers n ≥ 2. With these inequalities we can immediately deduce that: ∑ w (n) = x log log(x) + O(x). n≤x However, we can improve the O-term by noting that w (n) generates the following Dirichlet series: ζ(s) log ζ(s) = ∞ ∑ w (n) n=2 ns . Recall that in order to prove the Prime Number Theorem it was necessary to show that ζ(s) ̸= 0 for s = 1 + it; hence, the function log ζ(s) has a sole singularity on the line ℜ(s) = 1, at s = 1. This was necessary as Delange’s theorem could not be applied to functions which do not satisfy conditions of holomorphy on some vertical line, and from the properties of log ζ(s) the function ζ(s) log ζ(s) must be holomorphic at all points on the line ℜ(s) = 1, save the singularity at s = 1. Of course, as was previously mentioned, this result follows easily from equation (2-28), and is therefore not as deep a theorem as the Prime Number Theorem. However, we wanted to see how the Hardy-Ramanujan 59 results fit within the context of Delange’s Theorem as we will provide a similar treatment when addressing the Alladi-Erdős functions, which will be developed in the next chapter. While the above observations are sufficient to apply Delange’s Theorem to evaluate ∑ w (n), we could actually prove much stronger results through more elaborate the sum n≤x methods. This is because ζ(s) log ζ(s) may be analytically continued for all complex s ̸= 1 in the same zero-free region as ζ(s). Although, for our purposes the gain is minor for the additional work that would be involved. Theorem 2.3.5. There exists a constant c3 such that ∑ ( w (n) = x log log(x) + c3 (x) + O n≤x x log(x) ) . Proof: By the above discussion ζ(s) log ζ(s) satisfies the conditions of Delange’s theorem; moreover, ζ(s) log ζ(s) has a singularity of the type: 1 log (s − 1) ( 1 s −1 ) c3 + + c4 log (s − 1) ( 1 s −1 ) + ... for s near 1. Therefore, Delange’s Theorem implies that: ∑ ( w (n) = x log log(x) + c3 x + O n≤x x log(x) ) , as was to be demonstrated. In the above proof it should be noted that while Delange’s powerful theorem allows us to easily derive our result, his theorem is in many respects invoked unnecessarily. In particular, the first proof implies that (as was mentioned before) the only real analytic result necessary is equation (2-29), which is far more easily derived than Delange’s theorem. It is a rather amazing fact that many deterministic arithmetic functions can be interpreted in a probabilistic manner, much like how the prime number theorem can be considered to be a statement about the arithmetic mean of Λ(n). Furthermore, many of 60 these probabilistic interpretations shed much more light on various arithmetic functions than what can be derived analytically or elementarily. The motivation for Delange’s powerful theorem was supplied by the work of Sathe and Selberg, whose theorems offer further probabilistic interpretations for the functions πk (x) and Nk (x). Historically, Edmund Landau derived the estimates πk (x) ∼ x(log log(x))k−1 (k − 1)! log(x) (2–33) Nk (x) ∼ x(log log(x))k−1 (k − 1)! log(x) (2–34) and in 1909 [16], but he only proved these results for fixed k. Hardy and Ramanujan derived their uniform estimates (2-30) and (2-31) in 1917 [11], yet it would take a further 36 years before Sathe proved uniform estimates in k which were comparable in accuracy to those of Landau. Sathe no doubt found his inspiration in the papers of Hardy and Ramanujan and proceeded to prove his results inductively, although it should be noted that Sathe’s methods are very complicated and his results are difficult to derive. However, by using Selberg’s argument one may derive in a more classical and natural fashion the results of Sathe. To arrive at these results, consider the functions: F (s, z) = ∞ ∑ z ω(n) n=1 ns = ∏( p z 1+ s p −1 ) = ζ(s)z f (s, z) (2–35) = ζ(s)z g(s, z) (2–36) and G (s, z) = ∞ ∑ z Ω(n) n=1 ns = ∏( p z 1− s p )−1 where the function in (2-35) represents an analytic function of s and z for ℜ(s) > 1, and (2-36) represents an analytic function in s and z for ℜ(s) > 1, |z| < 2. When factored in 61 terms of ζ(s)z , f (s, z) becomes analytic for all z where ℜ(s) > 1/2, and g(s, z) becomes analytic when ℜ(s) > 1/2 and |z| < 2. The additional restriction that |z| < 2 arises from the pole which the prime p = 2 contributes to (2-36). In order to deduce an asymptotic formula for πk (x), Selberg first applies Perron’s formula to (2-35) S(z, x) = ∑ z ω(n) n≤x 1 = 2πi ∫ k+i∞ ζ(s)z f (s, z) k−i∞ xs ds s where k > 1, in order to get an estimate of the partial sums of the coefficients of that series; however, this requires one to deform the straight line contour in Perron’s formula into a zero-free region for ζ(s) in the strip 1/2 < ℜ(s) < 1. Furthermore, in general ζ(s)z may contain branch point singularities, which will necessitate taking a Hankel contour around the singularity at s = 1. After deforming the line of integration, one will quickly notice that (as in Theorem 3.1.8) the majority of the contribution from the integral comes from this branch point singularity; therefore, after Selberg demonstrated that the other contours contribute little to the estimate of S(z, x), he obtained the result that S(z, x) = ∑ ( z ω(n) = x log z−1 ( (x)f (1, z) 1 + O n≤x 1 log(x) )) which is uniform if z is bounded. In [26] Selberg then solves for πk (x) by applying Cauchy’s theorem to the sum S(z, x): 1 πk (x) = 2πi ∫ |z|=r S(z, x) dz z k+1 and then optimizing the radius r of the contour. The optimal value turns out to be k r = , and utilizing this result we may obtain an improvement on Landau’s log log(x) asymptotic πk (x) ∼ k f (1, log log(x) )x(log log(x))k−1 (k − 1)! log(x) 62 which is valid uniformly for k ≤ M log log(x), M an arbitrarily large positive constant. We now apply the Sathe-Selberg technique to derive an asymptotic for Nk (x) which holds uniformly (in some range of k). By applying Perron’s formula in a manner analogous to the derivation of S(z, x) (i.e. by deforming the path of integration and evaluating the Hankel contour around the singularity at s = 1) to (2-36) we obtain the sum T (z, x) = ∑ ( z Ω(n) = x log z−1 (x)g(1, z) 1 + O n≤x ( 1 log(x) )) which, by utilizing Cauchy’s theorem in a similar manner to the method used to derive πk (x), gives 1 Nk (x) = 2πi ∫ |z|=r T (z, x) dz z k+1 k . However, as |z| < 2 this forces log log(x) k ≤ (2 − δ) log log(x), for any δ > 0. Evaluating this contour integral yields the with the optimal value of the radius r = asymptotic: x(log log(x))k−1 , Nk (x) ∼ (k − 1)! log(x) which holds uniformly in k ≤ (2 − δ) log log(x). In order to study functions of this type in general, Selberg in [26] then derives the following deep theorem: Theorem 2.3.6. (Selberg) Let B(s, z) = ∞ ∑ b(z, n) n=1 for ℜ(s) > 1/2, and let 63 ns ∞ ∑ |b(z, n)| n=1 n (log(2n))B+3 be uniformly bounded for |z| ≤ B. Furthermore, let B(s, z)ζ(s)z = ∞ ∑ a(z, n) n=1 ns for ℜ(s) > 1. Then, we have A(z, x) = ∑ a(z, n) = n≤x B(1, z) x logz−1 (x) + O(x logz−2 (x)), Γ(z) uniformly for |z| ≤ B, x ≥ 2. With this theorem in hand we may not only obtain the above asymptotic values, but also those for a much wider class of functions. As was mentioned previously, the average order of a function can sometimes lead to misleading results. To illustrate this phenomenon consider τ (n), the number divisors of an integer n. It is a well-known fact, first demonstrated by Dirichlet, that ∑ √ τ (n) = x log(x) + (2γ − 1)x + O( x), n≤x and we may conclude that the arithmetic mean of τ (n) is log(n). It is therefore tempting to assume that τ (n) will be about the size of log(n) quite often; however, this is ω(n) ∏ α completely false. Given the canonical decomposition of an integer n = pi i it is i=1 easily seen that ∏ ω(n) 2 ω(n) ≤ τ (n) = ∏ ω(n) (αi + 1) ≤ i=1 2αi = 2Ω(n) , (2–37) i=1 and from the Hardy-Ramanujan results both ω(n) and Ω(n) have average and normal order log log(n), hence (2-37) implies 64 τ (n) = (log(n))log(2)+o(1) on some subset of the integers having natural density 1. Thus τ (n) is more often than not equal to (log(n))log(2)+o(1) , and is therefore significantly less than its arithmetic mean on a set of density 1 (in particular τ (n) cannot have its average order equal to its normal order). This fact can now be explained, but it requires the power of the Sathe-Selberg techniques and is therefore much deeper than the earlier results concerning τ (n) ∑ described above. It will be shown presently that the sum τ (n) is dominated by a n≤x small number of integers with many divisors. As ∑ n≤x τ (n) ∼ ∑ 2ω(n)+o(1) ∼ n≤x ∑ 2k πk (n) ∼ n≤x ∞ ∑ 2k x(log log(x))k−1 k=0 (k − 1)! log(x) and this is an exponential series in 2 log log(x). Thus ∑ |k−2 log log(x)|<(log log(x))1/2+δ 2 x(2 log log(x))k−1 ≍ x log(x) (k − 1)! log(x) (2–38) The main contribution in (2-38) comes from the terms for which |k − 2 log log(x)| < (log log(x))1/2+δ . For such values of k, τ (n) = 2k(1+o(1)) = 22 log log(x)(1+o(1)) = (log(x))2 log(2)+o(1) and as 2 log(2) > 1, these values are larger than the average order of log(n) for τ (n). Another way to interpret this is that the average order of τ (n) is thrown off by the small set of integers with the property that ω(n) ∼ 2 log log(n), which in light of the fact that the average and normal orders of ω(n) are log log(n), should also be quite rare. This explains why τ (n) can have an average order which is larger than its normal order. To close this section we give one more interesting probabilistic interpretation of the function ω(n). The Erdős-Kac Theorem states that for all λ ∈ R 65 ∫ λ √ 1 1 2 lim n ≤ x : ω(n) ≤ log log(x) + λ log log(x) = √ e −u /2 du. x→∞ x 2π −∞ It follows that the function ω(n) behaves as a normally distributed random variable, with mean and variance log log(n). 2.4 Remarks It should be noted that the prime number theorem is a rather deep theorem, and while elementary proofs of this theorem exist (that is, theorems which do not require the use of complex function theory), they are hardly simple. The brevity of the above arguments proving the Prime Number Theorem (and in particular the very short proof of the Hardy-Ramanujan Theorem) follows from the fact that most of the particulars of the theorem are subsumed in Delange’s Theorem (Theorem 2.2.1), which is itself a highly nontrivial result relying upon the theory of analytic functions. The above proofs do not use any new techniques from the theory of complex variables; however, I have never seen any published results on the asymptotic value of the summation of the powers of the von Mangoldt function: ψk (x) = ∑ Λk (n). n≤x The above approach for ψ2 (x) generalizes easily to ψk (x) for k ∈ Z, k > 2, and only requires one to differentiate log ζ(s) k-times, and then apply Delange’s theorem. It is easily deduced that any asymptotic result for ψk (x) is equivalent to the prime number theorem; as asymptotic results concerning π(x) follow by applying partial summation to the function ψk (x) in the manner outlined above, yielding ψk (x) ∼ x logk−1 (x). ( Furthermore, the formula ψ2 (x) = x log(x) + x + O ) x allows us to analyze log(x) the second moment (the variance) of the function Λ(n). While Λ(n) = 0 unless n is 66 a prime power, we should expect its variance to be quite large. This fact is in many ways surprising as the Prime Number Theorem assures us that the mean of Λ(n) is ( ) 1 ψ(x) = 1+O , which is quite small. However, by direct calculation this x log(x) variance is given by ( ( ( ))2 ( )) 1∑ 1 Λ(n) 1∑ 2 Λ(n) − 1 + O Λ (n) − 2Λ(n) + 1 + O = x n≤x log(n) x n≤x log(n) = 1 = x 1 (ψ2 (x) − 2ψ(x) + x + O(π(x))) x ( ( x log(x) + x − 2x + x + O ( = log(x) + O 1 log(x) x log(x) )) ) = log(x) + o(1). This is far in excess of the average order of Λ(n), which is in many ways to be expected. In closing this section we make several remarks on the still unproven Riemann hypothesis which would have major consequences for the estimate of π(x). As was mentioned earlier, the error term in the Prime Number Theorem depends on the elusive zero-free region of ζ(s), and Riemann Hypothesized that if ζ(ρ) = 0 and 0 < ℜ(ρ) < 1 then ℜ(ρ) = 1/2; this conjecture (if true) would allow us to improve our asymptotic estimate to ψ(x) = x + O(x 1/2 log2 (x)) and π(x) = li (x) + O(x 1/2 log(x)) 67 (for proofs of these results and further consequences of Riemann’s Hypothesis see [7] and [29]). It should be noted that while there is a great deal of evidence to support Riemann’s conjecture, this conjecture appears to be far beyond the scope of our present understanding of mathematics. Despite efforts by some of the most talented mathematicians in history, the Riemann Hypothesis has eluded all attempts to either prove or disprove its validity. 68 CHAPTER 3 ARITHMETIC FUNCTIONS INVOLVING THE LARGEST PRIME FACTOR 3.1 The Alladi-Erdős Functions In this section we will introduce various arithmetic functions and identities which will be of use later in applications to numerical factorization. To begin, we define the following functions: Definition 3.1.1. Let A∗ (n) = ∑ ∑ p, the sum of the prime factors of n and let A(n) = p|n αp be the sum of the prime factors of n, weighted according to multiplicity (Note p α ||n that p α ||n implies that p α |n but p α+1 does not divide n). We will call these two functions the first and second Alladi-Erdős functions, respectively. Furthermore, we now add a companion function to the two functions just introduced, namely, we define A′ (n) = ∑ pα p α |n α . In their 1976 paper [2], Alladi and Erdős introduced the functions A(n) and A∗ (n), as well as demonstrating some of their basic properties. We have introduced the function A′ (n) as it generates a Dirichlet Series which is very convenient to work with in light of Delange’s Theorem. Firstly, the functions A(n), A∗ (n), and A′ (n) are all additive p2 and A∗ (n) ≤ A′ (n); furthermore, as A′ (p 2 ) = p + and A∗ (p 2 ) = p we see that 2 A∗ (p 2 ) ≤ A′ (p 2 ), and it follows by a simple inductive argument that A∗ (n) ≤ A′ (n) for all values of n ∈ Z+ . Other interesting properties follow by considering the asymptotic value ∑ ∑ ∑ of the functions f (x) = A(n), f ∗ (x) = A∗ (n), and f ′ (x) = A′ (n), all of which n≤x n≤x n≤x are equal to π2 x 2 +O 12 log(x) ( x2 log2 (x) ) . (3–1) This asymptotic was also established for the first time by Alladi and Erdős; however, we will improve their asymptotic value to 69 ∫ ζ(2) 2 x √ t dt + O(x 2 e −C log(x) ), log(t) (3–2) (where ζ(2) = π 2 /6) as well as showing how Riemann’s hypothesis would allow us to further improve the error term. We will now analytically derive the asymptotic of f ′ (x) which can be approached more easily through analysis than f (x) or f ∗ (x). We begin with a simple lemma: Lemma 3.1.2. f ′ (x) − f ∗ (x) = O(x 3/2 ) Proof: Consider the difference f ′ (x) − f ∗ (x) which is easily seen to be equal to: ∑ p2 [ x ] ∑ p3 [ x ] ∑ pα [ x ] (A (n) − A (n)) = + + ... + 2 p2 3 p3 α pα 2 3 n≤x p α ≤x ∑ ′ ∗ p ≤x p ≤x where p ||n. Clearly this value is majorized by: α ∑ (A′ (n) − A∗ (n)) ≤ ∑x ∑ x + ... + 2 α 2 p α ≤x p ≤x n≤x ∑ 1 ∑ 1 =x + ... + 2 α 1/2 1/α p≤x ( =x p≤x x 1/2 x 1/3 x 1/α + + ... + log(x) log(x) log(x) ) + ... ( ) ∑ x 1/2 x 1/2 ≤x +O x 2 = O(x 3/2 ) log(x) log (x) n≤α ( ) x x , and the fact that where we have used the fact that π(x) = +O log(x) log2 (x) ∑ log(x) n≤α 1 = α ≤ log(2) = O(log(x)). Thus we have obtained the result of the lemma. It is interesting to contrast the difference f ′ (x) − f ∗ (x) from the above lemma with the difference f (x) − f ∗ (x), which is much smaller. It was first proved by Alladi and Erdős in [2], that Lemma 3.1.3. (Alladi-Erdős) f (x) − f ∗ (x) = x log log(x) + O(x) 70 Proof: It is easily seen that the difference f (x) − f ∗ (x) is given by ∑ ∑ [x ] ∑ [x ] ∗ (A(n) − A (n)) = p 2 + p 3 + ..., p p 2 3 n≤x p ≤x p ≤x furthermore ∑ p 2 ≤x ∑ ∑ x x p 2 = +O p p √ p √ p≤ x p≤ x = x log log(x) + O(x) and ∑ 1 ∑ x ∑ p i =x =O p . i−1 p p i i 1/i p ≤x p ≤x p≤x Therefore, as ∑∑ x p i = O(x) p i≥3 and we may conclude that ∑ (A(n) − A∗ (n)) = x log log(x) + O(x); (3–3) n≤x which is the result of Alladi and Erdős. Given the results of the preceding lemmas, the difference between any three of the arithmetic functions f (x), f ′ (x), and f ∗ (x) will not exceed O(x 3/2 ) = o(x 2 / log(x)). It will be shown later in this chapter that f ′ (x) is a function which is asymptotically equivalent π2 x 2 to , implying that the three functions must have the same asymptotic values. 12 log(x) It is also interesting to note how much greater f ′ (x) grows asymptotically than the functions f ∗ (x) and f (x). While for our purposes this discrepancy will not be important, equation (3.1.3) shows that f (x) − f ∗ (x) = x log log(x) + O(x) which is a far smaller 71 value then f ′ (x) − f ∗ (x) = O(x 3/2 ). This is most easily explained by the fact that f ′ (x) essentially counts prime powers, whereas the summation of the Alladi-Erdős functions do not. Therefore, it is interesting (and perhaps somewhat counterintuitive) that each function has the same dominant term for their respective asymptotic values. This is best explained as saying that the contribution from the primes p which divide an integer more than once (i.e. p α |n, where α > 1) is quite small compared to the contribution from the largest prime divisor. To make this discussion more precise, let P1 (n) denote the largest prime divisor of an integer n. In [2] Alladi and Erdős showed that ∑ and hence ∑ ζ(2) x 2 P1 (n) = +O 2 log(x) n≤x ( x2 log2 (x) ) (3–4) , P1 (n) has the same asymptotic as f ′ (x). Therefore, the majority of the n≤x contribution to the sum ∑ A(n) comes from the largest prime factors of the integers n≤x n ≤ x. This is consistent with Hardy and Ramanujan’s theorem that ∑ (Ω(n) − ω(n)) = n≤x O(x), where ω(n) is the number of distinct prime divisors of an integer n and Ω(n) is the number of distinct prime powers dividing an integer n; furthermore, from equations ∑ ∑ (2-27) and (2-28) both ω(n) and Ω(n) are asymptotically x log log(x) + O(x). That n≤x n≤x ω(n) and Ω(n) are asymptotically very close allows us to surmise (albeit heuristically) that the prime factors of most integers occur square-free. Theorem 3.1.4 also gives an interesting result due to Erdős (see Theorem 3.1.4 below) which gives further evidence of how the largest prime factor of n is quite dominant, and dictates the behavior of a large class of arithmetic functions. These observations further validate the fact that ∑ P1 (n) is asymptotically the same order of magnitude as f (x), as most integers will n≤x only be divisible by a prime p once, the higher powers of p will be rare; therefore, they should contribute little to the value of f (x), as they do. This last observation raises a very interesting and natural question: let Pm (n) to be the mth largest prime factor of an integer n, 1 ≤ m ≤ ω(n) (that is, Pm+1 (n) < Pm (n) for 72 1 ≤ m ≤ ω(n) − 1). We have already noted that ∑ P1 (n) accounts for the dominant n≤x term in the functions f (x), f ∗ (x), and f ′ (x); consequently, it is only natural that we ask ∑ P2 (n) accounts for the dominant term in the modified Alladi-Erdös whether the sum functions: ∑ n≤x (A(n) − P1 (n)), n≤x ∑ ∑ (A′ (n) − P1 (n)), and n≤x (A∗ (n) − P1 (n))? n≤x This question was posed to Paul Erdös by Krishnaswami Alladi during their first collaborative encounter, and was proved not long after in a much more general form in [2]. Their solution to the problem is supplied by equation (1-13), however, we will now state it formally as a theorem: Theorem 3.1.4. (Alladi-Erdős) For all integers m ≥ 1, we have: ∑ (A(n) − P1 (n) − ... − Pm−1 (n)) ∼ n≤x ∑ Pm (n) ∼ n≤x km x 1+(1/m) , logm (x) where km > 0 is a constant depending only on m, and is a rational multiple of ζ(1 + 1/m) where ζ(s) is the Riemann Zeta function. ( 1+(1/m) ) ∑ x , while asymptotically bounded Thus, for m ≥ 2 the sums Pm (n) = O m log (x) n≤x by f (x), f ′ (x), and f ∗ (x), do grow appreciably. Note that for the case m = 1, the above theorem implies that ∑ π2 x 2 P1 (n) = r +o 6 log(x) n≤x ( x2 log(x) ) , where r ∈ Q, r > 0. In fact, for m = 1 we may take r = 1/2 so that ∑ π2 x 2 +o P1 (n) = 12 log(x) n≤x ( x2 log(x) ) ; however, at present we cannot prove that r = 1/2 by simply appealing to the above theorem. Nevertheless, these observations further validate the previous comments ∑ concerning the sum P1 (n). n≤x 73 To further emphasize the dominance of the largest prime factor of an integer n, we will prove an interesting result first derived by P. Erdös. Recall that if pn denotes the nth prime number, then Tchebyschev’s estimate states that pn = O(n log(n)); also, a weak form of the well-known theorem due to F. Mertens (1874) states that ∏( p≤x 1 1− p )−1 = O(log(x)). (3–5) All of these results can be made more precise, and can be found in [29]. Our next theorem, due to Paul Erdős, gives an interesting (and in many ways surprising) characterization of an arithmetic function in terms of its largest prime factor: Theorem 3.1.5. (Erdős) Let f (n) > 0 be a non-decreasing arithmetic function. Then the sum ∞ ∑ n=1 1 f (n)n converges if and only if the sum ∞ ∑ n=1 1 f (P1 (n))n converges. Proof: As f (n) is non-decreasing f (P1 (n)) ≤ f (n), thus ∞ ∑ n=1 ∑ 1 1 ≤ , f (n)n f (P (n))n 1 n=1 ∞ ∞ ∑ ∞ ∑ 1 1 so that if converges, must also converge. f (P1 (n))n f (n)n n=1 n=1 To prove the converse, consider the following: ∑ n≤x ∑ 1 ∑ 1 1 = f (P1 (n))n f (p) n p≤x P1 (n)=p 74 )−1 ∑ 1 ∑ 1 ∏( ∑ 1 1 = = 1− f (p) mp f (p)p q q<p p≤x p≤x P1 (m)≤p ( =O ∑ log(p) p≤x ) f (p)p ( ≤O ∑ log(pn ) f (pn )pn n≤x ) where pn denotes the nth prime number. Now, by Tchebyschev’s estimate, ∑ log(pn ) =O f (p )p n n n≤x ( ∑ n≤x 1 f (pn )n ) and as n ≤ pn it follows that ∑ n≤x ∑ 1 1 ≤ , f (pn )n f (n)n n≤x and taking the limit as x → ∞ implies, by virtue of the above inequalities, that if ∞ ∑ n=1 1 f (n)n converges, then ∞ ∑ n=1 1 f (P1 (n))n must also converge, and the result follows. To analytically derive the desired asymptotic bounds for the Alladi-Erdös functions we will consider the Dirichlet series generated by A′ (n), which arises naturally from the ∑ pα study of the Riemann Zeta function ζ(s) discussed in Chapter 2. As A′ (n) = = α p α |n ∑ Λ(d) d, where Λ(n) is the von Mangoldt function, it follows that: log(d) d|n ∞ ∞ ′ ∑ ∑ ∑ Λ(d) 1 A (n) = d (3–6) s+1 n log(d) ns+1 n=1 n=1 d|n 75 = ∞ ∞ ∑ 1 ∑ Λ(n) n ns+1 n=1 log(n) ns+1 n=1 = ζ(s + 1) log ζ(s). Using the fact that ζ(s) is analytic in ℜ(s) > 1, ζ(1 + it) ̸= 0, ∀t ∈ R, and has a sole singularity at s = 1 we may deduce that ζ(s + 1) log ζ(s) is holomorphic at all points lying on the line ℜ(s) = 1, s ̸= 1; therefore, we may apply Delange’s theorem to conclude the following: ( 2 ) π2 x 2 x Theorem 3.1.6. f (x) = A (n) = +o 12 log(x) log(x) n≤x ∞ ∑ A′ (n) Proof: As equation (3-6) demonstrates, = ζ(s + 1) log ζ(s). Applying Delange’s ns+1 n=1 theorem (Theorem 2.2.1) yields: ∑ ′ ∑ A′ (n) n≤x n ′ x = ζ(2) +o log(x) ( x log(x) ) π2 x = +o 6 log(x) ( x log(x) ) , and a straightforward application of Abel summation (equation (1-12)) demonstrates that ∑ π2 x 2 f (x) = A (n) = +o 12 log(x) n≤x ′ ′ ( x2 log(x) ) , which is the desired result. Using Mellin’s inversion theorem we may improve this asymptotic estimate by taking into account the zero-free region of ζ(s) supplied by Theorem 2.1.3. To improve the estimate we will not need the full power of Mellin’s inversion theorem, in fact, for our purposes the following theorem will suffice: Theorem 3.1.7. Let k > 1 and x > 0, then the integral 1 2πi equals 1 if x > 1, 1 2 ∫ k+i∞ k−i∞ if x = 1, and 0 if 0 < x < 1. 76 x s+1 ds s +1 If D(s) = ∑∞ d(n) n=1 ns+1 is any Dirichlet series, where k > σa ≥ 1 is chosen such that D(s) lies in a domain of absolute convergence, then 1 2πi ∫ k+i∞ k−i∞ 1 xs D(s) ds = s 2πi ∫ k+i∞ k−i∞ (∞ ) ∑ d(n) x s+1 ds s+1 n s + 1 n=1 ( x )s+1 ds ( x )s+1 ds ∑ 1 ∫ k+i∞ ∑ 1 ∫ k+i∞ ∑ = d(n) + d(n) = d(n) 2πi k−i∞ n s + 1 n≥x 2πi k−i∞ n s + 1 n≤x n≤x if x ∈ R − Z. If x ∈ Z then the above integral equals d(x) ∑ + d(n). 2 n<x Note that the assumption that s is in a domain of absolute convergence is essential in order to justify the interchanging of the infinite sum and the integral. As was mentioned in the introduction, the first person to apply the above inversion technique to Dirichlet series was Oskar Perron in his 1908 article [19]. It is for this reason that the formula equating the sum of the coefficients of a Dirichlet series D(s) with the inverse Mellin transform of D(s) is often referred to as Perron’s formula. Recall that equation (3-6) shows that ∞ ∑ A′ (n) n=1 ns+1 = ζ(s + 1) log ζ(s), which is now in a form where we may apply Perron’s formula. As the evaluation of integrals of this form is now classical, we will only sketch the proof of how one may use Perron’s formula to derive the desired estimate. For the full details of how to evaluate contour integrals of this form one may consult Chapter 5 of [7], Chapter II.4 of [29], or for a step by step demonstration see Chapter 2 of [4]. We motivate the discussion by the following somewhat informal argument, as we will be taking a line integral it follows from Cauchy’s theorem that we may deform the 77 line of integration to be a suitable contour. For our purposes we will simply take the classical contour of integration chosen by de la Vallée-Poussin, which requires us to take ℜ(s) = σ > 1 − c log(t) for some constant c > 0. However, for s in this region the function ζ(s + 1) is analytic, hence bounded, so we should suspect that it does not make any major contribution to the integral. Thus, the majority of the contribution from the contour will occur at the logarithmic singularity at s = 1 of ζ(s + 1) log ζ(s), which has a coefficient of ζ(2); also, the error term in our estimate will be directly related to how far we may take our contour into the critical strip. As we will not be taking our contour further than σ = 1/2, and ζ(s + 1) ≤ ζ(3/2) is bounded in this region, the error term will be bounded by a function times the error term implied by the Prime Number Theorem. Hence, we should suspect that the integral is closely approximated by: ζ(2) 2πi ∫ k+i∞ k−i∞ ∫ x √ ∑ Λ(n) x s+1 t log ζ(s) ds = ζ(2) n = ζ(2) dt + O(x 2 e −c log(x) ) s +1 log(n) 2 log(t) n≤x which follows from the Prime Number Theorem. The next theorem is simply the statement that all of these observations are in fact accurate. Furthermore, we remind the reader that the following theorem is merely a sketch of the proof as many of the details have been omitted. Theorem 3.1.8. There exists a constant c > 0 such that ∫ ′ x f (x) = ζ(2) 2 √ t 2 −c log(x) dt + O(x e ). log(t) Proof: First, note that for a > 1 1 f (x) = 2πi ′ ∫ a+i∞ ζ(s + 1) log ζ(s) a−i∞ for non-integral x, whereas the integral equals f ′ (x − 1) + x s+1 ds s +1 A′ (x) 2 = f ′ (x) + o(x 3/2 ) if x ∈ Z; thus the choice of an integral or non-integral x > 0 will not affect our estimate. In order 78 to evaluate this line integral we will deform the path of integration as far into the critical strip, 0 ≤ σ ≤ 1, as possible while avoiding the singularities of the integrand, also the integrand has a branch cut on the real line which must be handled with care. Therefore, we choose the contour as follows: first let T be fixed, b = 1 − a = 1+ c log(T ) c , log(T ) and where c is the constant in de la Vallée-Poussin’s zero-free region. The classic path of integration is given by the vertical line from a + i ∞ to a + iT , followed by the horizontal line from a + iT to b + iT , then the vertical line from b + iT to b + i ϵ. The contour then proceeds around the branch cut avoiding the singularity at s = 1, with a semicircle of radius ϵ and center 1, then follows the horizontal path below the branch cut to b − i ϵ. The remainder of the path is merely the contour above the real axis reflected about the real line, that is, the vertical line from b − i ϵ to b − iT , the horizontal line from b − iT to a − iT , and the vertical line from a − iT to a − i ∞. Note that by de la Vallée-Poussin’s theorem and our choice of a and b this contour lies in a domain of analyticity of ζ(s + 1) log ζ(s). Hence, by Cauchy’s theorem, we may conclude that ∫ 1 f (x) = 2πi ′ 1 = 2πi (∫ ∫ a−iT + a−i∞ ∫ b−iT a−iT ζ(s + 1) log ζ(s) a−i∞ ∫ b−iϵ + a+i∞ ∫ + b−iT + cut ∫ b+iT b+iϵ ∫ a+iT + x s+1 ds s +1 a+i∞ ) + b+iT ζ(s +1) log ζ(s) a+iT x s+1 ds s +1 = D(x) + E (x). where D(x) denotes the dominant term in our asymptotic estimate and E (x) denotes the error term, i.e. E (x) = o(D(x)). It is a well-known fact that the error term in the Prime Number Theorem is directly related to how far we may move the path of integration of log ζ(s) into the critical strip, and as our integrand ζ(s + 1) log ζ(s) is very similar to the function evaluated in the classical proofs of the prime number theorem it is not difficult to justify that the error 79 term of f ′ (x) is also related to how far we may deform our contour into the critical strip. In fact, for our purposes the error term E (x) will be supplied by the contours which are not around the branch point (for a justification of this fact and further details see [4]). Specifically, 1 E (x) = 2πi (∫ ∫ a−iT + a−i∞ ∫ b−iT + a−iT ∫ b−iϵ ∫ b+iT + + b−iT b+iϵ a+i∞ ) ∫ a+iT ζ(s +1) log ζ(s) + b+iT a+iT x s+1 ds. s +1 For ℜ(s) = σ > 0 the function ζ(s + 1) is maximized on the real line, and as the path of integration does not penetrate as far as σ = 1 2 into the critical strip, we may conclude that, ζ(3/2) E (x) ≤ 2πi (∫ ∫ a−iT + a−i∞ ∫ b−iT + a−iT ∫ b−iϵ + b−iT ∫ b+iT + b+iϵ ∫ a+iT a+i∞ ) + b+iT log ζ(s) a+iT x s+1 ds. s +1 From here it is a straightforward, though detailed, process to estimate E (x). However, we may side-step the issue of directly evaluating the six contours by noting that ∫ a+i∞ ∑ p 1 x s+1 = ds, log ζ(s) α 2πi s + 1 a−i∞ α p ≤x then, choosing the same contour as before, it follows from the Prime Number Theorem ∫ x √ ∑ p t that = dt + O(x 2 e −c log(x) ). Thus, the error term is given by α 2 log(t) p α ≤x 1 2πi (∫ ∫ a−iT + a−i∞ ∫ b−iT + a−iT ∫ b−iϵ + b−iT ∫ b+iT + b+iϵ ∫ a+iT a+i∞ ) + b+iT log ζ(s) a+iT √ x s+1 ds = O(x 2 e −c log(x) ), s +1 which differs from the contour integral we wish to evaluate by a constant. Therefore √ √ E (x) ≤ O(ζ(3/2)x 2 e −c log(x) ) = O(x 2 e −c log(x) ). Note that the implicit constant in the O-term is far from optimal; however, for our purposes we need only show that such a constant exists. 80 The final step necessary to establish Theorem (3.1.7) is to evaluate the integral along the branch cut, which we now proceed to do: 1 2πi ∫ ζ(s + 1) log ζ(s) cut x s+1 ds s +1 ∫ ∫ x s+1 1 ζ(s + 1) log((s − 1)ζ(s)) ds − s +1 2πi cut 1 = 2πi ζ(s + 1) log(s − 1) cut x s+1 ds s +1 The first integral is zero because ζ(s + 1) log((s − 1)ζ(s)) is regular and single-valued along the cut; hence, the integrand is regular, and the integral along the upper side cancels the integral along the lower side. To evaluate the second integral we make the substitution s − 1 = ϵe iθ for −π < θ < π. With this substitution log(s − 1) = log(ϵ) + i θ, and therefore the value of log(s − 1) along the lower portion of the branch cut differs from the value of log(s − 1) along the upper portion of the branch cut by 2πi . Letting γ be a semicircle of radius ϵ centered at s = 1, then the above integral reduces to evaluating: 1 2πi 1 = 2πi ∫ b 1 ∫ ζ(s + 1) log(s − 1) cut x s+1 1 ζ(2 + ϵ) (log(ϵ) − i π) ds + s +1 2πi 1 + 2πi ∫ x s+1 ds s +1 b ζ(2 + ϵ) (log(ϵ) + i π) 1 ∫ ζ(2 + ϵ) (log(ϵ) + i θ) γ (3–7) x s+1 ds s +1 x s+1 ds. s +1 Now, we need only evaluate the three integrals in (3-7); letting ϵ → 0, the third integral is easily seen to be 1 2πi ∫ ζ(2 + ϵ) (log(ϵ) + i θ) γ ( ) x s+1 ds = O log(ϵ + π)x 2+ϵ 2πϵ = o(1). s +1 To obtain the desired value for D(x) we note that, 81 1 2πi ∫ b 1 x s+1 1 ζ(2 + ϵ) (log(ϵ) − i π) ds − s +1 2πi ∫ 1 ∫ ζ(2 + ϵ) (log(ϵ) + i π) b x s+1 ζ(2 + ϵ) ds = − s +1 =− b 1 ∫ 1 ζ(2) b x s+1 ds s +1 x s+1 ds, s +1 as ϵ → 0. Thus we are left with evaluating the integral, ∫ 1 ζ(2) b x s+1 ds. s +1 (3–8) Letting u 2 = x s+1 in (3-8) gives, ∫ 1 x s+1 ds = ζ(2) s +1 ζ(2) b ∫ x = ζ(2) 2 ∫ b furthermore, as log(u) > log(2) for 2 ≤ u ≤ x ∫ b 1 x 2+2 u du + ζ(2) log(u) b + 12 2 x ∫ x u du log(u) 2 b+1 2 2 u du; log(u) we see that 1 x 2+2 ζ(2) 2 b 1 u du = O(x 2 + 2 ) log(u) 2 −c √ log(x) ). Therefore which will be absorbed into the error term of E (x) = O(x e ∫ x t D(x) = ζ(2) dt, and as f ′ (x) = D(x) + E (x) we have our desired estimate: 2 log(t) ′ ∫ f (x) = ζ(2) 2 x √ t 2 −c log(x) dt + O(x e ) log(t) (3–9) which completes the proof. We reiterate that the above proof has omitted several important details, specifically, the explicit evaluation of the contours which are away from the branch point. However, the reader need not worry about the rigor of the above proof as integrals of this form are now classical in analytic number theory, and can be found in most texts on the 82 topic. In fact, in 1903, using a variant of the above integral representation, Edmund Landau succeeded in giving his own proof of the Prime Number Theorem by using the very same contour chosen above, and it is this proof which Landau included in his classic text [16] (a text which achieved so much fame that mathematicians such as G.H. Hardy simply referred to it as the Handbuch). The evaluation of each of these contours is not terribly difficult, although there are several particulars which must be taken into account. For example, log ζ(s) is a multiple-valued function with a singularity at s = 1, and this necessitates taking a branch of the logarithm which in turn makes the contour integration more difficult. Furthermore, some of the contours are not absolutely convergent, which further complicates their evaluation. In the above proof we avoided these difficulties by noting that the contours away from the branch point correspond to the error term in the Prime Number Theorem, and hence can be derived in a simple fashion from this observation. However, it should be mentioned that the error term in the Prime Number Theorem arises from the evaluation these contours. In essence, we have not avoided the task of evaluating these contours, but rather, we have invoked a theorem which allows us to avoid their explicit computation. Furthermore, note that by applying integration by parts to the integral in (3-9) that: ∫ ′ f (x) = ζ(2) 2 x ζ(2) x 2 t dt = +O log(t) 2 log(x) ( x2 log2 (x) ) we may re-verify our previous estimate of f ′ (x). Assuming the Riemann Hypothesis, all of the complex-valued singularities of log ζ(s) will have ℜ(s) = 12 . If one makes this assumption when evaluating the above contour integral, then we may improve our estimate to: ′ ∫ f (x) = ζ(2) 2 x t dt + O(x 3/2 log(x)), log(t) which is the best possible estimate using these methods. As f ′ (x) − f (x) = O(x 3/2 ) and f ′ (x) − f ∗ (x) = O(x 3/2 ) this also improves the error terms of f (x) and f ∗ (x). 83 Furthermore, it should be noted that improving the error term in the asymptotic estimate of f ′ (x) (or in the estimate of f (x) or f ∗ (x)) to O(x 3/2 log(x)) is equivalent to the Riemann hypothesis. Recall that Theorem 3.1.3 above, due to Alladi and Erdös, ∑ Pm (n) = o(x 3/2 ) for all m ≥ 2. This shows that sums of demonstrates that the sum n≤x this form are far smaller than even the best error terms for f (x), f ′ (x), and f ∗ (x), which is to be expected, as their combined sum (in many ways) will determine this error term. As was already stated, Alladi and Erdős derived the asymptotic results π2 x 2 f (x) = +O 12 log(x) ( x2 log2 (x) ) and ∑ π2 x 2 P1 (n) = +O 12 log(x) n≤x ( x2 log2 (x) ) in [2] using entirely elementary methods. Further results of this nature were also derived by Knuth and Pardo who, using only elementary methods, derived the asymptotic values for the mean and standard deviation of the largest prime factor of n, P1 (n). The following theorem is a restatement of this result, which we have included to provide a clearer picture of the functions under discussion. The emphasis of Knuth and Pardo in [14] is their algorithmic process, and for this reason they only sketch how one may derive their theorem. Of course, we wish to emphasize the mathematical aspects of their paper, so while the proof below is essentially due to Knuth and Pardo, we include certain details to make their proof more rigorous. Using Knuth and Pardo’s notation, let Φ(t) be the probability that P1 (n) ≤ t when n is in the range 1 ≤ n ≤ N. This functions can be identified with the Dickman function ρ(u) (see Definition: 3.2.5) discussed in the first section, and which will be discussed in log(N) greater length in section 3.2. In fact if one sets u = then for 1 ≤ u ≤ 2 we have log(t) ρ(u) = Φ(t). In [14] the authors demonstrate that: 84 ( Φ(t) = 1 − log for √ log(N) log(t) ) 1 + log(N) ∫ {u} du +O u2 N/t 1 ( 1 log(N) )2 N ≤ t ≤ N. Now, let ∫ ∫ N N t dΦ(t) = Φ(N)N − Φ(1) − k k Ek (P1 (n)) = k Φ(t)t k−1 dt, 1 1 that is, Ek((P1 (n)) is the kth)moment of P1 (n). Note that as the above integral from 1 to ∫ √N √ dΦ(t) = O(N k/2 ), it will be absorbed into the error term below. N is O N k/2 1 ( ) ζ(k + 1) N k Nk Theorem 3.1.9. Ek (P1 (n)) = +O . k + 1 √log(N) log2 (N) Proof: Ignoring the integral from 1 to N we are left with ∫ √ N = √ k 1 + log log(t) − log log(N) + tkd N ∫ k/2 N −k Φ(t)t k−1 dt, 1 ( N ∫ t dΦ(t) = Φ(N)N − Φ( N)N k ∫ √ N N 1 = √ t d(log log(t)) + log(N) N 1 log(N) ∫ 1 N/t {u} du +O u2 ( 1 log (N) )) 2 ) ( )k ∫ v ( N Nk {u} du d +O √ v u2 log2 (N) N 1 ∫ k 1 by replacing t by N/v in the second integral. The O-estimate is justified by the simple ∫b ∫b observation that if a f (t)dg(t) and a f (t)dh(t) exist, where h(t) = O(g(t)), and where both f and g are positive monotone functions on [a, b], then ∫ N √ N t k−1 dt = Nk log(t) Nk = log(N) (∫ 1 √ N ∫ √ N 1 dv + v k+1 ∫ dv v k+1 (log(N) − log(v )) √ 1 85 N log(v )dv k+1 v (log(N) − log(v )) ) Nk = +O k log(N) ( Nk log2 (N) ) . ∫ √ N Note that the second integral is −N k / log(N) times the integral 1 is within O(N −(k+1)/2 ) of ∫ 1 = ∞ ∑ {v } dv = v k+2 j≥1 ∑ (1 ( 1 = ∑( j≥1 = 1 − k j (j + 1)k k j≥1 {v } dv /v k+2 , which 1 k(k + 1) ( ) ∫ (v − j)dv v k+2 j+1 j j − k +1 1 1 − k j (j + 1)k ( ) 1 j k+1 1 − (j + 1)k+1 1 1 − k + 1 (j + 1)k+1 )) ) 1 1 1 ζ(k + 1) − (ζ(k + 1) − 1) = − k(k + 1) k + 1 k k +1 . Hence, we have shown that ζ(k + 1) N k Ek (P1 (n)) = +O k + 1 log(N) ( Nk log2 (N) ) , which is the desired result. This verifies the result of Alladi and Erdős that the first moment (the mean) of π2 N P1 (n) is asymptotically . Furthermore, the above theorem demonstrates 12 log(N) √ ζ(3) N √ that P1 (n) has the asymptotic standard deviation of (within a factor 3 log(N) of 1 + O(1/ log(N))). In [14] Knuth and Pardo also make the important observation that the ratio of the standard deviation to the mean deviates as N → ∞. This will be important for the analysis of their factorization algorithm (in Chapter 4) as it shows 86 that a traditional ”mean and variance” approach is unsuitable when dealing with such factorization algorithms. 3.2 Ψ(x, y ) It will be very informative to consider the most frequently occurring value for the largest prime factor P1 (n) of a given integer n, n ≤ x. This value cannot be obtained from the above estimates which give the average of the largest prime factors of integers n ≤ x. It turns out that the most frequently occurring value for the largest prime factor of an integer n ≤ x is far smaller than f (x)/x, owing to a small number of integers with very large prime factors which influence the estimate for f (x)/x (and make it much larger). In order to do this analysis we must introduce the function Ψ(x, y ): Definition 3.2.1. The Buchstab-de Bruijn function Ψ(x, y ) is defined to be the number of positive integers n ≤ x such that P1 (n) ≤ y . That is, Ψ(x, y ) is the number of integers less than x whose prime factors are less than y . log(x) As the value plays an important role in the behavior of this function, it is log(y ) customary and convenient to define u := log(x) , log(y ) provided x ≥ y ≥ 2; for the remainder of this section any reference to u will correspond to this definition. As was mentioned in the introduction, Ψ(x, y ) was first studied (in isolation) by S. Ramanujan and (publicly) by R. Rankin in [23], who used it to investigate the differences between prime numbers. However, Rankin’s results only pertained to the function Ψ(x, y ) in a rather limited range for the value of y . One year prior to Rankin’s publication A.A. Buchstab derived a very general recurrence formula in [6] to evaluate recurrences which commonly arise in sieve theory. In its most general form this formula can quickly become complicated to the point of losing much of its usefulness, but as it pertains to Ψ(x, y ) the formula is far less daunting. The following identity will be referred 87 to as Buchstab’s recurrence relation (the proof is fairly easy and can be found in [29]): for x ≥ 1, y > 0, we have Ψ(x, y ) = 1 + ∑ ( Ψ p≤y ) x ,p , p (3–10) where p ≤ n is, as usual, the prime numbers less than or equal to n. It was a recurrence of this sort (though in a different guise) which was discovered by Ramanujan, and whose rediscovery is recounted in the interesting article [27]. Iterating this recurrence gives us the following theorem, known as Buchstab’s identity, whose proof can also be found in [29]. Thus (2-26) implies: Theorem 3.2.2. For x ≥ 1, z ≥ y > 0, we have Ψ(x, y ) = Ψ(x, z) − ∑ y <p≤z ( Ψ x ,p p ) Up to this point there is no doubt some confusion about the naming convention of Ψ(x, y ) which we now hope to clarify; in 1951 N.G. de Bruijn exploited Buchstab’s identity to significantly improve the range of y for which Ψ(x, y ) would satisfy an asymptotic formula. Moreover, de Bruijn’s results all held uniformly, making Ψ(x, y ) into a rather convenient function to work with (for a given range of y ). At last we have a good historical justification for calling Ψ(x, y ) the Buchstab-de Bruijn function! The Buchstab-de Bruijn function has been the subject of much research in recent years, and we will only state some of the properties of Ψ(x, y ) necessary to answer the above question concerning the most frequently occurring value for the largest prime factor of an integer n. For our purposes we only require an asymptotic estimate for Ψ(x, y ) given by de Bruijn which holds for large values of y , namely, Theorem 3.2.3. [9]). Theorem 3.2.3. If x ≥ y ≥ 2 then, 88 ( ) Ψ(x, y ) = O xe −u/2 holds uniformly. It is worth noting that the value u can vary with x in the above theorem, which is one of the major reasons for the superiority of de Bruijn’s result over those of his predecessor’s. However, this result can be strengthened, as was demonstrated by de Bruijn in [9] that if x > 0, y ≥ 2, where log2 (x) ≤ y ≤ x and u defined as before, then Ψ(x, y ) < x log2 (y )e −u log(u+3)−u log log(u+3)+O(u) . (3–11) This estimate is obtained by noting that the sum over all the integers n such that P1 (n) ≤ y satisfies an Euler product given by ∑ P1 (n)≤y )−1 ∏( 1 1 = 1− s . ns p p≤y Then, for κ > 0 we may apply Perron’s formula to obtain: 1 Ψ(x, y ) = 2πi ∫ κ+i∞ ∏( κ−i∞ p≤y 1 1− s p )−1 xs ds, s which de Bruijn then evaluates, from which he obtains (3-11). For our purposes we only require the uniform estimate for Ψ(x, y ) given by Theorem 3.2.3, which holds for large values of y . We will now study the Dickman function, a function which frequently arises in the study of Ψ(x, y ) and which itself satisfies many interesting and useful equations. A study of the Dickman function, and in particular the rate of decay of this function, will help us to better understand the asymptotic relation in Theorem 3.2.3 log(x) Definition 3.2.4. Let u = such that 2 ≤ u ≤ 3, the Dickman function is defined as log(y ) ∫ ρ(u) := 1 − log(u) + log(v − 1) 2 89 u dv . v It may not be immediately clear how the Dickman function is related to the Buchstab-de Bruijn function, however, in the proof of theorem (3.1.8) what Knuth log(N) and Pardo refer to as Φ(t) is essentially ρ(u) with u = . The following theorem log(t) makes the connection between ρ(u) and Ψ(x, y ) more explicit (see [29], [9], and [27]) log(x) Theorem 3.2.5. For x ≥ y ≥ 2, and fixed u = , we have log(y ) Ψ(x, x 1/u ) = ρ(u). x→∞ x lim Hence, the above theorem demonstrates that for fixed u Ψ(x, x 1/u ) ∼ ρ(u)x. It is a fact that ρ(u) is uniquely defined by the initial condition ρ(u) = 1 for 0 ≤ u ≤ 1 and the recurrence ∫ ρ(u) = ρ(k) − u ρ(v − 1) k dv v for k < u ≤ k + 1. This property is deduced by applying Buchstab recurrence to the function Ψ(x, y ). This relationship implies that the Dickman function satisfies the following difference-differential equation uρ′ (u) + ρ(u − 1) = 0, (3–12) provided u > 1. Furthermore, the above properties of ρ(u) demonstrate that the Dickman function is differentiable for u > 1, and using Theorems 3.2.3 and 3.2.5 De Bruijn showed that for u > 3 ρ(u) = e −u log(u)−u log log(u)+O(u) . We are now in a position to answer the motivating question of this section. 90 (3–13) Theorem 3.2.6. For all ϵ > 0, the most frequently occurring value for P1 (n) for all n ≤ x √ √ lies between e (1−ϵ) log(x) log log(x) and e (1+ϵ) log(x) log log(x) . Proof: The number of integers m ≤ x with the property that P1 (m) = p is given by ( ) x Ψ , p , so we need to maximize this function. From equation (3-13) we may conclude p that ( Ψ x ,p p ) = x − log(x/p) x log(x) e log(p) +O(u) = e − log(p) +1+o(1) ; p p x − log(x) e log(t) +1 for t will give a rough estimate of the average t size of the largest prime factor of n, for n ≤ x. This is a simple optimization problem, hence, maximizing the function √ log(x) the greatest value of the above equation occurs when t = e log log(x) and inserting this √ (1+o(1)) log(x) log log(x) into the equation yields e , from which we may conclude the desired result. The function Ψ(x, y ) was generalized by Knuth and Pardo in [14], to Ψk (x, y ) = | {n ≤ x : Pk (n) ≤ y } |. Clearly Ψ1 (x, y ) = Ψ(x, y ), but there are several similarities between Ψ(x, y ) and Ψk (x, y ), namely, one may study these functions inductively. If α > 0 then ( α α Ψk (x , x) = ρk (α)x + O xα log(x α ) ) (3–14) where ρk (α) are functions analogous to the Dickman function in the case when k = 1. Knuth and Pardo demonstrate that the ρk (α) functions satisfy similar recurrence relations to ρ(u). If α > 1 and k ≥ 1 then, ∫ ρk (α) = 1 − α (ρk (t − 1) − ρk−1 (t − 1)) 1 for 0 < α ≤ 1 and k ≥ 1, ρk (α) = 1, 91 dt , t and if α ≤ 0 or k = 0 ρk (α) = 0. Furthermore, Knuth and Pardo demonstrate the following important asymptotic results, for k=2 there exist constants c0 , c1 , ..., cr such that ρ2 (α) = e γ (c 0 α + cr −1 ) c1 + ... + + O(α−r −1 ) α2 αr (3–15) and for k ≥ 3 we have ρk (α) = e k−2 (α) γ log α(k − 2)! ( +O logk−3 (α) α ) . (3–16) Equations (3-14), (3-15), and (3-16) show that there is a dramatic difference between the functions Ψk (x, y ) for k ≥ 2 and Ψ1 (x, y ), in particular, Ψk (x, y ) decreases much more rapidly for k = 1 than it does for k ≥ 2. As a simple example of this difference, consider the functions Ψ1 (x, 2) and Ψ2 (x, 2). As there are no primes p ≤ 2 except p = 2, it follows that Ψ1 (x, 2) will simply be a count of the number of powers of 2 less ] [ log(x) ; however, notice that every number of than or equal to x. Hence, Ψ1 (x, 2) = log(2) x the form 2p where p ≤ will be counted by the function Ψ2 (x, 2). By the prime number 2 x theorem this implies that there are about such numbers of the form 2p ≤ x, 2 log(x) x thus, ≤ Ψ2 (x, 2). Although a simple example, it is clear that Ψ1 (x, 2) grows far 2 log(x) more slowly as a function of x than does Ψ2 (x, 2). 3.3 Generalized Alladi-Duality There is an interesting duality between the kth largest and the kth smallest prime factors of an integer n, first noted by K. Alladi in [1], which will be of use in later observations concerning numerical factorization. Whereas Alladi’s treatment is entirely elementary, and holds for all arithmetic functions, the following proof will demonstrate Alladi’s duality analytically. We also note that although Alladi’s proof of the duality in 92 [1] generalizes to give the following result, he only supplies a proof of the principle for the special case of k = 1 (which is the duality amongst the largest and smallest prime factors of n). The only detriment to the analytic approach is that we must place bounds on the arithmetic functions being discussed to ensure the convergence of the Dirichlet series which they generate; however, this is a relatively minor restriction, and is in fact equivalent to the statement that the Dirichlet series generated by the arithmetic function has an abscissa of convergence σa ̸= −∞. Furthermore, as a bonus, we will derive a new representation for the function ζ(s). Let g(n) be an arithmetic function such that g(1) = 0 and let pk (n) and Pk (n) denote the kth smallest and kth largest prime factors of n, respectively (while these two values may coincide, the hope is that with this definition the notation will not cause any confusion). Furthermore, let µ(n) be Möbius’s number theoretic function. In this section, unless otherwise indicated, sums are to be taken for all n ≥ 2. Lemma 3.3.1. (Alladi) ∑ ( µ(d)g(Pk (d)) = (−1) k d|n ∑ ( µ(d)g(pk (d)) = (−1) k d|n ∑ d|n ∑ d|n ) ω(n) − 1 g(p1 (n)), k −1 ) ω(n) − 1 g(P1 (n)), k −1 ( ) ω(d) − 1 µ(d) g(P1 (d)) = (−1)k g(pk (n)), k −1 ) ω(d) − 1 g(p1 (d)) = (−1)k g(Pk (n)), µ(d) k −1 ( in particular, ∑ µ(d)g(P1 (d)) = −g(p1 (n)), d|n 93 and ∑ µ(d)g(p1 (d)) = −g(P1 (n)). d|n Proof: Consider the Dirichlet series generated by µ(n)z ω(n) g(p1 (n)), for |z| < 1, and the easy identity: E (z; s) = ∞ ∑ µ(n)z ω(n) g(p1 (n)) ns n=1 =− ∑ g(p) ∏ ( ps p q>p z 1− s q ) , (3–17) subject only to the restriction that g(n) grows at a rate where the Dirichlet series under consideration converges for all s such that ℜ(s) > σa . This identity is valid for all s ∈ C, ℜ(s) > σa , where σa is the abscissa of absolute convergence of the Dirichlet series ∞ ∑ µ(n)g(p1 (n)) , provided |z| ≤ 1. Therefore, s n n=1 ( ) ∑ 1 ∂ ω(d) − 1 ω(d)−k 1 ζ(s)E (z; s) = µ(d) z g(p1 ) s . k (k − 1)! ∂z k −1 n n=1 ∞ ∑ k (3–18) d|n However, ζ(s)E (z; s) = − [ ∑ g(p) ∏ ( ps p =− ∑ g(p) ∏ ( p ps r ≤p =− q>p 1 1− s r z 1− s q )−1 ∏ ( q>p ∑ g(p) ∏ ( p ps r ≤p )( 1 1− s q )−1 ] ∏ ( r ≤p 1 1− s r )−1 (3–19) ) 1 1 z z 1 + s + 2s + ... − s − 2s − ... q q q q 1 1− s r )−1 ∑ (−z)ω(m)−1 ; ms p1 (m)>p hence, )−1 ∑ g(p) ∏ ( 1 ∂k 1 k ζ(s)E (z; s) = (−1) 1− s lim z→1− (k − 1)! ∂z k p s r ≤p r p ∑ ω(m)≥k−1,p1 (m)>p 1 ms (3–20) 94 = (−1)k ∞ ∑ g(Pk (n)) ns n=1 . By equating (3-18) and (3-20) we see that (−1)k ∞ ∑ n=1 ( ) ∑ g(Pk (n)) ω(d) − 1 1. = µ(d) g(p (d)) 1 ns k −1 ns n=1 ∞ ∑ d|n Now, from the uniqueness of Dirichlet series we may equate the corresponding coefficients of (3-18) and (3-20) in the above equality to conclude that (−1)k g(Pk (n)) = ( ) ∑ ω(d) − 1 µ(d) g(p1 (d)). Proving the fourth identity. k −1 d|n Similarly, consider the Dirichlet series generated by µ(n)z ω(n) g(P1 (n)) ∞ ∑ µ(n)z ω(n) g(P1 (n)) D(z; s) := ns n=1 =− ∑ g(p) ∏ ( ps p q≤p z 1− s q ) ; (3–21) then, ( ) ∑ ω(d) − 1 1 1 ∂ µ(d)z ω(d)−k g(P1 (n)) s . ζ(s)D(z; s) = k k −1 (k − 1)! ∂z n n=1 ∞ ∑ k (3–22) d|n However, ζ(s)D(z; s) = − ∑ g(p) ∏ ( ps p =− r >p 1 1− s r ∑ g(p) ∏ ( p ps r >p )−1 ∏ ( 1 1− s r q≤p z 1− s q )( 1 1− s q )−1 ∑ (−z)ω(m)−1 , ms P1 (m)≤p and taking the limit as z → 1− by (3-21), (3-22), and (3-23) we have 95 )−1 (3–23) )−1 ∑ g(p) ∏ ( 1 1 ∂k k lim ζ(s)D(z; s) = (−1) 1− s s z→1− (k − 1)! ∂z k p r p r >p ∑ ω(m)≥k−1,P1 (m)<p 1 ms (3–24) k = (−1) ∞ ∑ g(pk (n)) ns n=1 . Now, equating the two Dirichlet series given by (3-23) (by differentiating and taking the limit) and (3-24): (−1)k ∞ ∑ g(pk (n)) ns n=1 ( ) ∑ 1 ω(d) − 1 = µ(d)z ω(d)−k g(P1 (n)) s , n k −1 n=1 ∞ ∑ d|n from which we may deduce that ∑ ( µ(d)z ω(d)−k d|n ) ω(d) − 1 g(P1 (d)) = (−1)k g(pk (n)) k −1 which proves the third identity. The first two identities follow by applying Möbius inversion to the last two identities, as this process is relatively straightforward we will only carry out the inversion procedure for the third identity. Note that as µ(n) is supported on the square-free integers, we need only carry out the inversion procedure over square-free numbers; hence, µ(n/d) = µ(n)µ(d), as this holds generally for square-free n. Then, as ∑ d|n ( ) ω(d) − 1 µ(d) g(P1 (n)) = (−1)k g(pk (n)) k −1 we may invert to obtain ( ) ∑ ω(d) − 1 µ(n) g(P1 (n)) = µ(n/d)(−1)k g(pk (d)) k −1 d|n 96 = µ(n)(−1)k ∑ µ(d)g(pk (d)); d|n thus, ( ) ∑ ω(d) − 1 (−1 ) g(P1 (n)) = µ(d)g(pk (d)) k −1 k d|n which is the second identity. The first identity follows similarly, proving the lemma. The next lemma can also be viewed as a generalization of Alladi’s duality principle for subsets Ω ⊂ P where P denotes the set of all prime numbers. Note that as we have already proved the Alladi duality identities for all n the following lemma follows trivially if we consider only sums which are taken over the integers n ≥ 2 such that if a prime p|n then p ∈ Ω. Lemma 3.3.2. −g(p1 (n)) = ∑ µ(d)g(P1 (d)) d|n and −g(P1 (n)) = ∑ µ(d)g(p1 (n)), d|n where it is understood that g(n) is a bounded function (in the sense of Lemma 3.3.1) and the sums are taken over the integers n ≥ 2 such that if p|n then p ∈ Ω. From Lemma 3.3.2 and the properties of ζ(s) we may estimate sums of the form ∑ g(p1 (n)) using Theorem 2.2.1 (Delange’s theorem). Thus, n≤x,P1 ∈Ω ∑ 1 = Cx + o(x) n≤x,P1 (n)∈Ω if and only if ∑ µ(n)g(p1 (n)) = −C < +∞; ns p1 (n)∈Ω Now consider 97 ( )−1 ∑ µ(n) ∑ 1 1 ∑ 1 ∏ 1 1 = − 1 − = − ns ζ(s) p∈Ω p s q≤p ps ζ(s) ns p1 (n)∈Ω (3–25) P1 (n)∈Ω for s ∈ C, ℜ(s) > 1. From Theorem 2.2.1 it follows that ∑ 1 = e1 x + o(x) n≤x,p1 (n)∈Ω if and only if −e1 = ∑ µ(n) , n P1 (n)∈Ω and ∑ 1 = d1 x + o(x) n≤x,p1 (n)∈Ω if and only if −d1 = ∑ µ(n) . n p1 (n)∈Ω In particular, if Ω = P then the above identities become: ) ∑ 1 ∏( 1 1 −1=− 1− s s ζ(s) p q p q<p or ) ∑ 1 ∏( 1 1 1− = 1− s . ζ(s) p s q<p q p From Tchebyschev’s estimate pn = O(n log(n)), where pn denotes the nth prime 1 number, and the fact that lim+ = 0 we see that: s→1 ζ(s) O (∞ ∑ n=2 )) ∏( 1 1 < +∞, 1− n log(n) q<p q n 98 (3–26) ∏( ) 1 1 and as a result of (3-21), 1− << , which is a weak form of Mertens’ q log log(n) q<pn theorem. If we use Mertens’ theorem directly (see [29]): ∏( q<pn 1 1− q ) ( =O 1 log(n) ) then the convergence of ) ∑ 1 ∏( 1 1 1− 1− s = s ζ(s) p q n p q<p n n x . Of log log(x) course, this result is much weaker than the prime number theorem derived in Chapter 2. as s → 1+ implies that pn >> n log log(n); equivalently, π(x) << However, for such a nontrivial result its derivation is remarkably simple. It is worth mentioning the result of Alladi [1] that if we allow our set Ω in Lemma 3.3.2 to be the set of primes p for which p ≡ l(m) and then apply this lemma for any l, m ∈ Z, (l, m) = 1, ∑ p1 (n)≡l(m) µ(n) 1 1 =− = − lim+ s→1 ζ(s) n ϕ(n) ∑ P1 (n)≡l(m) 1 ns where, again, it is understood that the sum is taken for all n ≥ 2; hence, for s near 1, ∑ P1 (n)≡l(m) 1 ζ(s) ∼ . s n ϕ(m) We further remark that ∑ p1 (n)≡l(m) 1 µ(n) =− n ϕ(m) (3–27) follows as a consequence of the prime number theorem for arithmetic progressions, see [1]. That is, if πl,m (x) denotes the number of primes p ≤ x such that p ≡ l(m) then the prime number theorem for arithmetic progressions states that 99 1 x πl,m (x) = +O ϕ(m) log(x) ( x log2 (x) ) . It is speculated by Alladi in [1], and this author, that equation (3-22) is elementarily equivalent to the prime number theorem for arithmetic progressions. However, at present this is only a conjecture. This completes Chapter 3 and the theoretical results necessary to analyze the factorization algorithm of Knuth and Pardo. 100 CHAPTER 4 THE KNUTH-PARDO ALGORITHM The algorithm introduced by Knuth and Pardo in their 1976 paper [14] is perhaps the simplest way to factor an integer algorithmically. The essential idea rests in attempting to divide an integer n by various trial prime divisors such as 2, 3, 5, ... and then ”casting out” all factors which are discovered, and then repeating the process. As √ this process will discover all prime divisors less than n, the algorithm will terminate when the trial divisors exceed the square root of the remaining un-factored part. √ The reason why we may restrict our attention to factors less then n is that if n is √ composite, then it must have a prime divisor ≤ n; or, stated another way, half of the √ divisors of an integer n must be ≤ n, with the other half of the divisors being obtained √ √ by evaluating n/d, for d|n, d ≤ n. Also, if p > n, and p|n, then p 2 will not divide n. The speed of this algorithm is intimately related to the size of the prime factors of n. √ If n is itself a prime number then there will be approximately n trial divisions (of course, we do not know a priori whether or not n is prime); whereas, if n is a power of 2 (i.e. n = 2a for some positive integer a) the number of trial divisions will be O(log(n)). Knuth and Pardo consider a random integer n and determine the approximate probability that the number of trial divisions is ≤ nx with 0 ≤ x ≤ 1/2. They then demonstrate that the number of trial divisions will be ≤ n0.35 about half of the time (for these results see [14]). Knuth and Pardo reach their conclusion by analyzing the kth largest prime factor of an integer, and then determine the running time of their algorithm by the size of the largest two prime factors. We will now introduce their standard ”divide and factor” algorithm. For n ≥ 2 let n = p1 (n)p2 (n)...pt (n)m, where p1 (n), ..., pt (n) are prime numbers listed in non-decreasing order and m ≥ d with all prime factors of m greater than or equal to d, i.e. pi (m) ≥ d. It is understood that we have such a list of the prime numbers which are relevant to the algorithm, and hence we do not need to add more time to the 101 algorithm by determining the trial prime divisors. What follows is an informal ALGOL-like description of the algorithm, supplied in [14] by the authors: t := 0, m := n, d := 2 While: d 2 ≤ m do Begin: increase d or decrease m; If d|m then Begin: t := t + 1 pt (n) := d m := m/d end Else: d := d + 1 end t := t + 1; pt (n) = m; m := 1; d := 1 If we denote D as the number of trial divisions performed and T as the number of prime factors of n (counting multiplicity) then the above algorithm’s while-loop requires approximately D + 1 operations, the if-loop requires D operations, the begin-loop requires T − 1 operations, and the Else-loop requires D − T + 1 operations. Knuth and Pardo remark in [14] that their algorithm can be refined in several simple ways by avoiding large numbers of non-prime divisors, for instance, if d > 3 we may consider prime divisors of the form 6k ± 1. This has the effect of dividing the number of trial divisions performed, D, by a constant. They further comment that the analysis of the simple case applies to more complicated settings with only minor variations. Let Pk (n) be the kth largest prime factor of n; therefore, Pk (n) = pT +1−k (n) after the algorithm terminates, with 1 ≤ k ≤ T . If n has less than k-prime factors then let Pk (n) = 1, and for convenience let P0 (n) = ∞. Knuth and Pardo observe in [14] that 102 the while-loop in the algorithm can terminate in three different ways depending upon the final inputs into the loop: Case 1: If n < 4 then D = 0, as d = 2 implies that d 2 > n; hence, the algorithm will terminate. Case 2: If n ≥ 4 and the Dth trial division succeeds. In this case the final trial division was by d = P2 (n), where d 2 > P1 (n). As d is initially 2 the operation d := d + 1 is performed D − T + 1 times, and hence D − T + 1 = P2 (n) − 2 or D = P2 (n) + T − 3 for P2 (n)2 > P1 (n). Case 3: If n ≥ 4 and the Dth trial division fails, then the final trial division was by d, where P2 (n) ≤ d, d 2 < P1 (n), and (d + 1)2 > P1 (n). Thus: D= ⌈√ ⌉ P1 (n) + T − 3 (4–1) where P2 (n)2 < P1 (n). Note that in all three cases we have D = max(P2 (n), ⌈√ ⌉ P1 (n) ) + T − 3. (4–2) We will now present their largely heuristic derivation that the limiting distribution of the probability that the kth largest prime factor of an integer ≤ N x exists. In their paper, it is Knuth and Pardo’s desire to analyze D, and to that end they analyze the distributions of P1 (n) and P2 (n). In fact, they go a good deal further and consider the distributions of Pk (n) in general. Let Probk (x, N) = | {n : 1 ≤ n ≤ N, Pk (n) ≤ N x } |, where x ≥ 0. Hence, Probk (x, N)/N is the probability that a random integer between 1 and N will have kth Probk (x, N) largest prime factor ≤ N x . In [14] Knuth and Pardo demonstrate that lim = N→∞ N Fk (x) exists, and derive various properties of the function Fk (x). The following is their heuristic argument that Fk (x) exists, analogous to the exposition by Dickman in [10]: 103 Probk (x, N) = Fk (x) exists. N→∞ N ”Proof:” Consider Probk (t + dt, N) − Probk (t, N), the number of n ≤ N where N t ≤ Theorem 4.0.3. The limit lim Pk (n) ≤ N t+dt , where dt is small. To count the number of such n we take all primes such that N t ≤ p ≤ N t+dt and multiply by all numbers m ≤ N 1−t such that Pk (m) ≤ p and Pk−1 (m) ≥ p. If n = mp then n ≤ N t+dt and Pk (n) = p; conversely, every n ≤ N with N t ≤ Pk (n) ≤ N t+dt will have the form n = mp for the above stated p and m. Note that the number of such m ≤ N 1−t such that Pk (m) ≤ p is approximately t Probk ( 1−t , N 1−t ), the unwanted subset consisting of those m such that Pk−1 (m) < p has t approximately Probk−1 ( 1−t , N 1−t ) members. It follows that the number of such m with mp ≤ N and Pk (m) ≤ p and Pk−1 (m) ≥ p is ( Probk t , N 1−t 1−t ) ( − Probk−1 ) t 1−t ,N . 1−t (4–3) Ignoring second-order terms gives: Probk (t+dt, N)−Probkl,m (t, N) [ ] [ ] t t t+dt t 1−t 1−t ≈ π(N ) − π(N ) Probk ( , N ) − Probk−1 ( ,N ) 1−t 1−t (4–4) where π(x) is the function counting the number of primes p ≤ x. By the Prime Number ( ) x x +O ; hence, using π(N t+dt ) − π(N t ) ≈ N t dtt in (4-4) Theorem, π(x) = log(x) log2 (x) yields Probk (t + dt, N) − Probk (t, N) dt ≈ N t ( Probk ( t , N 1−t 1−t N 1−t ) ( t )) Probk−1 1−t , N 1−t − (4–5) N 1−t as N → ∞ equation (4-5) gives the differential equation: Fk′ (t)dt dt ≈ t ( ( Fk t 1−t 104 ) ( − Fk−1 t 1−t )) . (4–6) As Fk (0) = 0 we may integrate equation (4-6) to obtain: ∫ ∞ ( ( Fk Fk (x) = 0 t 1−t ) ( − Fk−1 t 1−t )) dt . t (4–7) According to the convention P0 (n) = ∞ we define F0 (x) = 0, for all x. We must also have Fk (x) = 1 for x ≥ 1, k ≥ 1. Note that equation (4-7), together with these initial conditions, uniquely determine Fk (x) for 0 ≤ x ≤ 1, and as we also have the equation: ∫ 1 Fk (x) = 1 − x dt t ( ( Fk t 1−t ) ( − Fk−1 t 1−t )) (4–8) for 0 ≤ x ≤ 1, Fk (x) is also uniquely defined in terms of its values at points > x. Hence the limit is well-defined and therefore exists. We now return to the generalized functions Ψk (x, y ) to better enable us to understand the Knuth-Pardo algorithm. In their paper, Knuth and Pardo use the kth moment calculated in Theorem 3.1.9 to deduce many useful properties about Ψk (x, y ) and, consequently, to derive better results on Pk (n). It should be clear that Probk (x, N) = Ψk (N, N x ) so that Ψk (N, N x ) ∼ Fk (x)N. By analyzing the values Ek (P1 (n)) (as in Theorem 3.1.9) Knuth and Pardo go on to show that ( α α Ψk (x , x) = ρk (α)x + O for all fixed α > 1. 105 xα log(x) ) To close the discussion concerning this factorization algorithm, we will make several remarks about the model. For one thing, the model is largely probabilistic in that it considers a random n between 1 and N, and for relations such as nk ≤ N x ; however, the authors note in [14] that from an intuitive standpoint it may be more natural to ask for the probability of a relation such as nk ≤ nx without considering N. Furthermore, they comment that it is quite easy to convert from the one model to the other as most numbers between 1 and N are large. To make this discussion more precise, the authors of [14] consider the number of ( ) integers n, 12 N ≤ n ≤ N, such that Pk (n) ≤ N x . This is: Probk (x, N) − Probk x, N2 = 1 NFk (x) 2 + O (N/ log(N)); which easily follows from: Probk (x, N) = NFk (x) + O (N/ log(N)). Furthermore, consider how many of these n have nx < Pk (n) ≤ N x . ( ) log(2) log(2) The latter relation implies that N k ≥ Pk (n) > ( 12 N)x = N x− log(N) , and Fk x − log(N) = Fk (x) + O(1/ log(N)), as Fk (x) is differentiable. Hence, the number of such n is at most ( Probk (x, N) − Probk log(2) ,N x− log(N) ) ( =O N log(N) ) (4–9) , where the constant implied by the O-term in (4-9) is independent of x in a bounded ) ( N region about x. Hence, we have shown that Fk (x) + O log(N) for all n such that 1 N 2 ≤ n ≤ N satisfy Pk (n) ≤ nx . Therefore, if Qk (x, N) denotes the total number of n ≤ N such that Pk (n) ≤ nx , we have: Qk (x, N) = ∑ 1≤j≤log2 log(N) N 2j ( ( = NFk (x) + O N log(N) ( Fk (x) + O log ( by dividing the range −1 N log(N) N 2j ))) ( +O N log(N) ) ) , ≤ n ≤ N into log2 log(N) parts. Define the probability of a statement S(n) about the positive integer n by the formula: 106 (4–10) 1 | {n : n ≤ N s.t. S(n) is true } | N→∞ N Pr (S(n)) = lim (4–11) when the limit exists. Hence we may conclude that Pr (Pk (n) ≤ nx ) = Fk (x), for all fixed x. Another important observation concerning the theoretical model used in this paper is that results were stated in terms of the probability that the number of operations performed is ≤ N x (or nx ). Typically it is customary to approach the average number of operations of an algorithm in terms of mean values and standard deviations; however, this approach appears to be particularly uninformative for this algorithm. Knuth and Pardo comment in [14] that this phenomenon is apparent when considering the average number of operations performed over all n ≤ N, which will be relatively near the worst case n1/2 ; however, in more than 70 percent of the cases the actual number of operations performed will be less than n0.4 . Another reason why the typical mean-variance approach is uninformative is because, as was noted in Theorem 3.1.4 of Chapter III, the ratio of the standard deviation of the kth prime factor to its standard deviation is a divergent quantity. Another point, which is worth noting, is that (as the name suggests) the Knuth-Pardo algorithm is a very simple algorithm, and in recent years more efficient algorithms have been introduced (such as the elliptic curve [17] and quadratic sieve method [20]) which √ (1+o(1) log(n) log log(n)) . These can deterministically factor integers n with running time e algorithms render much of our analysis superfluous, as it will take far fewer steps to factor an integer completely using these methods than by utilizing the Knuth-Pardo algorithm. In closing, we note that there are some interesting avenues for future research using the results of this thesis. Alladi in [1] showed that for l and m relatively prime, we have equation (3-27), which is the identity 107 ∑ p1 (n)≡l(m) µ(n) 1 =− n ϕ(m) where, as in section 3.3, the sum is to be taken over all n ≥ 2. This identity follows as a consequence of the Prime Number Theorem for Arithmetic Progressions, the case k = 1 in Lemma 3.3.1, and some results on Ψ(x, y ). Now that Knuth and Pardo have supplied us with results on the more general functions Ψk (x, y ), it would be interesting to see whether further results would follow if we could take similar sums over those integers n ≥ 2 such that pk (n) ≡ l(m). However, as was noted in section 3.2, there is a significant difference in the behavior of Ψ1 (x, y ) and Ψk (x, y ) when k ≥ 2. In particular, Ψ1 (x, y ) = O(xe −u ) decays exponentially, whereas Ψk (x, y ) ∼ ρk (u)x and for k ≥ 2 equations (3-14) and (3-15) show that the functions ρk (u) do not decay nearly as rapidly. It would be interesting to see what consequences this would have for sums similar to (3-27). 108 REFERENCES [1] Krishnaswami Alladi, Duality between Prime Factors and an Application to the Prime Number Theorem for Arithmetic Progressions, Journal of Number Theory, vol. 9 (1977), p. 436–451. [2] K. Alladi, P. Erdős, On an Additive Arithmetic Function, Pacific Journal of Mathematics, vol. 71 (1977), no. 2, p. 275–294. [3] K. Alladi, P. Erdős, On the Aymptotic Behavior of Large Prime Factors of Integers, Pacific Journal of Mathematics, vol. 82 (1979), no. 2, p. 295-315. [4] Raymond Ayoub, An Introduction to the Analytic Theory of Numbers, Mathematical Surveys, no. 10, 1963. [5] Carl B. Boyer, A History of Mathematics John Wiley and Sons, 1968. [6] A.A. Buchstab, An asymptotic estimation of a general number-theoretic function, Mat. Sbornik, vol. 2 (1937), no. 44, p. 1239-1246. [7] H. M. Edwards, Riemann’s Zeta Function, Dover Publications, Inc., 2001. ∑1 [8] P. Erdős, Über die Reihe , Mathematica, Zutphen B 7 (1938), p. 1-2. p [9] N.G. de Bruijn, On the number of positive integers ≤ x and free of prime factors > y , Indag. Math., vol. 13 (1951), p. 50-60. [10] Karl Dickman, On the frequency of numbers containing prime factors of a certain relative magnitude, Ark. Mat., Astronomi och Fysic 22A (1930), 10, p. 1-14. [11] G.H. Hardy, S. Ramanujan, On the normal number of prime factors of a number n, Quarterly Journal of Mathematics, Oxford, vol. 48 (1917), p. 76-92. [12] Kenneth Ireland, Michael Rosen, A Classical Introduction to Modern Number Theory, Springer-Verlag, 1990. [13] I. Martin Isaacs, Algebra, a Graduate Course, Brooks/Cole Publishing Company, 1994. [14] Donald E. Knuth, Luis Trabb Pardo, Analysis of a Simple Factorization Algorithm, Theoretical Computer Science, vol. 3 (1976), no. 3, p. 321–348. [15] Jean-Marie De Koninck, Andrew Granville, and Florian Luca (editors), Anatomy of Integers, American Mathematical Society, 2008. [16] Edmund Landau, Handbuch der Lehre von der Verteilung der Primzahlen, Leipzig: Teubner, 1909. Reprinted by Chelsea, 1953. [17] H.W. Lenstra Jr., Factoring integers with elliptic curves, Annals of Mathematics, vol. 123 (1987), issue 3, p. 649-673. 109 [18] Hans Von Mangoldt Zu Riemann’s Abhandlung ’Ueber die Anzahl der Primzahlen unter einer gegebenen Grösse, Journal für die reine und angewandt Mathematik, BD. 114 (1895), 225-305. [19] Oskar Perron, Zur Theorie der Dirichletschen Reihen, Journal für die reine und angewandte Mathematik, BD. 134 (1908), 95-143. [20] Carl Pomerance, The quadratic sieve factoring algorithm, Advances in Cryptology, Proc. Eurocrypt ’84. Lecture Notes in Computer Science. Springer-Verlag, 1985, p. 169-182. [21] Jean Jacod and Philip Protter, Probability Essentials, second edition, Springer-Verlag, 2004. [22] V. Ramaswami, The number of positive integers < x and free of prime divisors > x c , and a problem of S.S. Pillai, Duke Math. J., vol. 16 (1949), p.99-109. [23] R. Rankin, The difference between consecutive prime numbers, J. London Math. Soc, vol. 13 (1938), p.242-247. [24] L.G. Sathe, On a problem of Hardy on the distribution of integers having a given number of prime factors I-IV, J. Indian Math. Soc., vol. 17 (1953), p.63-82, 83-141; vol. 18 (1954), p. 27-42, 43-81. [25] Bruce Schechter, My brain is open: the mathematical journeys of Paul Erdős, Simon and Schuster, Inc., 1998. [26] A. Selberg, Note on a paper by L.G. Sathe, J. Indian Math. Soc., vol. 18 (1954), p. 83-87. [27] Kannan Soundararajan An asymptotic expansion related to the Dickman function, Ramanujan Journal, vol. 29 (2012) (to appear). [28] J.J. Sylvester On Tchebycheff’s theorem of the totality of prime numbers comprised within given limits, Amer. J. Math., vol. 4 (1881), p.230-247. [29] Gérald Tenenbaum, Introduction to analytic and probabilistic number theory, Cambridge University Press, 1995. [30] E.C. Titchmarsh, The Theory of the Riemann Zeta-function, Oxford, Clarendon Press, 1951. 110 BIOGRAPHICAL SKETCH Todd Molnar graduated from the University of Delaware in 2007 with a B.S. in mathematics and economics. He has been a graduate student at the University of Florida since 2008. 111