Math 105 notes, week 3 C. Pomerance 1 The convolution inverse of an arithmetic function [I did not lecture about this, but you may find it useful.] If f is an arithmetic function, then it’s (convolution) inverse is an arithmetic function g such that f ∗ g = I. If g exists, it is unique. Indeed, if both f ∗ g = I and f ∗ h = I, then g = g ∗ I = g ∗ f ∗ h = f ∗ g ∗ h = I ∗ h = h. Good, but does an inverse g of f have to exist? A necessary condition is that f (1) 6= 0. Indeed, if g is the inverse of f , then 1 = I(1) = (f ∗ g)(1) = f (1)g(1). We now show that this necessary condition is also sufficient. We assume that f (1) 6= 0 and we try and solve for g. What this means is that we are solving for infinitely many unknowns: g(1), g(2), . . . . From the above, we see that the first unknown g(1) is 1/f (1), so at least that part is solved. Suppose that it is solved for numbers below n, where n ≥ 2. That is, we are supposing that we know the values g(1), g(2), . . . , g(n − 1), and for each number m with 2 ≤ m ≤ n − 1, we have (f ∗ g)(m) = 0. We would like to adjust g so that also (f ∗ g)(n) = 0, but we don’t want to mess up the earlier equations (f ∗ g)(m) = 0. So, the earlier ones are not affected if we do not change the values of g(1), g(2), . . . , g(n − 1) already chosen. To find g(n), we have X X f (a)g(b), (1) f (a)g(b) = f (1)g(n) + 0 = (f ∗ g)(n) = ab=n b<n ab=n where we have only singled out the one term in the sum with b = n. So, in this equation, think of what’s known and what’s unknown. The left side is 0, so that’s known. The values of f are given to us, so they’re known. The values of g with arguments smaller than n are assumed to be known. The only unknown in the equation is g(n). So, (1) is a linear equation in this unknown, where it appears a single time multiplied by a nonzero number. And so there is a unique choice for g(n) that makes the equation valid. We have proved the following. Proposition 1. An arithmetic function f has an inverse under convolution if and only if f (1) 6= 0. Moreover, if it has an inverse, the inverse is unique. Part of the definition of a multiplicative function f is that it is not the zero function. This implies that f (1) = 1, since if for some n we have f (n) 6= 0, then f (n) = f (1 · n) = f (1)f (n), so f (1) = 1. Thus, Proposition 1 implies that a multiplicative function has an inverse under convolution. But must the inverse itself be a multiplicative function? The answer is yes. 1 Proposition 2. If f is a multiplicative function, then f has a multiplicative inverse, and it is also multiplicative. Proof. We have seen that f has an inverse, call it g. The proof that g exists shows that g(1) = 1/f (1), so that g(1) = 1. Thus, g(1 · 1) is the same as g(1)g(1), which is square one, so to speak for showing that g is multiplicative. Suppose that K > 1 and that whenever ab < K with gcd(a, b) = 1, then g(ab) = g(a)g(b). Suppose too that mn = K with gcd(m, n) = 1. We’ll show that g(mn) = g(m)g(n), and so by induction, g is multiplicative. From (1), we have X f (a)g(b). 0 = I(K) = I(mn) = (f ∗ g)(mn) = g(mn) + ab=mn b<mn As in the proof that the convolution of multiplicative functions is multiplicative, we write a = a1 a2 , b = b1 b2 so that a1 b1 = m, a2 b2 = n. Thus, X X f (a1 a2 )g(b1 b2 ). f (a)g(b) = − g(mn) = − a1 b1 =m a2 b2 =n b1 b2 <mn ab=mn b<mn By the induction hypothesis g(b1 b2 ) = g(b1 )g(b2 ), and we are assuming that f is multiplicative, so that f (a1 a2 ) = f (a1 )f (a2 ). Thus, X X f (a1 )f (a2 )g(b1 )g(b2 ) f (a1 a2 )g(b1 b2 ) = − g(mn) = − a1 b1 =m a2 b2 =n b1 b2 <mn a1 b1 =m a2 b2 =n b1 b2 <mn =− X f (a1 )f (a2 )g(b1 )g(b2 ) + g(m)g(n) a1 b1 =m a2 b2 =n = −(f ∗ g)(m)(f ∗ g)(n) + g(m)g(n) = −I(m)I(n) + g(m)g(n). Either m or n is larger than 1, so the last expression reduces to g(m)g(n). That is, g(mn) = g(m)g(n), and our inductive proof that g is multiplicative is complete. Corollary 1. If f, g, h are arithmetic function with f ∗ g = h, then if any two of f, g, h are multiplicative, so is the third. 2 Writing a characteristic function as a sum If S is a set of natural numbers, let fS (n) be 1 if n ∈ S and fS (n) = 0 if n 6∈ S. This is the characteristic function of S. (In analysis, the characteristic function of a subset of the real numbers is defined similarly, but the domain is the set of reals.) Sometimes fS is called the indicator function of S. 2 An example: S = {1} and fS = I. Even this simple example can be useful, if we realize that I = µ ∗ u. Let us use this to prove that ϕ = µ ∗ E. We have X ϕ(N) = 1, n≤N gcd(n,N )=1 this is essentially the definition of ϕ. We have X I(gcd(n, N)) = µ(d) = d| gcd(n,N ) X µ(d), d|n, d|N since every common divisor of d and N divides the greatest common divisor of d and N. Thus, using the definition above of ϕ, X X X X X X N N = µ(d) . ϕ(N) = µ(d) = µ(d) 1= µ(d) d d n≤N n≤N d|n, d|N d|N d|n d|N d|N This is precisely the assertion that ϕ = µ ∗ E. Here’s a less trivial example. For a positive integer n, let n = a2 b, where a is the largest square dividing n. Then from elementary number theory, d2 | n if and only if d2 | a2 if and only if d | a. Thus, X X µ(d) = µ(d) = I(a). d2 |n d|a In particular, this sum is 1 if n is squarefree (meaning, the largest square divisor of n is 1), and it is 0 if n is not squarefree. Thus, if Q denotes the set of squarefree numbers, then fQ , the characteristic function of Q satisfies X fQ (n) = µ(d). d2 |n There are two other formulas that give fQ (n), namely µ(n)2 and |µ(n)|, but the more complicated formula above, since it’s written as a sum, is quite useful. Say we wish to count the number of squarefree numbers in [1, x]. This is jxk x X X X X XX µ(d) = 1= fQ (n) = µ(d) 2 = µ(d) 2 + O(1) . d d √ √ n≤x n≤x n≤x 2 d |n n∈Q d≤ x d≤ x √ x) since the summands are bounded and there are at most The sum of the O(1) terms is O( √ x of them. For the main part, it is x ∞ X X µ(d) X µ(d) µ(d) = x − x . 2 2 d2 d d √ √ d=1 d> x d≤ x 3 The sum from 1 to infinity is 6/π 2 , as we saw in week 2 in connection with the √ average order of √ ϕ. We also saw there how to estimate the tail sum over d > x: it is O(1/ x). Putting these things together, we have that the number of squarefree numbers in [1, x] is √ 6 x + O( x). π2 √ Getting a sharper error estimate than O( x) is a hard problem, and though progress has been made, it is still not known what the “truth” is. We shall see later that writing a characteristic function as a sum will be a key tool in proving Dirichlet’s theorem that a residue class a (mod n), where gcd(a, n) = 1, contains infinitely many primes. 3 Dirichlet series For an arithmetic function f , consider the series L(s, f ) = ∞ X f (n) n=1 ns . Here you can take “s” as a complex variable, but that may be too scary at the moment, so let’s take it to be a real variable. An issue would be if the series converges. This will depend on the choice of s and the choice of the arithmetic function f . For example, if f (n) = n! or f (n) = nn , it never converges. For f (n) = 1, that is, f = u, then it converges for s > 1 and diverges for s ≤ 1. In general, there will be some number s0 such that the series for L(s, f ) converges for s > s0 , diverges for s < s0 , and it’s a gray area for s = s0 . The number “s0 ” is known as the “abscissa of convergence”. So for f = u, it is 1, and for f (n) = (−1)n , it is 0. Can you show the following? If L(s1 , f ) converges and if s > s1 , then L(s, f ) converges. Hint: Use partial summation. What do you get when you multiply L(s, f ) and L(s, g)? Let’s work it out: ∞ ∞ X f (m) X g(n) X f (m)g(n) L(s, f )L(s, g) = = m, n ≥ 1 . s s m n=1 n (mn)s m=1 Note that the denominator here is a positive integer raised to the power s. But the positive integers appearing can appear in multiple ways. For example, the denominator 6s appears when m = 1, n = 6, again when m = 2, n = 3, and when m = 6, n = 1. Thus, generalizing, we have L(s, f )L(s, g) = ∞ ∞ X X (f ∗ g)(k) 1 X f (m)g(n) = = L(s, f ∗ g). s s k k k=1 mn=k k=1 Hence multiplying Dirichlet series introduces the Dirichlet convolution. How tidy! 4 If the series L(s, f ) and L(s, g) both converge absolutely, then the above argument, which involves rearranging terms, shows that L(s, f ∗ g) also converges absolutely. Can you prove that the abscissa of convergence for the series L(s, τ ) is 1? Let ζ(s) = L(s, u). What is the Dirichlet series for 1/ζ(s)? 4 The functions log and Λ The function log is of course not multiplicative. In fact it is (totally) additive, meaning that log(mn) = log(m) + log(n) for all positive integers m, n (in fact, for all positive reals). But the Möbius inversion formula does not require that a function be multiplicative. Define Λ = µ ∗ log, so that Λ is an arithmetic function that satisfies Λ ∗ u = log . Known as the von Mangoldt function, Λ is very important to us. Lets try to figure out what it is. One way is to figure out a function f which satisfies f ∗ u = log, and this must be Λ by the Möbius inversion formula. That is, we’re looking for a function f which satisfies X f (d) = log n (2) d|n for all positive integers n. Say n has the prime factorization pa11 pa22 . . . pakk . Then log n = log (pa11 pa22 . . . pakk ) = a1 log p1 + a2 log p2 + · · · + ak log pk . Now view an expression a log p as log p added to itself a times. Or if pa is one of the pai i , we can view a log p as log p added to itself the number of times pj | n with j ≥ 1. That is, X log(pa ) = a log p = log p, pj |pa and so log n = X log p. pj |n Slowly the function f that satisfies (2) is appearing out of the mist. We have f (m) = 0 unless m is of the form pj with p prime and j a positive integer, and then f (pj ) = log p. That’s it, we have f and this f is Λ, since a function f satisfying (2) is unique. So, again, ( log p, m is a power of a prime p, Λ(m) = 0, m is not a power of any prime. 5 And our key identity: log n = X Λ(d). (3) d|n An important way to think about Λ is that it is almost the same as the function that takes a prime p to log p and is zero at 1 and composites. The difference is that Λ also is nonzero at higher powers of primes. Thinking statistically, squares and higher powers are rare, they are sparsely distributed within the natural numbers, so the word “almost” almost makes sense! 5 Back to the prime harmonic sum When last we were discussing the prime harmonic sum (week 1), we had managed to show that X1 = log log x + O(1). p p≤x In this section we shall show that the von Mangoldt function helps us to prove a sharper version of this result. Theorem 1. There is a constant c such that for all x ≥ 2, X1 1 . = log log x + c + O p log x p≤x Proof. For a positive integer N, we estimate the sum X S(N) := log n n≤N by two different methods and then equate the estimates. The first method uses calculus. Since the function log x is increasing, we have the double inequality Z n Z n+1 log t dt < log n < log t dt. n−1 n Note that the first inequality holds for n starting at 2, but the second one holds for all positive integers n. Noting that log 1 = 0, Z N Z N +1 X X log t dt < log n = log n < log t dt. 1 2≤n≤N n≤N 1 The antiderivative of log t (use integration by parts) is t log t − t, and so evaluating the two integrals, we arrive at X S(N) = log n = N log N − N + O(log N). (4) n≤N 6 For our second way of evaluating the sum, we use (3). We have X n≤N log n = XX n≤N d|n N Λ(d) = Λ(d) d d≤N X X λ(d) =N +O d d≤N X ! Λ(d) . d≤N (5) From the definition of Λ, we have X X X X log p. Λ(d) = log p + p≤N 1/2 j≥2 pj ≤N p≤N d≤N We use the Erdős–Chebyshev estimate (week 1) for the first sum here, getting X log p ≤ N log 4 = O(N). p≤N For the double sum above, we have X X p≤N 1/2 j≥2 pj ≤N log p ≤ X X p≤N 1/2 log N. j≥2 pj ≤N The number of terms in the inner sum is O(log N), since the inequality pj ≤ N means that j ≤ log N/ log p. Also the number of terms in the outer sum is at most N 1/2 , so that the double sum is O(N 1/2 (log N)2 ). This shows that X Λ(d) = O(N), d≤N and so from (5), we have S(N) = X log n = N n≤N X Λ(d) + O(N). d d≤N (6) The equations (4) and (6) are our two estimates for S(N). Equating them gives us that X Λ(d) = log N + O(1). d d≤N Again using the definition of Λ, we have X Λ(d) X log p X X log p = + . j d p p 1/2 j≥2 p≤N d≤N p≤N 7 pj ≤N (7) Using the sum of a geometric progression, we can extend the inner sum in the second term to all j ≥ 2, getting X log p X X log p . < = O(1), pj p2 − p 1/2 1/2 j≥2 p≤N pj ≤N p≤N since we can extend this last sum over all primes p, or even all integers n ≥ 2, seeing that this infinite sum is convergent. Thus, X Λ(d) X log p = + O(1). d p d≤N p≤N With (7), this gives us X log p = log N + R(N), where R(N) = O(1). p p≤N (8) We are interested in the sum of 1/p, so guess what comes next. That’s right, partial summation. We have Z N X 1 X log p 1 X log p 1 1 X log p = · = + dt p p≤N p log p log N p≤N p t log2 t p≤t p 2 p≤N Z N 1 1 (log N + R(N)) + (log t + R(t)) dt. = log N t log2 t 2 The first term here is 1 + O(1/ log N). The second term falls into two parts: Z N Z N 1 R(t) dt. dt + t log t t log2 t 2 2 The first part is exactly log log N − log log 2, since the antiderivative of 1/(t log t) is log log t. The second part can be written as Z ∞ Z ∞ R(t) R(t) 2 dt − 2 dt. t log t 2 N t log t The first integral converges to a constant c0 , since R(t) = O(1) and the antiderivative of 1/(t log2 t) is −1/ log t. For the same reason, the second integral is O(1/ log N). Putting all of this together, we have X1 1 where c = 1 − log log 2 + c0 . = log log N + c + O p log N p≤N Changing N to x is insignificant, in fact if N = ⌊x⌋, then | log log x − log log N| = O(1/x), so we have the theorem. 8 Doing better than Theorem 1: We could say more about the constant c, and we could try and get a better error term. Also, in the proof we saw the sums X X Λ(n) n n≤N Λ(n), n≤N appearing. We estimated the first as O(N) and the second as log N + O(1). In both cases, we can do better, but it isn’t easy! We’ll next take up the problem of identifying the constant c in Theorem 1, since it is a pretty result, and also historically correct — this was done by Mertens a few decades before further progress was made on the other questions. 6 Selberg’s identity Among many other accomplishments, Atle Selberg is famous for an “elementary” proof (meaning, not using complex analysis) of the prime number theorem. Paul Erdős found a similar elementary proof around the same time, but it was Selberg who won the Fields Medal for it. It is somewhat tragic that because of some misunderstandings, they were not able to collaborate. Here we shall see a starting point for the Selberg–Erdős approach, known as Selberg’s identity. Theorem 2. We have Λ · log +Λ ∗ Λ = µ ∗ log2 . That is, for each n, X X Λ(n) log n + Λ(a)Λ(b) = µ(a) log2 b. ab=n ab=n There is a slick proof of this identity using the idea of a “derivative” f ′ of an arithmetic function f . This is not so mysterious, f ′ is defined as f · log. That is, f ′ (n) = f (n) log n. So, why is this simple-minded concept called the derivative? It’s because it is a linear operator, like the familiar derivative from calculus, and it satisfies the product rule, where now “product” is convolution. That is, it is easy to check that for any two arithmetic functions f, g and constant c, we have (cf )′ = cf ′ , (f + g)′ = f ′ + g ′, (f ∗ g)′ = f ′ ∗ g + f ∗ g ′ . The least trivial of these 3 formulas is the last, and all the last uses is that log(ab) = log a+log b. Let’s check it: X (f ∗ g)′ (n) = (f ∗ g)(n) log n = f (a)g(b) log(ab) ab=n = X ab=n f (a)g(b) log a + X ab=n f (a)g(b) log b = (f ′ ∗ g)(n) + (f ∗ g ′)(n). 9 A loose end from before: we have a distributive law for convolution and addition. Namely, if f, g, h are arithmetic functions, then f ∗ (g + h) = f ∗ g + f ∗ h, where we assume the usual convention on the right side, that addition is done after the convolutions; that is, the right side is (f ∗ g) + (f ∗ h). The formula just involves following your nose through the definition of convolution. To prove Theorem 2, we start with the identity u′ = log = Λ ∗ u, where the first equality comes from the definition of the derivative of an arithmetic function, and the second we established earlier this week (see (3)). Next, we take the derivative of both sides, using the product rule on the right side. We have u′′ = (Λ ∗ u)′ = Λ′ ∗ u + Λ ∗ u′ = Λ′ ∗ u + Λ ∗ log = Λ′ ∗ u + Λ ∗ Λ ∗ u. The left side here is log2 , so all we need do now is take the convolution of both sides by µ, noticing that doing so kills the “∗u” in the two terms on the right side, giving us log2 ∗µ = Λ′ + Λ ∗ Λ = Λ · log +Λ ∗ Λ. This completes the proof. 10