The Distribution of Prime Numbers

advertisement
The Distribution of Prime Numbers
Angel Kumchev
Spring 2005
Preface
These notes cover the material for the graduate course with the same title that I taught at the University of Texas at Austin during the Spring 2005 semester. They draw heavily on The Distribution of
Prime Numbers by M. Huxley and Multiplicative Number Theory by H. Davenport (as revised by
H. L. Montgomery). I also acknowledge the use of notes by Jeff Vaaler and numerous discussions
with him that helped improve the exposition.
i
Notation
For functions f and g with g ≥ 0, we write f (x) = O(g(x)) or f (x) g(x) when there is a constant
c such that | f (x)| ≤ cg(x); when f and g are both non-negative, we may also write f g instead of
g f . We write f (x) ∼ g(x) when lim f (x)/g(x) = 1 as x tends to some limit to be specified at each
occurrence. We use c, c0 , c1 , c2 , . . . to denote implicit constants; these and the constants implied by
O- and -symbols are presumed absolute, unless stated otherwise. Throughout these notes (and
much of number theory outside them) the letter p, with or without subscripts or superscripts, is
reserved for prime numbers. We also use s = σ + it to denote a complex variable.
For the most part, we use the standard notations for common number-theoretic functions. These
are usually defined at their first appearance, but for convenience we also list them here:
kθk
[θ]
{θ}
e(z)
Log z
d(n)
φ(n)
µ(n)
Λ(n)
π(x)
π(x; q, a)
θ(x)
ψ(x)
ψ(x; q, a)
ψ(x, χ)
τ(χ, a)
N(σ, T )
the distance from the real number θ to the nearest integer;
the integral part of the real number θ;
the fractional part of the real number θ;
e2πiz ;
the principal branch of the complex logarithm (Log x = ln x when x > 0);
the number of positive divisors of n;
Euler’s totient function: the number of reduced residue classes modulo n;
the Möbius function (see (1.1));
von Mangoldt’s function (see (1.3));
the number of primes p ≤ x;
the number of primes p ≤ x, with p ≡ a (mod q);
Chebyshev’s function (see (1.4));
the sum of the values of Λ(n) over n ≤ x;
the sum of the values of Λ(n) over n ≤ x, with n ≡ a (mod q);
the sum of the values of Λ(n)χ(n) over n ≤ x (see (3.50));
the Gaussian sum (see (3.7));
the number of zeros ρ = β + iγ of ζ(s) with σ ≤ β ≤ 1 and |γ| ≤ T ;
ii
Contents
Preface
i
Notation
ii
0 Historical background
0.1 Early history . . . . . . . . . . . . . . . . . . . . . . .
0.2 The Riemann ζ-function and the prime number theorem
0.3 Primes in arithmetic progressions . . . . . . . . . . . .
0.4 Primes in short intervals . . . . . . . . . . . . . . . .
.
.
.
.
1
1
2
4
7
.
.
.
.
9
9
10
14
18
.
.
.
.
22
22
28
33
35
.
.
.
.
.
39
39
45
47
53
56
1 Introduction: basic estimates
1.1 Multiplicative functions .
1.2 Partial summation . . . .
1.3 Dirichlet series . . . . .
1.4 Divisor functions . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 The prime number theorem
2.1 Definition of ζ(s). The functional equation
2.2 The zeros of ζ(s) . . . . . . . . . . . . .
2.3 The zerofree region . . . . . . . . . . . .
2.4 Proof of the prime number theorem . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Prime numbers in arithmetic progressions
3.1 Characters . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Dirichlet L-functions . . . . . . . . . . . . . . . . . .
3.3 The zeros of L(s, χ) . . . . . . . . . . . . . . . . . . .
3.4 The exceptional zero . . . . . . . . . . . . . . . . . .
3.5 The prime number theorem for arithmetic progressions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 The large sieve
63
4.1 Two results from analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2 Large-sieve inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Dirichlet polynomials with characters: a hybrid sieve . . . . . . . . . . . . . . . . 69
iii
5 Applications of the large sieve
5.1 Sums over primes and double sums . . . . . .
5.2 The Bombieri–Vinogradov theorem . . . . .
5.3 The Barban–Davenport–Halberstam theorem
5.4 The three primes theorem . . . . . . . . . . .
5.5 Primes in short intervals . . . . . . . . . . .
5.6 Primes in almost all short intervals . . . . . .
5.7 The linear sieve . . . . . . . . . . . . . . . .
5.8 Almost primes in short intervals . . . . . . .
iv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
75
75
77
81
82
89
98
100
106
Chapter 0
Historical background
0.1 Early history
The first result on the distribution of primes is Euclid’s theorem (circa 300 B.C.) on the infinitude
of the primes. In 1737 Euler went a step further and proved that, in fact, the series of the reciprocals
of the primes diverges. In the opposite direction, Euler observed that the rate of divergence of this
series is much slower than the rate of divergence of the harmonic series:
“The sum of the series of the reciprocals of the prime numbers,
1 1 1 1
1
1
+ + + +
+
+ ··· ,
2 3 5 7 11 13
is infinitely large, but it is infinitely many times less than the sum of the harmonic
series,
1 1 1 1
1+ + + + +··· .
2 3 4 5
Furthemore, the sum of the former series is like the logarithm of the sum of the latter
series.”
This statement appears to be the earliest attempt to quantify the frequency of the primes among the
positive integers.
Consider the prime counting function
X
π(x) =
1.
p≤x
In 1798 Legendre conjectured that π(x) satisfies the asymptotic relation
lim
x→∞
π(x)
= 1;
x/(log x)
(0.1)
this is the prime number theorem (PNT). Years later, Gauss wrote that back in his adolescent years
he had observed that the logarithmic integral
Z x
dt
Li x =
2 log t
1
seemed to provide a very good approximation to π(x). This, of course, is consistent with (0.1), as
can be seen from the formula
1!x
k!x
x
x
+
+ ···+
+O
.
(0.2)
Li x =
log x (log x)2
(log x)k+1
(log x)k+2
The first theoretical evidence in support of the PNT was given by Chebyshev in the 1850s. He
proved that:
• (0.1) predicts correctly the order of magnitude of π(x), that is, there exist absolute constants
c2 > c1 > 0 such that
c2 x
c1 x
≤ π(x) ≤
.
(0.3)
log x
log x
Chebyshev showed that for sufficiently large x one may take c1 = 0.9212 and c2 = 1.1056.
In his honor, bounds for π(x) of this type are now known as Chebyshev’s estimates.
• If the limit on the left side of (0.1) exists, then it must be equal to 1.
Chebyshev used the methods that he developed for the proof of (0.3) to establish Bertrand’s postulate: the interval (n, 2n] contains a prime number for all integers n ≥ 1. Furthermore, in 1874
Mertens used Chebyshev’s estimates (0.3) to show that
X1
= log log x + B + O (log x)−1 ,
p
p≤x
(0.4)
B being an absolute constant. This provided the first rigorous proof of Euler’s observation that “the
sum of the [series of the reciprocals of the primes] is like the logarithm of the sum of the [harmonic
series].” We sketch the proofs of (0.3), (0.4), and some related results in §1.2.
0.2 The Riemann ζ-function and the prime number theorem
The Riemann zeta-function is defined in the half-plane Re(s) > 1 as
ζ(s) =
∞
X
n=1
n−s =
Y
p
1 − p−s
−1
.
(0.5)
The identity between the infinite series and the infinite product on the right (which runs over all
primes) is an analytic expression of the fundamental theorem of arithmetic and was discovered by
Euler in 1737 (in the same paper as his proof of the infinitude of the primes), at least in the case
when s is real. The first to consider ζ(s) as a function of a complex variable was Riemann. In
1859 he published his seminal paper [47] (his only paper on number theory), in which he observed
that ζ(s) is holomorphic in the half-plane Re(s) > 1 and that it can be continued analytically to a
meromorphic function, whose only singularity is a simple pole at s = 1. Riemann was interested
in ζ(s), because Euler’s identity (0.5) provides a connection between the analytic properties of ζ(s)
2
and the PNT. It is not difficult to deduce from (0.5) that ζ(s) does not vanish in the half-plane
Re(s) > 1. Riemann proved that ζ(s) satisfies the functional equation
s
1−s
−s/2
(s−1)/2
π Γ
ζ(s) = π
Γ
ζ(1 − s),
2
2
from which it is easy to deduce that the only zeros of ζ(s) in the half-plane Re(s) < 0 are the
negative even integers; these are the trivial zeros of ζ(s). Besides the trivial zeros, ζ(s) has infinitely
many zeros in the strip 0 ≤ Re(s) ≤ 1: the non-trivial zeros of ζ(s). Riemann proposed several
conjectures about the non-trivial zeros of ζ(s):
C1. If
then
N(T ) = # ρ ∈ C : ζ(ρ) = 0, 0 ≤ Re(ρ) ≤ 1, 0 < Im(ρ) ≤ T ,
T
N(T ) =
log
2π
C2. The entire function
ξ(s) =
has a product representation
T
2πe
+ O(log T ).
s
1
s(s − 1)π−s/2 Γ
ζ(s)
2
2
ξ(s) = e
A+Bs
Y
ρ
s
1−
ρ
e−s/ρ ,
the product being over all non-trivial zeros of ζ(s).
C3. If x > 1, there is an explicit formula that represents π(x) as a series over the non-trivial zeros
of ζ(s).
C4. Riemann Hypothesis (RH). All zeros of ζ(s) with 0 ≤ Re(s) ≤ 1 lie on the line Re(s) = 12 .
By the end of the 19th century, conjectures C1–C3 were proved: C1 and C3 were established
by von Mangoldt, and C2 is a consequence of the general theory of entire functions of finite
order developed by Hadamard. In particular, the Riemann–Mangoldt explicit formula for π(x)
demonstrated that the PNT follows from the nonvanishing of ζ(s) on the line Re(s) = 1. Thus,
when in 1896 Hadamard and de la Vallée Poussin proved (independently) that ζ(1 + it) , 0 for all
real t, the PNT was finally proved. In contrast, the Riemann Hypothesis is still an open problem
that has been selected by the Clay Mathematics Institute as one of the seven Millennium Problems.
We remark that under RH, the Riemann–Mangoldt formula implies the asymptotic formula
π(x) = Li x + O x1/2 log x ,
(0.6)
which is essentially best possible.
3
The last observation has motivated the investigations of the error term in the PNT. In 1899 de
la Vallée Poussin refined the original proof that ζ(1 + it) , 0 and showed that, in fact, ζ(σ + it)
does not vanish in the region
c
,
(0.7)
σ≥1−
log(|t| + 10)
for some absolute constant c > 0. This suffices to establish the following quantitative version of
the PNT, which will be the main subject of Chapter 2 of these notes.
Theorem 1. There exists an absolute constant c > 0 such that
√
π(x) = Li x + O x exp − c log x .
Further improvements on the error term in the PNT have been quite limited. In 1922 Littlewood
proved that
√
π(x) − Li x x exp − c log x log log x ,
(0.8)
while the best result to date was obtained by Korobov [38] and I. M. Vinogradov [58] in 1958:
π(x) − Li x x exp − c(log x)3/5 (log log x)−1/5 .
(0.9)
Both (0.8) and (0.9) are consequences of repsective improvements on the estimate of the zerofree
region (0.7). Unfortunately, it is known that the approach employed in these works can never yield
a bound of the form π(x) − Li x xθ , with a fixed θ < 1.
0.3 Primes in arithmetic progressions
In a couple of memoirs published in 1837 and 1840, Dirichlet proved that if a and q are natural
numbers with (a, q) = 1, then the arithmetic progression a, a + q, a + 2q, . . . contains infinitely
many primes. By refining Dirichlet’s argument, Mertens established the asymptotic formula
X
p≤x
p≡a (mod q)
1 X1
1
∼
p φ(q) p≤x p
as x → ∞,
(0.10)
where φ(q) is Euler’s totient function. Fix q and consider the various reduced residue classes
modulo q. Since all but finitely many primes lie in residue classes a mod q with (a, q) = 1, (0.10)
suggests that the primes are uniformly distributed among the reduced residue classes to a given
modulus q. Thus, one may expect that if (a, q) = 1, then
π(x; q, a) =
X
p≤x
p≡a (mod q)
1∼
Li x
φ(q)
as x → ∞.
(0.11)
This is the prime number theorem for arithmetic progressions. We may approach this statement
in two different ways. First, we may fix a and q and ask whether (0.11) holds (allowing the
4
convergence to depend on q and a). Posed in this form, the problem is a minor generalization of
the PNT. In fact, shortly after proving Theorem 1, de la Vallée Poussin showed that
π(x; q, a) =
√
Li x
+ O x exp − c log x ,
φ(q)
where c = c(q, a) > 0 and the O-implied constant depends on q and a. The problem becomes much
more difficult if we want an estimate that is explicit in q and uniform in a. The first result of this
kind was obtained by Page [43], who proved that there exists a (small) positive number δ such that
π(x; q, a) =
Li x
+ O x exp − (log x)δ ,
φ(q)
whenever 1 ≤ q ≤ (log x)2−δ and (a, q) = 1. In 1935 Siegel [49] proved the following result, which
we will establish in Chapter 3.
Theorem 2. For any fixed A > 0, there exists a constant c = c(A) > 0 such that
π(x; q, a) =
√
Li x
+ O x exp − c log x ,
φ(q)
whenever 1 ≤ q ≤ (log x)A and (a, q) = 1.
Remark. While this result is clearly sharper than Page’s, it does have one significant drawback: it
is ineffective, that is, given a particular value of A, the proof does not allow the constant c(A) or
the O-implied constant to be computed.
The proofs of the above results rely on the analytic properties of a class of generalizations of
the Riemann zeta-function known as Dirichlet L-functions. For each positive integer q there are
φ(q) functions χ : Z → C called Dirichlet characters modulo q (we will define these in Chapter 3).
Given a character χ modulo q, we define the respective Dirichlet L-function by
L(s, χ) =
∞
X
χ(n)n−s
(Re(s) > 1).
n=1
Similarly to ζ(s), L(s, χ) is holomorphic in the half-plane Re(s) > 1 and can be continued analytically to a meromorphic function that has at most one pole, which (if present) must be a simple
pole at s = 1. Furthermore, just as ζ(s), the continued L(s, χ) has infinitely many zeros in the strip
0 ≤ Re(s) ≤ 1, and the horizontal distribution of those zeros has important implications on the
distribution of primes in arithmetic progressions. For example, the results of de la Vallée Poussin,
Page, and Siegel mentioned above were proved by showing that no L-function has zeros “close” to
the line Re(s) = 1. We also have the following generalization of the Riemann Hypothesis:
Generalized Riemann Hypothesis (GRH). Let L(s, χ) be a Dirichlet L-function.
Then all zeros of L(s, χ) with 0 ≤ Re(s) ≤ 1 lie on the line Re(s) = 21 .
5
Assuming GRH, we can deduce easily that
π(x; q, a) =
Li x
+ O x1/2 log x ,
φ(q)
(0.12)
whenever (a, q) = 1. This estimate establishes (0.11) for 1 ≤ q ≤ xθ , θ < 21 .
In many applications one only needs approximations like (0.12) in some average sense over
the moduli q. During the 1950s and 1960s several authors obtained such results. In particular, the
following quantity was studied extensively:
X
Li
y
.
E(x, Q) =
max max π(y; q, a) −
(a,q)=1 y≤x
φ(q)
q≤Q
The trivial bound for this sum is E(x, Q) x, whereas (0.12) implies
E(x, Q) Qx1/2 log x.
(0.13)
In 1965 Bombieri [5] and A. I. Vinogradov [55] proved (independently) the following result.
Theorem 3. For any fixed A > 0, there exists a constant B = B(A) > 0 such that
E(x, Q) x(log x)−A ,
whenever Q ≤ x1/2 (log x)−B .
We observe that this result provides a nontrivial estimate for E(x, Q) under essentially the
same restrictions on Q as GRH. Because of this fact, the Bombieri–Vinogradov theorem has seen
numerous applications in which it has been used as a de facto replacement for GRH. In Chapter 5
we will give a proof of Theorem 3 with B = A + 4.
It should be noted that unlike the error term in (0.6), the error term in (0.12) is not necessarily
best possible. In fact, there is some evidence in support of the bold conjecture that
π(x; q, a) =
Li x
+ O (x/q)1/2+
φ(q)
for any fixed > 0. In Chapter 5 we will establish the so-called Barban–Davenport–Halberstam
theorem, which asserts that this bound holds in the mean-square over all arithmetic progressions
with differences q ≤ x1− . We should also mention that during the mid 1980s Bombieri, Friedlander, and Iwaniec [6, 7, 8] obtained several variants of the Bombieri–Vinogradov theorem, in which
the value of Q can exceed x1/2 . However, since their methods go beyond the reach of these notes,
we will only state one of their results (see [7]).
Theorem 4. Let a , 0 and x ≥ y ≥ 3. Then
2
X Li
x
log
y
π(x; q, a) −
(Li x)
(log log x)c .
φ(q)
log
x
√
q≤ xy
(q,a)=1
Here c is an absolute constant and the -implied constant depends only on a.
6
0.4 Primes in short intervals
It is an old problem in the theory of prime numbers to prove that for any integer n ≥ 1, the interval
(n2 , (n + 1)2 ] contains a prime number. This problem leads quickly to the more general question of
estimating the differences between consecutive primes. Cramér was the first to study this question
systematically. Let pn denote the nth prime number. In 1920 Cramér [12] proved that under RH
pn+1 − pn p1/2
n log pn .
Cramér also proposed a probabilistic model of the prime numbers that leads to very precise (and
very bold) predictions of the asymptotic properties of the primes. In particular, he conjectured [13]
that
pn+1 − pn
= 1.
(0.14)
lim sup
(log pn )2
n→∞
Nontrivial upper bounds for pn+1 − pn can be derived from the quantitative versions of the PNT
stated above, but the ensuing results are rather poor, because the known bounds for the error term
in the PNT are barely smaller than the main term. However, in 1930 Hoheisel [26] obtained a
much sharper result. He proved (unconditionaly) the asymptotic formula
π(x + xθ ) − π(x) ∼ xθ (log x)−1
as x → ∞,
(0.15)
with θ = 1 − (33000)−1. Subsequently several authors made further contributions that produced the
following improvements on Hoheisel’s result:
Heilbronn [25] (1933)
Chudakov [11] (1936)
Ingham [30] (1937)
Montgomery [41] (1971)
Huxley [27] (1972)
Heath-Brown [23] (1988)
θ = 0.996
θ > 3/4 = 0.750
θ > 5/8 = 0.625
θ > 3/5 = 0.600
θ > 7/12 = 0.583 . . .
θ = 7/12 = 0.583 . . .
We will see the proof of Huxley’s result in Chapter 5 of these notes. Furthermore, since the
late 1970s, several mathematicians have shown that even shorter intervals must contain primes
(without establishing an asymptotic formula for the number of primes in such intervals). Such
results usually take the form
π(x + xθ ) − π(x) xθ (log x)−1
for x ≥ x0 (θ).
The following list displays the progress in that direction over the last 30 years:
Iwaniec and Jutila [34] (1979)
Heath-Brown and Iwaniec [24] (1979)
Iwaniec and Pintz [35] (1984)
Lou and Yao [40] (1992)
Baker and Harman [1] (1996)
Baker, Harman, and Pintz [2] (2001)
7
θ = 13/23 = 0.565 . . .
θ > 11/20 = 0.550
θ = 23/42 = 0.547 . . .
θ = 6/11 = 0.545 . . .
θ = 0.535
θ = 0.525
(0.16)
Selberg [48] considered the distribution of primes in “almost all short intervals.” Let h(x) be an
increasing function of x. We say that almost all intervals (x, x + h(x)] contain primes if the measure
of the set of x ∈ (1, X] for which the interval (x, x + h(x)] contains no prime is o(X). Selberg proved
that if h(x) grows faster than (log x)2 as x → ∞, the Riemann Hypothesis implies that almost all
intervals (x, x+h(x)] contain a prime number (and also that the asymptotic formula (0.15) holds for
each inexceptional interval). Further, Selberg showed unconditionally that if θ > 1/4, then almost
all intervals (x, x+ xθ ] contain a prime number. The latter result has been the subject of a long series
of successive improvements, similar to the improvements on Hoheisel’s result described above. In
particular, the best result to date obtained in 1996 by Jia [36] extends the range for θ in Selberg’s
result to θ > 1/20.
In the opposite direction, Erdös [16] showed in 1935 that
pn+1 − pn ≥ c log pn log log pn (log log log pn )−2
(0.17)
infinitely often. In 1938 Rankin [46] showed that one can replace the right side of (0.17) by
(1/3 + o(1)) log pn log log pn log log log log pn (log log log pn )−2 ,
but subsequent attempts at further improvements have not been very successful: the best result
to date (see Pintz [44]) replaces the constant 1/3 in Rankin’s bound by 2eγ , where γ is Euler’s
constant. In fact, the problem appears to be so notoriously difficult that Erdös—who was known
for offerring monetary prizes for solutions of problems he was intrigued by—announced that he
would pay $10,000 to anyone who proved that the constant 1/3 in Rankin’s result can be taken
arbitrarily large!
8
Chapter 1
Introduction: basic estimates
The purpose of this chapter is to introduce some of the basic techniques and functions appearing
in the later chapters. The results are mostly elementary and the reader may be familiar with some
(and possibly all) of them.
1.1 Multiplicative functions
A function f : N → C is said to be multiplicative if it is not identically zero and
f (mn) = f (m) f (n)
whenever gcd(m, n) = 1.
If f satisfies the stronger condition that f (mn) = f (m) f (n) for all pairs m, n, it is said to be completely (or totally) multiplicative.
Some functions, such as f (n) = n s (s ∈ C), are obviously multiplicative. Others are defined so
that they are. One such function is the Möbius function


if n = 1,
1
r
µ(n) = (−1) if n is the product of r distinct primes,
(1.1)


0
if n is divisible by the square of a prime.
The following lemma provides an easy way to deduce the multiplicativity of a large class of arithmetic functions. We leave its proof as an exercise.
Lemma 1.1. Suppose that f and g are multiplicative functions. Then the arithmetic function f ∗ g,
defined by
X
( f ∗ g)(n) =
f (d)g(n/d),
d|n
is also multiplicative.
The next lemma contains the most important property of the Möbius function.
9
Lemma 1.2.
X
d|n
(
1 if n = 1,
µ(d) =
0 otherwise.
Proof. It suffices to consider the case when n is squarefree. Suppose that n = p1 p2 · · · pr , where
p1 , p2 , . . . , pr are distinct primes, and write m = p1 p2 · · · pr−1 . Then
X
X
X
X
X
µ(d) =
µ(d) +
µ(d pr ) =
µ(d) +
(−µ(d)) = 0,
d|n
d|m
d|m
d|m
d|m
where the second to last step uses the multiplicativity of µ.
Corollary 1.3. (Möbius inversion formula) Suppose that f : N → C is an arithmetic function and
define
X
F(n) =
f (d).
d|n
Then f = F ∗ µ, that is,
f (n) =
X
F(d)µ(n/d).
d|n
In particular, if F is multiplicative, so is f .
Proof. We have
X
XX
XX
X X
F(d)µ(n/d) =
f (k)µ(n/d) =
f (k)µ(n/d) =
f (k)µ(n/km).
d|n
d|n
k|d
k|n
d|n
k|d
k|n m|(n/k)
By Lemma 1.2, the sum over m vanishes unless k = n, so the first claim follows. The second claim
is a consequence of the first, Lemma 1.1, and the multiplicativity of µ.
1.2 Partial summation
We now discuss a simple trick that is put to a great use in analytic number theory.
Lemma 1.4 (Abel). Suppose that an are complex numbers and f (x) is continuously differentiable
on [α, β]. Then
Z β
X
an f (n) = A(β) f (β) −
A(x) f 0 (x) dx,
α
α<n≤β
where A(x) =
P
α<n≤x
an .
Proof. Using Stieltjes integration by parts, we have
Z β+
Z
X
β
an f (n) =
f (x) dA(x) = f (x)A(x) α −
α<n≤β
α+
and the desired result follows.
10
β
A(x) d f (x),
α
Corollary 1.5. There is a constant c1 such that
X1
n≤x
Corollary 1.6.
X
n≤x
n
= log x + c1 + O x−1 .
log n = x log x − x + O(log x).
Remark. The constant c1 is known as Euler’s constant and usually is denoted by γ:
X
1
γ = lim
− log x ≈ 0.5772 . . . .
x→∞
n
n≤x
(1.2)
Next, we define three arithmetic functions that play an important role in prime number theory.
These are von Mangoldt’s function
(
log p if n is a power of a prime p,
(1.3)
Λ(n) =
0
otherwise,
and the functions
θ(x) =
X
log p
and
ψ(x) =
p≤x
X
Λ(n),
(1.4)
n≤x
which were introduced first by Chebyshev. Our first result about these functions is a Chebyshevtype bound for ψ(x).
Theorem 1.7. Suppose that 0 < c2 < log 2 and c3 > log 4. Then for sufficiently large x,
c2 x ≤ ψ(x) ≤ c3 x.
Proof. Define
X
T (x) =
log n.
n≤x
Taking logarithms in the prime factorization of n, we see that
X
log n =
Λ(d),
d|n
so we can rewrite T (x) as
T (x) =
XX
n≤x
Λ(d) =
d|n
X
Λ(d)
Thus,
T (x) − 2T (x/2) =
1=
n≤x
d|n
d≤x
X
X
Λ(d)
d≤x
11
h x i
X
d≤x
Λ(d)
h xi
h x i
−2
.
d
2d
d
.
We now note that
0≤
and
h xi
d
−2
h xi
h xi
−2
=1
d
2d
Hence,
h xi
≤1
2d
for x/2 < d ≤ x.
ψ(x) − ψ(x/2) ≤ T (x) − 2T (x/2) ≤ ψ(x).
On the other hand, by Corollary 1.6,
T (x) − 2T (x/2) = x log 2 + O(log x).
We deduce that
ψ(x) ≥ x log 2 + O(log x)
and
ψ(x) ≤ ψ(x/2) + x log 2 + O(log x)
≤ ψ(x/4) + 1 + 21 x log 2 + O(log x)
≤ ψ(x/8) + 1 + 12 + 14 x log 2 + O(log x)
..
.
≤ ψ(x/2r ) + 1 + 12 + 14 + · · · x log 2 + O(r log x)
≤ x log 4 + O (log x)2 ,
on choosing r so that 2r ≤ x < 2r+1 .
Lemma 1.8.
X log p
p≤x
p
= log x + O(1).
Theorem 1.9 (Mertens). There is an absolute constant B such that
X1
= log log x + B + O (log x)−1 .
p
p≤x
Proof. Define the function
R(x) =
X log p
− log x
p
2<p≤x
and the sequence
an =
(
(log n)/n if n is a prime number,
0
otherwise.
12
(1.5)
Then, by Lemma 1.4,
X an
X1
1
=
+
p 2<n≤x log n 2
p≤x
Z x X
1 X log p
dy
1
log p
=
+
+
2
log x 2<n≤x p
p
y(log y)
2
2
2<n≤y
Z x
log y + R(y)
1
1
log x + R(x) +
dy +
=
2
log x
y(log y)
2
2
Z x
3
R(x)
R(y)
= log log x + − log log 2 +
dy +
.
2
2
log x
2 y(log y)
Using Lemma 1.8 to bound R(y), we obtain
Z x
Z ∞
R(x)
R(y)
R(y)
dy +
=
dy + O (log x)−1 ,
2
2
log x
2 y(log y)
2 y(log y)
and the desired conclusion follows with
Z
3
B = − log log 2 +
2
2
∞
R(y)
dy.
y(log y)2
The final result of this section quantifies the relation between the error term in the PNT and the
difference ψ(x) − x.
Theorem 1.10. Suppose that f is an integrable function such that x1/2 f (x) x and
Z x
f (t)
dt f (x) log x.
t
2
Then
ψ(x) − x f (x)
⇔
π(x) − Li x f (x)(log x)−1 .
Proof. Theorem 1.7 implies that
θ(x) = ψ(x) + O x1/2 ,
whence
θ(x) = x + O( f (x)).
Let an be the sequence
an =
(
log n if n is a prime number,
0
otherwise.
13
As in the proof of Theorem 1.9,
X an
π(x) =
+1
log n
2<n≤x
Z x
θ(y) dy
θ(x)
+
+ O(1)
=
2
log x
2 y(log y)
Z x
Z x
x
f (x)
f (y) dy
dy
=
+O
+
+O
2
2
log x
log x
2 y(log y)
2 (log y)
= Li x + O f (x)(log x)−1 ,
by (0.2) and the properties of f . This proves the direct implication, the converse is left as an
exercise.
1.3 Dirichlet series
A Dirichlet series is an infinite series of the form
∞
X
an n−s ,
(1.6)
n=1
where an are complex numbers and s = σ + it is a complex variable. The following lemma shows
that if a Dirichlet series converges at any finite complex number s0 = σ0 + it0 , then it converges to
a holomorphic function in the half-plane Re(s) > σ0 .
Lemma 1.11. Suppose that s0 = σ0 + it0 and the series
∞
X
an n−s0
n=1
converges. Then the Dirichlet series (1.6) converges uniformly on the compact subsets of the halfplane Re(s) > σ0 and the sum-function
f (s) =
∞
X
an n−s
n=1
is holomorphic in that half-plane.
Proof. It suffices to show that (1.6) converges uniformly in the regions
s ∈ C : Re(s) ≥ σ0 + δ, | Im(s) − t0 | ≤ T
where δ, T > 0. By Lemma 1.4 with an = an n−s0 and f (n) = n−(s−s0 ) ,
X
X
−s
−s0 an n δ,T max an n ,
α<n≤β
α<x≤β
so the uniform convergence of (1.6) follows from the convergence of
criterion.
14
(1.7)
α<n≤x
P
n
an n−s0 and Cauchy’s
The number
inf Re(s) : (1.6) converges
is called the abscissa of convergence of the Dirichlet series (1.6). Here, of course, we allow the
possibility
that the infimum could be ±∞. The abscissa of convergence of the Dirichlet series
P
−s
n |an |n is called the abscissa of absolute convergence of (1.6). The two abscissas are related by
the following inequality.
Lemma 1.12. Suppose that σc and σa are the abscissa of convergence and the abscissa of absolute
convergence of the Dirichlet series (1.6). Then σc ≤ σa ≤ σc + 1.
Dirichlet series are an important class of generating functions in number theory. In the remainder of this section, we discuss their properties related to their use in number theory. We first
consider the relation between the sum-function of a Dirichlet series,
f (s) =
∞
X
an n−s ,
n=1
and the running sums of its coefficient sequence,
A(x) =
X
an .
n≤x
The passage from A(x) to f (s) is easy (at least, when Re(s) is sufficiently large):
f (s) =
∞
X
an
n=1
Z
∞
−s−1
sx
dx =
n
Z
1
∞
X
n≤x
−s−1
an sx
dx =
Z
∞
A(x)sx−s−1 dx.
1
The inverse relation requires a little bit more work.
Lemma 1.13 (Perron’s formula). Suppose that α > 0. Then

α
−1

1
+
O
u
(T
|
log
u|)
if u > 1,
Z α+iT s

1
u
1
−1
ds = 2 + O αT
if u = 1,

2πi α−iT s

O uα (T | log u|)−1
if 0 < u < 1.
Proof. This is an exercise in contour integration.
Corollary 1.14. Let f (s) be the sum-function of the Dirichlet series (1.6). Suppose that x < Z and
α > σa , where σa is the abscissa of absolute convergence of (1.6). Then
X
n≤x
1
an =
2πi
Z
α+iT
∞
xα X |an |n−α
f (s)x s ds + O
.
T n=1 | log(x/n)|
s −1
α−iT
15
Corollary 1.15. Suppose that a1 , a2 , a3, . . . and b1 , b2, b3 , . . . are sequences of complex numbers
and the holomorphic functions f (s) and g(s) are defined in the half-plane Re(s) > σ0 by
f (s) =
∞
X
an n
−s
and
g(s) =
n=1
∞
X
bn n−s .
n=1
If f (s) = g(s) whenever Re(s) > σ0 , then an = bn for all n = 1, 2, 3, . . . .
Proof. We apply Corollary 1.14 with x < Z, α = σ0 + 2 (this ensures the absolute convergence of
the series on the line Re(s) = α), and T = ∞. We get
Z α+i∞
Z α+i∞
X
X
1
1
s −1
an =
f (s)x s ds =
g(s)xs s−1 ds =
bn .
2πi
2πi
α−i∞
α−i∞
n≤x
n≤x
Since this holds for all non-integer x > 1, it follows that an = bn for all n = 1, 2, 3, . . . .
The next two lemmas and their corollaries illustrate why Dirichlet series are convenient generating functions in multiplicative number theory.
Lemma 1.16. Suppose that f (n) is a multiplicative function. Then the identity
∞
X
f (n)n−s =
Y
p
n=1
1 + f (p)p−s + f (p2 )p−2s + · · ·
holds whenever the series on the left converges absolutely.
P
Proof. P
The absolute convergence of the series n f (n)n−s implies the absolute convergence of the
series m f (pm )p−ms for all primes p. Let x ≥ 2 and r = π(x). Then
Y
−s
2
1 + f (p)p + f (p )p
−2s
p≤x
+··· =
=
=
∞
X
m1 =0
∞
X
···
···
m1 =0
∞
X
∞
X
mr =0
∞
X
mr =0
f (pm1 1 ) · · · f (pmr r )(pm1 1 · · · pmr r )−s
f (pm1 1 · · · pmr r )(pm1 1 · · · pmr r )−s
f (n)n−s ,
n=1
p|n⇒p≤x
where we have used the multiplicativity of f . Noting that the last sum contains, in particular, all
the terms with n ≤ x, we conclude that
X
Y
X
−s
−s
2 −2s
| f (n)|n−σ ,
f (n)n −
1 + f (p)p + f (p )p + · · · ≤
n≤x
n>x
p≤x
which establishes the desired identity.
16
Corollary 1.17. Suppose that f (n) is a completely multiplicative function. Then the identity
∞
X
Y
f (n)n−s =
p
n=1
1 − f (p)p−s
−1
holds whenever the series on the left converges absolutely.
Lemma 1.18. Suppose that a1 , a2 , a3, . . . and b1 , b2, b3 , . . . are sequences of complex numbers such
that the Dirichlet series
f (s) =
∞
X
an n
−s
and
g(s) =
n=1
∞
X
bn n−s
n=1
converge absolutely in the half-plane Re(s) > σ0 . Then the Dirichlet series
h(s) =
∞
X
cn n−s ,
cn =
X
au bv ,
uv=n
n=1
is also absolutely convergent in Re(s) > σ0 and h(s) = f (s)g(s).
Proof. Suppose first that σ > σ0 . Then
X
n≤x
|cn |n−σ
XX
−σ
n
a
b
=
u
v
n≤x uv=n
X
XX
|au bv | n−σ =
|au bv |(uv)−σ
≤
n≤x
≤
≤
X
u≤x
X
∞
u=1
uv=n
|au |u
|au |u
uv≤x
−σ
X
v≤x
−σ
X
∞
v=1
−σ
|bv |v
,
|bv |v
−σ
P
which proves the absolute convergence of n cn n−s for Re(s) = σ. In particular, we have that
XX
|au bv | n−σ → 0
as x → ∞,
n>x
uv=n
so the second part of the lemma follows from the inequality
X
X
X X
X −s
−s
−s |au bv | n−σ .
cn n −
au u
bv v ≤
n≤x
u≤x
v≤x
n>x
In the next series of corollaries ζ(s) is the Riemann zeta-function.
17
(1.8)
uv=n
Corollary 1.19. Suppose that Re(s) > 1. Then
∞
X
µ(n)n−s =
n=1
Proof. By Lemmas 1.2 and 1.18,
X
∞
µ(n)n
−s
n=1
1
.
ζ(s)
X
∞
n
n=1
−s
= 1.
Corollary 1.20. Suppose that Re(s) > 1. Then
∞
X
n=1
Λ(n)n−s = −
ζ 0 (s)
.
ζ(s)
1.4 Divisor functions
In this section we collect several standard estimates for the number of divisors function d(n) and its
averages. First of all, we note that d(n) is multiplicative (by Lemma 1.1) and satisfies d(pk ) = k +1.
These two observations lead (after some work) to the following upper bound for d(n).
Lemma 1.21. For any > 0, d(n) n .
The bound in Lemma 1.21 is not tight, but it is also not too far from the best possible general
bound (see Exercise 21). On the other hand, the next lemma shows that for most values of n, d(n)
is significantly smaller: its average value is log n.
Theorem 1.22 (Dirichlet). Suppose that x ≥ 2. Then
X
d(n) = x log x + (2γ − 1)x + O x1/2 ,
(1.9)
n≤x
where γ is Euler’s constant.
Proof. Let D(x) denote the left side of (1.9). We have
X
X
XX
X
X
1+
1−
1 = D1 (x) + D2 (x) − D3 (x),
D(x) =
1=
1=
n≤x uv=n
uv≤x
uv≤x
√
u≤ x
say.
√
uv≤x
√
v≤ x
u,v≤ x
Thus, (1.9) follows from the estimates
X h xi X x
+ O x1/2
D1 (x) = D2 (x) =
=
u
√
√ u
u≤ x
u≤ x
√
= x log x + γx + O x1/2
(by Corollary 1.5);
√ 2
D3 (x) =
x = x + O x1/2 .
18
Remark. The estimation of the error term in (1.9) is a famous problem in analytic number theory.
It is not too difficult to show that
X
∆(x) =
d(n) − x log x − (2γ − 1)x x1/3 log x.
n≤x
Attempts to improve further on this and other similar bounds have stimulated the development of
the theory of exponential sums (see Graham and Kolesnik [17] and Huxley [28]). The best result
to date was obtained recently by Huxley [29]:
∆(x) x131/416+ ,
where 131/416 = 0.3149 . . . . It is conjectured that
∆(x) x1/4+ ,
which if proven would be essentially best possible, as it is an old result of Hardy [20] that the
bound ∆(x) x1/4 does not hold for all x.
Often one needs upper bounds for higher moments of d(n). The following theorem provides
such an estimate.
Theorem 1.23. Suppose that x ≥ 1 and k ∈ N. Then
X
k
(d(n))k k x(log x)2 −1 + 1.
(1.10)
n≤x
Proof. By induction on k. The case k = 1 follows from Theorem 1.22. Now suppose that (1.10)
holds for some k ≥ 1. Then
X
X
X
(d(n))k+1 =
(d(uv))k ≤
(d(u)d(v))k ,
n≤x
uv≤x
uv≤x
where the last step uses that d(mn) ≤ d(m)d(n). Hence, by the inductive hypothesis,
X
X
X
(d(v))k
(d(u)d(v))k ≤
(d(u))k
uv≤x
u≤x
v≤x/u
k
k x(log x)2 −1
X (d(u))k
u
u≤x
+
X
u≤x
k+1 −1
(d(u))k k x(log x)2
,
on using the bound
X (d(u))k
u≤x
u
k
k (log x)2 ,
which follows from the inductive hypothesis by partial summation.
19
Exercises
1. Prove (0.2).
2. Prove that Euler’s function φ(n) is multiplicative.
3. Prove Lemma 1.1.
R∞
4. Prove Corollary 1.5. H: The value of c1 is 1 − 1 {x}x−2 dx.
5. Prove Corollary 1.6.
−1
6. Prove Lemma 1.8. H: First
show that the given sum equals x T (x)+O(1), where T (x) is the sum appearing
in the proof of Theorem 1.7.
7. Prove that
Y
1
1
C
1−
=
1+O
,
p
log x
log x
p≤x
where C is an absolute constant. (It can be shown that, in fact, C = e−γ , where γ is Euler’s constant.)
8. Let B be the constant appearing in Theorem 1.9 and C be the constant appearing in the last problem. Prove that
X1
1
+ log 1 −
B + log C =
.
p
p
p
9. Prove that under the hypotheses of Theorem 1.10,
Z x
f (y) dy
f (x)
.
2
log x
2 y(log y)
10. Prove the converse part of Theorem 1.10.
11. Modify the proof of Theorem 1.10 to show that the PNT is equivalent to the statement that ψ(x) ∼ x as x → ∞.
12. Verify (1.7).
P
13. Suppose that in Lemma 1.11 the assumption that the series n an n−s0 converges is weakened to the assertion
that the partial sums
X
an n−s0
(N = 1, 2, 3, . . . )
n≤N
are bounded. Prove that the conclusion of the lemma stays true.
14. Prove Lemma 1.12.
15. Prove Lemma 1.13.
16. Prove (1.8).
17. Prove Corollary 1.20.
18. Prove the following identities:
20
(a)
∞
X
d(n)n−s = ζ 2 (s) whenever Re(s) > 1;
n=1
(b)
∞
X
n=1
(c)
∞
X
n=1
|µ(n)|n−s = ζ(s)/ζ(2s) whenever Re(s) > 1;
φ(n)n−s = ζ(s − 1)/ζ(s) whenever Re(s) > 2.
19. Define a multiplicative function f : N → C by
k − 1/2
k
k −1/2
f (p ) =
= (−1)
,
k
k
where the generalized binomial coefficient ks , s ∈ C, is the coefficient of zk in the Maclaurin expansion of
(1 + z) s :
s
s(s − 1) · · · (s − k + 1)
.
=
k!
k
P
(a) Prove that the Dirichlet series F(s) = n f (n)n−s converges absolutely and uniformly on the compact
subsets of the half-plane Re(s) > 1.
(b) Prove that F(s)2 = ζ(s) whenever Re(s) > 1.
√
20. Prove that d(n) ≤ 3n for all n ∈ N.
21.
(a) Prove that there exists an absolute constant c1 > 0 such that
c1 log n
d(n) exp
.
log log n
(b) Let n = p1 p2 · · · pk , where pk denotes the kth prime. Prove that there exists an absolute constant c2 > 0
such that
c2 log n
d(n) exp
.
log log n
21
Chapter 2
The prime number theorem
In this chapter we develop the basic theory of the Riemann zeta-function to the level needed for the
proof of the PNT in the form of Theorem 1. However, for technical reasons, instead of Theorem 1
we will establish the following result, which is equivalent to it (by Theorem 1.10).
Theorem 2.1. There exists an absolute constant c > 0 such that for x ≥ 2
√
ψ(x) = x + O x exp −c log x .
2.1 Definition of ζ(s). The functional equation
We start by providing a rigorous definition of the zeta-function. As we said in the Introduction, we
initially define ζ(s) for Re(s) > 1 by
∞
X
ζ(s) =
n−s .
(2.1)
n=1
Since the series converges uniformly on compact subsets of the half-plane Re(s) > 1, it follows that
ζ(s) is holomorphic in this half-plane. Moreover, since the convergence is absolute, Corollary 1.17
applies to ζ(s). In this way, we get the Euler product representation of ζ(s):
ζ(s) =
Y
p
1
1− s
p
−1
.
(2.2)
In particular, we deduce from (2.2) that ζ(s) is does not vanish in the half-plane Re(s) > 1.
Our next goal is to extend the definition of ζ(s) to the whole complex plane C. There are numerous ways to do this. We will follow the original approach of Riemann, which yields one of
the most elegant and illuminating treatments of the analytic continuation of ζ(s) even today. Alternative proofs can be found in most monographs on the theory of the zeta-function (for example,
Titchmarsh [50] gives seven such proofs). However, before we are in position to present Riemann’s
argument, we need to build up our knowledge about two classical functions.
22
2.1.1 The theta-series
If Re(z) > 0 and α ∈ C, we define the theta-function ϑ(z; α) by
ϑ(z; α) =
∞
X
exp −πz(n + α)2 .
n=−∞
(2.3)
The property of ϑ(z; α) that we are interested in is the following transformation formula.
Lemma 2.2. Let x > 0 and 0 ≤ α < 1 be real. Then
ϑ(x; α) = x−1/2 exp − πα2 x ϑ x−1 ; −iαx .
(2.4)
Proof. Suppressing the dependence on x, we write f (α) = ϑ(x; α). Since the series
∞
X
2
n=−∞
exp −πx(n + α)
∞
X
and
n=−∞
n exp −πx(n + α)2
converge uniformly in α, f (α) is continuously differentiable. It is also clear that f (α) is 1-periodic.
Hence, f (α) equals its Fourier series:
∞
X
f (α) =
fˆn e(nα).
n=−∞
Here fˆn is the nth Fourier coefficient of f (α),
Z 1
ˆfn =
f (t)e(−nt) dt.
0
Thus, it suffices to show that
Z
1
0
f (t)e(−nt) dt = x−1/2 exp −πn2 x−1 .
By the absolute convergence of the theta-series, we can integrate it term-by-term, whence
Z 1
XZ 1
f (t)e(−nt) dt =
exp −πx(m + t)2 e(−nt) dt
0
m∈Z
=
m∈Z
=
0
XZ
m+1
m
XZ
m+1
m
exp −πxt2 e(−n(t − m)) dt
exp −πxt2 e(−nt) dt
Zm∈Z
= exp −πxt2 e(−nt) dt
R
Z
−1/2
=x
exp −πu2 e − nx−1/2 u du
R
−1/2
=x
exp −πn2 x−1 ,
where the last step uses that the function exp(−πu2 ) equals its Fourier transform.
23
While the above proof can be generalized to all complex α and z with Re(z) > 0, it is easier
to extend Lemma 2.2 to those cases by means of the identity theorem for analytic functions. This
yields the following result.
Corollary 2.3. Let Re(z) > 0 and α ∈ C. Then
Here z−1/2
ϑ(z; α) = z−1/2 exp − πα2 z ϑ z−1 ; −iαz .
= exp − 21 Log z denotes the principal branch of the function z−1/2 .
(2.5)
2.1.2 The gamma-function
The other special function that will figure prominently in our analysis is Euler’s gamma-function
Γ(s). We define Γ(s) by
∞ Y
s −s/n
1
1+
e ,
(2.6)
= seγs
Γ(s)
n
n=1
where γ is Euler’s constant (recall (1.2)). As the infinite product is uniformly convergent on the
compact subsets of C − {0, −1, −2, . . . }, (2.6) defines Γ(s) as a meromorphic function on C with
simple poles at 0 and at the negative integers and with no zeros. We now state several important
properties of Γ(s) that will be needed later.
Lemma 2.4. Suppose that s , 0, −1, −2, . . . . Then
Γ(s + 1) = sΓ(s).
(2.7)
Proof. Define
ε(x) =
X1
k≤x
k
− log x − γ,
(2.8)
so that ε(x) → 0 as x → ∞ (in fact, ε(x) = O(x−1 )). By (2.6),
n
Y
1 + s/k e−s/k
seγs
Γ(s + 1)
=
lim
Γ(s)
(s + 1)eγ(s+1) n→∞ k=1 1 + (s + 1)/k e−(s+1)/k
s
(s + 1)(s + 2) · · · (s + n)
1
1
= γ
lim
exp 1 + + · · · +
e (s + 1) n→∞ (s + 2)(s + 3) · · · (s + n + 1)
2
n
ε(n)
exp{log n + γ + ε(n)}
ne
= se−γ lim
= s lim
= s.
n→∞
n→∞ n + s + 1
n+ s+1
Similar arguments can be used to establish other relations between the function values of Γ(s).
In particular, we have the following two formulas:
Γ(s)Γ(1 − s) =
π
,
sin πs
Γ(s)Γ(s + 1/2) = π1/2 21−2s Γ(2s).
24
(2.9)
Lemma 2.5 (Euler’s integral formula). Suppose that Re(s) > 0. Then
Z ∞
Γ(s) =
e−t t s−1 dt.
0
Proof. We first show that
Γ(s) = lim
n→∞
n! · n s
.
s(s + 1)(s + 2) · · · (s + n)
By (2.6),
−1 −γs
Γ(s) = s e
n Y
s −1 s/k
e
lim
1+
n→∞
k
k=1
s
s
1 · 2···n
· exp s + + · · · +
n→∞ (s + 1)(s + 2) · · · (s + n)
2
n
n! exp{s(log n + ε(n))}
n! · n s
= lim
= lim
,
n→∞ s(s + 1)(s + 2) · · · (s + n)
n→∞ s(s + 1)(s + 2) · · · (s + n)
= s−1 e−γs lim
where ε(x) is the function defined in (2.8). We now observe that
Z n
n! · n s
t n s−1
t dt =
.
1−
n
s(s + 1)(s + 2) · · · (s + n)
0
Indeed, when s > 0 the above integral converges and we have
Z 1
Z n
t n s−1
s
t dt = n
(1 − u)n u s−1 du
1−
n
0
0
Z 1
n
s
=n
(1 − u)n−1 u s du
s 0
Z 1
s n(n − 1)
=n
(1 − u)n−2 u s+1 du
s(s + 1) 0
..
.
Z 1
n(n − 1) · · · 1
s
=n
u s+n−1 du
s(s + 1) · · · (s + n − 1) 0
n! · n s
.
=
s(s + 1)(s + 2) · · · (s + n)
Thus, it suffices to prove that
Z ∞
Z n
t n s−1
t dt =
e−t t s−1 dt.
lim
1−
n→∞ 0
n
0
To this end, we consider the functions
(
fn (t) =
(1 − t/n)n t s−1
0
25
if 0 ≤ t ≤ n,
if t > n.
(2.10)
Each of these functions is in L1 [0, ∞) and satisfies the inequality
| fn (t)| ≤ e−t tσ−1 ,
where σ = Re(s). The last inequality is easily verified by taking logarithms and noting that
t
t2
t3
n log 1 −
= −t −
− 2 − · · · < −t.
n
2n 3n
Furthermore,
t n
= e−t t s−1 .
lim fn (t) = t s−1 lim 1 −
n→∞
n→∞
n
Since the function e−t tσ−1 is in L1 [0, ∞), the dominated convergence theorem yields
Z ∞
Z ∞
Z ∞
lim
fn (t) dt =
lim fn (t) dt =
e−t t s−1 dt,
n→∞
0
0
n→∞
0
which completes the proof of the lemma.
We conclude our discussion of Γ(s) with Stirling’s formula, which provides an asymptotic
expansion for log Γ(s) when |s| → ∞.
Lemma 2.6 (Stirling’s formula). Suppose that | arg s| < π. Then
Z
√
log Γ(s) = (s − 1/2) log s − s + log 2π +
0
∞
Ψ(u)
du.
u+s
Here log s denotes the principal branch of the logarithm and Ψ(u) = {u} − 1/2.
Corollary 2.7. Suppose that 0 < δ < π and | arg s| < π − δ. Then
√
log Γ(s) = (s − 1/2) log s − s + log 2π + O |s|−1
and
Γ0 (s)
= log s + O |s|−1 ,
Γ(s)
the implied constants depending at most on δ.
Corollary 2.8. Suppose that α ≤ σ ≤ β and |t| ≥ 1. Then
√
|Γ(σ + it)| = 2π|t|σ−1/2 exp(−π|t|/2) 1 + O |t|−1 ,
the implied constant depending at most on α and β.
26
2.1.3 The functional equation
We are now in position to obtain the analytic continuation of ζ(s).
Proposition 2.9. Suppose that Re(s) > 1. Then
Z
s
1 ∞ s/2−1
1
−s/2
+
x
+ x−s/2−1/2 (ϑ(x; 0) − 1) dx,
π Γ
ζ(s) =
2
s(s − 1) 2 1
(2.11)
where ϑ(x; 0) is defined by (2.3).
Proof. From (2.10), we have
Z ∞
s Z ∞
2
−y s/2−1
s/2 s
Γ
=
e y
dy = π n
e−πxn xs/2−1 dx.
2
0
0
Summing this identity over n, we get
−s/2
π
Γ
s
2
ζ(s) =
∞ Z
X
n=1
∞
2
e−πxn xs/2−1 dx.
0
After interchanging the order of summation and integration, the right side becomes
Z ∞X
Z
∞
1 ∞
−πxn2 s/2−1
e
x
dx =
(ϑ(x; 0) − 1)xs/2−1 dx.
2
0
0
n=1
Next we write
Z
0
1
(ϑ(x; 0) − 1)x
s/2−1
dx =
Z
1
∞
ϑ(t−1 ; 0) − 1 t−s/2−1 dt
and use (2.4) with α = 0 to put the last integral in the form
Z ∞
Z ∞
−s/2−1
1/2
t ϑ(t; 0) − 1 t
dt =
(ϑ(t; 0) − 1)t−s/2−1/2 dt +
1
1
2
.
s(s − 1)
Clearly this completes the proof of (2.11).
Theorem 2.10. The function ζ(s) can be continued to a meromorphic function on C, whose only
singularity is a simple pole at s = 1 with residue 1. Furthermore, the ζ(s) satisfies the functional
equation
s
1−s
(s−1)/2
−s/2
ζ(s) = π
Γ
ζ(1 − s).
(2.12)
π Γ
2
2
Proof. Proposition 2.9 was obtained under the assumption Re(s) > 1, but since for x > 1
ϑ(x; 0) − 1 e−πx ,
the integral on the right side of (2.11) is an entire function. Therefore, the function
Z
s(s − 1) ∞
(ϑ(x; 0) − 1) xs/2−1 + x(1−s)/2−1 dx + 1
ξ(s) =
2
1
27
(2.13)
is entire. Thus, the right side of the equation
s −1
πs/2
Γ
ξ(s)
ζ(s) =
s(s − 1)
2
(2.14)
(which is equivalent to (2.11) when Re(s) > 1) is a meromorphic function whose only singularity
is a simple pole at s = 1 with residue
π1/2 ξ(1)
= 1.
Γ( 21 )
This establishes the first part of the theorem.
The functional equation (2.12) follows from (2.14) and the invariance of ξ(s) under the transformation s 7→ 1 − s.
2.2 The zeros of ζ(s)
Since we want to work with the logarithmic derivative ζ 0 (s)/ζ(s), we need to understand the zeros
and the poles of the zeta-function. Theorem 2.10 provides the necessary information about the
single pole of ζ(s). In this section, we concentrate on the zeros. So far we know that there are
no zeros in the half-plane Re(s) > 1. Since 1/Γ(s) is entire with simple zeros at the nonpositive
integers, it follows from the functional equation (2.12) that the only zeros of ζ(s) in the half-plane
Re(s) < 0 are simple zeros at the negative even integers and that ζ(0) , 0. The remaining part of
the complex plane—the vertical strip 0 ≤ Re(s) ≤ 1— is a twilight zone which may, and indeed
does, contain more zeros of the zeta-function. These come from the factor ξ(s) on the right of
(2.14). In this section we study ξ(s) as an entire function of order 1.
In general, an entire function f with f (0) , 0 is said to be of finite order if there is a number
η > 0 such that
M f (r) = max | f (z)| : |z| ≤ r f,η exp(rη ).
When it is finite, the infimum of all such η > 0 is called the order of f . Entire functions of finite
order enter our discussion because of the following result.
Lemma 2.11. The function ξ(s) is an entire function of order 1. Furthermore,
lim sup
|s|→∞
log |ξ(s)| 1
= .
|s| log |s| 2
(2.15)
Proof. Since ξ(s) = ξ(1 − s), it suffices to bound |ξ(s)| in the half-plane Re(s) ≥ 1/2. There, we can
estimate ξ(s) by means of (2.14), Stirling’s formula, and elementary upper bounds for ζ(s). When
Re(s) > 1, a variant of Corollary 1.5 yields
Z ∞
s
−s
{x}x−s−1 dx,
(2.16)
ζ(s) =
s−1
1
28
where {x} denotes the fractional part of x. However, since the last integral represents a holomorphic
function in Re(s) > 0, the identity holds in this larger domain. In particular, we have
(s − 1)ζ(s) |s|2
whenever Re(s) ≥ 1/2.
Moreover, since log |Γ(s)| ≤ | log Γ(s)|, Stirling’s formula yields
log |Γ(s)| ≤ |s| log |s| + O(|s|)
whenever Re(s) ≥ 1/2.
Combining the last two estimates and (2.14), we obtain
log |ξ(s)| ≤ 21 |s| log |s| + O(|s|)
whenever Re(s) ≥ 1/2,
which establishes the first claim of the lemma. For the second claim, we note that
log |Γ(|s|)| = log Γ(|s|) = |s| log |s| + O(|s|),
whence
log ξ(|s|) = 21 |s| log |s| + O(|s|)
as |s| → ∞.
We now take a short detour into the theory of entire functions of finite order.
Lemma 2.12. Suppose that f (s) is holomorphic in the closed disk |s| ≤ R. Let 0 < r < R and let
a1 , . . . , an be the zeros of f (s) lying inside the disk |s| ≤ r (listed according to multiplicities). Then
| f (0)|Rn
≤ max | f (s)|.
|a1 a2 · · · an | |s|=R
Proof. Consider the function
n
Y
R2 − sāk
F(s) = f (s)
.
R(s − ak )
k=1
This function is holomorphic in |s| ≤ R and |F(s)| = | f (s)| on the circle |s| = R. Thus, by the
maximum modulus principle,
| f (0)|Rn
= |F(0)| ≤ max |F(s)| = max | f (s)|.
|s|=R
|s|=R
|a1 a2 · · · an |
Lemma 2.13. Suppose that f (s) is an entire function of order η, and let N(R) denote the number of
zeros of f (s) inside the disk |s| < R, counted according to their multiplicities. Then for any R ≥ 1
and > 0,
N(R) f,η, Rη+ .
29
Proof. Without loss of generality, we may assume that f (0) = 1. Let a1 , . . . , an be the zeros of f (s)
inside the disk |s| < R, listed according to multiplicities, and choose an r > 0 so that
max |ak | : 1 ≤ k ≤ n ≤ r < R.
We apply Lemma 2.12 with r and 2R and obtain
Y 2R N(R)
2
≤
≤ max | f (s)| f, exp((2R)η+ ).
|s|=2R
|a
|
k
|a |<R
k
The desired conclusion now follows by taking logarithms.
Lemma 2.14. Let R > 0 and suppose that f (s) is holomorphic in the disk |s| ≤ R. Then
(n) f (0) ≤ 2n!R−n max Re( f (s) − f (0)).
|s|=R
Proof. It suffices to consider the case f (0) = 0. We write
f (n) (0)
= rn eiφn ,
n!
Then
iθ
Re f (Re ) =
∞
X
s = Reiθ .
rn Rn cos(nθ + φn ).
n=1
Since the last series is absolutely convergent, we can integrate it term-by-term to obtain the identity
Z 2π
1 + cos(nθ + φn ) Re f (Reiθ ) dθ = πrn Rn .
0
Hence,
n
πrn R ≤ M
where M = max{Re f (s) : |s| = R}.
Z
2π
0
1 + cos(nθ + φn ) dθ = 2πM,
Corollary 2.15. Suppose that f (s) is an entire function such that Re f (s) = o(|s|n ) as |s| → ∞.
Then f (s) is a polynomial of degree at most n − 1.
Theorem 2.16. Suppose that f (s) is an entire function of order 1 with f (0) , 0, and let a1 , a2, a3 , . . .
denote the zeros of f (s) listed according to their multiplicities and arranged so that
0 < |a1 | ≤ |a2 | ≤ · · · ≤ |an | ≤ · · · .
Then f (s) can be written as
f (s) = e
A+Bs
∞ Y
n=1
where A and B are constants.
30
s
1−
an
es/an ,
Proof. From Lemma 2.13 we deduce by partial summation that
product
∞ Y
s
P(s) =
1−
es/an
a
n
n=1
P
n
|an |−2 converges. Hence, the
converges uniformly on compact sets. Thus, P(s) is an entire function having the same zeros as
f (s) and F(s) = f (s)/P(s) is an entire function with no zeros. Therefore, F(s) has a holomorphic
logarithm, that is, there is an entire function g(s) such that F(s) = eg(s) . We now prove that
g(s) = A + Bs
(2.17)
for some constants A, B.
In view of Corollary 1.14, (2.17) follows from the estimate Re g(s) = o(|s|2 ) as |s| → ∞. Since
Re g(s) is a harmonic function, it suffices to establish this on a sequence of circles |s| = R with
R → ∞. We choose the radii so that
R − |an | R−1.1
(2.18)
for all zeros an of f (s); this is possible because of Lemma 2.13. With such a choice of R, we have
(
R|an |−1 + log R if |an | ≤ 2R,
− log (1 − s/an )es/an R2 |an |−2
if |an | > 2R,
whenever |s| = R. Hence, by Lemma 2.13 and partial summation (see Exercise 6),
X
X
X
− log |P(s)| log R
1+R
|an |−1 + R2
|an |−2 R1+
|an |≤2R
|an |≤2R
|an |>2R
for any 0 < < 1. Therefore,
Re g(s) = log eg(s) = log | f (s)| − log |P(s)| R1+
whenever |s| = R and R is chosen so that (2.18) holds. This establishes (2.17) and completes the
proof of the theorem.
Theorem 2.17. The function ξ(s) has infinitely many zeros in the strip 0 ≤ Re(s) ≤ 1 and no zeros
outside that strip. It can be written as
Y
s
Bs
1−
ξ(s) = e
es/ρ ,
(2.19)
ρ
ρ
where ρ runs through the zeros of ξ(s) counted according to their multiplicities and B is a constant.
Proof. Because of Lemma 2.11, we can apply Theorem 2.16 to ξ(s). Upon noting that ξ(0) = 1
(recall (2.13)), this proves (2.19). The infinitude of the zeros of ξ(s) is also a consequence of
Lemma 2.11. Indeed, if ξ(s) had only a finite number of zeros, (2.19) would imply the estimate
log |ξ(s)| |s|,
which contradicts (2.15).
31
Lemma 2.18. There exists a constant B1 such that
X
∞ X
−1
1
1
1
ζ 0 (s)
1
=
+ B1 +
−
+
+
,
ζ(s)
s−1
s
+
2n
2n
s
−
ρ
ρ
ρ
n=1
(2.20)
where ρ runs through the zeros of ξ(s) counted according to their multiplicities.
Proof. By logarithmic differentiation of (2.14), (2.6), and (2.19), we get
√
−1
1
Γ0 (s/2) ξ 0 (s)
ζ 0 (s)
=
− + log π −
+
ζ(s)
s−1 s
2Γ(s/2) ξ(s)
∞ X
1
1
Γ0 (s) 1
= +γ+
−
−
Γ(s)
s
s
+
n
n
n=1
X
ξ 0(s)
1
1
.
= B+
+
ξ(s)
s
−
ρ
ρ
ρ
Combining these three formulas, we obtain (2.20) with B1 = log
constant and B is the constant appearing in (2.19).
√
π + 12 γ + B, where γ is Euler’s
Theorem 2.19. Suppose that s = σ + it, −1 ≤ σ ≤ 2. Then
X
1
ζ 0 (s)
−1
=
+
+ O(log(|t| + 2)).
ζ(s)
s − 1 | Im ρ−t|≤1 s − ρ
(2.21)
Proof. We write τ = |t| + 2. Under the hypotheses of the theorem, we have
∞ X
1
1 X 3 X |s|
+
log τ.
s + 2n − 2n ≤
2
2n
n
n≤τ
n>τ
n=1
Substituting this into (2.20), we obtain
X
ζ 0 (s)
−1
=
+
ζ(s)
s−1
ρ
1
1
+
s−ρ ρ
+ O(log τ).
(2.22)
We now apply (2.22) to s = 2 + it. Logarithmic differentiation of the Euler product (2.2) yields
0
ζ (2 + it) X log p X log p
=
≤
,
ζ(2 + it) 2+it − 1 2−1
p
p
p
p
so (2.22) with s = 2 + it gives
Re
X
ρ
1
1
+
2 + it − ρ ρ
32
log τ.
(2.23)
Writing a typical zero of ξ(s) in the form ρ = β + iγ and noting that 0 ≤ β ≤ 1, we now find
Re
2−β
1
1
=
,
2 + it − ρ (2 − β)2 + (t − γ)2
1 + (t − γ)2
Re
1
β
= 2
≥ 0.
ρ β + γ2
These inequalities and (2.23) give
X
ρ
1
log τ.
1 + (t − γ)2
(2.24)
To prove (2.21) we subtract from (2.22) the respective formula for s = 2 + it and obtain
X 1
ζ 0 (s)
−1
1
+ O(log τ).
(2.25)
=
+
−
ζ(s)
s−1
s − ρ 2 + it − ρ
ρ
By (2.24),
X 1
X
X
3
6
1
≤
−
≤
log τ
s − ρ 2 + it − ρ (t − γ)2
1 + (t − γ)2
ρ
| Im ρ−t|>1
| Im ρ−t|>1
and
X 1
2 + it − ρ ≤
| Im ρ−t|≤1
X
| Im ρ−t|≤1
1≤
X
ρ
2
log τ.
1 + (t − γ)2
Hence, (2.21) follows from (2.25).
We conclude this section by recording a direct consequence of inequality (2.24).
Corollary 2.20. Suppose that T ≥ 2. The number of the zeros of ζ(s) in the region
0 ≤ Re(s) ≤ 1,
T ≤ | Im(s)| ≤ T + 1
is O(log T ).
2.3 The zerofree region
Theorem 2.21 (de la Vallée Poussin). There exists an absolute constant c1 > 0 such that ζ(s) has
no zero ρ = β + iγ with
c1
β≥1−
.
(2.26)
log(|γ| + 2)
Proof. Suppose that s = σ + it with σ > 1. Taking real parts in Corollary 1.20, we obtain
− Re
ζ 0 (s)
ζ(s)
=
∞
X
Λ(n)n−σ cos(t log n).
n=1
33
Since for real θ,
we have
3 + 4 cos θ + cos 2θ = 2(1 + cos θ)2 ≥ 0,
(2.27)
0
0
ζ (σ + 2it)
ζ (σ + it)
ζ 0 (σ)
− Re
≥ 0.
(2.28)
− 4 Re
−3
ζ(σ)
ζ(σ + it)
ζ(σ + 2it)
We now consider a particular zero ρ = β0 + iγ0 and write inequalities for the terms on the left
side of (2.28) when t = γ0 . Since ζ(s) has a pole of residue 1 at s = 1, we have
−
ζ 0 (σ)
1
=
+ O(1).
ζ(σ)
σ−1
(2.29)
In view of Exercise 10, we have |γ0 | ≥ c2 > 0. Thus, we obtain from (2.21) that
0
X
1
ζ (σ + iγ0 )
− Re
+ c3 log(|γ0 | + 2)
≤ − Re
ζ(σ + iγ0)
(σ − β) + i(γ0 − γ)
|γ−γ |≤1
0
and similarly,
−1
≤
+ c3 log(|γ0 | + 2),
(σ − β0 )
ζ 0 (σ + 2iγ0)
≤ c4 log(|γ0 | + 2).
− Re
ζ(σ + 2iγ0 )
Inserting (2.29)–(2.31) into (2.28), we deduce that for σ close to 1,
(2.30)
(2.31)
4(σ − β0 )−1 − 3(σ − 1)−1 ≤ c5 log(|γ0 | + 2).
Choosing
σ=1+
we obtain
1
,
2c5 log(|γ0 | + 2)
1
,
14c5 log(|γ0 | + 2)
which establishes (2.26) when |γ0 | ≥ c2 . We dispense with the last condition by noting that there
are only O(1) zeros with |γ0 | ≤ c2 and that none of them can be too close to the pole at s = 1. β0 ≤ 1 −
As we mentioned in the Introduction, there are more precise estimates for the zerofree region
of ζ(s) than that in Theorem 2.21. While their proofs are too technical to present here in full
detail, we will describe briefly the main idea. In the proof of Theorem 2.21, we derived bounds
for ζ 0 (s)/ζ(s) from Theorem 2.19, which in turn relied on estimates for Γ0 (s)/Γ(s). The more
sophisticated approach towards the zerofree region bounds ζ 0 (s)/ζ(s) using estimates for the sum
X
X
nit =
exp(it log n).
n≤N
n≤N
In fact, both Littlewood’s and the Vinogradov–Korobov improvements on Theorem 1 (recall (0.8)
and (0.9)) stem from improved estimates for this exponential sum. Those interested in learning
more about those results should consult the specialized monographs on the theory of the zetafunction, e.g., Ivić [31], Karatsuba and Voronin [37], or Titchmarsh [50].
34
2.4 Proof of the prime number theorem
Our proof of Theorem 2.1 combines Theorem 2.21 and the following result, known as the Riemann–
Mangoldt explicit formula for ψ(x). An alternative proof is sketched in Exercise 17.
Theorem 2.22. Suppose that 2 ≤ T ≤ x and {x} = 1/2. Then
X xρ
x(log x)2
ψ(x) = x −
,
+O
ρ
T
| Im ρ|≤T
(2.32)
where the summation is over the nontrivial zeros of ζ(s) with | Im ρ| ≤ T .
Proof. We apply Corollary 1.14 with f (s) = −ζ 0 (s)/ζ(s) and α = 1 + (log x)−1 . In view of Corollary 1.20, this gives
X
Z α+iT 0 s
∞
Λ(n)
1
ζ (s) x
x
ψ(x) =
−
.
ds + O
2πi α−iT
ζ(s) s
T n=1 nα | log(x/n)|
The error term is easily seen to be
X
X 1
x log x X 1
x(log x)2
1
+
+
.
T
n x/2<n≤2x |x − n| n>2x nα
T
n≤x/2
Hence,
Z α+iT 0 s
ζ (s) x
x(log x)2
1
.
(2.33)
ds + O
−
ψ(x) =
2πi α−iT
ζ(s) s
T
Observe that, by Corollary 2.20, we can always choose a number T of a certain size so that
T − |γ| (log T )−1
(2.34)
for all zeros ρ = β + iγ of ζ(s). With such a value of T we move the path of integration to the
contour C on Fig. 2.1. The contribution from the poles of the integrand lying between the two
contours is
X xρ ζ 0 (0)
−
,
x−
ρ
ζ(0)
|ρ|≤T
so it remains to show that the integral along C is negligible. To this end, we note that by Theorem 2.19 and Corollary 2.20,
0 ζ (s) 2
ζ(s) (log T )
whenever s ∈ C. Hence,
Z 0 s Z α
Z T −1/2 ζ
(s)
1
x
dy
x
u
2
−
x du +
ds (log T )
ζ(s) s
T −1/2
C
−T 1 + |y|
xT −1 (log T )2 + x−1/2 (log T )3 xT −1 (log x)2 .
Once (2.32) has been established for T subject to (2.34), removing that constraint by means of
Corollary 2.20 is straightforward.
35
− 21 + iT
α + iT
− 12 − iT
α − iT
Figure 2.1:
Proof of Theorem 2.1. Suppose that T ≥ 10 and set δ = c1 (log T )−1 , where c1 > 0 is the constant
from Theorem 2.21. By Corollary 2.20 and Theorem 2.21, the sum over the zeros on the right of
(2.32) is
X
X
2−r
x1−δ
1 x1−δ (log T )2 .
| Im ρ|≤2r
r≤2 log T
Thus, the result follows from (2.32) on choosing
log T = (log x)1/2 + O(1).
Exercises
1. Prove identities (2.9).
2. Prove Stirling’s formula.
3. Suppose that Re(s) > 0 and N ∈ N. Prove that
ζ(s) =
X
n
n≤N
−s
N 1−s
−s
+
1−s
Z
∞
N
{u}u−s−1 du.
4. Prove Corollary 2.15.
5. Prove that the convergence of the series
product
P
n
|an |−2 implies the uniform convergence on compact sets of the
∞ Y
s
1−
e s/an .
a
n
n=1
36
6. Suppose that f (s) is an entire function of order 1 with f (0) , 0 and a1 , a2 , a3 , . . . are its zeros, labeled as in
Theorem 2.16. Prove that:
X
(a)
|an |α α, Rα+1+ + 1 for any > 0;
|an |≤R
(b)
X
|an |>R
|an |α α, Rα+1+ for any α < −1 and 0 < < −α − 1.
7. Prove that ζ(0) = −1/2 and ζ(−1) = −1/12.
8. Prove that the Laurent expansion of ζ 0 (s)/ζ(s) about s = 1 is
−1
ζ 0 (s)
=
+γ + ··· ,
ζ(s)
s−1
where γ is Euler’s constant. H: Use (2.16).
9. Prove that the value of B in (2.19) is B =
1
2
log 4π − 21 γ − 1, where γ is Euler’s constant.
10. Prove that any zero ρ of the function ξ(s) satisfies
√
| Im ρ| ≥ −B−1 − 1 > 6.503695 . . . ,
where B is the constant appearing in (2.19) (and in the last problem).
11. The purpose of this problem is to show that the value of the constant B in Theorem 1.9 is
X1
1
+ log 1 −
B=γ−
,
p
p
p
(∗)
where γ is Euler’s constant. Note that combining this identity and the result of Exercise 1.8, we find that the
constant C in Exercise 1.7 equals e−γ .
P
(a) Let S (x) denote the sum on the left side (1.5) and define f (s) = p p−s . Prove that if Re(s) > 1, then
Z ∞
f (s) = (s − 1)
S (x)x−s dx.
1
(b) Suppose that σ > 1. Combining Theorem 1.9 and part (a), prove that
f (σ) = − log(σ − 1) − γ + B + O(−(σ − 1) log(σ − 1)).
(c) Suppose that Re(s) > 1. Prove that f (s) = log ζ(s) + g(s), where g(s) is holomorphic in the half-plane
Re(s) > 1/2.
(d) Derive (∗) from parts (b) and (c).
12. Suppose that T ≥ 10 and N(T ) denotes the number of zeros ρ of ξ(s) with 0 < Im ρ ≤ T . Prove that
T
T
log
+ O(log T ).
N(T ) =
2π
2πe
H: Apply the argument principle to ξ(s) and the rectangle with vertices −1 ± iT , 2 ± iT . Use the results in
§2.2 to estimate the contribution from the horizontal lines. Use the functional equation to replace the integral
over the line Re(s) = −1 by an integral overthe line Re(s) = 2. Finally, use that on the line Re(s) = 2,
ζ 0 (s)/ζ(s) has a Dirichlet series representation.
37
13. Suppose that ρ1 , ρ2 , ρ3 , . . . are the zeros of ξ(s) in the upper half-plane, listed according to multiplicities and
arranged so that 0 < γ1 ≤ γ2 ≤ γ3 ≤ · · · , where γn = Im ρn . Prove that
γn log n
= 1.
n→∞ 2πn
lim
14. Prove Theorem 2.22 for an arbitrary x ≥ 2. H: Use Lemma 2.18.
15. Define ψ0 (x) = 21 ψ(x+ ) + ψ(x− ) . Prove that for x > 1,
ψ0 (x) = x −
Here
P
ρ
= lim
T →∞
16. Define ψ1 (x) =
P
P
X xρ ζ 0 (0) 1
−
− log 1 − x−2 .
ρ
ζ(0)
2
ρ
|ρ|≤T .
n≤x (x
− n)Λ(n). Prove that for x > 1,
ψ1 (x) =
∞
ζ 0 (0) ζ 0 (−1) X x1−2k
x2 X xρ+1
−
−x
+
−
.
2
ρ(ρ + 1)
ζ(0)
ζ(−1) k=1 2k(2k − 1)
ρ
17. The purpose of this exercise is to deduce the PNT directly from (2.33) instead from the explicit formula (2.32).
Starting with (2.33) move the integration to a polygonal contour C that is similar to the contour displayed on
Fig. 2.1 but has vertices at α ± iT and η ± iT , where η = 1 − 21 c1 (log T )−1 . Estimate the integral over C to obtain
the PNT.
38
Chapter 3
Prime numbers in arithmetic progressions
In this chapter we study Dirichlet characters and L-functions and prove Theorem 2.
3.1 Characters
3.1.1 Characters of finite abelian groups
Let G be a finite abelian group of order m, written multiplicatively. A group homomorphism
χ : G → C× is called a character of G, that is,
for all x, y ∈ G.
χ(xy) = χ(x)χ(y)
In particular, χ(e) = 1 and (by Lagrange’s theorem on finite groups) χ(x)m = χ(xm ) = χ(e) = 1.
That is, χ(x) is an mth root of unity.
The characters of G form a group Ĝ under pointwise multiplication:
(χ1 χ2 )(x) = χ1 (x)χ2 (x)
for all x ∈ G.
The identity element of Ĝ is the trivial character
χ0 (x) = 1
for all x ∈ G,
and the inverse of χ is its complex-conjugate character χ̄.
Theorem 3.1. Ĝ G.
Proof. Suppose first that G is cyclic, G = hgi. Writing a generic element x of G as x = gy , we find
that every character χ ∈ Ĝ must be of the form
χa (x) = χa (gy ) = e(ay/m)
that is, Ĝ is a cyclic group of order m generated by χ1 .
39
(a ∈ Z),
Now, suppose that G is an arbitrary finite abelian group. By the structural theorem for finite
abelian groups, G can be written as the direct product of cyclic groups, G = C1 × C2 × · · · × Ck .
Given an x = x1 x2 · · · xk , x j ∈ C j , we define a character χ ∈ Ĝ by
χ(y) = χ1 (y1 )χ2 (y2 ) · · · χk (yk )
for y = y1 y2 · · · yk ∈ G, y j ∈ C j ;
here χ j is the character in Ĉ j corresponding to x j under the above isomorphism. Since the map
x 7→ χ is an isomorphism of abelian groups, the result follows.
Corollary 3.2. Suppose that G is a finite abelian group and x is an element of G other than the
identity. Then there is a character χ ∈ Ĝ such that χ(x) , 1.
Proof. This is a consequence of the proof of the theorem. As in that proof, we write G as the direct
product of cyclic groups, G = C1 × C2 × · · · × Ck . Then x = x1 x2 · · · xk , x j ∈ C j , and some x j is not
the identity. Without loss of generality, we may assume that x1 , e. Let g be the generator of C1 .
The character χ corresponding to ge · · · e under the isomorphism from the proof of the theorem has
the desired property.
Lemma 3.3. Let G be a finite abelian group and denote by e and χ0 the identity element and the
trivial character of G. Then the following two orthogonality relations hold:
(
X
|G| if χ = χ0 ,
χ(x) =
(3.1)
0
otherwise;
x∈G
and
X
χ∈Ĝ
(
|G|
χ(x) =
0
if x = e,
otherwise.
(3.2)
Proof. Suppose that χ , χ0 . Then for some x0 ∈ G, χ(x0 ) , 1. We now observe that
X
X
X
X
χ(x0 )
χ(x) =
χ(x0 x) =
χ(y) =
χ(y).
x∈G
x∈G
y∈x0 G
y∈G
Since χ(x0 ) , 1, the sum on the right must be equal to 0. This establishes (3.1) when χ is nontrivial;
the alternative case is straightforward.
Now suppose that x , e. By Corollary 3.2, there is a character χ0 ∈ Ĝ such that χ0 (x) , 1. But
X
X
X
X
χ0 (x)
χ(x) =
χ0 χ(x) =
ψ(x) =
ψ(x),
χ∈Ĝ
χ∈Ĝ
ψ∈χ0 Ĝ
ψ∈Ĝ
and since χ0 (x) , 1, the sum on the right must be equal to 0. Again, the remaining case is
straightforward.
40
3.1.2 Dirichlet characters
Let q ≥ 1 be an integer. Then Z/qZ is a commutative ring. Let Gq = (Z/qZ)× denote the multiplicative group of Z/qZ. (Recall that a residue class in Z/qZ is invertible if and only if it is relatively
prime to the modulus q.) Then Gq is an abelian group of order φ(q), where φ(q) is Euler’s function.
Of course, a character of Gq is a homomorphism χ : Gq → C× . It will be convenient to extend the
domain of each character to all elements of the ring Z/qZ by setting
χ(n) = 0
if gcd(n, q) > 1.
This extended function will be called a Dirichlet character modulo q, or simply a Dirichlet character. We will often regard each Dirichlet character modulo q as a q-periodic function χ : Z → C.
Although Dirichlet characters are not group homomorphisms, they are still completely multiplicative:
χ(mn) = χ(m)χ(n)
for all m, n ∈ Z.
(3.3)
We refer to the extension of the trivial group character χ0 as the principal character modulo q;
we will denote the principal character by χ0 . The orthogonality relations in Lemma 3.3 yield the
following orthogonality relations among Dirichlet characters.
Lemma 3.4. Suppose that q ≥ 1. If χ is a Dirichlet character modulo q, then
(
q
X
φ(q) if χ = χ0 ,
χ(n) =
0
otherwise.
n=1
Furthermore,
X
χ mod q
χ(n) =
(
φ(q) if n ≡ 1 (mod q),
0
otherwise,
(3.4)
(3.5)
where the sum on the right side is over all Dirichlet characters modulo q.
Let χ be a non-principal Dirichlet character modulo q, let q1 be a proper divisor of q, and let χ1
be a non-principal character modulo q1 such that
χ(n) = χ1 (n)χ0 (n)
for all n ∈ Z,
(3.6)
where χ0 is the principal character modulo q. Then we say that χ1 induces χ. If χ is a nonprincipal Dirichlet character modulo q and there exists a character χ1 as in (3.6), then χ is called
imprimitive; otherwise, χ is called primitive. Note that principal characters are neither primitive,
nor imprimitive. If χ is an imprimitive Dirichlet character modulo q, we define1 its conductor
to be the least modulus q∗ such that there exists a (necessarily primitive) character χ∗ modulo q∗
which induces χ. If χ is primitive, we define its conductor to be equal to the modulus q, and if χ is
principal, we define the conductor to be equal to 1.
1
This definition requires some justification; see Exercise 2.
41
3.1.3 Gaussian sums
We now introduce the Gaussian sum. If χ is a Dirichlet character modulo q and a is an integer, we
define
X
τ(χ, a) =
χ(m)e(am/q),
(3.7)
m mod q
where the summation is over any complete system of residues modulo q.
Lemma 3.5. Let χ be a Dirichlet character modulo q and suppose that either gcd(a, q) = 1 or χ is
primitive. Then
τ(χ, a) = χ̄(a)τ(χ, 1).
(3.8)
Proof. First, suppose that (a, q) = 1. Then
X
X
τ(χ, a) = χ̄(a)
χ(am)e(am/q) = χ̄(a)
χ(n)e(n/q) = χ̄(a)τ(χ, 1).
m mod q
n mod q
Here, we used that if m runs through a complete system of residues modulo q, then so does am.
Now, suppose that χ is primitive and (a, q) = k > 1. We write a = ka1 , q = kq1 , and note that
there exists an integer b such that
(b, q) = 1,
Then
χ(b)τ(χ, a) =
X
b ≡ 1 (mod q1 ),
χ(bm)e(a1 m/q1 ) =
m mod q
X
χ(b) , 1.
χ(bm)e(a1 bm/q1 ) = τ(χ, a).
m mod q
Since χ(b) , 1, it follows that τ(χ, a) = 0, which establishes the second claim of the lemma.
Identity (3.8) is useful for transforming exponential sums into character sums and vice versa.
However, for such applications it is crucial to be sure that τ(χ, 1) is nonzero. The next lemma
determines exactly the characters for which this is the case.
Lemma 3.6. Let χ be a Dirichlet character modulo q induced by a primitive character χ∗ modulo
q∗ . Then
q
q
τ(χ, 1) = µ
χ∗
τ(χ∗ , 1).
(3.9)
∗
∗
q
q
√
Moreover, if χ is primitive, then |τ(χ, 1)| = q.
Proof. Assume first that χ is primitive. Summing (3.8) over all a modulo q, we get
X
X
|τ(χ, 1)|2
|χ(a)|2 =
|τ(χ, a)|2
a mod q
a mod q
=
X
X
χ(m)e(am/q)
a mod q m mod q
=
X
X
m mod q n mod q
42
X
χ̄(n)e(−an/q)
n mod q
χ(m)χ̄(n)
X
a mod q
e(a(m − n)/q).
(3.10)
The innermost sum on the right side of (3.10) is q or 0 according as a ≡ b (mod q) or not. Hence,
X
X
|τ(χ, 1)|2
|χ(a)|2 = q
|χ(m)|2 .
a mod q
m mod q
This proves the second claim of the lemma.
We now turn to (3.9). Using Lemma 1.2, we can write the principal character χ0 modulo q as
X
χ0 (n) =
µ(d).
d|(n,q)
Thus,
τ(χ, 1) =
X
χ∗ (m)χ0 (m)e(m/q) =
X
µ(d)
d|q
=
X
χ∗ (m)e(m/q)
X
X
µ(d)
d|(m,q)
m mod q
m mod q
=
X
χ∗ (nd)e(nd/q)
n mod q/d
µ(d)χ∗ (d)
d|q
X
χ∗ (n)e(nd/q).
n mod q/d
Note that the terms with (d, q∗ ) > 1 do not contribute to the last sum. Thus, we may restrict the
summation over d to the divisors of q0 = q/q∗ :
X
X
τ(χ, 1) =
µ(d)χ∗ (d)
χ∗ (n)e(nd/q).
d|q0
n mod q/d
We now write the summation variable n modulo q/d as q∗ v + u, where u runs over a complete
system of residues modulo q∗ and v runs over a complete system of residues modulo q/dq∗ = q0 /d.
We get
X
X
X
τ(χ, 1) =
µ(d)χ∗ (d)
χ∗ (q∗ v + u)e((q∗ v + u)d/q)
u mod q∗ v mod q0 /d
d|q0
=
X
d|q0
µ(d)χ∗ (d)
X
χ∗ (u)e(ud/q)
u mod q∗
X
e(vd/q0 ).
v mod q0 /d
Since the innermost sum vanishes when q0 /d > 1, the result follows.
3.1.4 The Pólya–Vinogradov theorem
Suppose that χ is a non-principal character modulo q. From the orthogonality relation (3.4),
X
χ(n) ≤ φ(q)/2
M<n≤M+N
for all M, N ≥ 1. In this section, we will improve on this trivial bound. The next result was
obtained independently in 1918 by Pólya [45] and I. M. Vinogradov [56] and is known as the
Pólya–Vinogradov inequality.
43
Theorem 3.7. Suppose that M, N are positive integers and χ is a non-principal character modulo
q. Then
p
X
χ(n) ≤ 3q log q.
(3.11)
M<n≤M+N
Proof. First, suppose that χ is primitive. Then, by (3.8),
X
X
X
X
τ(χ̄, 1)
χ(n) =
χ̄(m)e(nm/q) =
χ̄(m)
M<n≤M+N
M<n≤M+N m mod q
m mod q
X
e(mn/q).
M<n≤M+N
Hence, we deduce from Lemma 3.6 that
X
M<n≤M+N
q−1 X
−1/2
χ(n) ≤ q
m=1
X
M<n≤M+N
e(mn/q).
On noting that the modulus of the inner sum is | sin(πmN/q)/ sin(πm/q)|, we get the inequality
X
M<n≤M+N
q−1
X
1/2
csc(πm/q).
χ(n) ≤ q
m=1
We now use apply the inequality csc(πx) ≤ (2x)−1 for 0 < x ≤ 1/2. When q = 2k, we obtain
q−1
X
m=1
csc(πm/q) ≤ q
k−1
X
m
−1
m=1
k−1
X
2m + 1
+1 ≤q
log
2m − 1
m=1
+1
= q log(q − 1) + 1 ≤ q log q;
and when q = 2k + 1,
q−1
X
m=1
csc(πm/q) ≤ q
k
X
m
−1
m=1
k
X
2m + 1
≤q
log
2m − 1
m=1
= q log q.
This establishes (3.11) for primitive characters.
On the other hand, if χ is induced by a primitive character χ∗ modulo r, r < q, we have
X
X
X
X
X
χ(n) =
χ∗ (n)
µ(d) =
µ(d)χ∗ (d)
χ∗ (m).
M<n≤M+N
M<n≤M+N
d|(q,n)
d|q
M/d<m≤(M+N)/d
Since χ∗ is primitive, the sum over m is bounded above by r1/2 log r. Hence,
X
X
≤ r1/2 log r
χ(n)
|χ∗ (d)| ≤ d(q/r)r1/2 log r,
M<n≤M+N
d|q
where the last step uses that the terms with (d, r) > 1 do not √contribute to the sum over d. The
desired result now follows from the elementary bound d(n) ≤ 3n.
44
3.2 Dirichlet L-functions
If χ is a Dirichlet character modulo q, we define the Dirichlet L-function L(s, χ) by
L(s, χ) =
∞
X
χ(n)n−s
(Re(s) > 1).
(3.12)
n=1
Since |χ(n)| ≤ 1, this series converges absolutely and uniformly on the compact subsets of the halfplane Re(s) > 1. Furthermore, when χ is a non-principal character, the series in (3.12) converges
uniformly (but not absolutely) on the compact subsets of Re(s) > 0:
Z ∞
X
L(s, χ) =
S (x)sx−s−1 dx,
S (x) =
χ(n).
(3.13)
1/2
n≤x
Thus, L(s, χ) is holomorphic in the half-plane Re(s) > 1, and for non-principal χ even in Re(s) > 0.
By Lemma 1.16, every L-function has an Euler product:
Y
−1
L(s, χ) =
1 − χ(p)p−s
(Re(s) > 1).
(3.14)
p
In particular, (3.14) implies that L(s, χ) , 0 when Re(s) > 1.
Next, we want to obtain an analytic continuation of L(s, χ) to a meromorphic function on C.
We observe that it suffices to consider the case when χ is primitive. Indeed, if χ is an imprimitive
character modulo q induced by a primitive character χ∗ modulo q∗ , then (3.14) yields
Y
Y
−1
−1 Y
= L(s, χ∗ )
1 − χ∗ (p)p−s .
=
1 − χ∗ (p)p−s
L(s, χ) =
1 − χ(p)p−s
p
p|q
p-q
Therefore, the analytic continuation of L(s, χ) is a straightforward consequence from the analytic
continuation of L(s, χ∗ ) and the holomorphy of the finite product on the right. Similarly, if χ0 is
the principal character modulo q, we have
Y
L(s, χ0 ) = ζ(s)
1 − p−s ,
p|q
so L(s, χ0 ) is holomorphic in C − {1} and has a simple pole at s = 1 with residue
Y
φ(q)
.
Res L(s, χ0 ); 1 =
1 − p−1 =
q
p|q
We now turn toward primitive characters.
Lemma 3.8. Suppose that χ is a primitive Dirichlet character modulo q, a ∈ {0, 1}, and define
θa (x; χ) =
∞
X
n=−∞
Then for all x > 0,
na χ(n) exp − πxn2 /q .
τ(χ̄, 1)θa (x−1 ; χ) = (ix)a (qx)1/2 θa (x; χ̄).
45
(3.15)
Proof. Suppose that a = 0. Because χ is primitive, we can use Lemma 3.5 to obtain
−1
τ(χ̄, 1)θa (x ; χ) =
∞
X
n=−∞
τ(χ̄, n) exp − πn2 /qx
X
=
∞
X
χ̄(k)
n=−∞
k mod q
exp − πn2 /qx e(kn/q).
Introducing the theta-series ϑ(z; α) defined in (2.3), we can write this identity as
X
τ(χ̄, 1)θa (x−1 ; χ) =
χ̄(k) exp πk2 x/q ϑ (qx)−1 ; −ikx .
k mod q
We now appeal to Lemma 2.2 and get
X
τ(χ̄, 1)θa (x−1 ; χ) = (qx)1/2
χ̄(k)ϑ(qx; k/q)
k mod q
1/2
= (qx)
X
χ̄(k)
n=−∞
k mod q
= (qx)1/2
X
k mod q
∞
X
χ̄(k)
exp − πx(nq + k)2 /q
X
m≡k (mod q)
exp − πxm2 /q = (qx)1/2 θa (x; χ̄).
The proof for a = 1 is similar, except that instead of Lemma 2.2 it uses the identity
∞
X
n=−∞
2
(n + α)e−πx(n+α) = −ix−3/2
∞
X
n=−∞
n exp −πn2 x−1 + 2πiαn ,
which follows from (2.4) via term-by-term differentiation with respect to α.
Theorem 3.9. Suppose that χ is a primitive Dirichlet character modulo q and choose a ∈ {0, 1}
so that χ(−1) = (−1)a . Then the Dirichlet L-function L(s, χ) can be extended to an entire function
satisfying the functional equation
q (s+a)/2 s + a 1−s+a
ia q1/2 q (1−s+a)/2
Γ
L(s, χ) =
Γ
L(1 − s, χ̄).
(3.16)
π
2
τ(χ, 1) π
2
Proof. As in the proof of Theorem 2.10, we start with a change of variables in the integral representation for the gamma-function:
(s+a)/2 Z ∞
s + a
π
s
Γ
=m
ma exp − πxm2 /q x(s+a)/2−1 dx.
2
q
0
Multiplying this identity by (q/π)(s+a)/2 χ(m)m−s and then summing over m, we get
Z ∞
∞
q (s+a)/2 s + a X
L(s, χ) =
χ(m)
Γ
ma exp − πxm2 /q x(s+a)/2−1 dx
π
2
0
m=1
Z ∞
1
θa (x, χ)x(s+a)/2−1 dx,
=
2 0
46
(3.17)
since ma χ(m) is an even function. By a change of variables and an appeal to Lemma 3.8,
Z 1
Z ∞
(s+a)/2−1
θa (x, χ)x
dx =
θa (t−1 , χ)t−(s+a)/2−1 dt
0
1
Z ∞
ia q1/2
=
θa (t, χ̄)t−(s−a)/2−1/2 dt.
τ(χ̄, 1) 1
Combining (3.17) and (3.18), we find that the left side of (3.16) equals
Z
Z ∞
1 ∞
ia q1/2
(s+a)/2−1
θa (x, χ)x
dx +
θa (x, χ̄)x−(s−a)/2−1/2 dx.
2 1
2τ(χ̄, 1) 1
(3.18)
(3.19)
Since θa (x, χ) decays exponentially for x → ∞, this expression represents an entire function and
thus provides an analytic continuation of L(s, χ) to C. Furthermore, the substitution s 7→ 1 − s
transforms (3.19) into
Z
Z ∞
ia q1/2
1 ∞
−(s−a)/2−1/2
θa (x, χ)x
dx +
θa (x, χ̄)x(s+a)/2−1 dx.
(3.20)
2 1
2τ(χ̄, 1) 1
Noting that when χ is primitive
τ(χ, 1)τ(χ̄, 1) = χ(−1)q = (−1)a q,
(3.21)
we see that (3.20) is equal to
Z
Z ∞
ia q1/2
τ(χ̄, 1) 1 ∞
(s+a)/2−1
−(s−a)/2−1/2
θa (x, χ̄)x
dx +
θa (x, χ)x
dx .
ia q1/2 2 1
2τ(χ, 1) 1
This establishes the functional equation (3.16).
3.3 The zeros of L(s, χ)
Again, we want to use the logarithmic derivative L0 (s, χ)/L(s, χ), so we need first to study the
zeros and the poles of the Dirichlet L-functions. As we already mentioned in the previous section,
L(s, χ) is entire, unless χ is principal, in which case L(s, χ) has a single singularity—a simple pole
at s = 1. We already know (from (3.14)) that L(s, χ) is non-zero in the half-plane Re(s) > 1.
Furthermore, using the functional equation (3.16), we can show that the only zeros of L(s, χ) in the
half-plane Re(s) < 0 are simple zeros at the even or odd integers, depending on the sign of χ(−1).
Also, if χ is a non-principal character with χ(−1) = 1, we see that s = 0 must be a zero. To study
the zeros of L(s, χ) in the strip 0 ≤ Re(s) ≤ 1, we introduce the entire function
q (s+a)/2 s + a Γ
L(s, χ),
ξ(s, χ) =
π
2
which will play the same role the function ξ(s) defined by (2.13) played in the study of the zeros
of the zeta-function.
47
Theorem 3.10. If χ is a Dirichlet character modulo q, then L(1, χ) , 0.
Proof. For principal characters the result is trivial (the L-function has a pole at s = 1), so we may
assume that χ is non-principal. The case when χ is complex is easy. Suppose that L(1, χ) = 0 for
a complex character χ. Then χ̄ is another character modulo q with
L(1, χ̄) = L(1, χ) = 0.
Therefore, the product
f (s) =
Y
L(s, χ)
χ mod q
represents an entire function, which vanishes at s = 1. On the other hand, when σ > 1, we have
X
∞
X XX
χ(pm )
log L(σ, χ) =
χ mod q
χ mod q
p
m=1
mpmσ
∞
XX
φ(q)
=
≥ 0.
mpmσ
p m=1
pm ≡1 (mod q)
Hence, f (σ) ≥ 1 for σ > 1, which is inconsistent with f (1) = 0. Thus, our assumption must be
false.
To prove the lemma for a non-principal real character, we consider the function
f (s) =
L(s, χ)L(s, χ0 )
,
L(2s, χ0 )
where χ0 is the principal character modulo q. If L(1, χ) = 0, this function is holomorphic in
the half-plane Re(s) > 1/2 and vanishes at s = 1/2 (since the denominator has a pole and the
numerator is entire). On the other hand, when Re(s) > 1, (3.14) yields
∞
Y ps + 1 X
f (s) =
=
an n−s .
s −1
p
χ(p)=1
n=1
Note that the coefficients an are nonnegative. We now look at the Taylor expansion of f (s) in
|s − 2| < 3/2. We have
∞
X
f (m) (2)
f (s) =
(s − 2)m ,
m!
m=0
where from the Dirichlet series representation,
f
(m)
m
(2) = (−1)
∞
X
an (log n)m n−2 = (−1)m bm ,
say.
n=1
Thus,
f (s) = f (2) +
∞
X
bm
m=1
m!
(2 − s)m ,
with non-negative coefficients bm . Letting s → 1/2, we find that all the coefficients on the right
are zero (since f (1/2) = 0), and in particular, that f (2) = 0. This, however, contradicts the Euler
product representation of f (s).
48
We now commence our investigation of the zeros of ξ(s, χ). The following two results are
analogues of Lemma 2.11 and Theorem 2.17.
Lemma 3.11. Suppose that χ is a primitive character modulo q. There is an absolute constant
c1 > 0 such that
|ξ(s, χ)| ec1 |s| log(q|s|) .
(3.22)
Proof. Because ξ(s, χ) satisfies the functional equation
ξ(s, χ) = w(χ)ξ(1 − s, χ̄),
|w(χ)| = 1,
it suffices to prove (3.22) when Re(s) ≥ 1/2. For such s, the desired bound follows from the
definition of ξ(s, χ), Stirling’s formula, and the estimate
|L(s, χ)| q|s|.
Theorem 3.12. Suppose that χ is a primitive character modulo q. Then the function ξ(s, χ) has
infinitely many zeros in the strip 0 ≤ Re(s) ≤ 1 and can be written as
Y
s
A+Bs
ξ(s, χ) = e
1−
es/ρ .
ρ
ρ
Here A = A(χ) and B = B(χ) are constants depending only on the character χ and the product is
over the zeros of ξ(s, χ) listed according to their multiplicities.
Corollary 3.13. Suppose that χ is a primitive character modulo q and a ∈ {0, 1} is such that
χ(−1) = (−1)a . Then
1
L0 (s, χ)
1
1 Γ0 ((s + a)/2) X
1
.
(3.23)
= B(χ) − log(q/π) −
+
+
L(s, χ)
2
2 Γ((s + a)/2)
s−ρ ρ
ρ
Here B(χ) is the constant appearing in Theorem 3.12.
Proof. This follows from Theorem 3.12 by logarithmic differentiation.
Corollary 3.14. The constant B(χ) satisfies
Re B(χ) = − Re
X1
ρ
ρ
.
(3.24)
Proof. We first observe that, by the functional equation of ξ(s, χ), if ρ is a zero of ξ(s, χ), then so
is 1 − ρ̄, while ρ̄ and 1 − ρ are zeros of ξ(s, χ̄). From Theorem 3.12,
X 1
ξ 0(s, χ)
1
,
(3.25)
= B(χ) +
+
ξ(s, χ)
s
−
ρ
ρ
ρ
49
and from the functional equation,
ξ 0 (1 − s, χ̄)
ξ 0 (s, χ)
=−
.
ξ(s, χ)
ξ(1 − s, χ̄)
Substituting s = 0, we get
X
ξ 0 (1, χ̄)
ξ 0 (0, χ)
=−
= −B(χ̄) −
B(χ) =
ξ(0, χ)
ξ(1, χ̄)
ρ
1
1
+
.
1 − ρ̄ ρ̄
(3.26)
Note that since ξ(s, χ) = ξ(s, χ), (3.25) implies B(χ) = B(χ). Thus, by (3.25) and our starting
remark,
X 1 1
X1
+
.
2 Re B(χ) = B(χ) + B(χ̄) = −
= −2 Re
ρ ρ̄
ρ
ρ
ρ
Theorem 3.15. Suppose that χ is a primitive character modulo q and s = σ + it, −1/2 ≤ σ ≤ 2.
Then
X
1
−1
L0 (s, χ)
=
+
+ O(log q(|t| + 2)),
(3.27)
L(s, χ)
s + a | Im ρ−t|≤1 s − ρ
where a ∈ {0, 1} is such that χ(−1) = (−1)a .
Proof. We write τ = q(|t| +2). The starting point is (3.23). The term involving the gamma-function
can be estimated as
1
+ O(log τ).
s+a
Thus, (3.23) may be rewritten as
X 1
L0 (s, χ)
1
1
+ O(log τ).
(3.28)
= B(χ) −
+
+
L(s, χ)
s+a
s
−
ρ
ρ
ρ
We view this approximate equation as an analogue of (2.22) and want to deduce from it an analogue
of (2.21). We let s = 2 + it and take real parts. Then the left side of (3.28) is bounded, so using
(3.24), we obtain
X
1
log τ.
Re
2 + it − ρ
ρ
Since ρ = β + iγ, 0 ≤ β ≤ 1, we have
Re
1
2−β
1
=
,
2
2
2 + it − ρ (2 − β) + (t − γ)
1 + (t − γ)2
whence
X
ρ
1
log τ.
1 + (t − γ)2
50
(3.29)
Subtracting from (3.28) the corresponding equation with s = 2 + it, we deduce that
X 1
−1
1
L0 (s, χ)
=
+
−
+ O(log τ).
L(s, χ)
s+a
s − ρ 2 + it − ρ
ρ
When |γ − t| > 1,
Hence, in view of (3.29),
1
1
1
−
s − ρ 2 + it − ρ (t − γ)2 .
X 1
−1
1
L0 (s, χ)
=
+
−
+ O(log τ).
L(s, χ)
s + a |γ−t|≤1 s − ρ 2 + it − ρ
Using (3.29) once more, we see that the terms (2 + it − ρ)−1 are also superfluous, and so (3.27)
follows from the last equation.
From (3.29), we obtain the following result.
Corollary 3.16. Suppose that T ≥ 2. The number of zeros of L(s, χ) in the region
0 ≤ Re s ≤ 1,
T ≤ | Im s| ≤ T + 1
is O(log qT ).
Theorem 3.17. Suppose that χ is a primitive character modulo q. There exists an absolute constant
c2 > 0 such that at most one zero ρ = β + iγ of the function L(s, χ) does not satisfy
β≤1−
c2
.
log q(|γ| + 2)
If such a zero does exist, the character χ must be real and the zero itself must be simple and real.
Proof. As in the proof of Theorem 2.21, we deduce from (2.27) that
−3
L0 (σ, χ0)
L0 (σ + it, χ)
L0 (σ + 2it, χ2)
− 4 Re
− Re
≥ 0.
L(σ, χ0 )
L(σ + it, χ)
L(σ + 2it, χ2)
(3.30)
Here, σ > 1 and χ0 is the principal character modulo q. Let ρ = β0 + iγ0 be a particular zero
of L(s, χ) and write τ = q(|γ0 | + 2). We now estimate the left side of (3.30) when t = γ0 . From
Theorem 3.15,
X
1
−1
L0 (σ + iγ0 , χ)
≤ − Re
+ O(log τ) ≤
+ O(log τ).
(3.31)
− Re
L(σ + iγ0, χ)
σ
+
iγ
−
ρ
σ
−
β
0
0
| Im ρ−γ |≤1
0
To estimate the term involving χ2 , we observe that if χ2 is induced by a character χ1 modulo q1 ,
then
X
X
L0 (s, χ2 ) L0 (s, χ1 )
−σ
−σ
−2σ
−
Λ(m)m
log
p
p
+
p
+
·
·
·
log q.
(3.32)
L(s, χ2 )
L(s, χ1 )
(m,q)>1
p|q
51
Moreover, a similar relation holds for L0 (s, χ0 )/L(s, χ0 ) and ζ 0 (s)/ζ(s). In particular, combining
that relation with (2.29), we obtain
−
L0 (σ, χ0)
1
≤
+ O(log q).
L(σ, χ0 )
σ−1
(3.33)
When χ2 is non-principal, (3.27) and (3.32) give
L0 (σ + 2iγ0, χ2 )
− Re
log τ.
L(σ + 2iγ0 , χ2 )
(3.34)
Combining (3.30), (3.31), (3.33), and (3.34), we deduce that
4(σ − β0 )−1 ≤ 3(σ − 1)−1 + c3 log τ.
(3.35)
On choosing σ = 1 + (2c3 log τ)−1 , this establishes the theorem in the case of complex characters.
We now turn to real characters χ (so that χ2 = χ0 ). In this case, we replace (3.34) by
− Re
L0 (σ + 2iγ0 , χ0 )
σ−1
≤
+ O(log τ),
L(σ + 2iγ0 , χ0 )
(σ − 1)2 + 4γ02
(3.36)
the extra term accounting for the pole at s = 1. Accordingly, (3.35) becomes
3
σ−1
4
+ c4 log τ.
≤
+
σ − β0 σ − 1 (σ − 1)2 + 4γ02
Thus, if |γ0 | ≥ δ(log τ)−1 , the desired conclusion follows on choosing
σ = 1 + c5 (log τ)−1 ,
0 < c5 < min (4c4 )−1 , 4c4 δ2 .
Finally, suppose that |γ0 | ≤ δ(log q)−1 . For σ > 1, Theorem 3.15 yields
−
X −1
L0 (σ, χ)
≤
+ c6 log q.
L(σ, χ) | Im ρ|≤1 σ − ρ
(3.37)
(Note that for a real character χ, ρ and ρ̄ are both zeros of L(s, χ), so the sum on the right is real.)
On the other hand,
∞
∞
X
L0 (σ, χ) X
ζ 0 (σ)
−
=
Λ(n)χ(n)n−σ ≥ −
Λ(n)n−σ =
.
L(σ, χ)
ζ(σ)
n=1
n=1
(3.38)
Combining (2.29), (3.37), and (3.38), we obtain
1
1
≤
+ c7 log q.
σ−ρ σ−1
| Im ρ|≤1
X
52
(3.39)
We now assume, as we may, that δ < (10c7 )−1 and set σ = 1 + 2δ(log q)−1 . If ρ = β0 + iγ0
is a complex zero or a double real zero, we estimate the left side of (3.39) from below by the
contribution from ρ and ρ̄ or by the doubled contribution from ρ. We get
2(σ − β0 )
≤ 0.6δ−1 log q.
(σ − β0 )2 + γ02
(3.40)
Also, by our choices,
σ − β0
σ − β0
0.8
≥
≥
.
2
2
2
2
−2
(σ − β0 ) + γ0 (σ − β0 ) + δ (log q)
σ − β0
Combining this inequality with (3.40), we obtain
1.6(σ − β0 )−1 ≤ 0.6δ−1 log q
⇒
β0 < 1 − 0.5δ(log q)−1 .
Finally, if β0 is a real zero and β1 is another real zero, we replace (3.40) by
(σ − β0 )−1 + (σ − β1 )−1 ≤ 0.6δ−1 log q,
whence
min(β0 , β1 ) ≤ 1 − δ(log q)−1 .
3.4 The exceptional zero
The possible real zero appearing in Theorem 3.17 is known as an exceptional zero, a Siegel zero,
or a Siegel–Landau zero. The purpose of this section is to show that such zeros cannot lie too close
to 1. First, we prove that if two L-functions both have exceptional zeros, one of them must have a
modulus that is much larger than the modulus of the other.
Theorem 3.18 (Landau). Let χ1 and χ2 be distinct primitive real characters modulo q1 and q2 ,
respectively. Suppose that β1 and β2 are real numbers such that
L(β1 , χ1 ) = L(β2 , χ2 ) = 0.
There exists an absolute constant c8 > 0 such that
min(β1 , β2 ) ≤ 1 − c8 (log q1 q2 )−1 .
Proof. Using the inequality
(1 + χ1 (m))(1 + χ2 (m)) ≥ 0,
we find that
ζ 0 (σ) L0 (σ, χ1) L0 (σ, χ2 ) L0 (σ, χ1 χ2 )
−
−
−
≥ 0.
(3.41)
ζ(σ)
L(σ, χ1 )
L(σ, χ2 )
L(σ, χ1 χ2 )
Since χ1 and χ2 are distinct, χ1 χ2 is non-principal and we can deduce the theorem from (3.41) by
referring to (2.29), (3.31), and an obvious analogue of (3.34).
−
53
Corollary 3.19. Let Q > 1. There is an absolute constant c9 > 0 such that no L-function L(s, χ)
modulo q, q ≤ Q, has a zero ρ = β + iγ with
β < 1 − c9 (log Q(|γ| + 2))−1 ,
(3.42)
except possibly at a point β0 on the real axis, where L(s, χ) may have a simple zero. Furthermore,
any character χ modulo q, q ≤ Q, for which this zero does occur is real and induced by the same
primitive real character.
So far, we know from Theorem 3.10 that the exceptional zero (if it exists) is less than 1, but
we have no quantitative form of this result. By a classical result of Dirichlet—the analytic class
number formula (see Davenport [14, §6, (15)]), for a primitive quadratic character χ modulo q, we
have
L(1, χ) = C(χ)h(q)q−1/2 ,
where C(χ) ≥ 1 and h(q) is a positive integer (the number of “classes” of binary quadratic forms
of discriminant q). Using the trivial observation that h(q) ≥ 1, we conclude that, in fact, we may
strenghten Theorem 3.10 to
L(1, χ) q−1/2 ,
which in turn leads to the following bound for the exceptional zero β0:
β0 ≤ 1 − c10 q−1/2 (log q)−2 .
(3.43)
The following remarkable result of Siegel’s provides a much stronger bound on β0 .
Theorem 3.20 (Siegel). Let > 0 be fixed. There is a constant c0 () > 0 such that if χ is a
primitive real character modulo q, then L(s, χ) , 0 in the region
| Im(s)| ≤ 1,
Re(s) ≥ 1 − c0 ()q− .
(3.44)
Proof. We consider primitive real characters χ1 and χ2 with moduli q1 and q2 , respectively, and
introduce the function
F(s) = ζ(s)L(s, χ1 )L(s, χ2 )L(s, χ1 χ2 ).
(3.45)
Since χ1 χ2 is a non-principal (though not necessarily primitive) character modulo q1 q2 , F(s) is
holomorphic everywhere except at s = 1, where it has a simple pole with residue
λ = L(1, χ1 )L(1, χ2 )L(1, χ1 χ2 ).
We now proceed to show that
F(σ) > 1/2 − c11 λ(q1 q2 )8(1−σ) (1 − σ)−1
When Re(s) > 1, we can write F(s) as a Dirichlet series
F(s) =
∞
X
n=1
54
an n−s .
for 7/8 < σ < 1.
(3.46)
It follows from the Euler product representations of the factors in (3.45) that a1 = 1 and that an ≥ 0
for all n = 1, 2, . . . . When |s − 2| < 1, F(s) has also a Taylor expansion
F(s) =
∞
X
n=0
where
n
(n)
bn = (−1) F (2)/n! =
bn (2 − s)n ,
∞
X
m=1
Hence,
−1
G(s) = F(s) − λ(s − 1)
=
am (log m)n m−2 ≥ 0.
∞
X
n=0
(bn − λ)(2 − s)n ,
(3.47)
and this representation is, in fact, valid in the larger disk |s − 2| ≤ 3/2, because G(s) is an entire
function. We now estimate the coefficients of the series in (3.47). On the circle C : |s − 2| = 3/2,
we have
|ζ(s)| 1,
|s − 1|−1 1,
and
|L(s, χ)| q
for any non-principal character χ modulo q. Thus,
G(s) (q1 q2 )2
(s ∈ C).
We now use Cauchy’s formula for the Taylor coefficients. Integrating along C, we find that
n
Z
1
G(s)
2 (q1 q2 )2 .
|bn − λ| = ds
n+1
2πi C (s − 2)
3
When N > 1 and 7/8 ≤ σ ≤ 1, this gives
∞
X
n=N
|bn − λ|(2 − σ)n (q1 q2 )2 (3/4)N (q1 q2 )2 e−N/4 .
Hence,
−1
F(σ) − λ(σ − 1)
N−1
X
≥
(bn − λ)(2 − σ)n − c12 (q1 q2 )2 e−N/4
n=0
≥ 1−λ
N−1
X
n=0
(2 − σ)n − c12 (q1 q2 )2 e−N/4 ,
upon noting that b0 = F(2) ≥ 1. Thus, choosing N so that
c12 (q1 q2 )2 e−N/4 < 1/2 ≤ c12 (q1 q2 )2 e−(N−1)/4 ,
we obtain
F(σ) ≥ 1/2 − λ(2 − σ)N (1 − σ)−1 ,
55
and (3.46) follows from the inequality
(2 − σ)N ≤ exp N(1 − σ) ≤ c13 (q1 q2 )8(1−σ) .
We may assume that L(s, χ) has a real zero β0 ≥ 1 − (log q)−1 and that there is some primitive
character χ2 such that L(s, χ2 ) has a real zero β2 with
1 − /10 ≤ β2 < 1.
We then consider F(s) with χ1 = χ and χ2 being this special character. Since F(β2 ) = 0, (3.46)
yields
λq8(1−β2 ) 1;
(3.48)
here the constant depends on the choice of χ2 , and hence, on . By the bounds in Exercise 12,
λ L(1, χ) log q.
(3.49)
Furthermore, Lagrange’s mean-value theorem and another appeal to Exercise 12 give
L(1, χ) = L(1, χ) − L(β0 , χ) = (1 − β0 )L0 (σ, χ) (1 − β0 )(log q)2 ,
for some σ ∈ (β0 , 1). Combining this estimate, (3.48) and (3.49), we obtain
1 (1 − β0 )q8(1−β2 ) (log q)3 (1 − β0 )q ,
and (3.44) follows.
Remark. Theorem 3.20 is ineffective. That is, given > 0, the proof does not allow us to calculate
the constant c0 (). Indeed, in the above proof, we essentially used a possible counterexample to a
strong conjecture2 (i.e., β2 ) to show that any possible counterexample to a weaker conjecture does
not fail that conjecture too miserably. In particular, in order to compute c0 (), we must exhibit a
particular character χ2 as in the proof of Siegel’s theorem. Of course, if GRH is true—as is the
popular belief—neither β0 nor β2 exist and we will never find an actual character χ2 that we can
use to calculate c0 ().
3.5 The prime number theorem for arithmetic progressions
For a Dirichlet character χ, define
ψ(x, χ) =
X
Λ(n)χ(n).
n≤x
The next theorem is an analogue of Theorem 2.22.
2
Namely, that all real zeros β of L-functions with real characters satisfy β ≤ 1 − /10.
56
(3.50)
Theorem 3.21. Suppose that χ is a non-principal character modulo q and 2 ≤ T ≤ x, where
{x} = 1/2. Then
X xρ
+ O xT −1 + q1/2 (log qx)2 ,
(3.51)
ψ(x, χ) = −
ρ
| Im ρ|≤T
where the summation is over the nontrivial zeros ρ of L(s, χ) with | Im ρ| ≤ T .
Proof. First, suppose that χ is primitive. As in the proof of Theorem 2.22, we set α = 1+(log qx)−1
and apply Corollary 1.14 with f (s) = −L0 (s, χ)/L(s, χ). Then an argument similar to that leading
to (2.33) yields
Z α+iT 0
1
L (s, χ) xs
x(log qx)2
.
(3.52)
ψ(x, χ) =
−
ds + O
2πi α−iT
L(s, χ) s
T
Because of Corollary 3.16, we may assume that T is chosen so that
| Im ρ| − T (log qT )−1
whenever L(ρ, χ) = 0.
(3.53)
It then follows from Theorem 3.15 and Corollary 3.16 that the inequality
0
L (s, χ) 2
L(s, χ) (log qT )
holds on the contour C shown on Fig. 2.1. Thus,
Z 0
L (s, χ) xs
x(log qx)2
−
ds ,
L(s, χ) s
T
C
(3.54)
the details being similar to those in the proof of the respective bound in the proof of Theorem 2.22.
Combining (3.52) and (3.54), we find that
x(log qx)2
,
ψ(x, χ) = Σ + O
T
where Σ is the sum of the residues of the function
0
L (s, χ) xs
−
L(s, χ) s
at its poles lying between C and the vertical line Re(s) = α. This function has simple poles at the
zeros of L(s, χ) in the critical strip and a simple or double pole at s = 0 (according as L(0, χ) , 0
or L(0, χ) = 0). Hence,
X xρ
x(log qx)2
ψ(x, χ) = −
,
(3.55)
+ C(χ) + O
ρ
T
| Im ρ|≤T
where C(χ) is the residue at 0. It remains to estimate C(χ).
57
Let b(χ) be the constant term in the Laurent expansion of L0 (s, χ)/L(s, χ) near s. Then
(
b(χ)
if L(0, χ) , 0,
C(χ) =
b(χ) + log x if L(0, χ) = 0.
From (3.27),
b(χ) =
X −1
|ρ|≤1
ρ
+ O(log q).
By Theorem 3.17, the last sum contains O(log q) terms and each of them, except for possibly one,
is O(log q). The exceptional term occurs only when χ is a real character such that L(s, χ) has an
exceptional zero; furthermore, by (3.43), the exceptional term is q1/2 (log q)2 . Altogether, we
have
C(χ) q1/2 (log qx)2 .
The desired result follows from this estimate and (3.55).
Finally, we remove the restriction to primitive characters. Suppose that χ is a character modulo
q induced by a primitive character χ∗ modulo r, 1 < r < q. Then
X
X
X
1 (log qx)2 .
(3.56)
Λ(n) ≤
(log p)
ψ(x, χ) − ψ(x, χ∗ ) ≤
n≤x
(n,q)>1
Hence,
p|q
X xρ
ψ(x, χ) = −
+O
ρ
| Im ρ|≤T
k:
pk ≤x
xT −1 + r1/2 (log qx)2 ,
where the summation is over the nontrivial zeros of L(s, χ∗ ). Thus, (3.51) follows on noting that
L(s, χ) and L(s, χ∗ ) have the same nontrivial zeros.
Theorem 3.22. Suppose that x ≥ 2, q ≥ 1, and (a, q) = 1. There is an absolute constant c14 > 0
such that
√
x
χ1 (a) xβ1
ψ(x; q, a) =
− δq
+ O x exp − c14 log x ,
(3.57)
φ(q)
φ(q) β1
where δq = 1 if there is a real character χ1 modulo q such that L(s, χ1 ) has an exceptional zero β1
and δq = 0 otherwise.
Proof. We assume, as we may, that c14 < 1/2. It suffices to consider the case
√
1 ≤ q ≤ exp log x ,
for otherwise (3.57) is trivial. By (3.5),
ψ(x; q, a) =
1 X
χ̄(a)ψ(x, χ).
φ(q) χ mod q
58
(3.58)
Suppose that χ is a nonprincipal character modulo q and set
√
T = exp log x ,
δ(T ) = c9 (2 log T )−1 .
By Corollary 3.19,
Re ρ ≤ 1 − c9 (log qT )−1 ≤ 1 − δ(T )
for all zeros of L(s, χ), except possibly for the exceptional zero β1 . Thus, (3.51) yields
X 1
xβ1
1−δ(T )
+O x
ψ(x, χ) = −δ(χ)
+ O xT −1 (log x)2 ,
β1
|ρ|
| Im ρ|≤T
where δ(χ) is 1 or 0 according as χ = χ1 or not. Using Corollary 3.16 to bound the last sum, we
deduce that
X0
√
xβ1
+ O φ(q)x exp − c15 log x ,
(3.59)
χ̄(a)ψ(x, χ) = −δq χ1 (a)
β1
χ mod q
where the summation on the right side is restricted to the non-principal characters modulo q. Furthermore, by a variant of (3.56) and Theorem 2.1,
√
ψ(x, χ0 ) = ψ(x) + O (log x)2 = x + x exp − c16 log x
(3.60)
Clearly, the desired result follows from (3.58)–(3.60).
Note that the constant c14 is effective, but Theorem 3.22 itself is not, since in general we do not
know whether the exceptional zero exists.
Proof of Theorem 2. We now deduce Theorem 2 from Theorem 3.22. We use Siegel’s theorem
with = (2A)−1 . It gives
β1 ≤ 1 − c0 (A)q−1/(2A) ≤ 1 − c0 (A)(log x)−1/2 ,
whence
(3.61)
√
xβ1 /β1 x exp − c1 (A) log x .
Therefore, even if there is an exceptional zero, the corresponding term on the right side of (3.57)
is superfluous and
√
x
ψ(x; q, a) =
+ O x exp − c2 (A) log x .
(3.62)
φ(q)
Theorem 2 now follows by partial summation.
Remark. Observe that the constants c0 (A), c1 (A), . . . above depend on the constant c0 () in Theorem 3.20 and are therefore ineffective.
59
Exercises
1. Suppose that q ≥ 1 is an integer and f : Z → C is a nontrivial, q-periodic, completely multiplicative function
such that f (n) = 0 whenever gcd(n, q) > 1. Prove that f is a Dirichlet character modulo q.
2. Suppose that the character χ modulo q is induced by the characters χ1 modulo q1 and χ2 modulo q2 . Write
q3 = gcd(q1 , q2 ). Prove that χ is induced also by a character χ3 modulo q3 .
3. Suppose that gcd(q1 , q2 ) = 1, χ1 is a character modulo q1 , and χ2 is a character modulo q2 . Prove that the
character χ1 χ2 modulo q1 q2 is primitive if and only if both χ1 and χ2 are primitive.
4.
(a) Suppose that p is an odd prime. Prove that there is no primitive real character modulo pα , α ≥ 2, and
that the only primitive real character modulo p is the Legendre symbol
(
+1 if n ≡ (mod p),
n
=
p
−1 if n . (mod p).
(b) Prove that there is no primitive real character modulo 2α , α ≥ 4.
(c) Prove that the only primitive real character modulo 4 is the character χ4 given by
(
+1 if n ≡ 1 (mod 4),
χ4 (n) =
−1 if n ≡ 3 (mod 4).
(d) Prove that the only primitive real characters modulo 8 are χ8 and χ4 χ8 , where χ8 is given by
(
+1 if n ≡ ±1 (mod 8),
χ8 (n) =
−1 if n ≡ ±3 (mod 8).
5. Suppose that m is a positive integer and define the exponential sum
G(m) =
m
X
n=1
e n2 /m .
(a) Define the function f : R → C by
f0 (x − 0) + f0 (x + 0)
f (x) =
,
2
f0 (x) =
(
e x2 /m
0
if 0 ≤ x ≤ m,
otherwise.
We can apply to f the Poisson summation formula (see Zygmund [59, eq. (II.13.4)]):
Z
X
X
ˆ
ˆ
f (n) =
f (n),
f (t) =
f (x)e(−xt) dx.
n∈Z
R
n∈Z
Use this to prove that
G(m) = m
X
n∈Z
2
e − mn /4
Z
−n/2+1
−n/2
e my2 dy.
(b) Using the result of (a), show that
√
m,
G(m) = C 1 + i
−m
60
where C =
Z
∞
−∞
e t2 dt.
(c) Deduce that
1 + i−m √
m.
1 + i−1
G(m) =
(d) Deduce the formula for the Fresnel integrals:
Z ∞
Z
2
cos x dx =
0
∞
sin x
0
2
dx =
r
π
.
2
6. The purpose of this problem is to establish the law of quadratic reciprocity:
q
p
= (−1)(p−1)(q−1)/4
q
p
(∗)
for all pairs of distinct odd primes p, q.
(a) Let G(m) be the exponential sum defined in the last problem. Prove that
p
q
G(pq) =
G(p)G(q).
q
p
(b) Deduce (∗) from part (a) and the explicit formula for G(m).
7. Suppose that χ is a character modulo q induced by a primitive character χ∗ modulo q∗ . Suppose also that a is
an integer and write q1 = q/(a, q), a1 = a/(a, q). Prove that:
(a) If q∗ - q1 , then τ(χ, a) = 0.
(b) If q∗ | q1 , then
τ(χ, a) = µ
q1
q∗
∗
χ
q1
q∗
φ(q)
τ(χ∗ , a1 ).
φ(q1 )
8. Suppose that gcd(q1 , q2 ) = 1, χ1 is a character modulo q1 , and χ2 is a character modulo q2 . Prove that
τ(χ1 χ2 , 1) = χ1 (q2 )χ2 (q1 )τ(χ1 , 1)τ(χ2 , 1).
9.
(a) Suppose that p is an odd prime and χ is the primitive real character modulo p (i.e., χ is the Legendre
symbol (·/p)). Prove that τ(χ, 1) = G(p), where G(m) is the exponential sum defined in Problem 5.
(b) Suppose that q is an odd squarefree integer and χ is the primitive real character modulo q. Prove that
(
1 if q ≡ 1 (mod 4),
√
εq =
τ(χ, 1) = εq q,
i if q ≡ 3 (mod 4).
10. Verify (3.21).
11. If χ1 and χ2 are two characters modulo q, the Jacobi sum is defined by
X
J(χ1 , χ2 ) =
χ1 (n)χ2 (1 − n).
n mod q
(a) Prove that when χ1 χ2 is primitive, τ(χ1 , 1)τ(χ2 , 1) = J(χ1 , χ2 )τ(χ1 χ2 , 1).
(b) Prove that when χ is primitive, J(χ, χ̄) = χ(−1)µ(q).
12. Suppose that χ is a non-principal character modulo q, k ≥ 0, and σ ≥ 1 − (log q)−1 . Prove that:
61
(a)
q
X
n=1
(b)
χ(n)(log n)k n−σ (log q)k+1 .
∞
X
n=q+1
χ(n)(log n)k n−σ (log q)k .
(c) L(k) (σ, χ) (log q)k+1 .
13. The purpose of this exercise is to establish Dirichlet’s theorem on primes in arithmetic progressions: if a and
q are integers with gcd(a, q) = 1, then the arithmetic progression a mod q contains infinitely many prime
numbers.
(a) Suppose that gcd(a, q) = 1 and Re(s) > 1. Observe that
X
p−s =
p≡a (mod q)
X
1 X
χ(p)p−s.
χ̄(a)
φ(q)
p
χ mod q
(b) Suppose that Re(s) > 1 and χ is a Dirichlet character. Show that
X
χ(p)p−s = log L(s, χ) + f (s, χ),
p
where f (s, χ) is holomorphic in Re(s) > 1/2.
(c) Suppose that χ is non-principal. Then L(s, χ) is holomorphic in Re(s) > 0. Together with Theorem 3.10,
this establishes that log L(s, χ) is holomorphic near s = 1. Combine this observation with the results of
(a) and (b) to conclude that when Re(s) > 1,
X
p−s =
p≡a (mod q)
1
log ζ(s) + g(s),
φ(q)
where g(s) is holomorphic near s = 1.
X
(d) Prove that
p−1 diverges. This establishes Dirichlet’s theorem.
p≡a (mod q)
62
Chapter 4
The large sieve
In modern analytic number theory we use the term “large sieve” to describe any among several
analytic lemmas, none of which is really a “sieve” (in the usual sense attached to that word in
number theory). Suppose that we have a sequence A = (an ) of some arithmetic interest and that
we want to study its distribution in a certain sense. A typical example is the case where A is an
integer sequence and we want to understand its distribution in arithmetic progressions or in short
intervals. Often such problems can be reduced to the estimation of generating functions
X
X(a),
(4.1)
a∈A
where the function X belongs to a suitably chosen class X of “harmonics”; nontrivial bounds for the
sums (4.1) then lead to information about the distribution of the sequence A. For example, in the
proof of Theorem 2 we used this approach to establish that the primes are uniformly distributed
among the reduced residue classes modulo q. In that case, the “harmonics” were the Dirichlet
characters modulo q and the generating functions (4.1) were the sums ψ(x, χ), with χ non-principal.
In a large-sieve inequality, we seek estimates for mean-square averages over X of general linear
forms in the “harmonics” X ∈ X. That is, we want to bound
2
XX
,
a
X(n)
n
X∈X
n≤N
for any choice of the coefficients an . In this chapter, we prove several estimates of the form
2
X
XX
≤ C(X, N)
|an |2 ,
a
X(n)
n
X∈X
n≤N
n≤N
where the harmonics are additive characters e(αm), Dirichlet characters χ(m), or powers m−it . In
the next chapter, we will see several applications of these results to the distribution of primes in
progressions and in intervals.
63
4.1 Two results from analysis
We say that an entire function f (z) is of exponential type τ, if
| f (z)| e(τ+)|z|
for all fixed > 0.
The entire functions of exponential type τ whose restrictions to the real line are in L2 (R) are
characterized by the Paley–Wiener theorem (see Boas [4, §6.8] or Zygmund [59, §XVI.7]): an
entire function f (z) has these two properties if and only if its Fourier transform on the real line,
Z ∞
f (x)e(−xt) dx,
fˆ(t) =
−∞
is supported in the interval |t| ≤ τ/(2π).
Consider the entire function
sin2 πz
B(z) =
π2
X
∞
sgn(n)
−2
−1
+ z + 2z
,
(z − n)2
n=−∞
where for a real number x,


+1
sgn(x) =
0


−1
if x > 0,
if x = 0,
if x < 0.
This function, discovered by Beurling in 1930 and then rediscovered by Selberg in 1974, has the
following three important properties:
(B1 ) sgn(x) ≤ B(x) for all x ∈ R;
(B2 ) the Fourier transform of the function B(x) − sgn(x) is a continuous function, supported in
[−1, 1];
R∞
(B3 ) −∞ B(x) − sgn(x) dx = 1.
Furthermore, the function B(z) is the unique entire function that satisfies (B1 ) and (B2 ) and minimizes the integral appearing in (B3 ). The proofs of these facts can be found in Graham and
Kolesnik [17, Appendix] or in Vaaler [51] (see also Exercise 1 after the chapter). We can use the
function B(z) to establish the following result.
Lemma 4.1. Suppose that α, β, δ are real numbers such that α < β and δ > 0. There exists an
entire function F(z) = F(z; α, β, δ) such that:
(i) F(x) ≥ (x; α, β), where (x; α, β) is the characteristic function of the interval [α, β];
(ii) its Fourier transform F̂(t) is supported in [−δ, δ];
(iii) F̂(0) = β − α + δ−1 .
64
Proof. The function
F(z) =
has all the desired properties.
1
2
B(δ(β − z)) + B(δ(z − α))
Lemma 4.2. Suppose that F(z) is an entire function of exponential type τ. Suppose furhter that
F ∈ L2 (R) and F(x) ≥ 0 for all real x. Then there exists an entire function f (z) of exponential type
τ/2 such that F(x) = | f (x)|2 when x is real.
Proof. This follows from the main result in Boas [4, §7.5]. Hypothesis (7.5.2) in [4] follows from
our hypothesis that F ∈ L2 (R) and the discussion in [4, §8.1].
Corollary 4.3. Suppose that α, β, δ are real numbers such that α < β and δ > 0. There exists an
entire function f (z) = f (z; α, β, δ) such that:
(i) | f (x)|2 = F(x) for all x ∈ R, where F(z) = F(z; α, β, δ) is the function from Lemma 4.1;
R∞
(ii) the Fourier transform fˆ(t) = −∞ f (x)e(−xt) dx is supported in [−δ/2, δ/2].
Proof. The function F(z) is of exponential type δ. Since by construction F ∈ L p (R) for all p ≥ 1,
we can apply Lemma 4.2 to F(z). The resulting function f (z) has property (i), and therefore, is of
exponential type δ/2 and square integrable. Hence, an appeal to the Paley–Wiener theorem proves
that fˆ is supported in [−δ/2, δ/2].
4.2 Large-sieve inequalities
Suppose that ξ1 < ξ2 < · · · < ξR are real numbers such that
T 0 + 21 δ ≤ ξr ≤ T 0 + T − 21 δ
(4.2)
and
|ξr − ξ s | ≥ δ > 0
whenever r , s.
(4.3)
Further, suppose that ν1 < ν2 < · · · < νK are real numbers such that
M ≤ νk ≤ M + N
and
|νk − νl | ≥ ∆ > 0
whenever k , l.
(4.4)
Lemma 4.4. Suppose that ξ1 , ξ2 , . . . , ξR and ν1 , ν2, . . . , νK are as above and define
S (α) =
K
X
ak e(ανk ),
k=1
where a1 , a2 , . . . , aK are complex numbers. Then
R
X
r=1
|S (ξr )|2 ≤ N + δ−1
65
T + ∆−1
K
X
k=1
|ak |2 .
(4.5)
Proof. Assuming the notation of Lemma 4.1 and Corollary 4.3, we introduce the functions
G(z) = F(z; M, M + N, δ),
g(z) = g(z; M, M + N, δ),
and define the sum
∗
S (α) =
K
X
H(z) = F(z; T 0 , T 0 + T, ∆).
ak g(νk )−1 e(ανk ).
k=1
By Fourier inversion,
Z
S (α) =
∞
ĝ(u)S ∗ (u + α) du.
−∞
Recalling that ĝ(u) is supported in [−δ/2, δ/2], we obtain
Z ∞
Z δ/2
2
2
∗
2
|S (ξr )| ≤
|ĝ(u)| du
|S (u + ξr )| du .
−∞
−δ/2
Since, by Plancherel’s theorem,
Z
Z ∞
2
|ĝ(u)| du =
∞
−∞
−∞
2
|g(u)| du =
it follows that
2
|S (ξr )| ≤ Ĝ(0)
Hence, by (4.2) and (4.3),
R
X
r=1
2
Z
G(u) du = Ĝ(0),
−∞
ξr +δ/2
ξr −δ/2
|S (ξr )| ≤ Ĝ(0)
∞
Z
Z
|S ∗ (x)|2 dx.
T 0 +T
|S ∗ (x)|2 dx.
T0
(4.6)
Next, we bound the integral on the right side of (4.6). Let bk = ak g(νk )−1 . We have
Z T 0 +T
Z ∞
K X
K
X
∗
2
∗
2
|S (x)| dx ≤
H(x)|S (x)| dx =
bk b̄l Ĥ(νl − νk ).
−∞
T0
k=1 l=1
By (4.4), Ĥ(νl − νk ) vanishes unless k = l. Thus,
Z T 0 +T
K
K
K
X
X
X
∗
2
2
2
−1
|S (x)| dx ≤ Ĥ(0)
|bk | = Ĥ(0)
|ak | G(νk ) ≤ Ĥ(0)
|ak |2 .
T0
k=1
k=1
k=1
Combining this inequality and (4.6), we get
R
X
r=1
2
|S (ξr )| ≤ Ĝ(0)Ĥ(0)
so the desired conclusion follows from the identities
Ĝ(0) = N + δ−1 ,
K
X
k=1
|ak |2 ,
Ĥ(0) = T + ∆−1 .
66
Corollary 4.5. Assume the notation of Lemma 4.4. Suppose that T ≤ 1 and all νk ’s are integers.
Then
R
K
X
X
2
−1
|S (ξr )| ≤ N + δ
|ak |2 .
r=1
k=1
Proof. Under the present hypotheses, we can estimate the right side of (4.6) as
Z T 0 +T
Z 1
K
K
X
X
∗
2
∗
2
2
|S (x)| dx ≤
|S (x)| dx =
|bk | ≤
|ak |2 ,
T0
0
k=1
k=1
where we have used Parseval’s identity.
Corollary 4.6. Suppose that N, Q are positive integers and a M , . . . , a M+N are complex numbers.
Then
2
X X M+N
X
X
M+N
2
a
e(bn/q)
≤
N
+
Q
|an |2 .
(4.7)
n
q≤Q 1≤b≤q
(b,q)=1
n=M
n=M
Proof. When Q = 1, (4.7) follows from Cauchy’s inequality, so we may assume that Q ≥ 2. Then,
we can view the double sum over q and b as a sum over all reduced fractions b/q, q ≤ Q, such that
(b/q) ∈ Q−1 , 1 ⊂ 2Q−2 , 1 .
If b0 /q0 and b00 /q00 are two such fractions, we have
0
00 0 00
00 0
b
− b = |b q − b q | ≥ 1 ,
q0 q00 q0 q00
Q2
unless b0 = b00 and q0 = q00 . Thus, (4.7) follows from Corollary 4.5 with δ = Q−2 .
We now turn to averages of character sums.
Lemma 4.7. Suppose that q, M, N are positive integers and a M+1 , . . . , a M+N are complex numbers.
Then
2
X M+N
X
X
M+N
a
χ(n)
≤
N
+
φ(q)
|an |2 .
(4.8)
n
χ mod q
n=M+1
n=M+1
Proof. The sum on the left of (4.8) equals
X∗
X
X∗
X
am ān
χ(m)χ̄(n) =
am ān
χ(mn̄),
m,n
m,n
χ mod q
χ mod q
P∗
where m,n denotes a summation restricted to integers relatively prime to q and m̄ is the multiplicative inverse of m modulo q: mm̄ ≡ 1 (mod q). By (3.5), the latter sum is
X
X
X
1
|am |2 + |an |2 ≤ φ(q) Nq−1 + 1
≤ φ(q)
|am an | ≤ φ(q)
|am |2 ,
2
m≡n (mod q)
m
m≡n (mod q)
and (4.8) follows.
67
If we want to sum over q as well, we need to restrict our attention to primitive characters or to
impose some restrictions on the coefficients an . The next lemma provides such a result.
Lemma 4.8. Suppose that Q, M, N are positive integers and a M+1 , . . . , a M+N are complex numbers.
Then
2
X
X q X∗ M+N
X
M+N
2
a
χ(n)
≤
N
+
Q
|an |2 .
n
φ(q)
q≤Q
χ mod q n=M+1
n=M+1
Proof. When χ is primitive, Lemmas 3.5 and 3.6 yield
M+N
2 M+N
2 2
X
X
X
q
an χ(n) = an τ(χ̄, n) = χ̄(b)S (b/q) ,
n=M+1
where
n=M+1
S (α) =
N
X
1≤b≤q
(b,q)=1
|ãn | = |an |.
ãn e(αn),
n=1
Hence,
2
M+N
2
X X
1
q X∗ X
≤
a
χ(n)
χ̄(b)S
(b/q)
n
φ(q)χ mod q n=M+1
φ(q) χ mod q 1≤b≤q
(b,q)=1
=
X
1
S (b1 /q)S (b2 /q)
|S (b/q)|2 ;
χ(b2 b̄1 ) =
φ(q) 1≤b ,b ≤q
1≤b≤q
χ mod q
X
1
X
2
(b1 b2 ,q)=1
(b,q)=1
here b̄1 denotes the multiplicative inverse of b1 modulo q. Thus, the lemma follows from (4.7). Next, we consider averages of Dirichlet polynomials of the form
D(s) =
N
X
an n−s .
(4.9)
n=1
Lemma 4.9. Suppose that δ > 0 and t1 < t2 < · · · < tR are real numbers such that
T 0 + 12 δ ≤ tr ≤ T 0 + T − 21 δ
and
|tr1 − tr2 | ≥ δ > 0
whenever r1 , r2 .
Further, suppose that a1 , . . . , aN are complex numbers and D(s) is defined by (4.9). Then
R
X
r=1
|D(itr )|2 ≤ δ−1 +
1
2π
N
X
log N T + 2πN
|an |2 .
n=1
68
(4.10)
Proof. We apply Lemma 4.4 with ξr = tr and νn = − 2π1 log n. Then
− 2π1 log N ≤ νn ≤ 0,
and for 1 ≤ m < n ≤ N,
2π|νm − νn | = log(n/m) ≥
(
log 2 if n ≥ 2m,
N −1 if n < 2m.
We note that the estimate in the case m < n < 2m uses the inequalities
n n−m
n − m
1 1 1
log
≥
1−
≥
1−
≥ .
m
m
2m
N−1
2N − 2
N
Thus, in the application of Lemma 4.4, N =
consequence of (4.5).
1
2π
log N and ∆ = (2πN)−1 and (4.10) is an immediate
Lemma 4.10. Suppose that a1 , . . . , aN are complex numbers and D(s) is defined by (4.9). Then
Z T 0 +T
N
X
2
|D(it)| dt ≤ T + 2πN
|an |2 .
T0
n=1
Proof. We argue similarly to the second part of the proof of Lemma 4.4. Let ∆ = (2πN)−1 and
G(z) = F(z; T 0 , T, ∆), where F(z) is the function from Lemma 4.1. Then
Z ∞
Z T 0 +T
N X
N
X
2
2
1
am ānĜ 2π
G(t)|D(it)| dt =
|D(it)| dt ≤
log(n/m) .
−∞
T0
m=1 n=1
1
| log(n/m)|
2π
Recalling from the proof of the previous lemma that
≥ ∆ unless m = n, we get
Z T 0 +T
N
N
X
X
2
2
|D(it)| dt ≤ Ĝ(0)
|an | = T + 2πN
|an |2 .
T0
n=1
n=1
4.3 Dirichlet polynomials with characters: a hybrid sieve
So far we have obtained large-sieve results in the form of inequalities with two terms on the right
side, one of which corresponds to the maximum size of the summands and the other to the mean
square of the function times the number of terms. In applications to the distribution of primes, we
sometimes consider Dirichlet polynomials of the form
D(s, χ) =
N
X
an χ(n)n−s .
(4.11)
n=1
We can sieve D(s, χ) over χ or over s; we can also sieve over both χ and s and obtain a hybrid
result. Oftentimes, such hybrid results are superior to either of the estimates resulting from sieving
over one of χ or s and then summing (or integrating) over the other.
69
Lemma 4.11. Suppose that q, Q, N are positive integers, T 0 and T are real, and D(s, χ) is defined
by (4.11). Then
N
X Z T 0 +T
X
|D(it, χ)|2 dt N + φ(q)T
|an |2
(4.12)
χ mod q
and
T0
n=1
N
X q X∗ Z T 0 +T
X
2
2
|D(it, χ)| dt N + Q T
|an |2 .
φ(q)
q≤Q
n=1
χ mod q T 0
(4.13)
Proof. Let G(z) = F(z; T 0 , T, T −1 ) and g(z) = f (z; T 0, T, T −1 ), where F and f are the functions
from Lemma 4.1 and Corollary 4.3, respectively. Then
Z T 0 +T
Z ∞
Z ∞
2
2
|D(it, χ)| dt ≤
G(t)|D(it, χ)| dt =
|g(t)D(it, χ)|2 dt.
(4.14)
−∞
T0
−∞
Note that
g(−t)D(−it, χ) = Ŝ χ (t),
S χ (x) =
N
X
an χ(n)ĝ x +
log n .
1
2π
n=1
Applying Plancherel’s theorem to the right side of (4.14), we find that
Z T 0 +T
Z ∞
2
|D(it, χ)| dt ≤
|S χ (x)|2 dx.
−∞
T0
Hence,
T 0 +T
X Z
χ mod q
2
|D(it, χ)| dt ≤
T0
Z
∞
X
−∞ χ mod q
|S χ (x)|2 dx.
(4.15)
Since ĝ(x) is supported in the interval |x| ≤ (2T )−1 , the summation in S χ (x) is supported in an
interval of length
say.
≤ e−2πx eπ/T − e−π/T T −1 e−2πx = H(x),
Thus, an appeal to Lemma 4.7 gives
X
χ mod q
N
X
an ĝ x +
|S (x, χ)| H(x) + φ(q)
2
n=1
Inserting this bound into the right side of (4.15), we obtain
X Z
χ mod q
T 0 +T
T0
2
|D(it, χ)| dt Z
∞
−∞
N
X
n=1
1
2π
2
log n .
N
X
an ĝ x +
H(x) + φ(q)
1
2π
n=1
|an |2
Z
∞
−∞
70
H(x) + φ(q) ĝ x +
2
log n dx
1
2π
2
log n dx.
1
We now observe that the last integral is supported in the interval x + 2π
log n ≤ (2T )−1 . For such
values of x, we have H(x) nT −1 , whence
Z
N
X Z T 0 +T
X
2 ∞
2
−1
ĝ x + 1 log n 2 dx.
(4.16)
|D(it, χ)| dt nT + φ(q) |an |
2π
χ mod q
T0
−∞
n=1
Finally, we observe that the integral on the right side of (4.16) equals
Z ∞
Z ∞
Z ∞
2
2
|ĝ(x)| dx =
|g(t)| dt =
G(t) dt = 2T,
−∞
−∞
−∞
and so (4.12) follows from (4.16). The proof of (4.13) is similar, using Lemma 4.8 instead of
Lemma 4.7.
An important alternative form of the hybrid large sieve is the following. Consider a collection
of Dirichlet characters, such as the collection of all characters modulo q or the collection of all
primitive characters to moduli q ≤ Q. Suppose that for each character χ in that collection we have
selected real numbers t1 (χ) < t2 (χ) < · · · < tR (χ), R = R(χ), such that
T 0 + δ/2 ≤ tr (χ) ≤ T 0 + T − δ/2
and
|tr (χ) − t s (χ)| ≥ δ > 0
whenever r , s.
We will refer to such a collection S of pairs (χ, tr (χ)) of characters and real numbers as a δ-spaced
set of points. When the given collection of characters is that of all characters modulo q or of all
primitive characters to moduli q ≤ Q, we further define
|S| = φ(q)T
|S| = Q2 T,
or
respectively.
Lemma 4.12. Suppose that D(s, χ) is defined by (4.11) and S is a δ-spaced set of points (χ, tr (χ))
of one of the two special kinds described above. Then
X
(χ,tr (χ))∈S
N
X
|D(itr (χ), χ)|2 δ−1 + log N N + |S|
|an |2 .
n=1
Proof. Fix a character χ and consider the respective numbers t1 (χ), . . . , tR (χ). Following the proof
of Lemma 4.4 (with νn = log n, M = 0, N = log N, and ∆ = N −1 ) up to (4.6), we get
R(χ)
X
r=1
2
−1
|D(itr (χ), χ)| ≤ δ
where
∗
D (s, χ) =
+ log N
N
X
n=1
71
Z
T 0 +T
T0
bn χ(n)n−s ,
|D∗ (it, χ)|2 dt,
(4.17)
with coefficients satisfying
N
X
n=1
N
X
2
|bn | ≤
n=1
|an |2 .
(4.18)
Summing (4.17) over all characters appearing in S, we obtain
Z
X
X T 0 +T ∗
2
−1
|D(itr (χ), χ)| ≤ δ + log N
|D (it, χ)|2 dt.
(4.19)
T0
χ
(χ,tr (χ))∈S
The desired result follows from (4.18), (4.19), and Lemma 4.11.
Exercises
1. For z ∈ C, define
H(z) =
sin2 πz
π2
X
∞
sgn(n)
−1
,
+
2z
(z − n)2
n=−∞
K(z) =
The Beurling–Selberg function B(z) from §4.1 is then B(z) = H(z) + K(z).
Rx
(a) Let ψ(x) = {x} − 1/2 and ψ1 (x) = 0 ψ(u) du. Show that when x > 0,
∞
X
n=1
1
1
= −2
(x + n)2
x
Z
∞
0
sin πz 2
1
1
{u} du
= − 2 +6
(u + x)3
x 2x
πz
Z
0
∞
.
ψ1 (u) du
.
(u + x)4
(b) Prove that |H(x)| ≤ 1 for all x ∈ R.
(c) Prove that |H(x) − sgn(x)| ≤ K(x) for all x ∈ R.
(d) Prove that B(x) satisfies property (B1 ) in §4.1.
(e) Prove that B(z) has exponential type 2π.
(f) By part (c), B(x) − sgn(x) is integrable, and so
Z
∞
−∞
B(x) − sgn(x) dx = lim
Use this to show that
Z
∞
−∞
(g) Show that
K(z) =
Z
N→∞
Z
N
B(x) − sgn(x) dx.
−N
B(x) − sgn(x) dx =
Z
∞
K(x) dx.
−∞
1
−1
(1 − |t|)e(tz) dt
and
K̂(t) = max(1 − |t|, 0).
Combine the latter identity and the result of part (f) to prove that B(x) satisfies property (B3 ) in §4.1.
Remark. Note that we stopped just short of establishing property (B2 ) in §4.1: if B(x) belonged to L2 (R),
we would be able to deduce (B2 ) from (e) above and the Paley–Wiener theorem, but of course, B(x) does not
belong to L p (R) for any p < ∞. On the other hand, the function F(z) constructed in Lemma 4.1 does belong to
L2 (R), so the above properties of B(z) suffice to prove that that function has all the desired properties.
72
2. Suppose that T > 0 and ν1 , ν2 , . . . , νK are real numbers, and define
K
X
S (t) =
ak e(νk t),
k=1
where a1 , a2 , . . . , aK are complex numbers. Prove that
Z
T
−T
2
|S (t)| dt ≤ (πT )
2
Z
∞
−∞
X
|νk −x|≤(2T )−1
2
ak dx.
H: Let δ = (2T )−1 . Start by arguing similarly to the proof of (4.15), but choose g(t) = (sin πδt)/(πδt)
−2
and G(t) = g(t)2 . Then
G(t) ≥ 4π for all t ∈ [−T, T ] and ĝ(x) is the characteristic function of [−δ/2, δ/2],
1
normalized in L (R).
3. Consider the Ramanujan sum
cq (n) =
X
e(bn/q).
1≤b≤q
(b,q)=1
(a) Prove that cq (n) is multiplicative as a function of q, that is, cq1 q2 (n) = cq1 (n)cq2 (n) whenever (q1 , q2 ) = 1.
(b) Prove that
cq (n) = φ(q)µ
4.
q
(q, n)
−1
q
.
φ
(q, n)
(a) Let N x denote the set of integers not divisible by primes p > x. Prove that
X
X
µ(n)2 φ(n)−1 =
n−1 ≥ log x.
n≤x
n∈N x
(b) Suppose that q is a positive integer. Prove that
X
n≤x
(n,q)=1
φ(q) X
µ(n)2 φ(n)−1 .
q n≤x
µ2 (n)φ(n)−1 ≥
5. Suppose that N is a set of positive integers contained in [M, M + N]. For each q ≤ Q, define
Rq = h ∈ Z : 1 ≤ h ≤ q, (n − h, q) = 1 for all n ∈ N ,
ω(q) = |Rq |.
The purpose of this exercise is to prove that
|N| ≤ N + Q
2
Y ω(p) −1
.
µ(q)
p − ω(p)
q≤Q
p|q
X
2
(a) Prove that ω(q) is multiplicative.
(b) Define
S (α) =
X
e(αn).
n∈N
Use the result of Exercise 3 to show that
X X
S (b/q)e(−bh/q) = µ(q)ω(q)|N|.
h∈Rq 1≤b≤q
(b,q)=1
73
(∗)
(c) Suppose that q is squarefree. Prove that
2
X X
X dµ(q/d)
Y
e(−bh/q) = ω(q)2
= ω(q)
p − ω(p) .
ω(d)
1≤b≤q h∈Rq
(b,q)=1
d|q
p|q
(d) Show that
Y ω(p) X X
≤
|S (b/q)|2.
|N|
µ(q)
p
−
ω(p)
q≤Q 1≤b≤q
q≤Q
p|q
2
X
2
(b,q)=1
Use this inequality and Corollary 4.6 to establish (∗).
6. Suppose that M, N, Q are positive integers with Q ≤ M and let N be the set of primes in [M + 1, M + N]. Apply
the result of the previous exercise to show that
π(M + N) − π(M) ≤ N + Q2
X
µ2 (q)φ(q)−1
q≤Q
−1
.
Deduce that
π(M + N) − π(M) ≤ N + Q2 (log Q)−1 .
Upon choosing Q = N 1/2 (log N)−1/2 , this provides a Chebyshev-type upper bound for primes in short intervals:
N
log log N
π(M + N) − π(M) ≤
2+O
,
log N
log N
whenever N > 2 and M ≥ N 1/2 .
7. Suppose that M, N, q are positive integers and a is an integer with (a, q) = 1. Generalize the result of the
previous exercise to prove that
log log(N/q)
N
2+O
,
π(M + N; q, a) − π(M; q, a) ≤
φ(q) log(N/q)
log(N/q)
whenever N ≥ 3q and M ≥ (N/q)1/2. This result is one of the many versions of the Brun–Titchmarsh inequality.
The sharpest result in this direction was obtained by Montgomery and Vaughan [42]:
π(M + N; q, a) − π(M; q, a) ≤
whenever N > q.
74
2N
,
φ(q) log(N/q)
Chapter 5
Applications of the large sieve
5.1 Sums over primes and double sums
Suppose that the function f : N → C is such that, when x is large, the sums
X
f (n)
n≤x
exhibit certain cancellation, and that we want to show that the same is true for the sums
X
f (p).
(5.1)
p≤x
The first general method for obtaining such results was developed by I. M. Vinogradov in the late
1930s. His starting point is the sieve of Eratosthenes. Let P(z) denote the product of all primes
p ≤ z and write P x = P(x1/2 ). Then
X
X
f (n) = f (1) +
f (p),
(5.2)
n≤x
(n,P x )=1
x1/2 <p≤x
since the only numbers n ≤ x that are not divisible by any prime ≤ x1/2 are 1 and the primes in
(x1/2 , x]. Using the properties of the Möbius function, we can write the sum on the left side of (5.2)
as
X
X
X
X
X
f (n) =
f (n)
µ(d) =
µ(d)
f (md).
n≤x
(n,P x )=1
n≤x
d|(n,P x )
d|P x
m≤x/d
The crux of Vinogradov’s method is a clever (and complicated) combinatorial argument that decomposes the latter sum into several subsums of two major types:
• type I sums: double sums of the form
X X
am f (mn),
(5.3)
m≤M n≤x/m
where M is not too large and the coefficients am are small on average, but otherwise arbitrary;
75
• type II sums: double sums of the form
XX
am bn f (mn),
(5.4)
m≤M n≤N
where M and N are neither too small nor too large and the coefficients am , bn are small on
average, but otherwise arbitrary.
This reduces the estimation of (5.1) to the estimation of type I and type II double sums.
In 1977 Vaughan [53] found an alternative way for decomposing sums over primes into double
sums that is much more straightforward than Vinogradov’s. His result is as follows.
Lemma 5.1 (Vaughan). Suppose that 2 ≤ U, V < X. Then
X
Λ(n) f (n) = Σ1 − Σ2 − Σ3 ,
(5.5)
U<n≤X
where
Σ1 =
X X
µ(m)(log k) f (mk),
Σ2 =
m≤V U<mk≤X
and
X X
am f (mk),
m≤UV U<mk≤X
Σ3 =
XX
Λ(m)bk f (mk),
m>U k>V
mk≤X
with coefficients |am | ≤ log m and |bk | ≤ d(k).
Proof. Our main tool is the identity
ζ 0 (s)
= L(s) − M(s)ζ 0 (s) − L(s)M(s)ζ(s) +
−
ζ(s)
ζ 0 (s)
−
− L(s) 1 − M(s)ζ(s) ,
ζ(s)
(5.6)
in which we choose L(s) and M(s) to be the Dirichlet polynomials
X
X
L(s) =
Λ(n)n−s
and
M(s) =
µ(n)n−s .
n≤U
n≤V
Suppose that n > U. Comparing the coefficients of n−s in the Dirichlet series representations of the
left and right sides of (5.6) we obtain the following identity for Λ(n):
X
X
X
X
µ(u) .
Λ(u)µ(v) +
Λ(m) −
µ(m)(− log k) −
Λ(n) = −
mk=n
m>U,k>V
uvk=n
u≤U,v≤V
mk=n
m≤U
uv=k
u≤V
Multiplying both sides of by f (n) and summing over U < n ≤ X, we obtain (5.5) with
X
X
am =
Λ(u)µ(v),
bk =
µ(u).
uv=m
u≤U,v≤V
Clearly, |am | ≤
P
u|m
uv=k
u≤V<uv
Λ(u) = log m and |bk | ≤ d(k).
76
Heath-Brown [22] proposed a different decomposition for von Mangoldt’s function, which
provides more flexibility than Lemma 5.1 and sometimes leads to superior results. Like Vaughan’s,
Heath-Brown’s identity arises from an identity for ζ 0 (s)/ζ(s). In this case, the underlying identity
is
k
ζ 0 (s) X
j−1 k
ζ(s) j−1 ζ 0 (s)M(s) j + ζ(s)−1 (1 − ζ(s)M(s))k ζ 0 (s),
(5.7)
=
(−1)
ζ(s)
j
j=1
here k ≥ 1 is an integer and M(s) is the Dirichlet polynomial
X
M(s) =
µ(n)n−s .
n≤X
Suppose that n is an integer with n ≤ X k and consider the coefficients of n−s on both sides of (5.7).
The coefficient of n−s on the left side of (5.7) is −Λ(n) and the last term on the right side of (5.7)
does not contribute to the coefficient of n−s . We thus find that
k X
X
k
(−1) j
(log m1 )µ(m j+1 ) · · · µ(m2 j ),
(5.8)
Λ(n) =
j
m1 ···m2 j =n
j=1
m1 ,...,m j ≤X
whenever n ≤ X k . We will come back to Heath-Brown’s identity when we discuss the distribution
of primes in short intervals later in this chapter.
5.2 The Bombieri–Vinogradov theorem
In this section we will use Vaughan’s identity, the large sieve for character sums in the form of
Lemma 4.8, and the Pólya–Vinogradov theorem to establish the following result equivalent to
Theorem 3.
Theorem 5.2. Suppose that 2 ≤ Q ≤ x. Then, for any fixed A > 0,
X
y max max ψ(y; q, a) −
x(log x)−A + Qx1/2 (log x)5 .
(a,q)=1 y≤x
φ(q)
q≤Q
5.2.1 Preparations
Define
(
1
δχ =
0
if χ is principal,
otherwise.
By (3.58),
ψ(y; q, a) −
y
1 X
χ̄(a) ψ(y, χ) − δχ y ,
=
φ(q) φ(q) χ mod q
77
(5.9)
whence
1 X y max ψ(y; q, a) −
ψ(y, χ) − δχ y.
≤
(a,q)=1
φ(q)
φ(q) χ mod q
Writing Σ(x, Q) for the left side of (5.9), we find that
X 1 X
max ψ(y, χ) − δχ y = Σ0 + Σ1 ,
Σ(x, Q) ≤
φ(q) χ mod q y≤x
q≤Q
say,
(5.10)
where Σ0 denotes the contribution from the principal characters and Σ1 denotes the contribution
from all the other characters. By (3.60) (that inequality holds for all q ≤ x) and the elementary
bound (see Exercise 1)
X
φ(mn)−1 φ(m)−1 log z,
(5.11)
n≤z
we have
X 1
√
Σ0 x exp − c1 log x
x(log x)−A .
φ(q)
q≤Q
(5.12)
As usual, for a non-principal character χ modulo q, we denote by χ∗ the primitive character
inducing χ. By (3.56),
X 1 X
Σ1 max |ψ(y, χ∗ )| + Q(log x)2 .
y≤x
φ(q)
q≤Q
χ mod q
Rearranging the sum over the characters as to combine the contributions of all characters to moduli
q = rq1 ≤ Q induced by the same primitive character χ modulo r, we deduce that
X X∗
X
1
Σ1 max |ψ(y, χ)|
+ Q(log x)2
y≤x
φ(rq1 )
r≤Q χ mod r
q1 ≤Q/r
X 1 X∗
(log x)
max |ψ(y, χ)| + Q(log x)2 ,
(5.13)
φ(r)χ mod r y≤x
r≤Q
where we have used (5.11) again. We can estimate the contribution from the “small” moduli r
using the Siegel–Walfisz theorem. Indeed, by (3.60) and (3.61), we have
√
max |ψ(y, χ)| x exp − c(A) log x
y≤x
for all primitive characters to moduli r ≤ (log x)A+5 = Q0 , say. Thus,
X 1 X∗
√
max |ψ(y, χ)| xQ0 exp − c(A) log x x(log x)−A−1 .
φ(r)χ mod r y≤x
r≤Q
0
Combining this inequality and (5.13), we obtain
Σ1 (log x)Σ2 + x(log x)−A + Q(log x)2 ,
where
Σ2 =
1 X∗
max |ψ(y, χ)|.
φ(r)χ mod r y≤x
Q <r≤Q
X
0
78
(5.14)
5.2.2 Application of Vaughan’s identity
By (5.5) with f (n) = χ(n), X = y, and U = V ≤ x1/2 (we will specify our choice of U later),
ψ(y, χ) = S 1 (y, χ) − S 2 (y, χ) − S 3 (y, χ) + ψ(U, χ),
where S j (y, χ) denotes the sum Σ j on the right side of (5.5). Hence,
Σ2 Σ3 + Σ4 + Σ5 + QU,
where
Σj =
1 X∗
max |S j−2 (y, χ)|
φ(r)χ mod r y≤x
Q <r≤Q
X
(5.15)
( j = 3, 4, 5).
0
We can estimate Σ3 right away. By partial summation,
X X
X X
S 1 (y, χ) ≤
(log k)χ(k) (log y)
χ(k)
m≤U
m≤U
U<mk≤y
U<mk≤z
for some z with U < z ≤ y. Thus, by the Pólya–Vinogradov inequality,
max |S 1 (y, χ)| r1/2 U(log x)2 .
y≤x
We conclude that
Σ3 Q3/2 U(log x)2 .
(5.16)
Next we estimate Σ5 using the large sieve. We then split Σ4 into two subsums: one similar to
Σ3 and one similar to Σ5 .
5.2.3 Estimation of Σ5
Suppose that a1 , . . . , a M and b1 , . . . , bK are complex numbers. Then, by Cauchy’s inequality and
Lemma 4.8,
M K
X
X r X∗ X
a
b
χ(mk)
m
k
φ(r)
r≤R
χ mod r m=1 k=1
2 1/2 2 1/2
M
K
X
X r X∗ X
r X∗ X
am χ(m)
bk χ(k)
φ(r)χ mod r m=1
φ(r)χ mod r k=1
r≤R
r≤R
X
1/2 X
1/2
M
K
2 1/2
2 1/2
2
2
M+R
K +R
|am |
|bk |
.
m=1
(5.17)
k=1
We would like to apply this bound to Σ5 , but before we can do that we must deal with the summation
condition mk ≤ y appearing in the definition of S 3 (y, χ).
79
We start by splitting the interval U < m ≤ xU −1 into O(log x) subintervals M < m ≤ M1 such
that M1 ≤ 2M. Then, for some choice of M, M1, we have
X
X
S 3 (y, χ) (log x)
Λ(m)bk χ(mk).
(5.18)
M<m≤M1 U<k≤y/m
Next, we use Perron’s formula (Lemma 1.13) with α = (log x)−1 , T = x2 , and u = y/(mk). We get
Z T
X
X
yα+it
1
S (χ, t)
dt + O(∆),
(5.19)
Λ(m)bk χ(mk) =
2π
α
+
it
−T
M<m≤M U<k≤y/m
1
where
S (χ, t) =
X
X
Λ(m)bk χ(mk)(mk)−α−it ,
∆=
M<m≤M1 U<k≤xM −1
yα X X Λ(m)d(k)
.
T M<m≤M
| log(y/mk)|
−1
1
k≤yM
If we assume, as we may, that kyk = 21 , we have | log(y/mk)| ≥ y−1 . Hence, by the PNT and
Theorem 1.22,
X X
∆ x−1
Λ(m)d(k) log x.
M<m≤M1 k≤yM −1
Note also that
T
Z
−T
|α + it|−1 dt log x.
Substituting these bounds into (5.19), we find that, for some |t0 | ≤ T ,
X
X
Λ(m)bk χ(mk) |S (χ, t0 )| + 1 log x.
M<m≤M1 U<k≤y/m
Since S (χ, t0 ) is independent of y, combining this inequality and (5.18), we get
max |S 3 (y, χ)| (log x)2 |S (χ, t0)| + 1 .
y≤x
Thus,
Σ5 (log x)2
1 X∗
|S (χ, t0 )| + Q(log x)2 .
φ(r)
Q <r≤Q
χ mod r
X
(5.20)
0
We now observe that
X
Λ(m)2 m−2α M log x
and
M<m≤M1
X
U<k≤xM −1
d(k)2 k−2α xM −1 (log x)3 ;
the former bound follows from the PNT and the latter from Theorem 1.23. Hence, (5.17) yields
X r X∗
|S (χ, t0 )| (log x)2 x + xRU −1/2 + x1/2 R2 .
φ(r)χ mod r
r≤R
From this inequality and (5.20), we derive
−1/2
(log x) + x1/2 Q .
Σ5 (log x)4 xQ−1
0 + xU
80
(5.21)
5.2.4 Completion of the proof
Suppose that U = V ≤ x1/3 . Then we can write S 2 (y, χ) as
S 2 (y, χ) = S 20 (y, χ) + S 200(y, χ),
where S 20 is the portion of S 2 where m ≤ U and S 200 is the portion where U < m ≤ UV ≤ xU −1 . We
have
X X
0
|S 2 (y, χ)| ≤ (log x)
χ(k),
m≤U
U<mk≤y
S 20
so the contribution of
to Σ4 can be bounded similarly to Σ3 . Moreover, we can estimate the
00
contribution of S 2 to Σ4 similarly to Σ5 , and the resulting bound is slightly sharper, because in this
case we apply (5.17) with coefficients am and bk subject to |am | ≤ log m and |bk | ≤ 1. Altogether,
we conclude that
−1/2
1/2
Σ4 (log x)4 Q3/2 U + xQ−1
+
xU
+
x
Q
.
(5.22)
0
Combining (5.10), (5.12), (5.14)–(5.16), (5.21), and (5.22), we get
−1/2
Σ(x, Q) (log x)5 Q3/2 U + xQ−1
(log x) + x1/2 Q ,
0 + xU
where U ≤ x1/3 is a parameter at our disposal. Any choice of U subject to
(log x)2A+12 ≤ U ≤ (x/Q)1/2
then yields (5.9). This proves the theorem when Q ≤ x(log x)−4A−24 ; in the alternative case, (5.9) is
worse than the trivial bound for Σ(x, Q).
5.3 The Barban–Davenport–Halberstam theorem
Using the large sieve and reductions such as those leading to (5.10), we can also establish the
following result.
Theorem 5.3. Suppose that 2 ≤ Q ≤ x. Then, for any fixed A > 0,
X X x 2
x2 (log x)−A + Qx log x.
ψ(x; q, a) −
φ(q)
q≤Q 1≤a≤q
(5.23)
(a,q)=1
The first results of this form were obtained by Barban [3] and Davenport and Halberstam [15];
hence, the name of the theorem. Note that by (5.23), the error term in the prime number theorem
for arithmetic progressions is O (x/q)1/2+ , at least on average over all progressions to moduli
q ≤ Q. Except when the modulus q is very small, so strong a bound for an individual progression
does not follow even from GRH! Furthermore, (5.23) appears to be (essentially) the best result
within the reach of present methods. Indeed, a substantial improvement of the first term on the
right side of (5.23) would yield a subsequent improvement on Theorem 2 (see Exercise 3). While
81
such an improvement would represent a significant achievement in the field, it seems more likely
that it would occur independent of the above problem than as a consequence to it. As to the second
term in the bound (5.23), Montgomery [41, Ch. 17] has shown that when Q ≥ x(log x)−A ,
X X x 2
as x → ∞.
ψ(x; q, a) −
∼ Qx log x
φ(q)
q≤Q 1≤a≤q
(a,q)=1
Thus, the second term on the right side of (5.23) is needed when Q is large.
5.4 The three primes theorem
Our goal in this section is to establish the following result from additive prime number theory.
Theorem 5 (I. M. Vinogradov). For a positive integer n, define
X
R(n) =
(log p1 )(log p2 )(log p3 ),
p1 +p2 +p3 =n
where the summation is over all representations of n as the sum of three primes. Then, for any
given A > 0,
R(n) = 21 n2 S(n) + O n2 (log n)−A ,
(5.24)
where
S(n) =
Y
p|n
1 − (p − 1)−2
Y
p-n
1 + (p − 1)−3 .
(5.25)
In particular, every sufficiently large odd integer is the sum of three primes.
This theorem was first proved in 1923 by Hardy and Littlewood [21] under the assumption
of GRH. In 1937 Vinogradov [57] applied his method for estimating sums over primes to the
exponential sum f (α) below to give an unconditional proof of the three primes theorem.
5.4.1 The Hardy–Littlewood circle method
Using the orthogonality relation
Z
0
1
(
1 if m = 0,
e(αm) dα =
0 if m , 0,
(5.26)
we can express R(n) as a Fourier integral. Indeed, by (5.26),
Z 1
X
R(n) =
(log p1 )(log p2 )(log p3 )
e α(p1 + p2 + p3 − n) dα
p1 ,p2 ,p3 ≤n
=
Z 1X
0
p≤n
0
3
(log p)e(αp) e(−αn) dα.
82
(5.27)
This identity is the starting point of the application of the circle method: we will use it to derive
the asymptotic formula for R(n) from estimates for the exponential sum
X
f (α) =
(log p)e(αp).
(5.28)
p≤n
Our analysis is motivated by two observations:
• when α is near a rational number a/q with a small denominator, f (α) should be large and
should have certain asymptotic behavior, suggested by the behavior of f (a/q);
• otherwise, the numbers e(αp), p ≤ n, should be approximately uniformly distributed on the
unit circle, and hence, f (α) should be “small”.
Let B = B(A) be a positive number to be chosen later and set
P = (log n)B .
If a and q are integers, we define the major arc1
M(q, a) = a/q − P/(qn), a/q + P/(qn) .
(5.29)
(5.30)
The
−1integration
in (5.27) can be taken over any interval of unit length, and in particular, over
Pn , 1 + Pn−1 . We partition this interval into two subsets:
[ [
M(q, a)
and
m = Pn−1 , 1 + Pn−1 \ M,
(5.31)
M=
q≤P 1≤a≤q
(a,q)=1
called respectively the set of major arcs and the set of minor arcs. Then, from (5.27) and (5.31),
Z
Z R(n) =
+
f (α)3 e(−αn) dα.
(5.32)
M
m
In §5.4.2, we use the Siegel–Walfisz theorem to prove that
Z
f (α)3 e(−αn) dα = 21 n2 S(n) + O n2 P−1
(5.33)
M
for any choice of P. Then, in §5.4.3, we show that
Z
f (α)3e(−αn) dα n2 (log n)−A
(5.34)
m
for B ≥ 3A + 18. Clearly, the asymptotic formula (5.24) follows from (5.32)–(5.34).
1
This term may seem a little peculiar, considering that M(q, a) is in fact an interval. The explanation is that, in the
original version of the circle method, Hardy and Littlewood used power series and Cauchy’s integral formula instead
of exponential sums and (5.26) (see Vaughan [54, §1.2]). In that setting, the role of M(q, a) is played by a small
circular arc near the root of unity e(a/q); hence, the terminology.
83
5.4.2 The major arcs
It is easy to see that the major arcs are comprised of mutually disjoint intervals M(q, a). Thus,
Z
X X Z
3
f (α) e(−αn) dα =
f (a/q + β)3 e − (a/q + β)n dβ.
(5.35)
M
q≤P 1≤a≤q
(a,q)=1
M(q,0)
We now proceed to approximate f (a/q + β). We will prove the following result.
Lemma 5.4. Suppose that B > 0, 1 ≤ a ≤ q ≤ (log n)B , (a, q) = 1, |β| ≤ n−1 (log n)B . Suppose also
that f (α) is defined by (5.28) and define
Z n
v(β) =
e(βx) dx.
0
Then
f (a/q + β) = µ(q)φ(q)−1 v(β) + O n(log n)−B .
Proof. We split the summation in f (α) according to the residue of p modulo q. We get
X X
(log p)e (a/q + β)p
f (a/q + β) =
p≤n
p≡h (mod q)
1≤h≤q
=
X
e(ah/q)
X
e(ah/q)
X
(log p)e(βp) + O(q).
(5.36)
p≤n
p≡h (mod q)
1≤h≤q
(h,q)=1
When (h, q) = 1, we have
X
(log p)e(βp)
p≤n
p≡h (mod q)
1≤h≤q
=
X
X
(log p)e(βp) =
p≤n
p≡h (mod q)
=
√ n
Λ(m)e(βm) + O
m≤n
m≡h (mod q)
Z n
e(βx) dψ(x; q, h) + O
0
By the Siegel–Walfisz theorem in the form of (3.62),
√ n .
∆(x; q, h) = ψ(x; q, a) − x/φ(q) n(log n)−3B ,
for all x ≤ n. Hence,
Z n
Z n
e(βx) d∆(x; q, h) |∆(n; q, h)| + 1 + |β|
|∆(x; q, h)| dx
0
2
Z n
−3B
n(log n) + |β|
n(log n)−3B dx n(log n)−2B .
0
84
(5.37)
Combining this estimate and (5.37), we get
X
p≤n
p≡h (mod q)
Z
1
(log p)e(βp) =
φ(q)
0
n
e(βx) dx + O n(log n)−2B .
(5.38)
Since (see Exercise 4.3)
cq (a) =
X
e(ah/q) = µ(q),
1≤h≤q
(h,q)=1
the desired conclusion follows from (5.36) and (5.38).
By Lemma 5.4 with 3B in place of B,
f (a/q + β)3 = µ(q)φ(q)−3 v(β)3 + O n3 P−3 .
Since the measure of M is O P2 n−1 , inserting this approximation into the right side of (5.35), we
obtain
Z
X µ(q)cq (−n) Z
3
3
2 −1
f (α) e(−αn) dα =
v(β)
e(−βn)
dβ
+
O
n
P
.
(5.39)
φ(q)3
M
M(q,0)
q≤P
At this point, we extend the integration over β to the whole real line. Since
v(β) n(1 + n|β|)−1 ,
the error we incur from doing this is
Z ∞
X
−2
φ(q)
P/(qn)
q≤P
(5.40)
X q2
n3 dβ
2 −2
n
P
n2 P−1 .
3
2
(1 + nβ)
φ(q)
q≤P
The last step uses the result of Exercise 5, which at the same time implies that
X
X
µ(q)cq (−n)φ(q)−3 φ(q)−2 1.
q≤P
q≤P
Hence, we deduce from (5.39) that
Z
f (α)3 e(−αn) dα = S(n, P)J(n) + O n2 P−1 ,
M
where
S(n, X) =
X
−3
µ(q)cq (−n)φ(q) ,
J(n) =
q≤X
Z
∞
v(β)3 e(−βn) dβ.
−∞
By Fourier’s inversion formula, J(n) = 21 n2 , and by Exercise 5,
X
S(n, P) − S(n, ∞) φ(q)−2 P−1 .
q>P
85
(5.41)
It follows that
Z
M
f (α)3 e(−αn) dα = 21 n2 S(n, ∞) + O n2 P−1 .
Finally, we remark that S(n, ∞) equals the product S(n) defined in (5.25). Indeed, this follows
from Lemma 1.16, on noting that the function g(q) = µ(q)cq (−n)φ(q)−3 is multiplicative and

−3

if u = 1 and p - n,
(p − 1)
u
−2
g(p ) = −(p − 1)
if u = 1 and p | n,


0
if u ≥ 2.
Thus, (5.33) is established.
5.4.3 The minor arcs
We now turn to (5.34). The modulus of the left side does not exceed
Z
Z 1
3
| f (α)| dα ≤ sup | f (α)|
| f (α)|2 dα.
m
m
(5.42)
0
By Parseval’s identity and the PNT,
Z 1
X
| f (α)|2 dα =
(log p)2 n log n.
0
p≤n
Thus, (5.34) will follow from (5.42), if we show that
sup | f (α)| n(log n)−A−1 .
(5.43)
m
We note that the trivial estimate for f (α) is
X
f (α) (log p) n,
p≤n
so our goal is to save a power of log n over the trivial estimate for f (α). We can do this using the
following lemma, which provides such a saving under the assumption that α can be approximated
by a reduced fraction whose denominator q is “neither too small, nor too large.”
Lemma 5.5. Suppose that α, δ are real and a, q are integers satisfying
1 ≤ q ≤ n,
(a, q) = 1,
|α − a/q| ≤ δ.
Then
f (α) (log n)5 (1 + δn) nq−1/2 + n5/6 + n2/3 q1/3 .
86
This estimate for f (α) is not quite the best known, but it has the advantage that it can be deduced
quickly from the work in §5.2. The sharpest known bound for f (α) is
f (α) (log n)4 nq−1/2 + n4/5 + n1/2 q1/2 ;
(5.44)
this holds under the standard assumption that |α − a/q| ≤ q−2 . For the proof of (5.44) see Vaughan
[54, Theorem 3.1] or Exercise 7 after the chapter. It is also possible to apply more carefully the
ideas used in the proof of Lemma 5.5 to prove, again when |α − a/q| ≤ q−2 , that
f (α) (log n)4 nq−1/2 + n7/8 q−1/8 + n3/4 q1/8 + n1/2 q1/2 .
For the proof of this result, see Vaughan [52]; that paper is also where identity (5.6) first appeared
and contains a proof of the (so far) strongest version of the Bombieri–Vinogradov theorem.
Proof. We will derive the lemma from the inequality
√ X
q
|ψ(x, χ)| (log qx)5 xq−1/2 + x5/6 + x2/3 q1/3 .
φ(q) χ mod q
(5.45)
First, we note that the contribution from the principal character modulo q is
≤ xq1/2 φ(q)−1 xq−1/2 log log q,
by Exercise 5(a). We estimate the average over the non-principal characters similarly to the sum
Σ2 in the proof of Theorem 5.2. In place of (5.15), we have
√ X
q
|ψ(x, χ)| q−1/2 (log qx) Σ03 + Σ04 + Σ05 + qU ,
φ(q) χ mod q
χ,χ0
where Σ0j is similar to Σ j and 1 ≤ U ≤ x1/3 is a parameter at our disposal. We can estimate each Σ0j
analogously to the respective Σ j , the only difference being that instead of (5.17), we appeal to the
inequality
M K
M
1/2 X
1/2
K
X X
X
1/2
1/2 X
2
2
.
am bk χ(mk) M + q
K +q
|am |
|bk |
χ mod q
m=1 k=1
m=1
k=1
Altogether, we obtain
√ X
q
|ψ(x, χ)| (log qx)5 xq−1/2 + xU −1/2 + x1/2 q1/2 + qU .
φ(q) χ mod q
We now choose
U = min x1/3 , (x/q)2/3 ,
87
so that
xU −1/2 x5/6 + x2/3 q1/3
qU x2/3 q1/3 .
and
Since x1/2 q1/2 ≤ x2/3 q1/3 , (5.45) follows.
We now turn to the estimation of f (α). We have
X
f (α) =
Λ(m)e(αm) + O n1/2 .
(5.46)
m≤n
By partial summation,
X
m≤n
X
Λ(m)e(αm) 1 + n|α − a/q| Λ(m)e(am/q),
(5.47)
m≤x
for some x ≤ n. Similarly to (3.56) and (3.58), we obtain
X
X
Λ(m)e(am/q) =
e(ah/q)ψ(x; q, h) + O (log qx)2
m≤x
1≤h≤q
(h,q)=1
=
1 X X
e(ah/q)χ̄(h)ψ(x, χ) + O (log qx)2
φ(q) χ mod q 1≤h≤q
(h,q)=1
1 X
=
τ(χ̄, a)ψ(x, χ) + O (log qx)2 .
φ(q) χ mod q
Since (a, q) = 1, it follows from Lemmas 3.5 and 3.6 that |τ(χ̄, a)| ≤
√
q. Hence,
√ X
q
|ψ(x, χ)| + (log qx)2 ,
Λ(m)e(am/q) φ(q)
m≤x
χ mod q
X
(5.48)
and the desired conclusion follows from (5.45)–(5.48).
Before we can derive (5.43) from Lemma 5.5, we need to state a simple lemma known as
Dirichlet’s theorem on Diophantine approximation.
Lemma 5.6 (Dirichlet). Let α and Q be real and Q ≥ 1. There exist integers a and q such that
1 ≤ q ≤ Q,
(a, q) = 1,
|qα − a| < Q−1 .
Proof. See Vaughan [54, Lemma 2.1] or Exercises 8 and 9.
Proof of (5.43). Suppose that α ∈ m. By Lemma 5.6 with Q = nP−1 , there are integers a and q
such that
1 ≤ q ≤ nP−1 ,
(a, q) = 1,
|qα − a| < Pn−1 .
88
Since α < M, it is not possible to have q ≤ P, and so,
P ≤ q ≤ nP−1 ,
(a, q) = 1,
We now apply Lemma 5.5 (with δ = n−1 ) and obtain
|α − a/q| < n−1 .
f (α) (log n)5 nq−1/2 + n5/6 + n2/3 q1/3
(log n)5 nP−1/2 + n5/6 + nP−1/3 n(log n)5−B/3 .
Thus, (5.43) follows on choosing B ≥ 3A + 18.
5.5 Primes in short intervals
In this section we discuss the following result mentioned in the Introduction.
Theorem 6 (Huxley). Let > 0 be fixed. Then for x ≥ x0 () and x7/12+ ≤ y ≤ x,
ψ(x) − ψ(x − y) = y + O y(log x)−1 .
(5.49)
We deduce Huxley’s theorem from the following two results.
Theorem 5.7 (Korobov; I. M. Vinogradov). There is an absolute constant c1 > 0 such that
β ≥ 1 − c1 (log(|γ| + 3))−2/3 (log log(|γ| + 3))−1/3 .
Theorem 5.8 (Huxley). Given 0 ≤ σ ≤ 1 and T ≥ 2, define
N(σ, T ) = ρ = β + iγ : ζ(ρ) = 0, σ ≤ β ≤ 1, |γ| ≤ T .
There is an absolute constant c2 > 0 such that
N(σ, T ) T 2.4(1−σ) (log T )c2 .
Theorem 5.7 is the Vinogradov–Korobov zerofree region underlying the modern error term
(0.9) in the PNT. The reader will find its proof in Ivić [31], Karatsuba and Voronin [37], or Titchmarsh [50]. Theorem 5.8, whose proof forms the bulk of this section, is an example of a zerodensity theorem. By virtue of Corollary 2.20, we have the trivial bound
N(σ, T ) ≤ N(0, T ) T (log T ).
(5.50)
A zero-density theorem is an inequality of the form
N(σ, T ) T A(σ)(1−σ) (log T )c(σ) ,
(5.51)
where A(σ) and c(σ) are such that (5.51) represents an improvement over (5.50) for σ in some
subinterval of 1/2 ≤ σ ≤ 1 (when σ ≤ 1/2, (5.50) is best possible). Results in which A(σ) is a
constant are of particular interest for applications. Theorem 5.8 above was established by Huxley [27] and is the sharpest known result of this type. It should be compared with the conjectural
bound
N(σ, T ) T 2(1−σ) (log T )c3
(1/2 ≤ σ ≤ 1),
(5.52)
which is known as the Density Hypothesis and in many situations can be used as a substitute
for RH.
89
5.5.1 Proof of Theorem 6
By Theorem 2.22 with T = x5/12−/2 ,
ψ(x) − ψ(x − y) = y −
X xρ − (x − y)ρ
+ O x7/12+/2 (log x)2 .
ρ
| Im ρ|≤T
(5.53)
On writing ρ = β + iγ, we have
Z
ρ
x − (x − y)ρ x ρ−1 =
u du ≤ yxβ−1 ,
ρ
x−y
whence
X xρ − (x − y)ρ
X
y
xβ−1 .
ρ
| Im ρ|≤T
|γ|≤T
Clearly, (5.49) follows from (5.53), (5.54), and the inequality
X
xβ−1 (log x)−1 ,
(5.54)
(5.55)
|γ|≤T
which we now proceed to prove.
Let
δ(T ) = c1 (log T )−2/3 (log log T )−1/3 ,
where c1 is the constant from Theorem 5.7. By Theorem 5.7 and partial integration,
Z 1−δ(T )
X
β−1
x =−
xσ−1 dN(σ, T )
0
| Im ρ|≤T
−1
= x N(0, T ) + (log x)
Z
1−δ(T )
N(σ, T )xσ−1 dσ.
0
We use (5.50) to bound N(0, T ) and Theorem 5.8 to bound N(σ, T ) under the sign of the integral.
We find that
Z 1−δ(T )
X
1−σ
β−1
−1
c2 +1
x x T log T + (log x)
dσ
x−1 T 2.4
| Im ρ|≤T
0
x−1/2 + (log x)c2 +1 x−δ(T ) (log x)−1 ,
on noting that
x−δ(T ) exp − c4 (log x)1/4 (log x)−c2 −2 .
This establishes (5.55) and completes the proof of the theorem.
90
5.5.2 Huxley’s density theorem: zero detection
Without loss of generality we may assume that 7/12 ≤ σ ≤ 1 and T ≥ T 0 . We start from the
integral transform
Z 2+i∞
1
−x
Γ(s)x−s ds
(x > 0),
(5.56)
e =
2πi 2−i∞
which follows from (2.10) by Mellin inversion. We introduce parameters X and Y, which we will
specify later. We define
X
MX (s) =
µ(m)m−s
m≤X
and observe that, by Lemma 1.2,
F X (s) = ζ(s)MX (s) =
∞
X
an n−s = 1 +
X
an n−s ,
an =
n>X
n=1
X
k|n
k≤X
µ(k) d(n).
Applying (5.56) with x = n/Y and summing the resulting identities over n, we get
Z 2+i∞
X
1
−w −n/Y
−1/Y
F X (w + s)Γ(s)Y s ds
(Re(w) > 0).
e
+
an n e
=
2πi
2−i∞
n>X
(5.57)
Suppose that ρ = β + iγ is a zero of ζ(s) counted by N(σ, T ). We move the integral in (5.57) to
the line Re s = 21 − β. The only singularities of the integrand in the strip 12 − β < Re s < 2 are the
pole of ζ(s + w) at s = 1 − w and the pole of Γ(s) at s = 0. Furthermore, when w = ρ, ζ(ρ + s) has
a zero at s = 0 that cancels the pole of the gamma-function. Hence,
Z 1/2−β+i∞
X
1
−1/Y
−ρ −n/Y
1−ρ
e
+
an n e
= MX (1)Γ(1 − ρ)Y +
F X (ρ + s)Γ(s)Y s ds.
(5.58)
2πi 1/2−β−i∞
n>X
In order to simplify this identity, we now suppose that
T 0.01 ≤ X ≤ T 10
T 0.01 ≤ Y ≤ T 10 .
and
(5.59)
The terms with n ≥ Y(log T )2 contribute o(1) to the left side of (5.59). Also, by Corollary 2.8,
the first term on the rightof (5.59) is o(1) unless |γ| ≤ (log T )2 . Therefore, apart from the zeros
counted by N σ, (log T )2 , all zeros counted by N(σ, T ) fall in one of the following classes:
• Class I: zeros ρ with
• Class II: zeros ρ with
Z
X
an n e
X<n≤Y(log T )2
1/2−β+i∞
1/2−β−i∞
> 1/3;
−ρ −n/Y F X (s + ρ)Γ(s)Y ds > 1/3.
s
91
(5.60)
(5.61)
We subdivide the interval X < n ≤ Y(log T )2 into O(log T ) subintervals N < n ≤ N1 , N1 ≤ 2N, and
note that the number of zeros of class I does not exceed R1 (log T ), where R1 is the number of zeros
ρ with
X
−1
−ρ −n/Y (5.62)
a
n
e
n
(log T ) .
N<n≤N1
Furthermore, by Corollary 2.20,
R1 |R1 |(log T ),
where R1 is a set of zeros of ζ(s) that satisfy (5.62) and
| Im ρ1 − Im ρ2| ≥ 1
whenever ρ1 , ρ2 , ρi ∈ R1 .
(5.63)
Similarly, the number of class II zeros is bounded by |R2 |(log T ), where R2 is a set of zeros of ζ(s)
that satisfy (5.61) and (5.63). We conclude that
N(σ, T ) N σ, (log T )2 + |R1 |(log T )2 + |R2 |(log T )
(|R1 | + |R2 | + log T )(log T )2 .
(5.64)
5.5.3 Huxley’s density theorem: 1/2 ≤ σ ≤ 3/4
We first bound |R1 |. We write bn = an e−n/Y (log n) and denote by S1 the set of imaginary parts γ of
zeros ρ ∈ R1 . Then
2
Z ∞
X X
−u−iγ
2
b
n
du
|R1 | (log T )
n
(log T )2
2
XZ
(log T ) N
(log T )2 N −σ
2
∞ X
−u−iγ bn n
du
σ
γ∈S1
−σ
β
N<n≤N1
ρ∈R1
N<n≤N1
XZ
γ∈S1
Z
∞
σ
∞
σ
2
X
−u−iγ bn n
N du
u
N<n≤N1
2
X X
u
−u−iγ
du.
N
bn n
γ∈S1
N<n≤N1
An appeal to Lemma 4.12 with δ = 1 now gives
Z ∞
X
−σ
2
|R1 | N (N + T )(log T )
Nu
|bn |2 n−2u du
σ
N −2σ (N + T )(log T )
X
N<n≤2N
N<n≤2N
d(n)2 (log n)2 N 2−2σ + T N 1−2σ (log T )6 ,
by Theorem 1.23 with k = 2. Recalling that X ≤ N ≤ Y(log T )2 , we deduce that
|R1 | Y 2−2σ + T X 1−2σ (log T )8 .
92
(5.65)
We now turn to class II zeros. By (5.61),
Z ∞
4/3
X
2(1−2β)/3
F X (1/2 + i(t + γ))Γ(1/2 − β + it) dt
|R2 | Y
.
(5.66)
−∞
ρ∈R2
Since 7/12 ≤ β ≤ 1, Lemmas 2.4 and 2.5 and Corollary 2.8 yield
Γ(1/2 − β + it) e−|t| .
(5.67)
Thus, by (5.66) and Hölder’s inequality,
|R2 | Y
2(1−2σ)/3
∞
−∞
γ∈S2
Y 2(1−2σ)/3
XZ
XZ
∞
−∞
γ∈S2
2/3
Y 2(1−2σ)/3 Σ1/3
1 Σ2 ,
F X (1/2 + i(t + γ))e−|t| dt
4/3
F X (1/2 + i(t + γ))4/3 e−|t| dt
(5.68)
where S2 denotes the set of imaginary parts of the zeros in R2 ,
XZ ∞
ζ(1/2 + i(t + γ))4 e−|t| dt,
Σ1 =
γ∈S2
Σ2 =
−∞
∞
XZ
γ∈S2
−∞
We have
Σ2 =
∞ Z
X X
MX (1/2 + i(t + γ))2 e−|t| dt.
m+1/2
MX (1/2 + i(t + γ))2 e−|t| dt
γ∈S2 m=−∞ m−1/2
∞
X
X Z m+1/2
−|m|
e
m=−∞
∞
X
m=−∞
γ∈S2
−|m|
e
Z
m−1/2
m+T +1
m−T −1
MX (1/2 + i(t + γ))2 dt
MX (1/2 + iu)2 du,
(5.69)
the last inequality being a consequence of (5.63). Since Lemma 4.10 yields
Z m+T +1
X
MX (1/2 + iu)2 du (X + T )
n−1 (X + T )(log T ),
m−T −1
n≤X
we conclude that
Σ2 (X + T )(log T )
∞
X
m=−∞
e−|m| (X + T )(log T ).
For the estimation of Σ1 , we use the following result.
93
(5.70)
Theorem 5.9. Suppose that T ≥ 2. Then
Z T
1
ζ + it 4 dt T (log T )4 .
2
(5.71)
0
Proof. See Ivić [31, Ch. 5], Montgomery [41, Ch. 10], or Titchmarsh [50, §7.5 and §7.6].
By a variant of (5.69),
Σ1 ∞
X
−|m|
e
m=−∞
Z
U(m) −U(m)
ζ 1/2 + iu 4 du,
where U(m) = |m| + T + 1. Hence, we derive from (5.71) that
Σ1 ∞
X
m=−∞
e−|m| (T + |m|)(log(T + |m|))4 T (log T )4 .
(5.72)
Combining (5.68), (5.70), and (5.72), we obtain
|R2 | Y 2(1−2σ)/3 X 2/3 T 1/3 + T (log T )2 .
(5.73)
We now choose X = T , which is consistent with (5.59). Then, by (5.64), (5.65), and (5.73),
N(σ, T ) Y 2−2σ + T 2−2σ + Y 2(1−2σ)/3 T + 1 (log T )10 .
Finally, we put Y = T 3/(4−2σ) (note that Y ≥ T ) and obtain
N(σ, T ) T 3(1−σ)/(2−σ) (log T )10 .
In particular, we have
N(σ, T ) T 2.4(1−σ) (log T )10
whenever 1/2 ≤ σ ≤ 3/4.
(5.74)
5.5.4 The Halász–Montgomery method
We consider the Dirichlet polynomial
D(s) =
X
an n−s ,
N<n≤2N
where an are complex numbers. Suppose that s1 , . . . , sR , sr = σr + itr , are complex numbers such
that
T 0 ≤ t1 < t2 < · · · < tR ≤ T 0 + T,
tr+1 − tr ≥ 1,
α ≤ σr ≤ 1,
(5.75)
and
|D(sr )| ≥ V
for all r = 1, . . . , R.
94
(5.76)
We choose complex numbers b1 , . . . , bR such that |br | = 1 and |D(sr )| = br D(sr ). Then
X
X
X
X
X
|D(sr )| =
br
an n−sr =
an
br n−sr
1≤r≤R
1≤r≤R
N<n≤2N
N<n≤2N
1≤r≤R
2 1/2 X X
1/2
X
−sr 2
br n |an |
≤
N<n≤2N
=
X
1≤r≤R
X
N<n≤2N
br b̄q n
−sr − s̄q
N<n≤2N
N<n≤2N 1≤r,q≤R
≤
X
1≤r,q≤R
1/2 X
|an |
2
1/2
1/2
X
X −s −s̄ 1/2
2
r
q
|an |
n
|br b̄q |
N<n≤2N
N<n≤2N
1/2 1/2
X X
X
−σr −σq +i(tq −tr ) 2
=
.
|an |
n
1≤r,q≤R
N<n≤2N
N<n≤2N
By partial summation, for some N < M ≤ 2N,
X
X i(t −t ) X i(t −t ) −2α −σr −σq +i(tq −tr )
−σr −σq q r q r n
n
n
N
.
N N<n≤M
N<n≤2N
(5.77)
(5.78)
N<n≤M
We deal with the sum over n by means of the following exponential sum estimate.
Lemma 5.10. Suppose that N ≥ 2, X > 0, and f : [N, 2N] → R has two continuous derivatives
that satisfy the conditions
X | f 0 (x)| X
XN −1 | f 00(x)| XN −1
and
for all x ∈ [N, 2N]. Then, for any interval I ⊆ [N, 2N],
X
e( f (n)) (XN)1/2 + X −1 .
n∈I
1
When r , q, we apply the lemma with f (x) = 2π
(tq − tr ) log x. We have X = |tq − tr |N −1 , so
X
X
ni(tq −tr ) =
e( f (n)) |tq − tr |1/2 + N|tq − tr |−1 .
N<n≤M
N<n≤M
Because of (5.75), we can put this inequality in the form
X
ni(tq −tr ) |tq − tr |1/2 + N(|tq − tr | + 1)−1 ,
(5.79)
N<n≤M
in which it is valid even when r = q. Inserting (5.79) into the right side of (5.78), we find that
X
n−σr −σq +i(tq −tr ) N −2α T 1/2 + N 1−2α (|tq − tr | + 1)−1 .
N<n≤2N
95
Thus,
X X
−σ
−σ
+i(t
−t
)
1−2α
r
q
q r n
Σ3 + N −2α R2 T 1/2 ,
N
1≤r,q≤R
where
Σ3 =
(5.80)
N<n≤2N
X
1≤r,q≤R
(|tr − tq | + 1)−1 ≤
X X
1≤r≤R |m|≤T
(|m| + 1)−1 R(log T ).
(5.81)
Combining (5.75)–(5.77), (5.80), and (5.81), we deduce that
R2 V 2 RN(log T ) + R2 T 1/2 N −2α G,
G=
X
N<n≤2N
|an |2 .
(5.82)
Let T 1 = c4 N 4α V 4G−2 for some sufficiently small constant c4 > 0. When T ≤ T 1 , (5.82) yields
R GN 1−2α V −2 (log T ).
When T ≥ T 1 , we first partition the interval [T 0 , T 0 + T ] into O(T/T 1 + 1) subintervals of length at
most T 1 and then bound the number of tr ’s in each subinterval using (5.82). We find that
(5.83)
R T/T 1 + 1 GN 1−2α V −2 (log T ) GN 1−2α V −2 + G3 T N 1−6α V −6 (log T ).
5.5.5 Huxley’s density theorem: 3/4 ≤ σ ≤ 1
In §5.5.3, we estimated |R1 | and |R2 | using the large sieve and Theorem 5.9. In this section we
derive alternative bounds for |R1 | and |R2 | using the Halász–Montgomery large value method. To
bound |R1 | we apply the Halász–Montgomery method to the Dirichlet polynomial on the right side
of (5.62). By (5.83) with V = (log T )−1 and α = σ,
|R1 | GN 1−2σ + G3 T N 1−6σ (log T )c5 ,
where
G≤
Hence, when 2/3 ≤ σ ≤ 1,
X
N<n≤N1
|an |2 X
n≤2N
d(n)2 N(log N)3 .
|R1 | Y 2−2σ + T X 4−6σ (log T )c6 .
(5.84)
We now proceed with the estimation of |R2 |. By (5.61) and (5.67), a class II zero ρ = β + iγ
satisfies the inequality
Z 10 log T
F X (1/2 + i(t + γ)) dt Y σ−1/2 .
−10 log T
96
Let U be a parameter to be chosen later. We partition R2 into two subsets: the set R2,1 of zeros
ρ ∈ R2 such that
MX (1/2 + i(t + γ)) ≤ U −1 Y σ−1/2
for all t with |t| ≤ 10 log T ; and the set R2,2 of the remaining zeros. For zeros ρ ∈ R2,1, we have
Z 10 log T
ζ(1/2 + i(t + γ)) dt,
U
−10 log T
whence
4
U Thus,
Z
10 log T
−10 log T
ζ(1/2 + i(t + γ)) dt
−4
3
|R2,1 | U (log T )
−4
3
U (log T )
−4
5
U (log T )
4
XZ
ρ∈R2,2
Z
2T
Z
2T
−2T
−2T
3
(log T )
Z
10 log T
−10 log T
|ζ(1/2 + i(t + γ))|4 dt.
10 log T
−10 log T
|ζ(1/2 + i(t + γ))|4 dt
|ζ(1/2 + iu)|
4
X
ρ∈R2,1
|γ−u|≤10 log T
1 du
|ζ(1/2 + iu)|4 du.
Using Theorem 5.9, we obtain
|R2,1| T U −4 (log T )9 .
(5.85)
If ρ is a zero in R2,2 , there exists a real number tγ , |tγ − γ| ≤ 10 log T , such that
MX (1/2 + itγ ) ≥ U −1 Y σ−1/2 .
We partition the interval [1, X] into O(log X) subintervals [N, N1 ], N1 < 2N, some of which must
satisfy
X
−1/2−itγ −1 σ−1/2
µ(n)n
(log X)−1 .
(5.86)
U Y
N<n≤N1
Let S(N) denote the subset of R2,1 containing those zeros ρ for which (5.86) holds. Then
|R2,2 | |S(N)|(log T )
for some N, 1 ≤ N ≤ X. By (5.83) with α = 1/2 and V = U −1 Y σ−1/2 (log T )−1 ,
|S(N)| NV −2 + NT V −6 log T.
(5.87)
(5.88)
From (5.85), (5.87), and (5.88),
|R2 | T U −4 + XY 1−2σ U 2 + T XY 3−6σ U 6 (log T )c7 .
97
(5.89)
Upon choosing
U = X −1 Y −3(1−2σ)
(5.89) becomes
1/10
,
|R2 | T X 2/5 Y 6(1−2σ)/5 + X 4/5 Y 2(1−2σ)/5 (log T )c7 .
Combining this inequality with (5.64) and (5.84), we find that
N(σ, T ) Y 2−2σ + T X 4−6σ + T X 2/5 Y 6(1−2σ)/5 + X 4/5 Y 2(1−2σ)/5 (log T )c8 .
(5.90)
Under the assumption that
X 2 Y 4(2σ−1) ≤ T 5 ,
the third term on the right side of (5.90) dominates the fourth, and so
N(σ, T ) Y 2−2σ + T X 4−6σ + T X 2/5 Y 6(1−2σ)/5 (log T )c8 .
Setting
X = Y (2σ−1)/(5σ−3) ,
we derive
provided that
N(σ, T ) Y 2−2σ + T Y (4−6σ)(2σ−1)/(5σ−3) (log T )c8 ,
1
2
Y ≤ T 2 (5σ−3)/(2σ−1) .
(5.91)
(5.92)
Finally, we take
1
2 +σ−1)
Y = T 2 (5σ−3)/(σ
,
which satisfies (5.92) for 2/3 ≤ σ ≤ 1 and turns the bound (5.91) into
2 +σ−1)
N(σ, T ) T (5σ−3)(1−σ)/(σ
(log T )c8 .
In particular,
N(σ, T ) T 2.4(1−σ) (log T )c8
whenever 3/4 ≤ σ ≤ 1.
Together with (5.74), this completes the proof of Huxley’s theorem.
5.6 Primes in almost all short intervals
In this section, we use Huxley’s density theorem (Theorem 5.8) to prove the following result.
Theorem 7. Let E(X, δ) denote the set of real numbers x ∈ [X, 2X] such that
ψ(x) − ψ(x − δx) − δx ≥ δx(log x)−1 .
Suppose that > 0 and A > 0 are fixed, X ≥ X0 (, A), and X −5/6+ ≤ δ ≤ 1. Then
|E(X, δ)| A X(log X)−A ,
the left side representing the Lebesgue measure of E(X, δ).
98
(5.93)
Proof. We have
−2
2
|E(X, δ)| (δX) (log X)
so it suffices to show that
Z 2X
X
Z
2X
X
ψ(x) − ψ(x − δx) − δx2 dx,
ψ(x) − ψ(x − δx) − δx2 dx δ2 X 3 (log X)−A−2 .
(5.94)
By Theorem 2.22 with T = X 5/6−/2 ,
X xρ − (x − δx)ρ
+ O X 1/6+/2 (log X)2
ρ
| Im ρ|≤T
Z
X
ρ
1/6+2/3
=
x ω(ρ) + O X
,
ω(ρ) =
ψ(x) − ψ(x − δx) − δx =
1
uρ−1 du.
1−δ
| Im ρ|≤T
Hence,
Z
2X
X
ψ(x) − ψ(x − δx) − δx2 dx Z
Upon noting that |ω(ρ)| ≤ δ, we obtain
2
Z 2X X
X
ρ
dx =
x
ω(ρ)
X
| Im ρ|≤T
2X
X
2
X ρ
dx + δ2 X 3−/2 .
x
ω(ρ)
X
ω(ρ1 )ω(ρ2 )
| Im ρ1 |≤T | Im ρ2 |≤T
2
δ
We now appeal to the inequality
Z 2X
X
X
X Z
| Im ρ1 |≤T | Im ρ2 |≤T
xβ1 +β2 +i(γ1 −γ2 ) dx Z
2X
xρ1 +ρ̄2 dx
X
2X
ρ1 +ρ̄2
x
X
dx.
δ2
(5.96)
X β1 +β2 +1
,
|γ1 − γ2 | + 1
which follows by partial integration. Using this to bound the right side of (5.96), we get
2
Z 2X X
X X X β1 +β2 +1
ρ
dx δ2
x
ω(ρ)
|γ1 − γ2 | + 1
X
| Im ρ|≤T
|γ |≤T |γ |≤T
1
(5.95)
| Im ρ|≤T
2
X X
|γ1 |≤T |γ2 |≤T
X
X 2β1 +1
δ2 (log T )
X 2β+1 .
|γ1 − γ2 | + 1
|γ|≤T
99
(5.97)
We now write θ(T ) = (log T )−3/4 . By Theorems 5.7 and 5.8 and (5.50),
Z 1−θ(T )
X
2β+1
X
=−
X 2σ+1 dN(σ, T )
0
|γ|≤T
XN(0, T ) + (log X)
Z
1−θ(T )
X 2σ+1 N(σ, T ) dσ
0
c2 +1
XT (log T ) + (log X)
Z
1−θ(T )
X 2σ+1 T 2.4(1−σ) dσ
0
X 3−θ(T ) (log X)c2 +1 X 3 (log X)−A−3 .
(5.98)
The desired bound (5.94) follows from (5.95), (5.97), and (5.98).
5.7 The linear sieve
This section is a (very) brief introduction to sieve methods, without proofs and in the special case
of a “linear sieve”.
5.7.1 The fundamental problem of sieve theory
Let A be a finite integer sequence. We will be concerned with the existence of elements of A that
are primes or, more generally, almost primes Pr , that is, integers having at most r prime divisors,
counted according to multiplicity. We consider a set of prime numbers P and a real parameter
z ≥ 2 and define the sifting function
Y
S (A, P, z) = # a ∈ A : (a, P(z)) = 1 ,
P(z) =
p.
(5.99)
p<z
p∈P
In applications, the set P is usually taken to be the set of possible prime divisors of the elements
of A, so the sifting function (5.99) counts the elements of A free of prime divisors p < z.
For our first attempt at bounding S (A, P, z), we recall Lemma 1.2. It yields
X X
X
S (A, P, z) =
µ(d) =
µ(d)|Ad |,
(5.100)
a∈A d|(a,P(z))
d|P(z)
where
|Ad | = # a ∈ A : a ≡ 0 (mod d) .
To this end, we suppose that there exist a (large) quantity X and a multiplicative function ω(d)
such that |Ad | can be approximated by Xω(d)/d, and we write r(A, d) for the remainder in this
approximation:
ω(d)
|Ad | = X
+ r(A, d).
(5.101)
d
100
We expect r(A, d) to be ‘small’, at least in some average sense over d. Substituting (5.101) into
the right side of (5.100), we find that
S (A, P, z) = XV(z) + R(A, z),
where
V(z) =
X
µ(d)
d|P(z)
ω(d)
,
d
R(A, z) =
X
µ(d)r(A, d).
(5.102)
(5.103)
d|P(z)
We would like to believe that, under ‘ideal circumstances’, (5.103) is an asymptotic formula for
the sifting function S (A, P, z), XV(z) being the main term and R(A, z) the error term. However,
such expectations turn out to be unrealistic (see Exercise 10). Therefore, we need to adjust our
strategy.
Let D > 0 be a parameter to be chosen later in terms of X. Suppose that Λ+ (d) and Λ− (d) are
real-valued functions supported on the squarefree integers d such that
|Λ± (d)| ≤ 1
Furthermore, suppose that
X
d|n
Λ− (d) ≤
Λ± (d) = 0
and
X
d|n
µ(d) ≤
X
Λ+ (d)
d|n
for d ≥ D.
(5.104)
for all n ∈ A.
(5.105)
Using (5.100), (5.101), and the left inequality in (5.105), we obtain
X X
X
S (A, P, z) ≥
Λ− (d) =
Λ− (d)|Ad |
a∈A d|(a,P(z))
d|P(z)
ω(d)
+ r(A, d) ≥ XM− − R,
=
Λ (d) X
d
d|P(z)
X
−
where
M± =
X
d|P(z)
Λ± (d)
ω(d)
,
d
R=
X
d|P(z)
d<D
|r(A, d)|.
(5.106)
In a similar fashion, we can use the right inequality in (5.105) to estimate the sifting function from
above. That is, we have
XM− − R ≤ S (A, P, z) ≤ XM+ + R.
(5.107)
We are now in a position to overcome the difficulty caused by the “error term” in (5.100). The
sum R is similar to the error term R(A, z) defined in (5.101), but unlike R(A, z) we can use the
parameter D to control the number of terms in R. Thus, our general strategy will be to construct
functions Λ± (d) which satisfy (5.104) and (5.105) and for which the sums M± are of the same
order as the sum V(z) defined in (5.102). There are various constructions of such functions Λ± (d).
We will simply state one of the modern sieves in a form suitable for application in §5.8.
101
5.7.2 The Rosser–Iwaniec sieve
The basic form of this sieve method appeared for the first time in an unpublished manuscript by
Rosser, and the full-fledged version was developed independently by Iwaniec [32, 33]. Suppose
that the multiplicative function ω(d) in (5.101) satisfies the condition
−1 κ Y log w2
ω(p)
K
≤
1−
1+
(2 ≤ w1 < w2 ),
(5.108)
p
log
w
log
w
1
1
w ≤p<w
1
2
where κ > 0 is an absolute constant known as the sieve dimension and K > 0 is independent of
w1 and w2 . This inequality is usually interpreted as an average bound for the values taken by ω(p)
when p is prime, since it is consistent with the inequality ω(p) ≤ κ. In the applications we are
interested in, (5.108) holds with κ = 1, so we will state the Rosser–Iwaniec sieve in this special
case: this is the so-called the linear sieve.
Suppose that ω(p) satisfies (5.108) with κ = 1 and that
0 < ω(p) < p
when p ∈ P
and
ω(p) = 0 when p < P.
(5.109)
We put Λ± (1) = 1 and Λ± (d) = 0 if d is not squarefree. If d > 1 is squarefree and has prime
decomposition d = p1 · · · pr , p1 > p2 > · · · > pr , we define
(
(−1)r if p1 · · · p2l p32l+1 < D whenever 0 ≤ l ≤ (r − 1)/2,
Λ+ (d) =
(5.110)
0
otherwise,
(
(−1)r if p1 · · · p2l−1 p32l < D whenever 1 ≤ l ≤ r/2,
(5.111)
Λ− (d) =
0
otherwise.
It can be shown (see Greaves’ book [18] or Iwaniec’s original paper [33]) that these two functions
satisfy conditions (5.104) and (5.105). Furthermore, if the quantities M± are defined by (5.106)
with Λ± (d) given by (5.110) and (5.111), we have
for s ≥ 1,
(5.112)
V(z) ≤ M+ ≤ V(z) F(s) + O e−s (log D)−1/3
for s ≥ 2,
(5.113)
V(z) ≥ M− ≥ V(z) f (s) + O e−s (log D)−1/3
where s = log D/ log z and the functions f (s) and F(s) are the continuous solutions of the following
system of differential delay equations:
f (s) = 0
F(s) = 2eγ s−1
(s f (s))0 = F(s − 1)
(sF(s))0 = f (s − 1)
if 0 < s ≤ 2,
if 0 < s ≤ 3,
if s > 2,
if s > 3.
Here γ is Euler’s constant. The analysis of this system reveals that the function F(s) is strictly
decreasing for s > 0, that the function f (s) is strictly increasing for s > 2, and that
0 < f (s) < 1 < F(s)
102
for s > 2.
(5.114)
Furthermore, both functions are very close to 1 for large s: they satisfy
F(s), f (s) = 1 + O(s−s )
as s → ∞.
(5.115)
Substituting (5.112) and (5.113) into (5.107), we obtain
S (A, P, z) ≤ XV(z) F(s) + O (log D)−1/3 + R
S (A, P, z) ≥ XV(z) f (s) + O (log D)−1/3 − R
for s ≥ 1,
(5.116)
for s ≥ 2.
(5.117)
5.7.3 Two applications
Example 5.7.1. Suppose that 2 ≤ y ≤ x, where x is a (large) real number. We choose A to be the
sequence of integers n ∈ (x − y, x] and P to be the set of all primes. Then
h xi h x − yi y n xo n x − yo
X
−
= −
+
,
|Ad | =
1=
d
d
d
d
d
x−y<md≤x
so (5.101) holds with
X = y,
ω(d) = 1,
r(A, d) = −
and
and in (5.117), one has
XV(z) = y
Y
p<z
1 − p−1 y(log z)−1
n xo
and
d
+
n x − yo
d
,
R D.
Hence, combining (5.114) and (5.117), we obtain
S (A, P, z) y(log x)−1 ,
(5.118)
provided that
D ≤ y1−
and
z ≤ D1/2−
for some fixed > 0.
Choosing y = xθ , D = y1− , and z = D1/2− , we find that there are y(log x)−1 integers
n ∈ (x − xθ , x] that have no prime divisor smaller than xθ/2−2 . Since the numbers in question do
not exceed x, each of the elements of A counted on the left side of (5.118) has at most 2/θ prime
divisors. In particular, we are able to conclude that:
For sufficiently large x, the interval (x − x1/2 , x] contains a P4 -number.
Note that in this case we just miss to show the existence of P3 -numbers in (x − x1/2 , x]. If we
increase the length of the intervals just slightly, we obtain:
For δ > 0 and x ≥ x0 (δ), the interval (x − x1/2+δ , x] contains a P3 -number.
103
Example 5.7.2. Suppose that n is a (large) even integer and set
A= n− p : 2< p<n
and
P= p : p-n .
Then (5.101) is expected to hold with
X = Li n
and
ω(d) =
(
d/φ(d) if (d, n) = 1,
0
otherwise.
The main term in (5.117) is
XV(z) = (Li n)
Y
p<z
p-n
1 − (p − 1)−1 (Li n)(log z)−1 ,
(5.119)
and the error term R is bounded by
Li
n
.
max π(n; d, a) −
D+
(a,d)=1
φ(d)
d≤D
X
In particular, when D ≤ n1/2− , the Bombieri–Vinogradov theorem yields
R n(log n)−3 .
(5.120)
We now choose D = n0.49 and z = n2/9 , so that we have
s=
log D
> 2.2.
log z
Combining (5.114), (5.117), (5.119), and (5.120), we find that
S (A, P, z) n(log n)−2 .
(5.121)
That is, there are n(log n)−2 elements of A that have no prime divisors smaller than n2/9 . Since
the numbers in A do not exceed n, the elements of A counted on the left side of (5.121) have at
most four prime divisors each, that is, the left side of (5.121) counts solutions of n − p = P4 . We
conclude that:
Every sufficiently large even integer n can be represented as the sum of a prime and a
P4 -number.
The results of both examples can be strengthened significantly. Chen [9, 10] has proved the
following two theorems.
Theorem 8 (Chen). For sufficiently large x, the interval (x − x1/2 , x] always contains a P2 -number.
104
Theorem 9 (Chen). Every sufficiently large even integer n can be represented as the sum of a
prime and a P2 -number.
Obviously, Theorem 9 is the best possible result of its kind, short of a proof of the binary
Goldbach conjecture. Its proof is too involved to include in these lectures, but those interested can
find all the details in Halberstam and Richert [19, Ch. 11]. We present the proof of Theorem 8 in
§5.8. However, unlike Theorem 9, Theorem 8 can and has been improved on (several times). The
best result to date is due to Liu [39] and states that all intervals of the form (x − x0.436 , x], x ≥ x0 ,
contain a P2 -number.
5.7.4 The bilinear form of the error term in the linear sieve
Assume that A is an integer sequence such that (5.101) holds with a function ω subject to (5.108)
with κ = 1 and
X X ω(pν )
L
≤
(2 ≤ ω1 < ω2 ),
ν
p
log 3w1
w ≤p<w ν≥2
1
2
where L > 0 is independent of w1 , w2 . Inspired by Chen’s original proof of Theorem 8, Iwaniec [32]
obtained the following more flexible form of the linear Rosser–Iwaniec sieve.
Theorem 5.11 (Iwaniec). Suppose that 0 < < 1/3, M, N ≥ 2, D = MN. Then
S (A, P, z) ≤ XV(z) (F(s) + E(, D, K, L)) + R+ (M, N),
S (A, P, z) ≥ XV(z) ( f (s) − E(, D, K, L)) − R+ (M, N),
(5.122)
(5.123)
where s = log D/ log z, E(, D, K, L) + −8 eK+L (log D)−1/3 , and
XXX
R± (M, N) =
a±m, j b±n, j r(A, mn),
J = exp 8 −3 .
j≤J m≤M n≤N
m|P(z) n|P(z)
The coefficients a±m, j , b±n, j depend at most on , M, N (but not on A) and satisfy |a±m, j| ≤ 1, |b±n, j | ≤ 1.
The importance of this result is that it allows us to replace the error term R defined by (5.106)
with a bounded number of sums of the form
XX
am bn r(A, mn),
m≤M n≤N
where |am | ≤ 1, |bn | ≤ 1. In many applications, one can exploit the arithmetic properties of
the sequence A to estimate such double sums more effectively. To illustrate this, we return to
Example 5.7.1. Suppose that y = xθ , 2/5 < θ < 3/5. We will show (see Lemma 5.13 below) that
in this situation one can obtain a satisfactory bound for R±(M, N) under the hypotheses
M ≤ xθ−6 ,
MN 2 ≤ x(5θ−1)/2−10 .
Therefore, upon choosing 0 < < 0 (δ), M = xθ−6 , and N = x(3θ−1)/4−2 in Theorem 5.11, we can
replace the parameter D = xθ− in Example 5.7.1 by D = MN = x(7θ−1)/4−8 to obtain:
For δ > 0 and x ≥ x0 (δ), the interval (x − x3/7+δ , x] contains a P3 -number.
105
5.8 Almost primes in short intervals
In this section we establish Theorem 8. That result and its subsequent improvements rest on two
new ideas: more sophisticated sieve machinery and the Fourier analysis of the remainders r(A, d).
5.8.1 The remainders r(A, d)
In Example 5.7.1 we estimated the error term in (5.117) trivially. In this section, we use Fourier
analytic techniques to exploit the oscillation of the remainders r(A, mn) in (5.122) and (5.123).
Lemma 5.12. Suppose that > 0, x2 ≤ y ≤ x, 2 ≤ M < M1 ≤ 2M, 2 ≤ N < N1 ≤ 2N,
y ≤ MN ≤ x, and am , bn are complex numbers with |am | ≤ 1, |bn | ≤ 1. We define
h xi h x − yi y
r(x, y; d) =
−
− .
d
d
d
There exist a real number X ∈ [x/2, 2x] and (complex) coefficients a∗m , b∗n , with |a∗m | ≤ 1, |b∗n | ≤ 1,
such that
X X
X X X ∗ ∗
Xh −1 am bn e
am bn r(x, y; mn) y(MN) + yx− ,
mn
M<m≤M N<n≤N 1≤h≤H
M<m≤M N<n≤N
1
1
1
1
with H = MNy−1 x3 .
Proof. Let f be a C ∞ -function, supported in [x − y − yx−2 , x + yx−2 ] and such that
f (u) = 1
(x − y ≤ u ≤ x)
f ( j) (u) (yx−2 )− j
and
( j ≥ 0).
(5.124)
(See Exercise 12 for one possible construction of such a function.) Then, by Lemma 1.21,
X
X X X
X X
am bn −
am bn f (kmn)
M<m≤M1 N<n≤N1 x−y<kmn≤x
that is,
X
X
M<m≤M1 N<n≤N1
X
x−y−yx−2 <u≤x−y
am bn
d(u)2 +
M<m≤M1 N<n≤N1
X
x<u≤x+yx−2
k
d(u)2 yx− ,
h x i h x − y i
X X X
−
=
am bn f (kmn) + O(yx− ).
mn
mn
M<m≤M N<n≤N
k
1
(5.125)
1
Let gr (u) = f (ur). Applying the Poisson summation formula (see Zygmund [59, eq. (II.13.4)]) to
the sum over k, we obtain
X h X
X
X
−1
f (kmn) =
gmn (k) =
ĝmn (h) = (mn)
fˆ
,
mn
h
k
k
h
106
where fˆ is the Fourier transform of f . Hence, we can rewrite (5.125) as
h x i h x − y i
X X
X X X am bn h fˆ
am bn
−
=
+ O(yx− ).
mn
mn
mn
mn
M<m≤M N<n≤N
M<m≤M N<n≤N
h
1
1
1
(5.126)
1
The contribution to the right side from the terms with h = 0 is
fˆ(0)
X
M<m≤M1
X am bn
X X am bn
=y
+ O(yx− ),
mn
mn
N<n≤N
M<m≤M N<n≤N
1
1
1
so it follows from (5.126) that
X
X
am bn r(x, y; mn) =
M<m≤M1 N<n≤N1
X
M<m≤M1
X X am bn h + O(yx− ).
fˆ
mn
mn
N<n≤N h,0
(5.127)
1
We now proceed to estimate the tails of the series over h. Choose an integer r ≥ 3 + −1 . By
(5.124) and r-fold partial integration,
Z ∞
−r
fˆ(t) = (−2πit)
f (r)(u)e(−ut) du y(yx−2 |t|)−r
(r ≥ 0).
−∞
Thus, the contribution to the right side of (5.127) from terms with |h| > H is
−2 −r
X yx−2 |h| −r
yx H
yH
MN x(3−r) 1.
y
MN
MN
|h|>H
Therefore, (5.127) yields
X
X
M<m≤M1 N<n≤N1
am bn r(x, y; mn) Z
∞
−∞
X
X
M<m≤M1 N<n≤N1
X am bn −uh | f (u)| du + yx− .
e
mn
mn 0<|h|≤H
Recalling that f is supported on a subset of [x/2, 2x] of measure O(y), we conclude that
X X
X X X ∗ ∗
Xh −1 −
am bn r(x, y; mn) y(MN) am bn e
+ yx ,
mn
M<m≤M N<n≤N
M<m≤M N<n≤N 1≤h≤H
1
1
1
1
where |a∗m | ≤ 1, |b∗n | ≤ 1, and x/2 ≤ X ≤ 2x.
Lemma 5.13. Suppose that > 0, x2 ≤ y ≤ x, 2 ≤ M < M1 ≤ 2M, 2 ≤ N < N1 ≤ 2N,
y ≤ MN ≤ x, and am , bn are complex numbers with |am | ≤ 1, |bn | ≤ 1. Also, suppose that
M ≤ yx−6 ,
Then
MN ≤ y1/2 x1/2−3 ,
X
X
M<m≤M1 N<n≤N1
MN 2 ≤ y5/2 x−1/2−10 .
am bn r(x, y; mn) yx− ,
where r(x, y; mn) is the function defined in Lemma 5.12.
107
(5.128)
Proof. Let H = MNy−1 x3 . By Lemma 5.12, it suffices to show that
X X X
Xh
∗ ∗
S (H, M, N) =
am bn e
MN x− .
mn
M<m≤M N<n≤N 1≤h≤H
1
(5.129)
1
Here a∗m , b∗n , and X are as in Lemma 5.12. By Cauchy’s inequality,
2
X X X
Xh ∗
|S (H, M, N)| M
b
e
n
mn M<m≤2M N<n≤N1 1≤h≤H
X
X X
h
h
X
2
1
.
−
M
e
m
n
n
1
2
N<n ,n ≤2N 1≤h ,h ≤H M<m≤2M
2
1
2
1
(5.130)
2
We now group the quadruples (h1 , h2, n1 , n2 ) according to the value of the determinant ∆ = h1 n2 −
h2 n1 . When ∆ = 0, we bound the sum over m in (5.130) trivially by M. When ∆ , 0, we appeal to
Lemma 5.10 with f (m) = ∆X(n1 n2 m)−1 . We get
X
M<m≤2M
e
∆X
mn1 n2
|∆|X
MN 2
1/2
+
M2 N 2
.
|∆|X
Writing δ(k) for the number of quadruples with ∆ = k, we conclude that
1/2
M2 N 2
|k|X
.
+
|S (H, M, N)| δ(0)M +
δ(k)
MN 2
|k|X
0<|k|≤2HN
2
X
2
(5.131)
For |k| ≤ 2HN, we have
δ(k) ≤
X
X
N<n≤2N 1≤h≤H
d(hn + |k|) (HN)1+/2 ,
so (5.131) yields
(HN)− |S (H, M, N)|2 M 2 NH + X 1/2 M 1/2 N 3/2 H 5/2 + X −1 M 3 N 3 H
M 2 N 2 My−1 x3 + MN 2 y−5/2 x1/2+7.5 + M 2 N 2 y−1 x−1+3 .
Since (HN) ≤ x/2 , (5.129) follows from (5.132) and the hypotheses (5.128).
(5.132)
5.8.2 Proof of Theorem 8
Let y = x1/2 , z = xδ , P be the set of all primes, A the sequence of integers n ∈ (x − y, x]. We write
S (A, w) for S (A, P, w),
Y
Y
P(w) =
p,
V(w) =
1 − p−1 .
p<w
p<w
108
Our starting point is the sum
X
X log p
α−β
1−
Σ(α, β, δ) =
,
log x
x−y<n≤x
p|n
(n,P(z))=1
where α, β, δ are positive absolute constants to be chosen later. On the one hand, we have
X X Σ(α, β, δ) ≤
β+1−α
1 .
x−y<n≤x
(n,P(z))=1
p|n
Hence, if we assume that
3α ≥ β + 1,
(5.133)
only P2 -numbers n will contribute positive terms to Σ(α, β, δ). In particular, the theorem will follow
if we show that
Σ(α, β, δ) > 0
for some α, β, δ satisfying (5.133). On the other hand,
X
log p
S (A p , z)
α−β
Σ(α, β, δ) = S (A, z) −
log x
p≥z
X log p
≥ S (A, z) −
α−β
S (A p, z).
log
x
α/β
(5.134)
z≤p≤x
First, we proceed to obtain a lower bound for S (A, z). Put 0 = 10−6 . By Lemma 5.13 with
M ≤ x1/2−60 and N ≤ x1/8−20 , we have
X X
am bn r(A, mn) yx−0
(5.135)
M<m≤M1 N<n≤N1
for any choice of M1 ≤ 2M, N1 ≤ 2N, |am | ≤ 1, |bn | ≤ 1. We now appeal to Theorem 5.11 with
M = x1/2−60 , N = x1/8−20 , X = y, z = xδ . It yields
S (A, z) ≥ yV(z) f δ−1 (5/8 − 80 ) − c9 − O (log x)−1/3 .
Note that we have used (5.135) to estimate the remainder R− (M, N) in (5.123). Choosing =
(2000c9 )−1 , we deduce that, for x → ∞,
S (A, z) ≥ yV(z) f δ−1 (5/8 − 80 ) − 0.001 .
(5.136)
Next, we turn to the sum on the right side of (5.134). To this end we require that
α/β ≤ 1/2 − 70 .
109
(5.137)
Using a dyadic argument, we split the interval [z, xα/β ] into O(log x) subintervals [xu , Cxu ], C ≤ 2.
For p ∈ [xu , Cxu ], we apply Theorem 5.11 with M = Mu = x1/2−u−60 , N = x1/8−20 , X = X p = yp−1 ,
z = xδ :
5/8 − u − 80
−1/3
S (A p , z) ≤ X p V(z) F
+ c9 + O (log x)
+ R p (Mu , N),
(5.138)
δ
where
R p (Mu , N) =
XX X
am, j bn, j r(A p , mn).
j≤J m≤Mu n≤N
m|P(z) n|P(z)
Here, the coefficients am, j , bn, j independent of p and satisfy |am, j | ≤ 1, |bn, j | ≤ 1. When m and n are
divisors of P(z) and p ≥ z, we have r(A p , mn) = r(A, mnp). Hence, on writing k = pm, we get
X XX
log p
ck bn r(A, kn),
R p (Mu , N) α−β
log x
xu <p≤Cxu
k≤K n≤N
where K = 2x1/2−60 , |ck | ≤ 1, |bn | ≤ 1. Thus, by (5.135),
X log p
α−β
R p (Mu , N) yx−0 /2 .
log
x
xu <p≤Cxu
Furthermore, when xu ≤ p ≤ 2xu , we have
5/8 − u − 80
5/8 − 80 log p
F
−F
(log x)−1 .
−
δ
δ
log z
(5.139)
(5.140)
Combining (5.138)–(5.140), we find that
X log p
α−β
S (A p , z) ≤ yV(z) σ1 + σ2 c9 + O σ2 (log x)−1/3 ,
log x
α/β
z≤p≤x
where
log p
5/8 − 80 log p
σ1 =
α−β
F
,
−
log x
δ
log z
xδ ≤p≤xα/β
X 1
log p
σ2 =
α−β
.
p
log x
δ
α/β
X
1
p
x ≤p≤x
By Theorem 1.9, σ2 1, so choosing sufficiently small, we conclude that when x → ∞,
X log p
α−β
S (A p , z) ≤ yV(z)(σ1 + 0.001).
(5.141)
log x
α/β
z≤p≤x
110
Finally, we evaluate σ1 . Let g(u) = x−u (α − βu)F δ−1 (5/8 − 80 − u) and note that
g(u) x−u
g0 (u) x−u (log x).
and
Using Stieltjes integration by parts and the PNT, we obtain
Z α/β
σ1 =
g(u) d π(xu ) − π(xδ )
δ
Z α/β
π(xu ) − π(xδ ) g0 (u) du
=−
δ
Z α/β Z xu
dt
=−
dg(u) + O (log x)−1
xδ log t
δ
Z α/β
g(u)xu (log x)
−1
du
+
O
(log
x)
=
log(xu )
δ
Z α/β
=
u−1 (α − βu)F δ−1 (5/8 − 80 − u) du + O (log x)−1 .
δ
From the last calculation, (5.134), (5.136), and (5.141), we deduce that
Σ(α, β, δ) ≥ yV(z)(σ3 − 0.003),
where
−1
σ3 = f δ (5/8 − 80 ) −
Z
α/β
δ
u−1 (α − βu)F δ−1 (5/8 − 80 − u) du.
Hence, it remains to choose α, β, δ satisfying (5.133) and (5.137) and such that σ3 > 0.003. In
order to simplify the calculations, we choose
δ = 5/32 − 20 ,
α = 4/3,
β = 3,
although a slightly better choice would have been
δ = 5/32 − 20 ,
α = (β + 1)/3,
β = 2.1.
(5.142)
Then (5.133) and (5.137) hold and
Z
1 4/9 −1
σ3 = f (4) −
u (4 − 9u)F 4 − δ−1 u du
3 δ
Z
1 4/9 4 − 9u
γ ln 3
−
du
= 2e
4
3 δ u(4 − δ−1 u)
3
6.75δ
1
γ ln 3
= 2e
− ln
+ 3δ ln
≥ 0.11eγ > 0.11.
4
3
9δ − 1
9δ − 1
This completes the proof of Chen’s theorem.
111
Remark. It is not difficult to obtain some improvement on Theorem 8. Indeed, after changing the
choice of y to y = xθ , we alter our choices so that
δ = (7θ − 1)/16 − 20 ,
α/β ≤ θ − 70 .
(5.143)
Choosing first α = (β + 1)/3 and then β so that the second inequality in (5.143) is nearly exact (cf.
(5.142)), we derive a lower bound
Σ(α, β, δ) ≥ xθ V(z)(σ4 (θ) − 0.003),
where σ4 (θ) is a function of θ similar to σ3 above. Finally, we try to choose θ so that σ4 (θ) > 0.003.
Following this strategy, one easily finds that one can replace the interval (x − x1/2 , x] in Theorem 8
by (x − x0.46 , x]. As we mentioned at the end of §5.7.3, further improvements arise from the use of
sharper exponential sum estimates and/or more sophisticated versions of the sum Σ(α, β, δ) above.
Such matters, however, go beyond the scope of these lectures.
Exercises
1. Prove (5.11).
2. Prove Theorem 5.3.
3.
(a) Suppose that f (x)(log x)−1 → ∞ and
x −1
max ψ(x; q, a) −
x f (x) .
(a,q)=1 φ(q)
q≤Q
X
Show that the asymptotic formula
ψ(x; q, a) =
x
1 + o(1)
φ(q)
as x → ∞
(∗)
holds for all arithmetic progressions a mod q, with 1 ≤ q ≤ min Q, f (x)(log x)−1 and (a, q) = 1. In
particular, a version of the Bombieri–Vinogradov theorem with x exp −2(log x)δ , δ > 0, in place of the
term x(log x)−A on
the right side of (5.9) would establish (∗) for all arithmetic progressions with moduli
q ≤ exp (log x)δ , thus yielding an improvement on the Siegel–Walfisz theorem.
(b) Obtain a variant of the result of part (a) relating to the Barban–Davenport–Halberstam theorem.
4. Prove (5.40).
5. Prove that:
(a) φ(n) n(log log n)−1 for all n ≥ 10;
X n2
(b)
x;
φ(n)2
n≤x
X
(c)
φ(n)−2 x−1 .
n>x
6. Let J(n) be defined by (5.41). Prove that J(n) = 21 n2 .
112
7. The point of this exercise is to establish (5.44).
(a) Use Lemma 5.1 with U = V to decompose f (α) into type I sums of the form (5.3) with f (n) = e(αn)
and M U 2 and type II sums of the form (5.4) with the same f (n) and U M, N xU −1 .
(b) Let Σ1 be any of the type I sums arising from the decomposition in part (a). Prove that
X
Σ1 (log n)
min n/m, kαmk−1 .
m≤U 2
(c) Let Σ2 be any of the type II sums arising from the decomposition in part (a). Prove that
X X
min M, kα(u − v)k−1 ,
|Σ2 |2 M(log n)7
u≤nM−1 v≤nM−1
for some M with U ≤ M ≤ xU −1 .
(d) Suppose that M, N, α are real numbers with M, N ≥ 1, and that |α − a/q| ≤ q−2 with (a, q) = 1. Then
X
min MNm−1 , kαmk−1 MNq−1 + M + q (log 2MNq).
m≤M
H: See Vaughan [54, Lemma 2.2].
(e) Suppose that |α − a/q| ≤ q−2 . Using the results of parts (b)–(d), show that
Σ1 (log n)2 nq−1 + U 2 + q
and
Σ2 (log n)4 nq−1/2 + nU −1/2 + n1/2 U 1/2 + n1/2 q1/2 .
Noting that U = n2/5 is the choice that optimizes these bounds, deduce (5.44).
8. The purpose of this exercise is to establish Lemma 5.6. Let N = [Q] and consider the numbers
0, {α}, {2α}, . . . , {Nα}, 1.
(∗)
Show that some interval (k − 1)(N + 1)−1 , k(N + 1)−1 , 1 ≤ k ≤ N + 1, contains at least two of the numbers (∗).
From this, deduce Dirichlet’s theorem.
9. The purpose of this exercise is to give an alternative proof of Lemma 5.6.
(a) Show that Dirichlet’s theorem is equivalent to the inequality
δα = min knαk : 1 ≤ n ≤ N ≤ (N + 1)−1 .
(b) Suppose that 0 < δ ≤
1
2
and define the 1-periodic function
fδ (x) = max(δ − kxk, 0).
Prove that the nth Fourier coefficient of fδ is given by
(
δ2
fˆδ (n) =
(sin πδm)2/(πm)2
113
if m = 0,
if m , 0.
(∗)
(c) Suppose that δα > 0 and set δ = δα . Observe that
X
δ=
1−
|n|≤N
Deduce that
∞
X
δ=
|n|
fδ (nα).
N+1
fˆδ (m)KN (mα),
m=−∞
where KN (x) is the Fejér kernel
KN (x) =
X
1−
|n|≤N
2
sin π(N + 1)x
1
|n|
.
e(nx) =
N+1
N+1
sin πx
(d) Use the result of part (c) to prove (∗).
10. Let A be the set of integers n ≤ X, P the set of all primes, z = X 1/2 . Observe that with these choices, (5.102)
takes the form
Y
1 − p−1 + R(A, z).
S (A, P, z) = X
p<z
Hence, under the hypothesis
R(A, z) = o X(log X)−1
one obtains
π(X) ∼ X
Y
p<z
as X → ∞,
(∗)
e−γ X 2e−γ X
1 − p−1 ∼
=
,
log z
log X
which contradicts the PNT. Therefore, (∗) must be false.
11. Fill the details of Example 5.7.2.
12. The purpose of this exercise is to construct a C ∞ -function f with the properties required in the proof of
Lemma 5.12.
(a) Define the function
g(x) =
(
exp (x − 1)−1 − x−1
0
if 0 < x < 1,
otherwise.
Show that g ∈ C ∞ (R).
Rx
(b) Let G(x) = −∞ g(t) dt, where g is the function from part (a). Show that the function h(x) = G(x)/G(1)
is a non-decreasing C ∞ -function such that h(x) = 0 when x ≤ 0 and h(x) = 1 when x ≥ 1.
(c) Suppose that α < β and δ > 0. Let h(x) be the function from part (b) and define
f (x) = h((x − α)/δ + 1) − h((x − β)/δ).
Then f is a C ∞ -function, supported in [α − δ, β + δ] and such that
f (x) = 1 (α ≤ x ≤ β)
and
f ( j) (x) δ− j
( j ≥ 0).
13. Prove (5.140).
14. In the remark at the end of §5.8, we sketched the proof of the following result:
For sufficiently large x, the interval (x − x0.46 , x] always contains a P2 -number.
Fill the details of the proof.
114
Bibliography
[1] R. C. Baker and G. Harman, The difference between consecutive primes, Proc. London Math.
Soc. (3) 72 (1996), 261–280.
[2] R. C. Baker, G. Harman, and J. Pintz, The difference between consecutive primes II, Proc.
London Math. Soc. (3) 83 (2001), 532–562.
[3] M. B. Barban, The “large sieve” method and its applications to number theory, Uspehi Mat.
Nauk 21 (1966), 51–102, in Russian.
[4] R. P. Boas, Entire Functions, Academic Press, 1954.
[5] E. Bombieri, On the large sieve, Mathematika 12 (1965), 201–225.
[6] E. Bombieri, J. B. Friedlander, and H. Iwaniec, Primes in arithmetic progressions to large
moduli, Acta Math. 156 (1986), 203–251.
, Primes in arithmetic progressions to large moduli II, Math. Ann. 277 (1987), 361–
[7]
393.
[8]
, Primes in arithmetic progressions to large moduli III, J. Amer. Math. Soc. 2 (1989),
215–224.
[9] J. R. Chen, On the representation of large even integer as the sum of a prime and the product
of at most two primes, Sci. Sinica 16 (1973), 157–175.
[10]
, On the distribution of almost primes in an interval, Sci. Sinica 18 (1975), 611–627.
[11] N. G. Chudakov, On zeros of Dirichlet’s L-functions, Mat. Sb. (N.S.) 1 (1936), 591–602.
[12] H. Cramér, Some theorems concerning prime numbers, Arkiv Mat. Astronom. Fysik 15
(1920), 1–32.
[13]
, On the order of magnitude of the difference between consecutive primes, Acta Arith.
2 (1937), 23–46.
[14] H. Davenport, Multiplicative Number Theory, 3rd ed., Springer-Verlag, 2000.
115
[15] H. Davenport and H. Halberstam, Primes in arithmetic progressions, Michigan Math. J. 13
(1966), 485–489.
[16] P. Erdös, On the difference of consecutive primes, Quart. J. Math. Oxford 6 (1935), 124–128.
[17] S. W. Graham and G. A. Kolesnik, Van der Corput’s Method of Exponential Sums, Cambridge
University Press, 1991.
[18] G. Greaves, Sieves in Number Theory, Springer-Verlag, 2001.
[19] H. Halberstam and H.-E. Richert, Sieve Methods, Academic Press, 1974.
[20] G. H. Hardy, On Dirichlet’s divisor problem, Proc. London Math. Soc. (2) 15 (1916), 1–25.
[21] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio Numerorum’ III: On the expression of a number as a sum of primes, Acta Math. 44 (1923), 1–70.
[22] D. R. Heath-Brown, Prime numbers in short intervals and a generalized Vaughan identity,
Canad. J. Math. 34 (1982), 1365–1377.
[23]
, The number of primes in a short interval, J. Reine Angew. Math. 389 (1988), 22–63.
[24] D. R. Heath-Brown and H. Iwaniec, On the difference between consecutive primes, Invent.
Math. 55 (1979), 49–69.
[25] H. Heilbronn, Über den Primzahlsatz von Herrn Hoheisel, Math. Z. 36 (1933), 394–423.
[26] G. Hoheisel, Primzahlprobleme in der Analysis, Sitz. Preuss. Akad. Wiss. 2 (1930), 1–13.
[27] M. N. Huxley, On the difference between consecutive primes, Invent. Math. 15 (1972), 164–
170.
[28]
, Area, Lattice Points, and Exponential Sums, Oxford University Press, 1996.
[29]
, Exponential sums and lattice points III, Proc. London Math. Soc. (3) 87 (2003),
591–609.
[30] A. E. Ingham, On the difference between consecutive primes, Quart. J. Math. Oxford 8 (1937),
255–266.
[31] A. Ivić, The Riemann Zeta-Function, Wiley, 1985.
[32] H. Iwaniec, A new form of the error term in the linear sieve, Acta Arith. 37 (1980), 307–320.
[33]
, Rosser’s sieve, Acta Arith. 36 (1980), 171–202.
[34] H. Iwaniec and M. Jutila, Primes in short intervals, Ark. Mat. 17 (1979), 167–176.
[35] H. Iwaniec and J. Pintz, Primes in short intervals, Monatsh. Math. 98 (1984), 115–143.
116
[36] C. H. Jia, Almost all short intervals containing prime numbers, Acta Arith. 76 (1996), 21–84.
[37] A. A. Karatsuba and S. M. Voronin, The Riemann Zeta-Function, Walter de Gruyter, 1992.
[38] N. M. Korobov, Estimates of trigonometric sums and their applications, Uspehi Mat. Nauk
13 (1958), 185–192, in Russian.
[39] H. Q. Liu, Almost primes in short intervals, J. Number Theory 57 (1996), 303–322.
[40] S. T. Lou and Q. Yao, A Chebychev’s type of prime number theorem in a short interval II,
Hardy–Ramanujan J. 15 (1992), 1–33.
[41] H. L. Montgomery, Topics in Multiplicative Number Theory, Springer-Verlag, 1971.
[42] H. L. Montgomery and R. C. Vaughan, The large sieve, Mathematika 20 (1973), 119–134.
[43] A. Page, On the number of primes in an arithmetic progression, Proc. London Math. Soc. (2)
39 (1935), 116–141.
[44] J. Pintz, Very large gaps between consecutive primes, J. Number Theory 63 (1997), 286–301.
[45] G. Pólya, Über die Verteilung der quadratischen Reste und Nichtreste, Nachr. Akad. Wiss.
Göttingen Math.-Phys. (1918), 21–29.
[46] R. A. Rankin, The difference between consecutive prime numbers, J. London Math. Soc. 13
(1938), 242–247.
[47] G. F. B. Riemann, Über die Anzahl der Primzahlen unter einen gegebenen Grösse, Monatsber. Berliner Akad. (1859), 671–680.
[48] A. Selberg, On the normal density of primes in small intervals, and the difference between
consecutive primes, Arch. Math. Naturvid. 47 (1943), 87–105.
[49] C. L. Siegel, Über die Classenzahl quadratischer Körper, Acta Arith. 1 (1935), 83–86.
[50] E. C. Titchmarsh, The Theory of the Riemann Zeta-Function, 2nd ed., Oxford University
Press, 1986.
[51] J. D. Vaaler, Some extremal functions in Fourier analysis, Bull. Amer. Math. Soc. (N.S.) 12
(1985), 183–216.
[52] R. C. Vaughan, Mean value theorems in prime number theory, J. London Math. Soc. (2) 10
(1975), 153–162.
[53]
, Sommes trigonométrique sur les nombres premiers, C. R. Acad. Sci. Paris Sér. A
258 (1977), 981–983.
[54]
, The Hardy–Littlewood Method, 2nd ed., Cambridge University Press, 1997.
117
[55] A. I. Vinogradov, The density hypothesis for Dirichlet L-series, Izv. Akad. Nauk SSSR Ser.
Mat. 29 (1965), 903–934, in Russian.
[56] I. M. Vinogradov, Sur la distribution des résidus et non résidus de puissances, Perm Univ.
Fiz.-Mat. Zh. 1 (1918), 18–24.
[57]
, Representation of an odd number as the sum of three primes, Dokl. Akad. Nauk
SSSR 15 (1937), 291–294, in Russian.
[58]
, A new estimate for ζ(1 + it), Izv. Akad. Nauk SSSR Ser. Mat. 22 (1958), 161–164,
in Russian.
[59] A. Zygmund, Trigonometric Series, 3rd ed., Cambridge University Press, 2002.
118
Download