LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS BEHZAD MEHRDAD AND LINGJIONG ZHU Abstract. The law of large numbers for the empirical density for the pairs of uniformly distributed integers with a given greatest common divisor is a classic result in number theory. In this paper, we study the large deviations of the empirical density. We will also obtain a sharp rate of convergence to the normal distribution for the central limit theorem. Some generalizations are provided. 1. Introduction Let X1 , . . . , Xn be the random variables uniformly distributed on {1, 2 . . . , n}. It is well known that 1 X 6 1gcd(Xi ,Xj )=` → 2 2 , ` ∈ N. (1.1) 2 n π ` 1≤i,j≤n Heuristically, observe that if gcd(Xi , Xj ) = `, then Xi , Xj ∈ C` := {`n : n ∈ N}, with probability `12 as n → ∞ and {gcd(Xi , Xj ) = `} = {Xi , Xj ∈ C` , gcd(Xi /`, Xj /`) = P∞ 2 1}. Therefore, we get the desired result by noticing that `=1 `12 = π6 . On the other hand, two independent uniformly chosen integers are coprime if and only if they do not have a common prime factor. For any prime number p, the probability that a uniformly random integer is divisible by p is p1 . Hence, we get an alternative formula, Y 1 X 6 1 1 (1.2) 1 → = 2, 1 − = gcd(Xi ,Xj )=1 n2 p2 ζ(2) π 1≤i,j≤n p∈P where ζ(·) is the Riemann zeta function and throughtout this paper P denotes the set of all the prime numbers in an a creasing order. The fact that P(gcd(Xi , Xj )) → π62 was first proved by Cesàro [2]. The identity relating the product over primes to ζ(2) in (1.2) is an example of an Euler product, and the evaluation of ζ(2) as π 2 /6 is the Basel problem, solved by Leonhard Euler in 1735. For a rigorous proof of (1.2), see e.g. Hardy and Wright [11]. For further details and properties of the distributions, moments and asymptotics for the greatest common divisors, we refer to Cesàro [3], [4], Cohen [5], Diaconis and Erdős [7] and Fernández and Fernández [9], [10]. Since the law of large numbers result is well-known, it is natural to study the fluctuations, i.e. central limit theorem and the probabilities of rare events, i.e. Date: 25 October 2013. Revised: 25 October 2013. 2000 Mathematics Subject Classification. 60F10, 60F05, 11A05. Key words and phrases. Greatest common divisors, coprime pairs, central limit theorems, large deviations, rare events. 1 2 BEHZAD MEHRDAD AND LINGJIONG ZHU large deviations. The central limit theorem was recently obtained in Fernández and Fernández [9] and we will provide the sharp rate of convergence to normal distribution. The large deviations result is the main contribution of this paper. For the readers who are interested in the probabilistic methods in number theory, we refer to the books by Elliott [8] and Tenenbaum [12]. The paper is organized in the following way. In Section 2, we state the main results, i.e. the central limit theorem and the convergence rate to the Gaussian distribution and the large deviation principle for the emirical density. The proofs for large deviation principle are given in Section 3, and the proofs for the central limit theorem are given in Section 4. 2. Main Results 2.1. Central Limit Theorem. In this section, we will show a central limit theorem and obtain the sharp rate of convergence to the normal distribution. The method we will use is based on a result by Baldi et al. [1] for Stein’s method for central limit theorems. Theorem 1. Let Z be a standard normal distribution with mean 0 and variance 1. ! P 2 6 C 1≤i,j≤n 1gcd(Xi ,Xj )=1 − n π 2 , Z ≤ 1/2 , (2.1) dT V 2σ 2 n3/2 n where C > 0 is a universal constant and Y 1 36 2 2 (2.2) σ := 1 − 2 + 3 − 4. p p π p∈P Similarly, we have the following result. Theorem 2. Let Z be a standard normal distribution with mean 0 and variance 1, and let ` ∈ N . ! P 2 6 C 1≤i,j≤n 1gcd(Xi ,Xj )=` − n `2 π 2 (2.3) dT V , Z ≤ 1/2 , 2σ 2 n3/2 n where C > 0 is a universal constant and 1 Y 1 2 36 2 (2.4) σ := 3 1 − 2 + 3 − 4 4. ` p p ` π p∈P 2.2. Large Deviation Principle. In this section, we are interested to study the following probability, X 1 (2.5) P 2 1gcd(Xi ,Xj )=` ' x , as n → ∞. n 1≤i,j≤n When x 6= show that, 1 6 `2 π 2 , this probability goes to zero as n → ∞. Indeed, later, we will (2.6) 1 P 2 n X 1gcd(Xi ,Xj )=` ' x ' e−nI` (x)+o(n) , ` ∈ N. 1≤i,j≤n In other words, this probability decays exponenailly fast as n → ∞. This pheonomenon is called large devitions in probability and the exponent I` (x) is the rate function. LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS 3 Later in this section, we will specify I` (x) for any ` ∈ N. For the moment, let us concentrate on I1 (x), the case in which we consider the number of coprime pairs. Before we proceed, let us introduce the formal definition of large deviations. A sequence (Pn )n∈N of probability measures on a topological space X satisfies the large deviation principle with rate function I : X → R if I is non-negative, lower semicontinuous and for any measurable set A, we have 1 1 (2.7) − inf o I(x) ≤ lim inf log Pn (A) ≤ lim sup log Pn (A) ≤ − inf I(x). n→∞ n x∈A n→∞ n x∈A Here, Ao is the interior of A and A is its closure. We refer to Dembo and Zeitouni [6] or Varadhan [13] for general background of large deviations and the applications. αk β1 βm 1 α2 Let Xi = pα 1 p2 · · · pk q1 · · · qm , where pj , for 1 ≤ j ≤ k are the first k prime numbers and qh are distinct primes other than pj , for 1 ≤ h ≤ m. Define Xik as αk 1 αk the restriction of Xi to its first k primes, i.e. Xik := pα 1 p2 · · · pk . As a first step to understand the event that Xi and Xj are coprime, it is easier to understand the event that Xik and Xjk are coprime. For a given prime p ∈ {p1 , . . . , pk }, we can ask whether Xik and Xjk are divisible by p or not, for any 1 ≤ i, j ≤ n. This is analogous to selecting two subsets from {1, 2, . . . , k} and consider if they have empty intersection. To illustrate, let us consider the following example. Choose n subsets Aki , 1 ≤ i ≤ n, uniformly from {1, 2, . . . , k}. Then, Aki ∩Akj = ∅ if and only if h ∈ / Aki ∩ Akj , for any h ∈ {1, 2, . . . , k}, which occurs with probability 1 1 − 22 . Therefore, the following law of large numbers holds, k 1 1 X 3k 1 → 1 − (2.8) = k. k k Ai ∩Aj =∅ 2 2 n 2 4 1≤i,j≤n Let us denote the 2k subsets of {1, 2, . . . , k} by B1 , . . . , B2k and P k := {B1 , . . . , B2k }. Then, ZZ 1 X A (2.9) 1Aki ∩Akj =∅ = fk (x, y)LA n (dx)Ln (dy), n2 x∈P k ,y∈P k 1≤i,j≤n where LA n is the empirical measure on P k , i.e. n (2.10) LA n (x) = 1X δ k (x), n i=1 Ai and fk (x, y) on P k × P k is defined as (2.11) ( 0 fk (x, y) := 1 if x ∩ y 6= ∅, otherwise. The sets Aki are i.i.d. uniformly distributed on P k . Sanov’s theorem tells us that (2.12) −nI(µ)+o(n) P(LA , n ' µ) = e where µ = (µj )1≤j≤2k ∈ M(P k ), the space of all the probability measures on P k , and the rate function is given by k (2.13) I(µ) = 2 X j=1 log µ j µj . 2−k 4 BEHZAD MEHRDAD AND LINGJIONG ZHU By contraction principle, X 1 (2.14) P 2 1Aki ∩Akj =∅ ' x ' e−nI(x)+o(n) , n 1≤i,j≤n where k (2.15) I(x) = RR inf P k ×P k fk (x,y)µ(dx)µ(dy)=x 2 X log j=1 µ j µj . 2−k Now let us go back to our original problem, but for truncated integers. we are interested in, X 1 (2.16) P 2 1gcd(Xik ,Xjk )=1 ' x ' e−nI1 (x)+o(n) . n 1≤i,j≤n Therefore, let us introduce P k again, P k := {(α1 , . . . , α2k ) : αj ∈ {0, 1}} (2.17) Similar as before, we take Aki and Akj from P k . Unlike before, now the Aki are not taken uniformly from P k . Indeed, it follows a probability distribution η k = (ηi )1≤i≤2k , where ηi are taken from the set (2.18) ( ) 1−α1 αk 1−αk α1 1 1 1 1 1− ··· 1− , αj ∈ {0, 1}, j = 1, 2, . . . , k . p1 p1 pk pk Another look at fk in 2.11 reveals that, ( 0 if xi = yi = 1 for some 1 ≤ i ≤ 2k . (2.19) fk (x, y) = 1 otherwise. Hence, we have k (2.20) I(x) = RR inf P k ×P k fk (x,y)µ(dx)µ(dy)=x 2 X i=1 log µi ηi µi . There are two issues one needs to understand. • The first problem is how to let k → ∞. We need to understand the meaning of P k and the associated probability measure when k → ∞. Later, we will see that to rigorously tackle this problem, we consider the binary expansions of a real number on [0, 1] and will define the probability measures accordingly. • In an “ideal world”, the probability that Xi is divisible by any prime number p is precisely p1 . But in our definition, Xi is uniformly distributed on {1, 2, . . . , n} and therefore the probability that it is divisible by p is [n/p] n , where [x] is the integer part of x. Later, we will define the probability measure for the “ideal world” as P̃ compared with the original probability measure P. LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS 5 Let S := (si )i∈N be a sequence of numbers on [0, 1]. We define the probability measure νkS on [0, 1], for k ∈ N, as follows. (2.21) νkS ([0, b]) := k i−1 X Y (1 − sj )(1 − si )bi , i=1 j=1 where b = 0.b1 b2 . . . is the binary expansion of b. In other words, if we draw a random variable U according to the measure νkS and consider the first k digits in the binary expansion of U , they are distributed as k Bernoulli random variables with parameters (si )ki=1 . It is easy to see that a measure ν S exists as a weak limit of νkS and let ν S be its weak limit. For example, if si = 21 , for i ∈ N, then ν S is simply the Lebesgue measure on [0, 1]. Let (pi )i∈N be the members of P in the increasing order. From now on, we work with νk and ν, for which the si is p1i ,or 1 (2.22) ν = ν P , where P := , and pi ∈ P. pi i∈N Let U1 , . . . , Un be random variables distributed with probability measure ν. Restrict each Ui to its first k digits in the binary expansion and define it as Uik . By our construction of ν and νk , we see that Uik are random variables distributed with probability measure νk . Let us see the connections amongst Xik , Aki and Uik . If we view Uik as a 0, 1 sequence of length k, there exists a bijection ψk to P k . Hence, ψk (Uik ) has the same distribution as Aki that is η k as in eq. 2.18. In addition, Xik is φk (Uik ), where the map φk : [0, 1] → N, for k ∈ N, is defined as (2.23) φk (a) := k Y pai i , i=1 where a = 0.a1 a2 a3 . . . is the binary expansion of a and pi is the ith smallest prime number. Moreover, if we define f : [0, 1]2 → {0, 1} (an extension of 2.19) as ( 0 if x and y share a common 1 in their binary expansions (2.24) f (x, y) = , 1 otherwise then, it is straightforward to see that (2.25) f (Uik , Ujk ) = 1gcd(Xik ,Xjk )=1 = 1Aki ∩Akj =∅ . We also extend ψk naturally to ψ. In that regard, define Ai := ψ(Ui ). So, (2.26) f (Ui , Uj ) = 1Ai ∩Aj =∅ . Since φk , ψk , and ψ are continuous maps, Ui ’s, Ai ’s and Xi ’s satisfy large deviations with the same rate if one of them satisfies large deviation. Moreover, the rate function should be an extension of (2.20), and this is the subject of our next and main theorem. P Theorem 3. The probability measures n12 1≤i,j≤n 1gcd(Xi ,Xj )=1 ∈ · satisfy a large deviation principle with rate function Z dµ (2.27) I1 (x) = RR inf log dµ, dν f (x,y)µ(dx)µ(dy)=x [0,1] [0,1]2 where ν and f are defined in 2.22 and 2.24, respectively. 6 BEHZAD MEHRDAD AND LINGJIONG ZHU We can also consider the following large deviation problem, X 1 (2.28) P 2 1gcd(Xi ,Xj )=` ' e−nI` (x)+o(n) . n 1≤i,j≤n Write βm ` = q1β1 q2β2 · · · qm , (2.29) where qi are distinct primes and βi are positive integers for 1 ≤ i ≤ m. For a fixed `, let p1 , . . . , pk be the smallest k primes distinct from q1 , . . . , qm . Any positive integer can be written as γm α1 k p1 · · · pα q1γ1 · · · qm k , (2.30) where γi and αj are non-negative integers. Any number on [0, 1] can be written as 0.γ1 γ2 · · · γm α1 α2 · · · αk · · · , (2.31) where γ1 , . . . , γm are obtained from ternary expansion and α1 , α2 , . . . are obtained from binary expansion. The interpretation is that if an integer is not divisible by qiβi , then γi = 0. If it is divisible by qiβi but not by qiβi +1 , then γi = 1. Finally, if it is divisible by qiβi +1 , then γi = 2. We also have αj = 0 if an integer is not divisible by pj and 1 otherwise. Restrict to the first m + k digits and define a probability measure νk that takes values α1 1−α1 αk 1−αk 1 1 1 1 (2.32) g(q1 ) · · · g(qm ) 1− ··· 1− , p1 pi pk pk where (2.33) g(qi ) = 1− 1 1 β qi i β − qi i βi1+1 qi if γi = 0 1 β +1 qi i if γi = 1 , 1 ≤ i ≤ m. if γi = 2 Let ν be the weak limit of νk . Similar to Theorem 3, we get the following result. P Theorem 4. The probability measures n12 1≤i,j≤n 1gcd(Xi ,Xj )=` ∈ · satisfy a large deviation principle with rate function Z dµ (2.34) I` (x) = RR inf dµ, log dν f (x,y)µ(dx)µ(dy)=x 2 [0,1] [0,1] where f (x, y) = 1 if 0 never occurs in the first m digits in the expansions of x and y; and x and y do not share a common 1 or 2 in their expansions. Otherwise, f (x, y) = 0. Remark 5. It is interesting to observe that π62 is also the density of square-free integers. That is because an integer is square-free if and only if it is not divisible by p2 for any prime number p. Therefore, we have the law of large numbers, i.e. n Y 1X 1 6 (2.35) 1Xi is square-free → 1 − 2 = 2. n i=1 p π p∈P LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS The central limit theorem is standard, Pn i=1 1Xi is square-free − √ (2.36) n 6n π2 →N 36 6 0, 2 − 4 π π 7 . The large deviation principle also holds with rate function x 1−x (2.37) I(x) := x log + (1 − x) log . 6/π 2 1 − 6/π 2 Remark 6. One can also generalize the result to ask what it is the probability that if we uniformly randomly choose d numbers from {1, 2, . . . , n} their greatest common divisor is 1. It is not hard to see that X Y 1 1 1 (2.38) 1gcd(Xi1 ,...,Xid )=1 → 1− d = , as n → ∞. d n p ζ(d) 1≤i1 ,...,id ≤n p∈P 2 There are d n(n − 1) · · · (n − (2d − 2)) pairs (i1 , . . . , id ) and (j1 , . . . , jd ) so that |{i1 , . . . , id } ∩ {j1 , . . . , jd }| = 1. It is also easy to see that Y 1 2 (2.39) P (gcd(X1 , . . . , Xd ) = gcd(Xd , . . . , X2d−1 ) = 1) = 1 − d + 2d−1 . p p p∈P Therefore, we have the central limit theorem. X Y 1 1 1gcd(Xi1 ,...,Xid )=1 − nd (2.40) 1− d 2d−1 p d · n 2 1≤i1 ,...,id ≤n p∈P Y 2 Y 1 1 2 → N 0, 1 − d . 1 − d + 2d−1 − p p p p∈P p∈P We also have the large deviation principle for ( n1d ·) with the rate function P 1≤i1 ,...,id ≤n 1gcd(Xi1 ,...,Xid )=1 ∈ Z (2.41) I(x) = R ··· R inf [0,1]d f (x1 ,x2 ,...,xd )µ(dx1 )···µ(dxd )=x log [0,1] dµ dν dµ, where ν is the same as in Theorem 3 and (2.42) ( 1 if x1 , . . . , xd do not share a common 1 in their binary expansions f (x1 , . . . , xd ) = . 0 otherwise 3. Proofs of Large Deviation Principle In this section, we will prove a series of lemmas and theorems of superexponential estimates that are needed in the proof of Theorem 3 before going to the proof of Theorem 3. Let us give the definitions of Yp , S(k1 , k2 ) and P̃ that will be used repeatedly throughout this section. Definition 7. For any prime number p, we define (3.1) Yp := #{1 ≤ i ≤ n : Xi is divisible by p}. Definition 8. For any k1 , k2 ∈ N, let us define (3.2) S(k1 , k2 ) := {p ∈ P : k1 < p ≤ k2 }. 8 BEHZAD MEHRDAD AND LINGJIONG ZHU Definition 9. We define a probability measure P̃ under which Xi are i.i.d. and P̃(Xi is divisible by p) = p1 for p ∈ P, p ≤ n and the events {Xi divisible by p} and {Xi divisible by q} are independent for distinct p, q ∈ P, p, q ≤ n. Lemma 10. Let Y be a Binomial random variable distributed as B(α, n). For any λ ∈ R, let λ1 := eλ . If 2αλ21 < 1 and α < 21 , then, for sufficiently large n, h λ 2i 1 log 4(n + 1) (3.3) log Ẽ e n Y ≤ 4λα2 λ41 + . n n Proof. By the definition of Binomial distribution, n h λ 2i X λi2 n i (3.4) Ẽ e n Y = α (1 − α)n−i e n i i=0 λi2 n i ≤ (n + 1) max α (1 − α)n−i e n . 0≤i≤n i Using Stirling’s formula, for any n ∈ N, 1≤ √ (3.5) Therefore, we have 0 ≤ x ≤ 1. Hence, n i n! e ≤√ . 2πn(n/e)n 2π ≤ 4enH(i/n) , where H(x) := −x log x − (1 − x) log(1 − x), (3.6) h λ 2i 1 log Ẽ e n Y n ( 2 ) log 4(n + 1) i i i i ≤ + max H + log(α) + 1 − log(1 − α) + λ . 0≤i≤n n n n n n To find the maximum of (3.7) f (x) := H(x) + x log(α) + (1 − x) log(1 − α) + λx2 , it is sufficient to look at (3.8) 0 f (x) = log α 1−α The assumptions 2αλ21 < 1 and α < 1 2 − log x 1−x + 2λx. implies that α 2αλ21 λ21 ≤ 2αλ21 ≤ . 1−α 1 − 2αλ21 (3.9) Since logarithm is an increasing function, (3.8) and (3.9) imply that f 0 (x) < 0 for any x ≥ 2αλ21 . Therefore, the maximum of f is attained at some x ≤ 2αλ21 . x In addition, since log( 1−x ) is increasing in x, the maximum of (3.10) g(x) := H(x) + x log(α) + (1 − x) log(1 − α) is achieved at x = α, which is g(α) = 0. Hence, (3.11) max f (x) = 0≤x≤1 which concludes the proof. max 0≤x≤2αλ1 f (x) ≤ 0 + λx2 ≤ λ(2αλ21 )2 , LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS 9 Theorem 11. For any k, n ∈ N sufficiently large and > 0, X 1 (3.12) log P̃ Yp2 > n2 ≤ − log(k) + 4. n 8 p∈S(k,n) Therefore, we have the following superexponential estimate, X 1 (3.13) lim sup lim sup log P̃ Yp2 > n2 = −∞. n→∞ n k→∞ p∈S(k,n) Proof. Note that Yp = #{1 ≤ i ≤ n : X̃i is divisible by p}. And whether X̃i is divisible by p is independent from X̃i being divisible by q for distinct primes p and q. In other words, Yp are independent for distinct primes p ∈ P. By Chebychev’s inequality, h λP i X 2 1 1 (3.14) log P̃ Yp2 > n2 ≤ −λ + log Ẽ e n p∈S(k,n) Yp n n p∈S(k,n) h λ 2i 1 X log Ẽ e n Yp . = −λ + n p∈S(k,n) We choose k ∈ N large enough so that λ1 = eλ < 2 2 2 2 p λ1 < k λ1 < 1. By Lemma 10, we have (3.15) 1 n X h λ 2i log Ẽ e n Yp ≤ p∈S(k,n) X p∈S(k,n) √ 2k. For k < p ≤ n, we have log(4(n + 1)) + 4λ n 2 1 λ41 . p Prime number theorem states that (3.16) lim x→∞ π(x) = 1, x/ log(x) where π(x) denotes the number of primes less than x. Therefore, |k < p ≤ n, p ∈ 2n P| ≤ log n for sufficiently large n. Together with (3.15), for sufficiently large n, we get h λP i X 2 1 2n log 4(n + 1) 1 (3.17) log Ẽ e n k<p≤n,p∈P Yp ≤ + 4λλ41 n log n n p2 k<p≤n,p∈P ≤3+ 4λλ41 X 1 `2 `>k 1 ≤ 3 + 4λλ41 . k Plugging (3.17) into (3.14), we get X 1 4λλ41 (3.18) log P̃ Yp2 > n2 ≤ −λ + 3 + n k k<p≤n,p∈P 4λ41 =3−λ − . k 10 BEHZAD MEHRDAD AND LINGJIONG ZHU 4λ4 We can choose λ = 14 log(k) − 3 so that k 1 < 2 and it does not violate with our √ earlier assumption that λ1 < 2k for large k. Hence, X 1 log(k) (3.19) log P̃ Yp2 > n2 ≤ 3 − + 3 n 8 k<p≤n,p∈P ≤4− log(k) , 8 which yields the desired result. Lemma 12. For k1 , k2 ∈ N sufficiently large, X log(k1 ) (3.20) P . Yp2 > n2 ≤ 4 log log k2 + 4 − 8 S(k1 ,k2 ) Proof. The idea of the proof is to change the P measure from P to P̃ and apply Lemma 10. First, observe that F (X1 , . . . , Xn ) = p∈S(k1 ,k2 ) Yp2 only depends on the events {Xi ∈ Ep1 ,...,p` }, where i, ` ∈ {1, 2, . . . , n} and {p1 , . . . , p` } ⊂ S(k1 , k2 ) and (3.21) Ep1 ,...,p` := {i ∈ {1, 2, . . . , n}|Prime(i) ∩ S(k1 , k2 ) = {p1 , . . . , p` }} , where Prime(x) := {q ∈ P : x is divisible by q}. We will show that the following uniform upper bound holds, (3.22) P(X1 ∈ Ep1 ,...,p` ) ≤ e4 log log k2 . P̃(X1 ∈ Ep1 ,...,p` ) Before we proceed, let us show that (3.22) and Theorem 11 implies (3.20). Since Xi ’s are independent and X̃i ’s are independent, P Xi ∈ Epi1 ,...,pi` , 1 ≤ i ≤ n n ≤ e4 log log k2 , (3.23) P̃ Xi ∈ Epi1 ,...,pi` , 1 ≤ i ≤ n where {pi1 , . . . , pi` } ⊂ S(k1 , k2 ) for 1 ≤ i ≤ n. Recall that F only depends on events {Xi ∈ Ep1 ,...,p` }, therefore, (3.24) 1 1 log P(F > n2 ) ≤ 4 log log k2 + log P̃(F > n2 ) n n log(k1 ) ≤ 4 log log k2 + 4 − , 8 where we used Theorem 11 at the last step. Now, let us prove (3.22). First, let us give an upper bound for the numerator, that is, h i n p ···p 1 1 1 ` (3.25) P (X1 ∈ Ep1 ,...,p` ) = #|Ep1 ,...,p` | ≤ ≤ , n n p1 · · · p` where [x] denotes the largest integer less or equal to x and we used the simple fact that [x] x ≤ 1 for any positive x. LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS 11 As for the lower bound for the denominator, we have Y Y 1 1 P̃ (X1 ∈ Ep1 ,...,p` ) = (3.26) 1− q q q∈{p1 ,...,p` } q∈S(k1 ,k2 )\{p1 ,...,p` } Y Y 1 1 ≥ 1− q q q∈{p1 ,...,p` } ≥ Y q∈{p1 ,...,p` } q∈S(k1 ,k2 ) 1 −2 Pq∈S(k ,k ) q1 1 2 e , q where we used the inequality that 1 − x ≥ e−2x for x ≤ 21 . Notice that X 1 (3.27) lim − + log log n = M, n→∞ q q∈S(1,n) where M = 0.261497 . . . is the Meissel-Mertens constant. Therefore, for suffciently large k2 , X 1 X 1 ≤ ≤ 2 log log k2 . (3.28) q q q∈S(k1 ,k2 ) q∈S(1,k2 ) Combining (3.25), (3.26) and (3.28), we have proved the upper bound in (3.22). Lemma 13. Let pj , 1 ≤ j ≤ `, ` ∈ N be the primes such that S(k1 , k2 ) = {p1 , . . . , p` } and Y Y (3.29) m pj ≤ n < (m + 1) pj , 1≤j≤` 1≤j≤` where m ∈ N. Then, there exists a coupling of vectors of random variables Xi and X̃i for 1 ≤ i ≤ n, i.e. a measure µ with marginal distributions the same as Xi and X̃i such that n 2k X X 2 1 2 2 n 2 Ỹq ≥ n ≤ 2 (3.30) µ Yq − . m q∈S(k1 ,k2 ) q∈S(k1 ,k2 ) Proof. The main ingredient of the proof is the Chinese Remainder Theorem which states that the set of equations x ≡ a1 mod(p1 ) .. (3.31) . x ≡ a mod(p ) ` ` has a unique solution 1 ≤ x ≤ p1 · · · p` , where 0 ≤ ai < pi , i ∈ {1, 2, . . . , `}. Hence, for each sequence of ai ’s, the set of equations in (3.31) has exactly m solutions for 1 ≤ x ≤ mp1 · · · p` . We denote these solutions by Ri (a1 , . . . , a` ) for i ∈ {1, 2, . . . , m}. Given Xi uniformly distributed on {1, 2, . . . , n}, we define X̃i as follows. We generate Bernoulli random variables cj for 1 ≤ j ≤ `, with parameters 1 pj and independent of each other. Now, define ( pc11 · · · pc`` if Xi > mp1 · · · p` (3.32) X̃i = , pb11 · · · pb`` otherwise 12 BEHZAD MEHRDAD AND LINGJIONG ZHU where bj is 1 if Xi is divisible by pj and 0 otherwise for 1 ≤ j ≤ `. By the c definition, if we condition on Xi > mp1 · · · p` , X̃i is the multiplication of pjj and cj ’s are independent. Now, conditional on Xi ≤ mp1 · · · p` and let Prime(Xi ) = → − {p ∈ P : Xi is divisible by p}. Thus, for a vector b = (bj )`j=1 ∈ {0, 1}` , we have ` Y ∆ := µ X̃i = (3.33) b pjj |Xi ≤ mp1 · · · p` j=1 → − = µ Prime(Xi ) ∩ {p1 , . . . , p` } = S( b ) , → − where S( b ) := {pj |bj = 1, 1 ≤ j ≤ `}. But that is equivalent to (3.34) ∆= = #{Ri (a1 , . . . , a` )|aj = 0 if and only if bj = 0, 1 ≤ i ≤ m} mp1 · · · p` Q m bj 6=0 (pj − 1) mp1 · · · p` Y 1 Y 1 1− = . pj pj j:bj 6=0 j:bj =0 Therefore, we get (3.35) µ Xi = ` Y b Y pj j = j=1 j:bj =0 1 Y 1 1− . pj pj j:bj 6=0 Let us define ( 1 if {Prime(Xi ) ∩ S(k1 , k2 )} = 6 {Prime(X̃i ) ∩ S(k1 , k2 )} (3.36) g(Xi , X̃i ) := . 0 otherwise → − → − 1 since By the definition of the coupling of X and X̃ , we have P(g(Xi , X̃i ) = 1) ≤ m the event g(Xi , X̃i ) = 1 implies that Xi > mp1 · · · p` which occurs with probability (3.37) n − mp1 p2 · · · p` mp1 p2 · · · p` 1 1 ≤1− < . = n (m + 1)p1 · · · p` m+1 m Now, let us go back to prove the superexponential bound in (3.30). Observe that (3.38) f (X1 , . . . , Xn ) := X q∈S(k1 ,k2 ) Yq2 = X q∈S(k1 ,k2 ) Yq + X X q∈S(k1 ,k2 ) i6=j Hence, (3.39) X 2 2 Yq − Ỹq ≤ #{i|g(Xi , X̃i ) = 1}2k2 n. q∈S(k1 ,k2 ) 1q|gcd(Xi ,Xj ) . LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS 13 That is because if we change one of Xi ’s, the function f (X1 , . . . , Xn ) changes by at most k2 (n + 1) ≤ 2k2 n. Therefore, X µ (3.40) Yq2 − Ỹq2 ≥ n2 q∈S(k1 ,k2 ) ≤ µ 2k2 n#{i|g(Xi , X̃i ) = 1} ≥ n2 = µ #{i|g(Xi , X̃i ) = 1} ≥ n . 2k2 Pn Notice that #{i|g(Xi , X̃i ) = 1} = i=1 1g(Xi ,X̃i )=1 is the sum of i.i.d. indica1 tor functions and µ(g(X1 , X̃1 ) = 1) ≤ m . Hence, by Chebychev’s inequality, by choosing θ = log m > 0, we have (3.41) µ #{i|g(Xi , X̃i ) = 1} ≥ n 2k2 h in ≤ E eθ1g(X1 ,X̃1 )=1 e−θn 2k2 n θ e + 1 e−θn 2k2 ≤ m ≤ 2n e−(log m)n 2k2 which yields the desired result. Theorem 14. For any > 0, we have the following superexponential estimates, X 1 Yq2 > n2 = −∞. (3.42) lim sup lim sup log P n n→∞ k→∞ q∈S(k,n) Proof. Let us write X (3.43) Yq2 = q∈S(k,n) X Yq2 + q∈S(k,M1 ) 120 X q∈S(M1 ,M2 ) Yq2 + X Yq2 , q∈S(M2 ,n) 120 where M1 := [log log n] and M2 := [log n] . By Lemma 12, for the second and third terms in (3.43), we have 2 X 1 n (3.44) log P Yq2 > n 3 q∈S(M1 ,M2 ) ≤ 4 log log M2 + 4 − log M1 8 3 120 120 = 4 log(log([log n] )) + 4 − log([log(log(n))] ) 24 120 = 4 log log + log log n + 4 − 5 log log log n 120 = 4 log log + 4 − log log log n, 14 BEHZAD MEHRDAD AND LINGJIONG ZHU and similarly, 1 log P n (3.45) X q∈S(M2 ,n) 2 n ≤ − log(log n) + 4. Yq2 > 3 In addition, for the first term in (3.43), by Lemma 13, we get X 2 X n 1 (3.46) Yq2 − Ỹq2 > ≤ log 2 − log µ log M0 , n 6 12M 1 q∈S(k,M1 ) q∈S(k,M1 ) where (3.47) n M0 := Q q∈S(k,M1 ) q n ≥ M1 M1 120 120 = exp log(n) − (log log n) log log log n . By Theorem 11, (3.48) 1 log P̃ n X p∈S(k,M1 ) 2 n ≤ − log(k) + 4. Yp2 ≥ 6 48 Combining (3.44), (3.45), (3.46), (3.47) and (3.48), we get the desired result. Finally, we are ready to prove Theorem 3. Proof of Theorem 3. Let Ln , Lkn be the empirical measures of Ui , Uik , i.e. n 1X δU (x), Ln (x) := n i=1 i (3.49) and n Lkn (x) := (3.50) 1X δ k (x). n i=1 Ui By Sanov’s theorem, see Dembo and Zeitouni [6], Ln satisfies a large deviation principle on M[0, 1], the space of probability measures on [0, 1], equipped with the weak topology and the rate function (R dµ log dν dµ if µ ν [0,1] (3.51) I(µ) = . +∞ otherwise For a ∈ [0, 1] and i ∈ N, we define (3.52) χi (a) = the ith digit in the binary expansion of a. We also redefine fk , f : [0, 1]2 → {0, 1} from 2.19 and 2.24, for k ∈ N, as follows (3.53) fk (x, y) = 1 − max χi (x)χi (y), 1≤i≤k and f (x, y) := 1 − max χi (x)χi (y). i∈N In other words, f is 1 if x and y do not share a common 1 at the same place in their binary expansions and f is 0 otherwise. Similar interpertation holds for fk . Clearly, fk ≥ f and limk→∞ fk (x, y) = f (x, y). Again, let ν be the probability LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS 15 measure on [0, 1] such that for a random variable x with measure ν, χi (x) are i.i.d. Bernoulii random variables with parameters p1i , where pi is the ith smallest prime number. Let αk := {α ∈ [0, 1]|χi (α) = 0 for i > k} be the set of numbers on [0, 1] with k-digit binary expansion. We define (3.54) Let Fk (µ) := have Aα := {x ∈ [0, 1]|χi (α) = χi (x), 1 ≤ i ≤ k}. RR f (x, y)dµ(x)dµ(y) and F (µ) := [0,1]2 f (x, y)dµ(x)dµ(y). We [0,1]2 k RR (3.55) Fk (µ) = X fk (α, β)µ(Aα )µ(Aβ ). α,β∈Ak Hence the map µ 7→ Fk (µ) is continuous, i.e. for µn → µ in the weak topology, Fk (µn ) → Fk (µ). By contraction principle, Ln ◦ Fk−1 satisfies a large deviation principle with good rate function Z dµ (k) dµ. log (3.56) I (x) = RR inf dν f (x,y)dµ(x)dµ(y)=x [0,1] [0,1]2 k Moreover, in Theorem 11, we proved that ! ZZ 1 (fk − f )dLn (x)dLn (y) ≥ δ = −∞, (3.57) lim sup lim sup log P n→∞ n k→∞ [0,1]2 for any δ > 0. Thus the family {Ln ◦Fk−1 } are exponentially good approximation of {Ln ◦ F −1 }. Now, by Theorem 4.2.16 in Dembo and Zeitouni [6], Ln ◦ F −1 satisfies a weak large deviation principle with the rate function (3.58) I1 (x) = sup lim inf inf δ>0 k→∞ |w−x|<δ I (k) (w). Since the interval [0, 1] is compact, Ln ◦F −1 satisfies the full large deviation principle with good rate function I1 (x) as above and it is easy to check that Z dµ (3.59) I1 (x) = RR inf log dµ. dν f (x,y)µ(dx)µ(dy)=x [0,1] [0,1]2 Pn For any p ∈ P, let us recall that Yp = i=1 1p|Xi and for any k1 , k2 ∈ N, S(k1 , k2 ) = {p ∈ P : k1 < p ≤ k2 }. By Theorem 11, we have X 1 1 1gcd(Xik ,Xjk )=1 − 1gcd(Xi ,Xj )=1 ≥ (3.60) lim sup lim sup log P̃ 2 n n n→∞ k→∞ 1≤i,j≤n X X 1 1 1p|Xi ,p|Xj ≥ = lim sup lim sup log P̃ 2 n n n→∞ k→∞ 1≤i,j≤n p∈S(k,n) X 1 1 ≤ lim sup lim sup log P̃ 2 Yp2 ≥ n n→∞ n k→∞ p∈S(k,n) = −∞. 16 BEHZAD MEHRDAD AND LINGJIONG ZHU Next, notice that the difference between ( n12 P 1≤i,j≤n 1gcd(Xik ,Xjk )=1 ∈ ·) under P̃ and P̃ is superexponentially small by Lemma 13. Finally, by Theorem 14, (3.61) X 1 1 1gcd(Xik ,Xjk )=1 − 1gcd(Xi ,Xj )=1 ≥ = −∞. lim sup lim sup log P 2 n n→∞ n k→∞ 1≤i,j≤n This implies that (3.62) X 1 1 − inf o I1 (x) ≤ lim inf log P 2 1gcd(Xi ,Xj )=1 ∈ A n→∞ n x∈A n 1≤i,j≤n X 1 1 ≤ lim sup log P 2 (3.63) 1gcd(Xi ,Xj )=1 ∈ A ≤ − inf I1 (x). n n→∞ n x∈A 1≤i,j≤n 4. Proofs of Central Limit Theorem Proof of Theorem 1. Instead of summing over 1 ≤ i, j ≤ n, we only need to consider 1 ≤ i 6= j ≤ n. The reason is because if i = j, then gcd(XP i , Xi ) = 1 if and only n 1 therefore if Xi = 1 which occurs with probability n1 and i=1 1gcd(X 2n3/2 P Pi ,Xi )=1 is negligible in the limit as n → ∞. Moreover, 1≤i6=j≤n 1gcd(Xi ,Xj )=1 = 2 1≤i<j≤n 1gcd(Xi ,Xj )=1 and we can therefore concentrate on 1 ≤ i < j ≤ n. Let us define aij = 1gcd(Xi ,Xj )=1 for 1 ≤ i < j ≤ n. aij have the same distribution and let αn be the mean of a12 . Then, we have Y 1 (4.1) αn = E[a12 ] = P(gcd(X1 , X2 ) = 1) → 1− 2 , p p∈P P as n → ∞. Define ãij := aij − αn and W = (i,j)∈I ãij , where the sum is taken over the set I that is all the pairs of i, j ∈ [n] and i < j. Therefore, 2 X XX (4.2) σn2 := Var(W ) = E ãij = E[ãij ãk` ]. (i,j)∈I (i,j) (`,k) Note that if the intersection of {i, j} and {k, `} is empty, then ãij and ãk` are independent and E[ãij ãk` ] = E[ãij ]E[ãk` ] = 0. The remaining two cases are either {i, j} = {k, `} or |{i, j} ∩ {k, `}| = 1. For the former, we have (4.3) E[ãij ãij ] = E[a2ij ] − αn2 = E[aij ] − αn2 = αn − αn2 . For the latter, assuming without loss of generality that i = k and j 6= `, we get (4.4) E [ãij ãk` ] = E [aij ai` ] − αn2 = P (gcd(Xi , Xj ) = gcd(Xi , X` ) = 1) − αn2 = βn − αn2 , where βn := P (gcd(Xi , Xj ) = gcd(Xi , X` ) = 1). LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS 17 Let p be a prime number and X̃1 , X̃2 , X̃3 be three i.i.d. integer valued random variables so that X̃1 is divisible by p with probability p1 . Then, by inclusionexclusion principle, Y 2 1 (4.5) P(gcd(X̃1 , X̃2 ) = gcd(X̃1 , X̃3 ) = 1) = 1− 2 + 3 . p p p∈P It is easy to see that Y 2 1 βn → 1− 2 + 3 , p p (4.6) p∈P as n → ∞. We have n2 pairs that {i, j} = {k, `} and 3×2× n3 pairs that |{i, j}∩{k, `}| = 1. (We pick three numbers from 1 to n. Then we pick one of them to be duplicated, say i. Finally, we have two pairs as (i, j)(i, k) and (i, k)(i, j)). Thus n n 2 2 (4.7) σn = (αn − αn ) + 3 · 2 · (βn − αn2 ), 2 3 and we have (4.8) σn2 → Y p∈P 1− 1 2 + 3 p2 p − 36 , π4 as n → ∞. Now, our goal is P to use the general theorem for random dependency graphs to prove that W = σ1n (i,j)∈I ãij converges to a standard normal random variable. We have a collection of dependent random variables (ãij )(i,j)∈I . We say ãij and ãk` are neighbors if they are dependent, i.e. {i, j} ∩ {k, `} = 6 ∅. Let N (i, j) = {neighbors of (i, j)} ∪ {(i, j)}. Hence, N (i, j) has D = 2n − 5 elements. In addition, let Z be a standard normal random variable. By a result of Baldi et al. [1], we have (4.9) dT V (W, Z) = sup |P(W ≤ t) − P(Z ≤ t)| t √ s X D2 X 2σn D3/2 ≤ 3 E|ãij |4 . E|ãij |3 + √ σn π σn2 (i,j)∈I (i,j)∈I Note that ãij is bounded by 1. Thus, using (4.7), we have s √ D2 n 2σn D3/2 n dT V (W, Z) ≤ 3 (4.10) + √ 2 σn 2 2 π σn (2n)3/2 n (2n)2 · n2 + 5 σn3 σn2 C ≤ 1/2 . n where C is a universal constant. ≤ Acknowledgements The authors are very grateful to Professor S. R. S. Varadhan for helpful discussions and generous suggestions. 18 BEHZAD MEHRDAD AND LINGJIONG ZHU References [1] Baldi, P., Rinott, Y. and C. Stein. (1989). A normal approximation for the number of local maxima of a random function on a graph. In Probability, Statistics, and Mathematics. Papers in Honor of Samuel Karlin (T. W. Anderson, K. B. Athreya and D. L. Iglehart, eds.) 59-81. Academic Press, Boston. [2] Cesàro, E. (1881). Démonstration élémentaire et généralisation de quelques thérèmes de M. Berger. Mathesis. 1, 99-102. [3] Cesàro, E. (1885). Étude moyenne du plus grand commun diviseur de deux nombres. Annali di Matematica Pura ed Applicata. 13, 235-250. [4] Cesàro, E. (1885). Sur le plus grand commun diviseur de plusieurs nombres. Annali di Matematica Pura ed Applicata. 13, 291-294. [5] Cohen, E. (1960). Arithmetical functions of a greatest common divisor. I. Proc. Amer. Math. Soc. 11, 164-171. [6] Dembo, A. and O. Zeitouni. Large Deviations Techniques and Applications, 2nd Edition, Springer, New York, 1998. [7] Diaconis, P. and P. Erdős. On the distribution of the greatest common divisor. Technical Report No. 12. Stanford University, 1977. Reprinted in A Festschrift for Herman Rubin, 5661. Lecture Notes, Monograph Series, Vol. 45, Institute of Mathematical Statistics, 2004. [8] Elliott, P. D. T. A. Probabilistic Number Theory, Volume I and II, Springer-Verlag, New York, 1980. [9] Fernández, J. L. and P. Fernández. Asymptotic normality and greatest common divisors. Preprint, 2013. [10] Fernández, J. L. and P. Fernández. On the probability distribution of gcd and lcm of r-tuples of integers. Preprint, 2013. [11] Hardy, G. H. and E. M. Wright. An Introduction to the Theory of Numbers, 6th Edition, Oxford University Press, 2008. [12] Tenenbaum, G. Introduction to Analytic and Probabilistic Number Theory, Cambridge Studies in Advanced Mathematics, Cambridge University Press, 1995. [13] S. R. S. Varadhan. Large Deviations and Applications, SIAM, Philadelphia, 1984. Courant Institute of Mathematical Sciences New York University 251 Mercer Street New York, NY-10012 United States of America E-mail address: mehrdad@cims.nyu.edu, ling@cims.nyu.edu