o u r nal o f J on Electr i P c r o ba bility Electron. J. Probab. 17 (2012), no. 35, 1–22. ISSN: 1083-6489 DOI: 10.1214/EJP.v17-2026 An asymptotically Gaussian bound on the Rademacher tails∗ Iosif Pinelis† Abstract An explicit upper bound on the tail probabilities for the normalized Rademacher sums is given. This bound, which is best possible in a certain sense, is asymptotically equivalent to the corresponding tail probability of the standard normal distribution, thus affirming a longstanding conjecture by Efron. Applications to sums of general centered uniformly bounded independent random variables and to the Student test are presented. Keywords: probability inequalities; large deviations; Rademacher random variables; sums of independent random variables; Student’s test; self-normalized sums; Esscher–Cramér tilt transform; generalized moments; Tchebycheff–Markov systems. AMS MSC 2010: Primary 60E15, Secondary 60F10; 62G10; 62G15; 60G50; 62G35. Submitted to EJP on September 30, 2011, final version accepted on May 15, 2012. 1 Introduction, summary, and discussion Let ε1 , . . . , εn be independent Rademacher random variables (r.v.’s), so that P(εi = 1) = P(εi = −1) = 12 for all i. Let a1 , . . . , an be any real numbers such that a21 + · · · + a2n = 1. (1.1) Let Sn := a1 ε1 + · · · + an εn be the corresponding normalized Rademacher sum. Let Z denote a standard normal 2 r.v., with the density function ϕ, so that ϕ(x) = √1 e−x /2 for all real x. 2π Upper bounds on the tail probabilities P(Sn > x) have been of interest in combinatorics/optimization/operations research; see e.g. [32, 2, 16, 17, 3, 26] and bibliography therein. Other authors, including Bennett [4], Hoeffding [30], and Efron [22], were mainly interested in applications in statistics. The present paper too was motivated in part by statistical applications in [62]. ∗ Supported in part by NSF grant DMS-0805946 and NSA grant H98230-12-1-0237. † Michigan Technological University, USA. E-mail: ipinelis@mtu.edu Gaussian-Rademacher bound A particular case of a well-known result by Hoeffding [30] is the inequality P(Sn > x) 6 e−x 2 /2 (1.2) for all x > 0. Obviously related to this is Khinchin’s inequality — see e.g. survey [53]; for other developments, including more recent ones, see e.g. [43, 37, 52, 90]. Papers [65, 73] contain multidimensional analogues of an exact version of Khinchin’s inequality, whereas [72] presents their extensions to multi-affine forms in ε1 , . . . , εn (also known as Rademacher chaoses) with values in a vector space. Latała [42] gave bounds on moments and tails of Gaussian chaoses; Berry–Esseen-type bounds for general chaoses were recently obtained by Mossel, O’Donnell, and Oleszkiewicz [49]. For other kinds of improvements/generalizations of the inequality (1.2) see the recent paper [1] and bibliography there. While easy to state and prove, bound (1.2) is, as noted by Efron [22], “not sharp enough to be useful in practice”. Exponential inequalities such as (1.2) are obtained by finding a suitable upper bound (say E(t)) on the exponential moments E etSn and then minimizing the Markov bound e−tx E(t) on P(Sn > x) in t > 0. The best exponential bound of this kind on the standard normal tail probability P(Z > x) is inf t>0 e−tx E etZ = e−x 2 , for any x > 0. Thus, a factor of the order of magnitude of x1 is “missing" in this bound, compared with the asymptotics P(Z > x) ∼ x1 ϕ(x) as x → ∞; cf. the result by Talagrand [84]. Now it should be clear that any exponential upper bound on the tail probabilities for sums of independent random variables must be missing the x1 factor. The problem here is that the class of exponential moment functions is too small. Eaton [19] obtained the moment comparison E f (Sn ) 6 E f (Z) for a much richer class of moment functions f , which enabled him [20] to derive an upper bound on P(Sn > x), which is asymptotic to c3 P(Z > x) as x → ∞, where /2 c3 := 2e3 9 = 4.4634 . . . . Eaton further conjectured that P(Sn > x) 6 c3 ϕ(x)/x for x > this conjecture, √ P(Sn > x) 6 c P(Z > x) 2. The stronger form of (1.3) for all x ∈ R with c = c3 was proved by Pinelis [65], along with a multidimensional extension, which generalized results of Eaton and Efron [18]. Various generalizations and improvements of inequality (1.3) as well as related results were given by Pinelis [66, 67, 70, 74, 76, 77, 79, 57] and Bentkus [6, 7, 9]. Clearly, as pointed out e.g. in [10], the constant c in (1.3) cannot be less than P c∗ := √1 (ε1 2 + ε2 ) > √ P(Z > 2) √ 2 = 3.1786 . . . , (1.4) which may be compared with c3 . Bobkov, Götze and Houdré (BGH) [11] gave a simple proof of (1.3) with a constant factor c ≈ 12.01. Their method was based on the ChapmanKolmogorov identity for the Markov chain (Sn ). Such an identity was used, e.g., in [68] concerning a conjecture by Graversen and Peškir [24] on maxk6n |Sk |. Pinelis [78] showed that a modification of the BGH method can be used to obtain inequality (1.3) with a constant factor c ≈ 1.01 c∗ ≈ 3.22. Bentkus and Dzindzalieta [8] recently closed the gap by proving that c∗ is indeed the best possible constant factor c in (1.3); they used the Chapman-Kolmogorov identity together with the Berry-Esseen bound and a new extension of the Markov inequality. Bentkus and Dzindzalieta [8] also obtained the inequality p √ P(Sn > x) 6 41 + 81 1 − 2 − 2/x2 for x ∈ (1, 2 ], (1.5) EJP 17 (2012), paper 35. ejp.ejpecp.org Page 2/22 Gaussian-Rademacher bound 5 . whereas Holzman and Kleitman [32] proved that P(Sn > 1) 6 16 We should also like to mention another kind of result, due to Montgomery-Smith [48], who obtained an upper bound on ln P(Sn > x) and a matching lower bound on ln P(Sn > Cx) for some absolute constant C > 0; these bounds depend on x > 0 and on the sequence (a1 , . . . , an ) and differ from each other by no more than an absolute constant factor; the constants were improved by Hitczenko and Kwapien [27]. As was pointed out by the referee, whereas the normal-tail-like bounds obtained in the present paper and its predecessors including [30, 20, 65, 78] will usually work better when the ai ’s are fairly balanced, bounds such as the ones obtained in [48] can be advantageous otherwise, when the ai ’s significantly differ in magnitude from one another. Indeed, the bounds given in [48] are expressed in terms of an interpolation norm of (a1 , . . . , an ), which is equivalent (up to a universal constant factor) to an expression based on splitting the ai ’s into two groups according to the absolute values of the ai ’s. The result of [48] was extended to sums of general independent zero-mean r.v.’s in [29], and the latter work was also motivated in part by that of Latała [41]. The proof in [48] was in part based on an extension of the improvement of Hoffmann-Jørgensen’s inequality [31] found by Klass and Nowicki [34]. More recent developments in this direction are given in [35, 36]. In the mentioned paper [22], Efron conjectured that there exists an upper bound on the tail probability P(Sn > x) which behaves as the corresponding standard normal tail P(Z > x), and he presented certain facts in favor of this conjecture. Efron’s conjecture suggests that even the best possible constant factor c = c∗ = 3.17 . . . in (1.3) is excessive for large x; rather, for such x the ratio of a good bound on P(Sn > x) to P(Z > x) should be close to 1. Theorem 1.1 below provides such a bound, of simple and explicit form. Another well-known conjecture, apparently due to Edelman [80, 21], is that P(Sn > x) 6 supn>1 P √1 n (ε1 + · · · + εn ) > x (1.6) for all x > 0; that is, the conjecture is that the supremum of P(Sn > x) over all finite sequences (a1 , . . . , an ) satisfying condition (1.1) is the same as that over all such (a1 , . . . , an ) with equal ai ’s; cf. the above discussion concerning the result by MontgomerySmith [48] vs. normal-tail-like bounds. Conjecture (1.6) was recently disproved; see [92, 59]. Another two known and interesting conjectures are that P(Sn > 1) 6 41 [32, 2, 26] 7 and that P(Sn > 1) > 64 [13, 28, 51, 89]. The main result of the present paper is Theorem 1.1. For all real x > 0 P(Sn > x) 6 Q(x) := P(Z > x) + where Cϕ(x) C < P(Z > x) 1 + , 9 + x2 x (1.7) √ C := 5 2πe P(|Z| < 1) = 14.10 . . . . (1.8) Remark 1.2. The constant factor C is the best possible in the sense that the first inequality in (1.7) turns into the equality when x = n = 1. It would be of interest to find the optimal value of C if the constant 9 in the denominator in (1.7) is replaced by a significantly smaller positive value, say c. Then it could be possible to replace the 1 constant C by a smaller value. At that, the factor c+x 2 would be decreasing faster than 1 9+x2 , ∂ 1 2 especially when x > 0 is not too large – since the “rate” ∂x ln c+x = c/x+x is 2 greater for smaller c > 0. However, such a quest appears to entail further significant technical complications. Also, it is an open (and apparently very difficult) problem EJP 17 (2012), paper 35. ejp.ejpecp.org Page 3/22 Gaussian-Rademacher bound Cϕ(x) whether the asymptotic rate of decrease of the “extra” term 9+x2 as x → ∞ is the best possible one. Such questions appear to be related to the open problems stated at the end of [59]. It is hoped that these matters will be addressed in subsequent studies. Using e.g. part (II) of Proposition 3.1 (in Section 3 of this paper), it is easy to see that the ratio of the bound Q(x) in (1.7) to P(Z > x) increases from ≈ 2.25 to ≈ 3.61 and then decreases to 1 as x increases from 0 to ≈ 2.46 to ∞, respectively. Figure 1 presents a graphical comparison of this ratio, Q(x)/ P(Z > x), with (i) the best possible constant factor c = c∗ ≈ 3.18 in (1.3); (ii) the level 1, which is asymptotic (as x → ∞) to the ratio of either one of the two bounds in (1.7) to P(Z > x), and hence, by the central limit theorem, is also asymptotic to the ratio of the supremum of P(Sn > x) (over all normalized Rademacher sums Sn ) to P(Z > x); (iii) the ratio of Hoeffding’s bound e−x 2 /2 to P(Z > x). In Figure 1, the graph of the latter ratio looks like a steep straight line (and asymptotically, for large x, is a straight line), most of which is outside the vertical range of the picture, thus showing how much the bounds c∗ P(Z > x) and Q(x) improve the 2 Hoeffding bound e−x /2 . ratios 8 6 4 2 x 0 5 10 15 20 25 30 35 2 Figure 1: Ratio Q(x)/ P(Z > x) (thick solid) compared with the ratio e−x /2 / P(Z > x) (solid, steeply upwards), as well as with the levels 1 (dashed) and c∗ ≈ 3.18 (dotted) In view of the main result of Bentkus [5], one immediately obtains the following corollary of Theorem 1.1. Corollary 1.3. Let X, X1 , . . . , Xn be independent identically distributed r.v.’s such that P(|X| 6 1) = 1 and E X = 0. Then P X + · · · + X 1 n √ > x 6 2Q̂n (x) n for all real x > 0, where Q̂n is the linear interpolation of the restriction of the function Q to the set √2n ( n2 − b n2 c + Z). Here we shall present just one more application of Theorem 1.1, to the self-normalized sums X1 + · · · + Xn Vn := p 2 , X1 + · · · + Xn2 EJP 17 (2012), paper 35. ejp.ejpecp.org Page 4/22 Gaussian-Rademacher bound where, following Efron [22], we assume that the Xi ’s satisfy the so-called orthant symmetry condition: the joint distribution of s1 X1 , . . . , sn Xn is the same for any choice of signs s1 , . . . , sn ∈ {1, −1}, so that, in particular, each Xi is symmetrically distributed. It suffices that the Xi ’s be independent and symmetrically (but not necessarily identically) distributed. In particular, Vn = Sn if Xi = ai εi for all i. It was noted by Efron that (i) Student’s statistic Tn is a monotonic function of the so-called self-normalized q p Vn / 1 − Vn2 /n and (ii) the orthant symmetry implies in general that the distribution of Vn is a mixture of the distributions of normalized Rademacher sums Sn . Thus, one obtains sum: Tn = n−1 n Corollary 1.4. Theorem 1.1 holds with Vn in place of Sn . Note that many of the most significant advances concerning self-normalized sums are rather recent; e.g., a necessary and sufficient condition for their asymptotic normality was obtained only in 1997 by Giné, Götze, and Mason [23]. It appears natural to compare the probability inequalities given in Theorem 1.1 with limit theorems for large deviation probabilities. Most of such theorems, referred to as large deviation principles (LDP’s), deal with logarithmic asymptotics, that is, asymptotics of the logarithm of small probabilities; see e.g. [15]. As far as the logarithmic asymptotics is concerned, the mentioned the Hoeffd bounds c∗P(Z > x) and Q(x) and 2 2 ing bound e−x /2 are all the same: ln c∗ P(Z > x) ∼ ln Q(x) ∼ ln e−x /2 = −x2 /2 as x → ∞; yet, as we have seen, at least the first two of these bounds are vastly different from the Hoeffding bound, especially from the perspective of statistical practice. Results on the so-called exact asymptotics for large deviations (that is, asymptotics for the small probabilities themselves, rather than for their logarithms) are much fewer; see e.g. [15, Theorem 3.7.4] and [54, Ch. VIII]. Note that the inequalities in (1.7) hold for all x > 0, and, a priori, the summands ai εi do not have to be identically or nearly identically distributed; cf. conjecture (1.6). In contrast, almost all limit theorems for large deviations in the literature – whether with exact or logarithmic asymptotics – hold only √ for x = O( n ), with n being the number of identically or quasi-identically distributed (usually independent or nearly independent) random summands; the few exceptions here include results of the papers [50, 63, 64, 69, 91] and references therein, where the √ restriction x = O( n ) is not imposed and x is allowed to be arbitrarily large. In general, observe that a limit theorem is a statement on the existence of an inequality, not yet fully specified, as e.g. in “there exists some n0 such that |xn − x| < ε for all n > n0 ”; as such, a limit theorem cannot provide a specific bound. Of course, being less specific, limit theorems are applicable to objects of much greater variety and complexity, and limit theorems usually provide valuable initial insight. Yet, it seems natural to suppose that the tendency, say in the studies of large deviation probabilities, will be to proceed from logarithmic asymptotics to asymptotics of the probabilities themselves and then on to exact inequalities. We appear to be largely at the beginning of this process, still struggling even with such comparatively simple objects as the Rademacher sums – the simplicity of which is only comparative, as the discussion around Figure 1 in [78] suggests. However, there have already been a number of big strides made in this direction. For instance, Boucheron, Bousquet, Lugosi, and Massart [12] obtained explicit bounds on moments of general functions of independent r.v.’s; their approach was based on a generalization of Ledoux’s entropy method [44, 45], using at that a generalized tensorization inequality due to Latała and Oleszkiewicz [40]. Another, more recent example demonstrating the same tendency is the work by van de Geer [88]. Even more recently, Tropp [86] provided noncommutative generalizations of the Bennett, Bernstein, Chernoff, and Hoeffding bounds – even with explicit and optimal constants; as pointed out in [86], “[a]symptotic theory is less relevant in practice”. Yet, as stated above, in the case EJP 17 (2012), paper 35. ejp.ejpecp.org Page 5/22 Gaussian-Rademacher bound of Rademacher sums and other related cases significantly more precise bounds can be obtained. 2 Proof of Theorem 1.1: outline Let us begin the proof with several introductory remarks. There are many symbols used in the proof. Therefore, let us assume a localization principle for notations: any notations introduced in a section or in a proof of a lemma/sublemma supersede those introduced in preceding sections or proofs. For example, the meaning of the Xi ’s introduced later in this section differs from that in Section 1. Without loss of generality (w.l.o.g.), assume that 0 6 a1 6 . . . 6 an =: a, (2.1) so that a = maxi ai . Introduce the numbers ui := ui,x := xai , whence for all x > 0 0 6 u1 6 . . . 6 un = xa. (2.2) The proof of Theorem 1.1 is to a large extent based on a careful analysis of the Esscher exponential tilt transform of the r.v. Sn . In introducing and using this transform, Esscher and then Cramér were motivated by applications in actuarial science. Closely related to the Esscher transform is the saddle-point approximation; for a recent development in this area, see [61]. The Esscher tilt has been used extensively in limit theorems for large deviation probabilities, but much less commonly concerning explicit probability inequalities – two rather different in character cases of the latter kind are represented by Raič [81] and Pinelis and Molzon [62]. One may also note that, in deriving LDP’s, the exponential tilt is usually employed to get a lower bound on the probability; in contrast, in this paper the tilt is used to obtain the upper bound. One may also note that, whereas in [78, 8] the main difficulty was to deal with moderate values values of x, in the present paper both the moderate and large values of x present significant problems; in a sense, here the consideration depends not just on the value of x itself but, to a greater extent, on the value of the product xa. The main idea of the proof is to reduce the problem from that on the vector (a1 , . . . , an ) of an unbounded dimension n to a set of low-dimensional extremal problems. The first step here is to use exponential tilting to obtain upper bounds on P(Sn > x) in R P 2 term of sums of the form g̃ dν , where i g(ui ), which can then be represented as x g̃(u) := g(u)/u2 (for u 6= 0), ν := 1 X 2 u δu , x2 i i i (2.3) and δt denotes the Dirac probability measure at point t, so that ν is a probability measure on the interval [0, xa]. This step turns the initial finite-dimensional problem into an infinite-dimensional one, involving the measure ν . However, then the well-known Carathéodory principle allows one to reduce the dimension to (at most) k − 1, where k is the total number of the integrals (with the respect to the measure ν ) involved in the extremal problem in hand; see e.g. [58] for recent developments in this direction, and references therein. The above ideas were carried out in the first version of this paper — see [55]. Later, I realized that the systems of integrands one has to deal with in the proof of Theorem 1.1 possess the so-called Tchebycheff and, even, Markov properties; therefore, EJP 17 (2012), paper 35. ejp.ejpecp.org Page 6/22 Gaussian-Rademacher bound one can reduce the dimension even further, to about k/2, which allows for more effective analyses. It should also be noted that the verification of the Markov property of a finite sequence of functions largely reduces to checking the positivity of several functions of only one variable. Major expositions of the theory of Tchebycheff–Markov systems and its applications are given in the monographs by Karlin and Studden [33] and Kreı̆n and Nudel0man [38]; closely related to this theory are certain results in real algebraic geometry, whereby polynomials are “certified” to be positive on a semialgebraic domain by means of an explicit representation, say in terms of sums of squares of polynomials; see e.g. [39, 47]. A brief review of the Tchebycheff and Markov systems of functions, which contains all the definitions and facts necessary for the applications in the present paper, is given in [60]. For the readers’ convenience, we shall present here a condensed version of [60] — in Appendix A at the end of this paper. Even after the just described reductions in dimension, the proof of Theorem 1.1 entails extensive (even if rather routine) calculations, especially symbolic ones. In this section, a number of lemmas will be stated, from which Theorem 1.1 will easily follow. Most of these lemmas will be proved in Section 3 – with the exception of Lemmas 2.3 and 2.7, whose proofs are more complicated and will each be presented in a separate section. Each of these two more complicated lemmas is based on a number of sublemmas – which are stated in the corresponding section and used there to prove the lemma. Each of these sublemmas (except for Sublemma 4.1) is a technical statement about one or several smooth functions of one real variable and is proved using the Mathematica implementation of the Tarski algorithm [85, 46, 14]; the proofs of these sublemmas can be found in [56]. It should be quite clear that all such calculations done with an aid of a computer are no less reliable or rigorous than similar, or even less involved, calculations done by hand. ***** For all i = 1, . . . , n, let Xi := ai εi . Next, let X̃1 , . . . , X̃n be any r.v.’s such that E g(X̃1 , . . . , X̃n ) = E exSn g(X1 , . . . , Xn ) E exSn (2.4) for all Borel-measurable functions g : Rn → R. Equivalently, one may require condition (2.4) only for Borel-measurable indicator functions g ; clearly, such r.v.’s X̃i do exist. It is also clear that the r.v.’s X̃i are independent. Moreover, for each i the distribution of X̃i is eui δai + e−ui δ−ai /(eui + e−ui ). Formula (2.4) presents the mentioned Esscher exponential tilt transform, with the tilting parameter (TP) the same as the x in (1.7); that is, we choose the TP to be the 2 minimizer of e−tx E etZ = e−tx+t /2 in t > 0 — rather than the minimizer of e−tx E etSn , which latter is usually taken as the TP in limit theorems for large deviations and can thus be expressed only via an implicit function. Our choice of the TP appears to simplify the proof greatly. In terms of the tilted r.v.’s X̃1 , . . . , X̃n , introduce now mx := X i 1 X E X̃i = ui th ui , x i Lx := sx := sX i 1 Var X̃i = x 1 X E |X̃i − E X̃i |3 , s3x i EJP 17 (2012), paper 35. s X i u2i , ch2 ui (2.5) (2.6) ejp.ejpecp.org Page 7/22 Gaussian-Rademacher bound where ch := cosh, sh := sinh, th := tanh, and arcch := arccosh assuming that arcch z > 0 for all z ∈ [1, ∞); thus, for each z ∈ [1, ∞), arcch z is the unique solution y > 0 to the equation ch y = z . Let F n and Φ denote, respectively, the tail function of X̃1 + · · · + X̃n and the standard normal tail function, so that F n (z) = P(X̃1 + · · · + X̃n > z) and Φ(z) = P(Z > z) for all real z . Also, let cBE denote the least possible constant in the Berry-Esseen inequality z − m x sup F n (z) − Φ 6 cBE Lx ; s x z∈R (2.7) by Shevtsova [83], cBE 6 56 100 ; a slightly worse bound, cBE 6 0.5606, is due to Tyurin [87]. Lemma 2.1. For all x > 0 P(Sn > x) 6 N (x) + 2cBE B(x), (2.8) where x − m o x2 s2x x − xmx + ln Φ + xsx , 2 sx i n o X B(x) := Lx exp − x2 + ln ch ui . N (x) := exp nX ln ch ui + (2.9) (2.10) i Lemma 2.1 carries out much of the first step in the proof of Theorem 1.1, as mentioned before: using exponential tilting to reduce the original problem, on the vector P (a1 , . . . , an ) of an unbounded dimension n, to one involving sums of the form i g(ui ) — recall here the expressions of mx and sx in (2.5) in terms of such sums. At this point, P only the factor Lx in (2.10) remains to be bound in terms a sum of the form i g(ui ), which will be done later, in Sublemma 4.1. Next, introduce the ratio ϕ(x) , xΦ(x) r(x) := (2.11) which is the inverse Mills ratio at x divided by x. By [71, Proposition 1.2], r is strictly and continuously decreasing from ∞ to 1 on the interval (0, ∞), so that there is a unique root x3/2 ∈ (0, ∞) of the equation r(x3/2 ) = 3/2; at that, x3/2 = 1.03 . . . and 1 < r(x) 6 3 2 for x > x3/2 . (2.12) 51 125 = 0.408 (2.13) Cϕ(x) 9 + x2 (2.14) Introduce also u∗ := and h(x) := (cf. (1.7)). The next two lemmas provide upper bounds on the terms N (x) and 2cBE B(x) in (2.8) — for x large enough; also, in Lemma 2.3, un = maxi xai is assumed to be small enough. EJP 17 (2012), paper 35. ejp.ejpecp.org Page 8/22 Gaussian-Rademacher bound Lemma 2.2. If x > x3/2 then N (x) 6 Φ(x). 13 10 Lemma 2.3. If x > and un 6 u∗ , then 2cBE B(x) 6 h(x). The proofs of the above two lemmas are comparatively difficult, especially the latter one, of Lemma 2.3, which will take entire Section 4. It is in these two proofs that we use methods of extremal problems (including special tools for Tchebycheff–Markov systems) for measures in a given moment set — to carry out the mentioned reduction from an infinite-dimensional problem to finite, in fact low, dimensions. In contrast with Lemmas 2.2 and 2.3, the following lemma is easy and to be used just as a quick reference concerning the second inequality in (1.7). Lemma 2.4. If x > 0 then h(x) < CΦ(x)/x. Next is another easy lemma, which serves as the induction basis (with n = 1) in the proof of Theorem 1.1 below; it is also used in the proof of Lemma 2.8. Lemma 2.5. If x > 0 then P(ε1 > x) 6 Φ(x) + h(x). Now we shall address a case not covered by Lemma 2.3: when un is not small enough (and x is still large enough). For this case we adopt an approach, which is based on the Chapman–Kolmogorov identity (2.16) and similar to methods used e.g. in [68, Proof of Proposition 2], [11, Proof of Theorem 4.2], and [78, Proof of Theorem 2]. Clearly, this method is quite different from the combination of the methods of exponential tilting and solving extremal problems for moment sets used — for the small enough values of un — in the proofs of Lemmas 2.1–2.3. Consider x−a U := Ux,a := √ 1 − a2 and x+a V := Vx,a := √ , 1 − a2 with a as in (2.1). The following two lemmas provide information about behavior of the Cϕ(x) two respective terms, Φ(x) and h(x) = 9+x2 , in the bound Q(x) on P(Sn > x) in (1.7). This information will be used to carry the induction step in proof of Theorem 1.1. Lemma 2.6. If x > √ 3 then 12 Φ(U ) + 12 Φ(V ) 6 Φ(x). Lemma 2.6 was proved in [11]; cf. also [78, Lemma 5]. Lemma 2.7. If x > (2.2), a = un /x. 15 10 and un > u∗ , then 1 2 h(U ) + 12 h(V ) 6 h(x); recall here that, by Thus, Lemmas 2.6 and 2.7 taken together close the gap that was left open in Lemma 2.3 because of the restriction un 6 u∗ there. Still, in Lemmas 2.2, 2.3, 2.6, and 2.7 the value of x was assumed to be large enough. This remaining gap is closed by Lemma 2.8. For all x ∈ (0, √ 3] P(Sn > x) 6 Φ(x) + h(x). (2.15) A key point in the proof of Lemma 2.8 is using inequality (1.3) with c = 3.22, as provided by [78]; we could have used instead the main result of [8], with c = c∗ = 3.17 . . ., but c = 3.22 is enough for our purposes here. Based on the above lemmas, we can now present Proof of Theorem 1.1. By definition (2.14) and Lemma 2.4, it is enough to prove inequality (2.15) for all x > 0. This can be done by induction on n. Indeed, for n = 1 this EJP 17 (2012), paper 35. ejp.ejpecp.org Page 9/22 Gaussian-Rademacher bound is Lemma 2.5. Assume now √ that n > 2. In view of Lemma 2.8, it is enough to prove inequality (2.15) for all x > 3 . At that, in view of Lemmas 2.1, 2.2, and 2.3, it is enough to consider the case un > u∗ . To do that, write 1 2 P(S̃n−1 > U ) + 12 P(S̃n−1 > V ), (2.16) √ := b1 ε1 +· · ·+bn−1 εn−1 , with bi := ai / 1 − a2 . It remains to use the induction P(Sn > x) = where S̃n−1 hypothesis together with Lemmas 2.6 and 2.7. 3 Proofs of Lemmas 2.1, 2.2, 2.4, 2.5, and 2.8 Proof of Lemma 2.1. Reading equation (2.4) with g(X1 , . . . , Xn ) = e−xSn Q xSn × I{Sn > x} right-to-left, recalling (2.7), and observing that E e = i ch ui , one has P(Sn > x) =− E exSn Z e−xy dF n (y) = [x,∞) Z ∞ xe−xy F n (x) − F n (y) dy 6 N1 (x) + B1 (x), x where ∞ y − m i h x − m x x N1 (x) := xe−xy Φ −Φ dy sx sx Z x∞ y − m dy N (x) x = = e−xy ϕ s s E exSn x x x Z and Z ∞ B1 (x) := 2cBE Lx 2 xe−xy dy = 2cBE Lx e−x = x 2cBE B(x) . E exSn Thus, (2.8) follows. Now and later in the paper, we need the following special l’Hospital-type rule for monotonicity. Proposition 3.1. ([75, Propositions 4.1 and 4.3]) Let −∞ 6 a < b 6 ∞. Let f and g be differentiable functions defined on the interval (a, b). It is assumed that g and g 0 do not take on the zero value and do not change their respective signs on (a, b). (I) If f (a+) = g(a+) = 0 or f (b−) = g(b−) = 0, and if the ratio f 0 /g 0 is strictly increasing/decreasing on (a, b), then (respectively) (f /g)0 is strictly positive/negative and hence the ratio f /g is strictly increasing/decreasing on (a, b). (II) If f (b−) = g(b−) = 0 and if the ratio f 0 /g 0 switches its monotonicity pattern at most once on (a, b) — only from increase to decrease, then the ratio f /g does so. Proof of Lemma 2.2. Let us begin this proof by using the well-known fact that the tail function Φ is log-concave. This fact is contained e.g. in [25, 67]. Alternatively, it can be easily obtained using part (I) of Proposition 3.1, since (ln Φ)0 = − ϕ . So, one can write Φ ln Φ(y) 6 ln Φ(x) + (ln Φ)0 (x)(y − x) = ln Φ(x) − xr(x)(y − x), with y = x−mx sx + xsx (cf. (2.9)) and r(x) defined by (2.11). Therefore and in view of (2.5), Z xa h 1 N (x) f (u) i ln 6 Ẽ(r, ν) := e(u) + r · 1 − ν( du) x2 sx Φ(x) 0 (recall (2.1)), e(u) := 1 th u ln ch u + − u2 u 2 ch2 u and f (u) := 1 − EJP 17 (2012), paper 35. th u 1 + 2 u ch u ejp.ejpecp.org Page 10/22 Gaussian-Rademacher bound for u 6= 0, e(0) := 0 and f (0) := 1, and r := r(x). Note that the probability measure ν on the interval [0, xa] defined by (2.3) satisfies the restriction Z xa b dν = s2x , where b(u) := 0 1 . ch2 u (3.1) Recalling now (2.12), we see that to prove Lemma 2.2 we only need to show that Ẽ(r, ν) 6 0 for all such probability measures ν and all r ∈ [1, 32 ]; in fact, since Ẽ(r, ν) is affine in r , it suffices to consider only r ∈ {1, 23 }. Using Proposition A.3 in Appendix A and the Mathematica command Reduce, one can verify that each of the two systems (1, −b, f − e) and (1, −b, f ) is an M+ -system on any interval [c, d] ⊂ [0, ∞); as mentioned earlier, this verification reduces to checking the positivity of several (Wronskian) functions of only one variable; for the system (1, −b, f − e), this takes about 20 sec on a standard laptop, and about 1 sec for the system (1, −b, f ). Since sx ∈ (0, 1] and r > 1, the integrand in the integral expression of Ẽ(r, ν) can be rewritten as g := r − θ1 (f − θe) with θ := srx ∈ (0, 1], and so, (1, −b, −g) is an M+ -system on [0, xa], for any r > 1 and any value of sx . Hence, by Proposition A.4 in Appendix A, R xa the minimum of 0 (−g) dν , and thus the maximum of Ẽ(r, ν), over all the probability R xa measures ν on [0, xa] satisfying the restriction 0 b dν = s2x is attained when the support of ν is a singleton subset (say {u}) of [0, xa]. For this u, one has sx = 1/ ch u, and it now suffices to show that g(u) = e(u) + r · 1 − f (u) ch u 6 0 for r ∈ {1, 32 } and u ∈ [0, ∞); using again the Mathematica command Reduce, it takes about 2 sec to check this in each of the two cases, r = 1 and r = 23 . Proof of Lemma 2.4. Using part (I) of Proposition 3.1, one can see that the ratio xh(x) Φ(x) is increasing in x > 0, from 0 to C . Now the result follows. Proof of Lemma 2.5. Observe that the definition (1.8) of C is equivalent to the condition Φ(1) + h(1) = 21 (cf. Remark 1.2). Hence and because Φ + h is decreasing on (0, ∞), one has P(ε1 > x) = 21 = Φ(1) + h(1) 6 Φ(x) + h(x) for all x ∈ (0, 1]. For x > 1, one obviously has P(ε1 > x) = 0 < Φ(x) + h(x). Proof of Lemma 2.8. By the symmetry, Chebyshev’s inequality, and the main result of [78], √ 13 P(Sn > x) 6 12 I{0 < x 6 1} + 2x1 2 I{1 < x 6 13 } + 3.22Φ(x) I{ < x 6 3} 10 10 √ for all x ∈ (0, 3 ]. In particular, for all x ∈ (0, 1] one has P(Sn > x) 6 12 = P(ε1 > x) 6 Φ(x) + h(x), by Lemma 2.5. 2 Next, let us prove (2.15) for x ∈ (1, 13 10 ]. Write x Φ(x) = Φ(x)/p(x), where p(x) := 0 0 1/x2 . Note that Φ(∞−) = p(∞−) = 0 and Φ (x)/p0 (x) = x3 ϕ(x)/2, so that Φ /p0 switches its monotonicity pattern exactly once on (0, ∞), from increase to decrease. Hence, by part (II) of Proposition 3.1, x2 Φ(x) = Φ(x)/p(x) switches its monotonicity pattern at most once, and at that necessarily from increase to decrease, as x increases from 1 13 to 10 . So, the minimum of x2 Φ(x) over x ∈ [1, 13 10 ] is attained at one of the end points of the interval [1, 13 ] ; in fact, the minimum is at x = 1. It is also easy to see that the 10 13 2 minimum of x h(x) over x ∈ [1, 10 ] is attained at x = 1as well. Thus, P(Sn > x) 6 2x1 2 = 1 1 13 Φ(x) + h(x) 6 Φ(x) + h(x) = Φ(x) + h(x) for x ∈ (1, 10 ]. 2x2 Φ(x)+h(x) 2 Φ(1)+h(1) √ 13 The case x ∈ ( 13 10 , 3 ] is similar to the just considered case x ∈ (1, 10 ]. Here, using part (II) of Proposition 3.1 again, one can see that h/Φ switches, √ just once, 13 from increase to decrease on (0, ∞) ; in particular, h/Φ increases on ( , 3 ], because 10 √ (h/Φ )0 ( 3) = 0.29 . . . > 0. So, to complete the proof of Lemma 2.8, it is enough to check 13 13 that 3.22Φ( 13 10 ) 6 Φ( 10 ) + h( 10 ), which is true. EJP 17 (2012), paper 35. ejp.ejpecp.org Page 11/22 Gaussian-Rademacher bound 4 Proof of Lemma 2.3 As was stated earlier, proofs of all sublemmas in this paper (except for Sublemma 4.1 below) can be found in [56]. We shall need the following tight upper bound on the Lyapunov ratio Lx , defined by (2.6): Sublemma 4.1. One has Lx 6 1 X 3 u (1 + th2 ui ) ch ui . x3 i i (4.1) The proof of Sublemma 4.1 will be given at the end of this section. By Sublemma 4.1 and the definition (2.10) of B(x), 2 B(x) 6 x1 e−x where +J˜ , (4.2) Z ˜ ν) := x2 J˜ := J(x, Z ` dν + ln k(u) := u(1 + th2 u) ch u, `(u) := k dν, ln ch u u2 for u 6= 0, 1 2, `(0) := and ν is the probability measure on the interval [0, u∗ ] defined by (2.3), so that ν satisfies the restriction (3.1). To obtain the upper bound h(x) on 2cBE B(x) as ˜ ν) over all such probability measures ν . stated in Lemma 2.3, we shall maximize J(x, R R R To do so, let us first maximize k dν given values of the integrals 1 dν(= 1), b dν(= R s2x , as in (3.1)), and ` dν . Noting that (ln ch)00 = th0 = sech2 and applying (twice) the special lHospital-type rule for monotonicity given by part (I) of Proposition 3.1, one sees that `0 < 0 on (0, ∞). (4.3) Sublemma 4.2. [56] The sequence (g0 , g1 , g2 , g3 ) := (1, −b, −`, k) is an M+ -system on [0, u∗ ]; here one may want to recall Definition A.2 in Appendix A. A proof of Sublemma 4.2 can be found in [56], where it is based on the statement in [60] reproduced as Proposition A.3 in Appendix A here. So, by Proposition A.4 (with n = 2 and m = 1 there), it suffices to consider measures ν of the form ν = (1 − t)δu + tδu∗ for some t ∈ [0, 1] and u ∈ [0, u∗ ]. For such ν , ˜ ν) = J(t, u) := Jx (t, u) := x2 · (1 − t)`(u) + t`(u∗ ) + ln (1 − t)k(u) + tk(u∗ ) . J(x, Thus, we need to maximize J(t, u) over all (t, u) ∈ [0, 1] × [0, u∗ ]; clearly, this maximum is attained. For all (t, u) ∈ (0, 1) × [0, u∗ ), k(u∗ ) − k(u) + τ `(u∗ ) − `(u) ∂J(t, u) (1 − t)k(u) + tk(u∗ ) = ∂t u∗ − u u∗ − u = k 0 (w) + τ `0 (w), (4.4) ∂J(t, u) (1 − t)k(u) + tk(u∗ ) = k 0 (u) + τ `0 (u), (4.5) ∂u 1−t where τ := x2 · (1 − t)k(u) + tk(u∗ ) and w is some number such that u < w < u∗ (whose existence follows by the mean-value theorem). So, if the maximum of J over the set [0, 1] × [0, u∗ ] is attained at some point (t, u) ∈ (0, 1) × (0, u∗ ), then at this point one has k0 (w) k0 (u) ∂J ∂J ∂t = 0 = ∂u , whence, by (4.4), (4.5), and (4.3), `0 (w) = −τ = `0 (u) while u∗ > w > u > 0, which contradicts EJP 17 (2012), paper 35. ejp.ejpecp.org Page 12/22 Gaussian-Rademacher bound 0 Sublemma 4.3. [56] The function ρ := k`0 is strictly increasing on the interval [0, u∗ ] (by continuity, we let ρ(0) := ρ(0+) = −∞). Also, no maximum of J is attained at any point (t, u) ∈ (0, 1) × {0}, because at any such point the right-hand side of (4.5) is k 0 (0) + τ `0 (0) = 1 + τ · 0 > 0, whereas the left-hand side of (4.5) must be 6 0. Thus, the maximum can be attained at some point (t, u) ∈ [0, 1]×[0, u∗ ] only if either t ∈ {0, 1} or u = u∗ . Therefore the maximizing measure ν must be concentrated at one point, say u, of the interval [0, u∗ ]. Together with (4.2), this shows that B(x) 6 1 −x2 +J0 (x,u) e , u∈[0,u∗ ] x sup where J0 (x, u) := Jx (0, u) = x2 · `(u) + ln k(u). So, Lemma 2.3 reduces now to the following statement: Λ(x, u) := J0 (x, u) − (?) x2 − ln x + ln(9 + x2 ) − K 6 0 2 (4.6) for all (x, u) ∈ [ 13 10 , ∞) × [0, u∗ ], where C K := ln √ . 2 2π cBE Thus, one may want to maximize Λ in u ∈ [0, u∗ ]. Towards that end, observe that for all u>0 1 ∂Λ = γ(u) − x2 , 0 −` (u) ∂u where k0 1 = −ρ ; k`0 k so, the partial derivative of Λ in u > 0 equals γ(u) − x2 in sign. On the other hand, the γ := − function k1 is positive and strictly decreasing and, in view of Sublemma 4.3, the function (−ρ) is so as well (on the interval [0, u∗ ]). It follows that the function γ too is positive and strictly decreasing on (0, u∗ ]; at that, γ(0+) = ∞. Introduce now p x∗ := γ(u∗ ) = 7.39 . . . . (4.7) By the mentioned properties of the function γ , for each x ∈ (0, x∗ ] one has γ(u) > x2 for all u ∈ [0, u∗ ] and hence Λ(x, u) increases in u ∈ [0, u∗ ], so that Λ(x, u) 6 Λ(x, u∗ ) for all u ∈ [0, u∗ ]. Since the derivative of Λ(x, u∗ ) in x is a rather simple rational function, it is easy to see that Λ(x, u∗ ) 6 0 for all x > 13 10 . So, inequality (4.6) holds for all 13 (x, u) ∈ [ 10 , x∗ ] × [0, u∗ ]. It remains to prove (4.6) for each x ∈ [x∗ , ∞) (and all u ∈ [0, u∗ ]). For each such x, there is a unique ux ∈ [0, u∗ ] such that γ(u) − x2 and hence ∂Λ ∂u are opposite to u − ux in sign, and so, Λ(x, u) 6 Λ(x, ux ) for all u ∈ [0, u∗ ]. Since, by (4.3), the function ` is strictly and continuously decreasing on [0, ∞), there is a unique inverse function `−1 : (0, 21 ] 7→ [0, ∞). Now introduce J˜0 (x, λ) := J0 x, `−1 (λ) = x2 λ + ln k̃(λ), where k̃ := k ◦ `−1 and λ ∈ [`(u∗ ), `(0)] = [`(u∗ ), 21 ]. Next, observe that (ln k̃)0 = −γ◦`−1 , which is decreasing on [`(u∗ ), 12 ], because γ and ` (and hence `−1 ) are decreasing. It follows that the function ln k̃ is concave on [`(u∗ ), 12 ], and so, J˜0 (x, λ) is concave in λ ∈ [`(u∗ ), 12 ] – for each real x. At this point, we need EJP 17 (2012), paper 35. ejp.ejpecp.org Page 13/22 Gaussian-Rademacher bound Sublemma 4.4. [56] If u ∈ (0, u∗ ] then γ(u) > 6 u2 . √ √ 6 x √ 6 x By (4.7) and Sublemma 4.4, if u = 6 ) √x 6 `( x ) and x > x∗ , then u ∈ (0, u∗ ] and γ( > x = γ(ux ), which in turn implies that < ux , `( > `(ux ), and (ln k̃) < (ln k̃)0 `(ux ) = −γ(ux ) = −x2 since γ , `, and (ln k̃) are decreasing ; so, for all λ ∈ √ J˜0 J˜0 [`(u∗ ), 12 ], ∂∂λ x, `( x6 ) < ∂∂λ x, `(ux ) = 0; therefore and by the concavity of J˜0 (x, λ) in λ, √ √ √ √ J˜0 J˜0 (x, λ) 6 J˜0 x, `( x6 ) + ∂∂λ x, `( x6 ) λ − `( x6 ) 6 Jˆ0 (x, x6 ) 2 √ 0 6 x ) 0 for all λ ∈ [`(u∗ ), 21 ], where Jˆ0 (x, u) := J0 (x, u) + x2 − γ(u) `(u∗ ) − `(u) . √ 2 Thus, in view of (4.6), Lemma 2.3 reduces to the inequality Jˆ0 (x, x6 ) − x2 − ln x + ln(9 + x2 ) − K 6 0 for all x > x∗ , where we change the variable once again, from x to u, by the √ formula x = 6 u . So, Lemma 2.3 reduces to Sublemma 4.5. [56] For all u ∈ (0, u∗ ] Λ̃(u) := Jˆ0 ( √ 6 u ,u) − 3 u2 √ − ln 6 u + ln(9 + 6 u2 ) 6 K. It remains, in this section, to present Proof of Sublemma 4.1. Observe that Lx = (xsx )−3 means exactly that X u3i (1 − th4 ui ) − s3x i X P i u3i (1−th4 ui ). So, inequality (4.1) u3i (1 + th2 ui ) ch ui = X i for all ui ’s in the interval [0, u∗ ] such that u2i g(ui ) 6 0 (4.8) i P i u2i = x2 and u2i i ch2 ui P = x2 s2x , where 1 1 3 − s ch u . x ch2 u ch2 u P P P u2 Next, the object i u2i g(ui ) in (4.8) with the restrictions i u2i = x2 and i ch2iu = x2 s2x i can be rewritten as x2 E h(Y ) given E Y = s2x , where h(·) := ha (·) as in (4.9) below with P a = s3x and Y is a r.v. with the distribution ν := x12 i u2i δvi , with vi := ch21 ui ; note that one always has sx ∈ (0, 1] and ν is indeed a probability measure due to the restriction P 2 P 2 2 −2 i ui = x . So, by Subsublemma 4.6 below and Jensen’s inequality, x i ui g(ui ) = E h(Y ) 6 h(E Y ) = h(s2x ) = 0, which proves the inequality in (4.8) and hence that in g(u) := u(1 − th4 u) − s3x u(1 + th2 u) ch u = u 2 − (4.1). Subsublemma 4.6. [56] For each a ∈ [0, 1], the function (0, 1] 3 v 7→ ha (v) := arcch( √1v )(2 − v)(v − √a ) v (4.9) is concave. 5 Proof of Lemma 2.7 This proof could be somewhat simplified using the mentioned result (1.5); however, let us present an independent proof here, which is not much more complicated. Let ∆ := ∆(x, u) := √ 2π 1 C 2h Ux,u/x + 12 h Vx,u/x − h(x) . EJP 17 (2012), paper 35. ejp.ejpecp.org Page 14/22 Gaussian-Rademacher bound We have to show that ∆(x, u) 6 0 for all pairs (x, u) in the set P := {(x, u) ∈ [ 15 10 , ∞) × [u∗ , ∞) : u < x}; the condition u < x here corresponds to the condition a = an < 1. The idea of the proof of Lemma 2.7 is, essentially, to fix a value of x and then differentiate ∆(x, u) twice with respect to certain functions of u (which may be different for different fixed values of x) so that the sign of the resulting generalized second partial derivative of ∆(x, u) in u be comparatively easy to determine. In other words, we establish a generalized convexity pattern for ∆(x, u) in u. Toward this end, introduce first the set 4 P̃ := {(x, u) ∈ [ 15 10 , ∞) × [ 10 , ∞) : u < x}, which is slightly larger than P ; recall here (2.13). Then we shall consider the mentioned generalized first and second partial derivatives of ∆(x, u) in u: ∂∆ and ∂u ∂∆1 ∆2 := ∆2 (x, u) := F2 (x, u) , ∂u ∆1 := ∆1 (x, u) := F1 (x, u) (5.1) (5.2) where F1 (x, u) and F2 (x, u) are certain expressions, to be defined soon, such that F1 (x, u)(u − 1) > 0 and F2 (x, u) > 0 for all (x, u) ∈ P̃ with u 6= 1. (5.3) Moreover, we shall show that F1 (x, u) and F2 (x, u) are such that the ∆1 and ∆2 as in (5.1) and (5.2) possess the following properties: ∆2 > 0 on P̃ , (5.4) ∆1 (x, x−) = − 21 < 0 for x > 0, (5.5) and, furthermore, one has the following sublemmas, proved in [56]: Sublemma 5.1. [56] ∆1 (x, 4 10 ) > 0 for all x ∈ [ 15 10 , ∞) and 15 Sublemma 5.2. [56] ∆(x, u∗ ) < 0 for all x ∈ [ 10 , ∞). 4 It will then follow by (5.2), (5.3), and (5.4) that ∆1 (x, u) increases in u ∈ [ 10 , 1) and 15 in u ∈ (1, x) for each x ∈ [ 10 , ∞), whence, by (5.5), ∆1 (x, u) < 0 for all (x, u) ∈ P̃ such that u > 1 and, by Sublemma 5.1, ∆1 (x, u) > 0 for all (x, u) ∈ P̃ such that u < 1. Thus, for all points (x, u) ∈ P̃ with u 6= 1 one has ∆1 (x, u)(u − 1) < 0 and hence, by (5.1) and 4 15 (5.3), ∆(x, u) decreases in u ∈ [ 10 , x) for each x ∈ [ 10 , ∞). Using now Sublemma 5.2 4 and recalling that u∗ > 10 , one concludes that ∆ < 0 on P , which yields Lemma 2.7. It remains to present F1 and F2 such that (5.3), (5.4), and (5.5) hold indeed. Let n u − x2 2 o x2 − u2 p2 (x, u)2 F1 (x, u) := exp and 2 (x2 − u2 ) (u − 1)x2 (x2 − u) p1 (x, u) n 2ux2 o (u − 1)2 (x − u)2 (u + x)2 x2 − u2 p (x, u)2 p (x, u)3 1 3 F2 (x, u) := exp , x2 − u2 p2 (x, u) where p1 (x, u) := x2 (11 + x2 ) − (10u2 + 2ux2 ), p2 (x, u) := x2 (9 + x2 ) − (8u2 + 2ux2 ), p3 (x, u) := x2 (9 + x2 ) − (8u2 − 2ux2 ). EJP 17 (2012), paper 35. (5.6) (5.7) (5.8) ejp.ejpecp.org Page 15/22 Gaussian-Rademacher bound Using e.g. the Mathematica command Reduce, one can see that on the set P̃ the polynomials p1 , p2 , and p3 are positive. Note also that u < x < x2 for all (x, u) ∈ P̃ . So, (5.3) holds. Next, with definitions (5.1), (5.2), (5.6), and (5.7) in place, it turns out that ∆2 (x, u) is a polynomial in (x, u) (of degree 24 in x, and 14 in u). Using Reduce again, one verifies (5.4). Finally, it is straightforward (even if somewhat tedious) to check (5.5). A Tchebycheff–Markov systems For a nonnegative integer n, let g0 , . . . , gn be (real-valued) continuous functions on an interval [a, b] for some a and b such that −∞ < a < b < ∞. Let M denote the set of all (nonnegative) Borel measures on [a, b]. Take any point c = (c0 , . . . , cn ) ∈ Rn+1 such that Z b n Mc := µ ∈ M : o gi dµ = ci for all i ∈ 0, n 6= ∅; (A.1) a here and in what follows, for any m and n in Z ∪ {∞} we let m, n := {j ∈ Z : m 6 j 6 n}. Definition A.1. The sequence (g0 , . . . , gn ) of functions is a T -system if the restrictions of these n+1 functions to any subset of [a, b] of cardinality n+1 are linearly independent. If, for each k ∈ 0, n, the initial subsequence (g0 , . . . , gk ) of the sequence (g0 , . . . , gn ) is a T -system, then (g0 , . . . , gn ) is said to be an M -system (where M refers to Markov). n Let (g0 , . . . , gn ) be a T -system on [a, b]. Let det gi (xj ) 0 denote the determinant of the matrix gi (xj ) : i ∈ 0, n, j ∈ 0, n . This determinant is continuous in (x0 , . . . , xn ) in the (convex) simplex (say Σ) defined by the n inequalities a 6 x0 < · · · < xn 6 b and does not vanish anywhere on Σ. So, det gi (xj ) 0 is constant in sign on Σ. n Definition A.2. The sequence (g0 , . . . , gn ) is said to be a T+ -system on [a, b] if det gi (xj ) 0 > 0 for all (x0 , . . . , xn ) ∈ Σ. If (g0 , . . . , gk ) is a T+ -system on [a, b] for each k ∈ 0, n, then the sequence (g0 , . . . , gn ) is said to be an M+ -system on [a, b]. In the case when the functions g0 , . . . , gn are n times differentiable at a point x ∈ (a, b), consider also the Wronskians k (j) W0k (x) := det gi (x) 0 , (j) where k ∈ 0, n and gi g0 (x). (0) is the j th derivative of gi , with gi := gi ; in particular, W00 (x) = Proposition A.3. Suppose that the functions g0 , . . . , gn are (still continuous on [a, b] and) n times differentiable on (a, b). Then, for the sequence (g0 , . . . , gn ) to be an M+ system on [a, b], it is necessary that W0k > 0 on (a, b) for all k ∈ 0, n, and it is sufficient that u0 > 0 on [a, b] and W0k > 0 on (a, b) for all k ∈ 1, n. Thus, verifying the M+ -property largely reduces to checking the positivity of several functions of only one variable. A special case of Proposition A.3 (with n = 1 and g0 = 1) is the following well-known fact: if a function g1 is continuous on [a, b] and has a positive derivative on (a, b), then g1 is (strictly) increasing on [a, b]; vice versa, if g1 is increasing on [a, b], then the derivative of g1 (if exists) must be nonnegative on (a, b). As in this special case, the proof of Proposition A.3 in general can be based on the mean-value theorem; cf. e.g. [33, Theorem 1.1 of Chapter XI], which states that the requirement for W0k to be strictly positive on the closed interval [a, b] for all k ∈ 0, n EJP 17 (2012), paper 35. ejp.ejpecp.org Page 16/22 Gaussian-Rademacher bound is equivalent to a condition somewhat stronger than being an M+ -system on [a, b]; in connection with this, one may also want to look at [38, Theorem IV.5.2]. Note that, in the applications to the proofs of Lemmas 2.2 and 2.3 of this paper, the relevant Wronskians vanish at the left endpoint of the interval. The proof of Proposition A.3 can be obtained by induction on n using the recursive n formulas for the determinants det gi (xj ) 0 and W0n as displayed right above [33, (5.5) in Chapter VIII] and in [33, (5.6) in Chapter VIII], where we use gi in place of ψi . Proposition A.4. Suppose that (g0 , . . . , gn+1 ) is an M+ -system on [a, b] or, more generally, each of the sequences (g0 , . . . , gn ) and (g0 , . . . , gn+1 ) is a T+ -system on [a, b]. Suppose also that condition (A.1) holds. Let m := b n+1 2 c. Then one has the following. Rb (I) The maximum (respectively, the minimum) of a gn+1 dµ over all µ ∈ Mc is attained at a unique measure µmax (respectively, µmin ) in Mc . Moreover, the measures µmax and µmin do not depend on the choice of gn+1 , as long as gn+1 is such that (g0 , . . . , gn+1 ) is a T+ -system on [a, b]. (II) There exist subsets Xmax and Xmin of [a, b] such that Xmax ⊇ supp µmax , Xmin ⊇ supp µmin , and (a) if n = 2m then card Xmax = card Xmin = m + 1, Xmax 3 b, and Xmin 3 a; (b) if n = 2m − 1 then card Xmax = m + 1, card Xmin = m, and Xmax ⊇ {a, b}. To illustrate Proposition A.4, one may consider the simplest two special cases when the conditions of the proposition hold and its conclusion is obvious: (i) n = 0, g0 (x) ≡ 1, g1 is increasing on [a, b], and c0 > 0; then supp µmax ⊆ {b} and supp µmin ⊆ {a}; in fact, µmax = c0 δb and µmin = c0 δa ; here and in what follows, δx denotes the Dirac probability measure at point x. (ii) n = 1, g0 (x) ≡ 1, g1 (x) ≡ x, g2 is strictly convex on [a, b], c0 > 0, and c1 ∈ [c0 a, c0 b]; b−c1 −c0 a then supp µmax ⊆ {a, b} and card supp µmin 6 1; in fact, µmax = c0b−a δa + c1b−a δb , and µmin = c0 δc1 /c0 if c0 > 0 and µmin = 0 if c0 = 0. These examples also show that the T -property of systems of functions can be considered as generalized monotonicity/convexity; see e.g. [82] and bibliography there. Proof of Proposition A.4. Consider two cases, depending on whether c is strictly or singularly positive; in equivalent geometric terms, this means, respectively, that c belongs to the interior or the boundary of the smallest closed convex cone containing the subset {(g0 (x), . . . , gn (x)) : x ∈ [a, b]} of Rn+1 [38, Theorem IV.6.1]. In the first case, when c is strictly positive, both statements of Proposition A.4 follow by [38, Theorem IV.1.1]; at that, one should let Xmax = supp µmax and Xmin = supp µmin . (The condition that c be strictly positive appears to be missing in the statement of the latter theorem; cf. [33, Theorem 1.1 of Chapter 1.1].) In the other case, when c is singularly positive, use [38, Theorem III.4.1], which states that in this case the set Mc consists of a single measure (say µ∗ ), and its support set X∗ := supp µ∗ is of an index 6 n; that is, `− + 2` + `+ 6 n, where `− , `, and `+ stand for the cardinalities of the intersections of X∗ with the sets {a}, (a, b), and {b}. It remains to show that this condition on the index of X∗ implies that there exist subsets Xmax and Xmin of [a, b] satisfying the conditions (IIa) and (IIb) of Proposition A.4 and such that Xmax ∩ Xmin ⊇ X∗ . 2m−`− −`+ − c 6 b 2m−` c = m − `− ; so, card(X∗ ∪ If n = 2m then card(X∗ ∩ (a, b)) = ` 6 b 2 2 {b}) 6 `− +(m−`− )+1 = m+1. Adding now to the set X∗ ∪{b} any m+1−card(X∗ ∪{b}) EJP 17 (2012), paper 35. ejp.ejpecp.org Page 17/22 Gaussian-Rademacher bound points of the complement of X∗ ∪ {b} to [a, b], one obtains a subset Xmax of [a, b] such that Xmax ⊇ X∗ , Xmax 3 b, and card Xmax = m + 1. Similarly, there exists a subset Xmin of [a, b] such that Xmin ⊇ X∗ , Xmin 3 a, and card Xmin = m + 1. 2m−1−`− −`+ If n = 2m − 1 then card(X∗ ∩ (a, b)) = ` 6 b c 6 m − 1 and hence card(X∗ ∪ 2 {a, b}) 6 1 + (m − 1) + 1 = m + 1. So, there exists a subset Xmax of [a, b] such that Xmax ⊇ X∗ , Xmax ⊇ {a, b}, and card Xmax = m + 1. One also has card X∗ = `− + ` + `+ 6 − +`+ b 2m−1+` c 6 b 2m+1 2 2 c = m. So, there exists a subset Xmin of [a, b] such that Xmin ⊇ X∗ and card Xmin = m. References [1] Sergei N. Antonov and Victor M. Kruglov, Sharpened versions of a Kolmogorov’s inequality, Statist. Probab. Lett. 80 (2010), no. 3-4, 155–160. MR-2575440 [2] A. Ben-Tal, A. Nemirovski, and C. Roos, Robust solutions of uncertain quadratic and conic quadratic problems, SIAM J. Optim. 13 (2002), no. 2, 535–560 (electronic). MR-1951034 [3] Aharon Ben-Tal and Arkadi Nemirovski, On safe tractable approximations of chanceconstrained linear matrix inequalities, Math. Oper. Res. 34 (2009), no. 1, 1–25. MR-2542986 [4] George Bennett, Probability inequalities for the sum of independent random variables, J. Amer. Statist. Assoc. 57 (1962), no. 297, 33–45. [5] V. Bentkus, An inequality for large deviation probabilities of sums of bounded i.i.d. random variables, Liet. Mat. Rink. 41 (2001), no. 2, 144–153. MR-1851123 [6] , A remark on the inequalities of Bernstein, Prokhorov, Bennett, Hoeffding, and Talagrand, Liet. Mat. Rink. 42 (2002), no. 3, 332–342. MR-1947624 [7] , An inequality for tail probabilities of martingales with differences bounded from one side, J. Theoret. Probab. 16 (2003), no. 1, 161–173. MR-1956826 [8] V. Bentkus and D. Dzindzalieta, A tight Gaussian bound for weighted sums of Rademacher random variables (preprint). [9] Vidmantas Bentkus, On Hoeffding’s inequalities, Ann. Probab. 32 (2004), no. 2, 1650–1673. MR-2060313 [10] , On measure concentration for separately Lipschitz functions in product spaces, Israel J. Math. 158 (2007), 1–17. MR-2342455 [11] Sergey G. Bobkov, Friedrich Götze, and Christian Houdré, On Gaussian and Bernoulli covariance representations, Bernoulli 7 (2001), no. 3, 439–451. MR-1836739 [12] Stéphane Boucheron, Olivier Bousquet, Gábor Lugosi, and Pascal Massart, Moment inequalities for functions of independent random variables, Ann. Probab. 33 (2005), no. 2, 514–560. MR-2123200 [13] D. L. Burkholder, Independent sequences with the Stein property, Ann. Math. Statist. 39 (1968), 1282–1288. MR-0228045 [14] George E. Collins, Quantifier elimination for real closed fields by cylindrical algebraic decomposition, Quantifier elimination and cylindrical algebraic decomposition (Linz, 1993), Texts Monogr. Symbol. Comput., Springer, Vienna, 1998, pp. 85–121. MR-1634190 [15] Amir Dembo and Ofer Zeitouni, Large deviations techniques and applications, second ed., Applications of Mathematics (New York), vol. 38, Springer-Verlag, New York, 1998. MR1619036 [16] Kürşad Derinkuyu and Mustafa Ç. Pınar, On the S-procedure and some variants, Math. Methods Oper. Res. 64 (2006), no. 1, 55–77. MR-2264772 [17] Kürşad Derinkuyu, Mustafa Ç. Pınar, and Ahmet Camcı, An improved probability bound for the approximate S-Lemma, Oper. Res. Lett. 35 (2007), no. 6, 743–746. MR-2361043 [18] M. L. Eaton and Bradley Efron, Hotelling’s T 2 test under symmetry conditions, J. Amer. Statist. Assoc. 65 (1970), 702–711. MR-0269021 [19] Morris L. Eaton, A note on symmetric Bernoulli random variables, Ann. Math. Statist. 41 (1970), 1223–1226. MR-0268930 EJP 17 (2012), paper 35. ejp.ejpecp.org Page 18/22 Gaussian-Rademacher bound [20] , A probability inequality for linear combinations of bounded random variables, Ann. Statist. 2 (1974), 609–613. [21] D. Edelman, Private communication, 1994. [22] Bradley Efron, Student’s t-test under symmetry conditions, J. Amer. Statist. Assoc. 64 (1969), 1278–1302. MR-0251826 [23] Evarist Giné, Friedrich Götze, and David M. Mason, When is the Student t-statistic asymptotically standard normal?, Ann. Probab. 25 (1997), no. 3, 1514–1531. MR-1457629 [24] S. E. Graversen and G. Peškir, Extremal problems in the maximal inequalities of Khintchine, Math. Proc. Cambridge Philos. Soc. 123 (1998), no. 1, 169–177. MR-1474873 [25] Richard L. Hall, Marek Kanter, and Michael D. Perlman, Inequalities for the probability content of a rotated square and related convolutions, Ann. Probab. 8 (1980), no. 4, 802– 813. MR-577317 [26] Jean-Baptiste Hiriart-Urruty, A new series of conjectures and open questions in optimization and matrix analysis, ESAIM Control Optim. Calc. Var. 15 (2009), no. 2, 454–470. MR2513094 [27] Paweł Hitczenko and Stanisław Kwapień, On the Rademacher series, Probability in Banach spaces, 9 (Sandjberg, 1993), Progr. Probab., vol. 35, Birkhäuser Boston, Boston, MA, 1994, pp. 31–36. MR-1308508 [28] , On the Rademacher series, Probability in Banach spaces, 9 (Sandjberg, 1993), Progr. Probab., vol. 35, Birkhäuser Boston, Boston, MA, 1994, pp. 31–36. MR-1308508 [29] Paweł Hitczenko and Stephen Montgomery-Smith, Measuring the magnitude of sums of independent random variables, Ann. Probab. 29 (2001), no. 1, 447–466. MR-1825159 [30] Wassily Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc. 58 (1963), 13–30. MR-0144363 [31] Jørgen Hoffmann-Jørgensen, Sums of independent Banach space valued random variables, Studia Math. 52 (1974), 159–186. MR-0356155 [32] Ron Holzman and Daniel J. Kleitman, On the product of sign vectors and unit vectors, Combinatorica 12 (1992), no. 3, 303–316. MR-1195893 [33] Samuel Karlin and William J. Studden, Tchebycheff systems: With applications in analysis and statistics, Pure and Applied Mathematics, Vol. XV, Interscience Publishers John Wiley & Sons, New York-London-Sydney, 1966. MR-0204922 [34] Michael J. Klass and Krzysztof Nowicki, An improvement of Hoffmann-Jørgensen’s inequality, Ann. Probab. 28 (2000), no. 2, 851–862. MR-1782275 [35] , Uniformly accurate quantile bounds via the truncated moment generating function: the symmetric case, Electron. J. Probab. 12 (2007), no. 47, 1276–1298 (electronic). MR2346512 [36] , Uniformly accurate quantile bounds for sums of arbitrary independent random variables, J. Theoret. Probab. 23 (2010), no. 4, 1068–1091. MR-2735737 [37] H. König and S. Kwapień, Best Khintchine type inequalities for sums of independent, rotationally invariant random vectors, Positivity 5 (2001), no. 2, 115–152. MR-1825172 [38] M. G. Kreı̆n and A. A. Nudel0 man, The Markov moment problem and extremal problems, American Mathematical Society, Providence, R.I., 1977, Ideas and problems of P. L. Čebyšev and A. A. Markov and their further development, Translated from the Russian by D. Louvish, Translations of Mathematical Monographs, Vol. 50. MR-0458081 [39] Jean Bernard Lasserre, Moments, positive polynomials and their applications, Imperial College Press Optimization Series, vol. 1, Imperial College Press, London, 2010. MR-2589247 [40] R. Latała and K. Oleszkiewicz, Between Sobolev and Poincaré, Geometric aspects of functional analysis, Lecture Notes in Math., vol. 1745, Springer, Berlin, 2000, pp. 147–168. MR-1796718 [41] Rafał Latała, Estimation of moments of sums of independent real random variables, Ann. Probab. 25 (1997), no. 3, 1502–1513. MR-1457628 [42] , Estimates of moments and tails of Gaussian chaoses, Ann. Probab. 34 (2006), no. 6, 2315–2331. MR-2294983 EJP 17 (2012), paper 35. ejp.ejpecp.org Page 19/22 Gaussian-Rademacher bound [43] Rafał Latała and Krzysztof Oleszkiewicz, On the best constant in the Khinchin-Kahane inequality, Studia Math. 109 (1994), no. 1, 101–104. MR-1267715 [44] Michel Ledoux, On Talagrand’s deviation inequalities for product measures, ESAIM Probab. Statist. 1 (1995/97), 63–87 (electronic). MR-1399224 [45] , The concentration of measure phenomenon, Mathematical Surveys and Monographs, vol. 89, American Mathematical Society, Providence, RI, 2001. MR-1849347 [46] S. Łojasiewicz, Sur les ensembles semi-analytiques, Actes du Congrès International des Mathématiciens (Nice, 1970), Tome 2, Gauthier-Villars, Paris, 1971, pp. 237–241. MR0425152 [47] Murray Marshall, Positive polynomials and sums of squares, Mathematical Surveys and Monographs, vol. 146, American Mathematical Society, Providence, RI, 2008. MR-2383959 [48] S. J. Montgomery-Smith, The distribution of Rademacher sums, Proc. Amer. Math. Soc. 109 (1990), no. 2, 517–522. MR-1013975 [49] Elchanan Mossel, Ryan O’Donnell, and Krzysztof Oleszkiewicz, Noise stability of functions with low influences: invariance and optimality, Ann. of Math. (2) 171 (2010), no. 1, 295–341. MR-2630040 [50] A. V. Nagaev, Probabilities of large deviations of sums of independent random variables (Doctor of Science Thesis, Tashkent 1970). [51] Krzysztof Oleszkiewicz, On the Stein property of Rademacher sequences, Probab. Math. Statist. 16 (1996), no. 1, 127–130. MR-1407938 [52] , On a nonsymmetric version of the Khinchine-Kahane inequality, Stochastic inequalities and applications, Progr. Probab., vol. 56, Birkhäuser, Basel, 2003, pp. 157–168. MR2073432 [53] G. Peshkir and A. N. Shiryaev, Khinchin inequalities and a martingale extension of the sphere of their action, Uspekhi Mat. Nauk 50 (1995), no. 5(305), 3–62. MR-1365047 [54] V. V. Petrov, Sums of independent random variables, Springer-Verlag, New York, 1975, Translated from the Russian by A. A. Brown, Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 82. MR-0388499 [55] I. Pinelis, An asymptotically Gaussian bound on the Rademacher tails, preprint, version 1, http://arxiv.org/pdf/1007.2137v1.pdf. [56] , An asymptotically Gaussian bound on the Rademacher tails, preprint, version 3, http://arxiv.org/pdf/1007.2137v3.pdf. [57] , On the Bennett-Hoeffding inequality (preprint), arXiv:0902.4058v1 [math.PR]. [58] , On the extreme points of moments sets, preprint, http://arxiv.org/find/all/1/ au:+pinelis/0/1/0/all/0/1. [59] , On the supremum of the tails of normalized sums of independent Rademacher random variables, preprint, http://arxiv.org/find/all/1/au:+pinelis/0/1/0/all/0/1. [60] , Tchebycheff systems and extremal problems for generalized moments: a brief survey, preprint, http://arxiv.org/find/all/1/au:+pinelis/0/1/0/all/0/1. [61] , Exponential deficiency of convolutions of densities, 10.1051/ps/2010010, ESAIM: Probability and Statistics (2011). published online, DOI [62] I. Pinelis and R. Molzon, Berry-Esséen bounds for general nonlinear statistics, with applications to Pearson’s and non-central Student’s and Hotelling’s (preprint), arXiv:0906.0177v1 [math.ST]. [63] I. F. Pinelis, A problem on large deviations in a space of trajectories, Theory Probab. Appl. 26 (1981), no. 1, 69–84. [64] I. F. Pinelis, Asymptotic equivalence of the probabilities of large deviations for sums and maximum of independent random variables, Limit theorems of probability theory, Trudy Inst. Mat., vol. 5, “Nauka” Sibirsk. Otdel., Novosibirsk, 1985, pp. 144–173, 176. MR-821760 [65] Iosif Pinelis, Extremal probabilistic problems and Hotelling’s T 2 test under a symmetry condition, Ann. Statist. 22 (1994), no. 1, 357–368. MR-1272088 EJP 17 (2012), paper 35. ejp.ejpecp.org Page 20/22 Gaussian-Rademacher bound [66] , Optimal tail comparison based on comparison of moments, High dimensional probability (Oberwolfach, 1996), Progr. Probab., vol. 43, Birkhäuser, Basel, 1998, pp. 297–314. MR-1652335 [67] , Fractional sums and integrals of r -concave tails and applications to comparison probability inequalities, Advances in stochastic inequalities (Atlanta, GA, 1997), Contemp. Math., vol. 234, Amer. Math. Soc., Providence, RI, 1999, pp. 149–168. MR-1694770 [68] , On exact maximal Khinchine inequalities, High dimensional probability, II (Seattle, WA, 1999), Progr. Probab., vol. 47, Birkhäuser Boston, Boston, MA, 2000, pp. 49–63. MR1857314 [69] , Exact asymptotics for large deviation probabilities, with applications, Modeling uncertainty, Internat. Ser. Oper. Res. Management Sci., vol. 46, Kluwer Acad. Publ., Boston, MA, 2002, pp. 57–93. MR-1893275 [70] , L’Hospital type rules for monotonicity: applications to probability inequalities for sums of bounded random variables, JIPAM. J. Inequal. Pure Appl. Math. 3 (2002), no. 1, Article 7, 9 pp. (electronic). MR-1888922 [71] , Monotonicity properties of the relative error of a Padé approximation for Mills’ ratio, JIPAM. J. Inequal. Pure Appl. Math. 3 (2002), no. 2, Article 20, 8 pp. (electronic). MR1906389 [72] , Spherically symmetric functions with a convex second derivative and applications to extremal probabilistic problems, Math. Inequal. Appl. 5 (2002), no. 1, 7–26. MR-1880267 [73] , Dimensionality reduction in extremal problems for moments of linear combinations of vectors with random coefficients, Stochastic inequalities and applications, Progr. Probab., vol. 56, Birkhäuser, Basel, 2003, pp. 169–185. MR-2073433 [74] , Binomial upper bounds on generalized moments and tail probabilities of (super)martingales with differences bounded from above, High dimensional probability, IMS Lecture Notes Monogr. Ser., vol. 51, Inst. Math. Statist., Beachwood, OH, 2006, pp. 33–52. MR-2387759 [75] , On l’Hospital-type rules for monotonicity, JIPAM. J. Inequal. Pure Appl. Math. 7 (2006), no. 2, Article 40, 19 pp. (electronic). MR-2221321 [76] , On normal domination of (super)martingales, Electron. J. Probab. 11 (2006), no. 39, 1049–1070. MR-2268536 [77] , Exact inequalities for sums of asymmetric random variables, with applications, Probab. Theory Related Fields 139 (2007), no. 3-4, 605–635. MR-2322709 [78] , Toward the best constant factor for the Rademacher-Gaussian tail comparison, ESAIM Probab. Stat. 11 (2007), 412–426. MR-2339301 [79] , On inequalities for sums of bounded random variables, J. Math. Inequal. 2 (2008), no. 1, 1–7. MR-2453629 [80] S. Portnoy, Private communication, 1991. [81] Martin Raič, CLT-related large deviation bounds based on Stein’s method, Adv. in Appl. Probab. 39 (2007), no. 3, 731–752. MR-2357379 [82] Moshe Shaked and J. George Shanthikumar, Stochastic orders, Springer Series in Statistics, Springer, New York, 2007. MR-2265633 [83] I. G. Shevtsova, Refinement of estimates for the rate of convergence in Lyapunov’s theorem, Dokl. Akad. Nauk 435 (2010), no. 1, 26–28. MR-2790498 [84] Michel Talagrand, The missing factor in Hoeffding’s inequalities, Ann. Inst. H. Poincaré Probab. Statist. 31 (1995), no. 4, 689–702. MR-1355613 [85] Alfred Tarski, A Decision Method for Elementary Algebra and Geometry, RAND Corporation, Santa Monica, Calif., 1948. MR-0028796 [86] Joel A. Tropp, User-friendly tail bounds for sums of random matrices (preprint), arXiv:1004.4389v7 [math.PR]. [87] I. Tyurin, New estimates of the convergence rate in the Lyapunov theorem (preprint, arXiv:0912.0726v1 [math.PR]). EJP 17 (2012), paper 35. ejp.ejpecp.org Page 21/22 Gaussian-Rademacher bound [88] Sara A. van de Geer, On non-asymptotic bounds for estimation in generalized linear models with highly correlated design, Asymptotics: particles, processes and inverse problems, IMS Lecture Notes Monogr. Ser., vol. 55, Inst. Math. Statist., Beachwood, OH, 2007, pp. 121– 134. MR-2459935 [89] Mark Veraar, A note on optimal probability lower bounds for centered random variables, Colloq. Math. 113 (2008), no. 2, 231–240. MR-2425084 [90] , On Khintchine inequalities with a weight, Proc. Amer. Math. Soc. 138 (2010), no. 11, 4119–4121. MR-2679633 [91] Vladimir Vinogradov, Refined large deviation limit theorems, Pitman Research Notes in Mathematics Series, vol. 315, Longman Scientific & Technical, Harlow, 1994. MR-1312369 [92] A. V. Zhubr, On one extremal problem for N -cube, To appear, 2012. Acknowledgments. I am pleased to thank the referee for a careful reading of the paper and useful suggestions about the exposition. EJP 17 (2012), paper 35. ejp.ejpecp.org Page 22/22