An asymptotically Gaussian bound on the Rademacher tails Iosif Pinelis ∗

advertisement
o
u
r
nal
o
f
J
on
Electr
i
P
c
r
o
ba
bility
Electron. J. Probab. 17 (2012), no. 35, 1–22.
ISSN: 1083-6489 DOI: 10.1214/EJP.v17-2026
An asymptotically Gaussian bound
on the Rademacher tails∗
Iosif Pinelis†
Abstract
An explicit upper bound on the tail probabilities for the normalized Rademacher sums
is given. This bound, which is best possible in a certain sense, is asymptotically equivalent to the corresponding tail probability of the standard normal distribution, thus
affirming a longstanding conjecture by Efron. Applications to sums of general centered uniformly bounded independent random variables and to the Student test are
presented.
Keywords: probability inequalities; large deviations; Rademacher random variables; sums of
independent random variables; Student’s test; self-normalized sums; Esscher–Cramér tilt transform; generalized moments; Tchebycheff–Markov systems.
AMS MSC 2010: Primary 60E15, Secondary 60F10; 62G10; 62G15; 60G50; 62G35.
Submitted to EJP on September 30, 2011, final version accepted on May 15, 2012.
1
Introduction, summary, and discussion
Let ε1 , . . . , εn be independent Rademacher random variables (r.v.’s), so that P(εi =
1) = P(εi = −1) = 12 for all i. Let a1 , . . . , an be any real numbers such that
a21 + · · · + a2n = 1.
(1.1)
Let
Sn := a1 ε1 + · · · + an εn
be the corresponding normalized Rademacher sum. Let Z denote a standard normal
2
r.v., with the density function ϕ, so that ϕ(x) = √1 e−x /2 for all real x.
2π
Upper bounds on the tail probabilities P(Sn > x) have been of interest in combinatorics/optimization/operations research; see e.g. [32, 2, 16, 17, 3, 26] and bibliography
therein. Other authors, including Bennett [4], Hoeffding [30], and Efron [22], were
mainly interested in applications in statistics. The present paper too was motivated in
part by statistical applications in [62].
∗ Supported in part by NSF grant DMS-0805946 and NSA grant H98230-12-1-0237.
† Michigan Technological University, USA. E-mail: ipinelis@mtu.edu
Gaussian-Rademacher bound
A particular case of a well-known result by Hoeffding [30] is the inequality
P(Sn > x) 6 e−x
2
/2
(1.2)
for all x > 0. Obviously related to this is Khinchin’s inequality — see e.g. survey [53];
for other developments, including more recent ones, see e.g. [43, 37, 52, 90]. Papers
[65, 73] contain multidimensional analogues of an exact version of Khinchin’s inequality,
whereas [72] presents their extensions to multi-affine forms in ε1 , . . . , εn (also known
as Rademacher chaoses) with values in a vector space. Latała [42] gave bounds on
moments and tails of Gaussian chaoses; Berry–Esseen-type bounds for general chaoses
were recently obtained by Mossel, O’Donnell, and Oleszkiewicz [49]. For other kinds
of improvements/generalizations of the inequality (1.2) see the recent paper [1] and
bibliography there.
While easy to state and prove, bound (1.2) is, as noted by Efron [22], “not sharp
enough to be useful in practice”. Exponential inequalities such as (1.2) are obtained by
finding a suitable upper bound (say E(t)) on the exponential moments E etSn and then
minimizing the Markov bound e−tx E(t) on P(Sn > x) in t > 0. The best exponential
bound of this kind on the standard normal tail probability P(Z > x) is inf t>0 e−tx E etZ =
e−x
2
, for any x > 0. Thus, a factor of the order of magnitude of x1 is “missing" in this
bound, compared with the asymptotics P(Z > x) ∼ x1 ϕ(x) as x → ∞; cf. the result by
Talagrand [84]. Now it should be clear that any exponential upper bound on the tail
probabilities for sums of independent random variables must be missing the x1 factor.
The problem here is that the class of exponential moment functions is too small.
Eaton [19] obtained the moment comparison E f (Sn ) 6 E f (Z) for a much richer class
of moment functions f , which enabled him [20] to derive an upper bound on P(Sn > x),
which is asymptotic to c3 P(Z > x) as x → ∞, where
/2
c3 :=
2e3
9
= 4.4634 . . . .
Eaton further conjectured that P(Sn > x) 6 c3 ϕ(x)/x for x >
this conjecture,
√
P(Sn > x) 6 c P(Z > x)
2. The stronger form of
(1.3)
for all x ∈ R with c = c3 was proved by Pinelis [65], along with a multidimensional
extension, which generalized results of Eaton and Efron [18]. Various generalizations
and improvements of inequality (1.3) as well as related results were given by Pinelis
[66, 67, 70, 74, 76, 77, 79, 57] and Bentkus [6, 7, 9].
Clearly, as pointed out e.g. in [10], the constant c in (1.3) cannot be less than
P
c∗ :=
√1 (ε1
2
+ ε2 ) >
√
P(Z > 2)
√ 2
= 3.1786 . . . ,
(1.4)
which may be compared with c3 . Bobkov, Götze and Houdré (BGH) [11] gave a simple
proof of (1.3) with a constant factor c ≈ 12.01. Their method was based on the ChapmanKolmogorov identity for the Markov chain (Sn ). Such an identity was used, e.g., in
[68] concerning a conjecture by Graversen and Peškir [24] on maxk6n |Sk |. Pinelis [78]
showed that a modification of the BGH method can be used to obtain inequality (1.3)
with a constant factor c ≈ 1.01 c∗ ≈ 3.22. Bentkus and Dzindzalieta [8] recently closed
the gap by proving that c∗ is indeed the best possible constant factor c in (1.3); they
used the Chapman-Kolmogorov identity together with the Berry-Esseen bound and a
new extension of the Markov inequality. Bentkus and Dzindzalieta [8] also obtained the
inequality
p
√
P(Sn > x) 6 41 + 81 1 − 2 − 2/x2
for x ∈ (1, 2 ],
(1.5)
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 2/22
Gaussian-Rademacher bound
5
.
whereas Holzman and Kleitman [32] proved that P(Sn > 1) 6 16
We should also like to mention another kind of result, due to Montgomery-Smith
[48], who obtained an upper bound on ln P(Sn > x) and a matching lower bound on
ln P(Sn > Cx) for some absolute constant C > 0; these bounds depend on x > 0 and
on the sequence (a1 , . . . , an ) and differ from each other by no more than an absolute
constant factor; the constants were improved by Hitczenko and Kwapien [27]. As was
pointed out by the referee, whereas the normal-tail-like bounds obtained in the present
paper and its predecessors including [30, 20, 65, 78] will usually work better when the
ai ’s are fairly balanced, bounds such as the ones obtained in [48] can be advantageous
otherwise, when the ai ’s significantly differ in magnitude from one another. Indeed,
the bounds given in [48] are expressed in terms of an interpolation norm of (a1 , . . . , an ),
which is equivalent (up to a universal constant factor) to an expression based on splitting the ai ’s into two groups according to the absolute values of the ai ’s. The result
of [48] was extended to sums of general independent zero-mean r.v.’s in [29], and the
latter work was also motivated in part by that of Latała [41]. The proof in [48] was in
part based on an extension of the improvement of Hoffmann-Jørgensen’s inequality [31]
found by Klass and Nowicki [34]. More recent developments in this direction are given
in [35, 36].
In the mentioned paper [22], Efron conjectured that there exists an upper bound on
the tail probability P(Sn > x) which behaves as the corresponding standard normal tail
P(Z > x), and he presented certain facts in favor of this conjecture. Efron’s conjecture
suggests that even the best possible constant factor c = c∗ = 3.17 . . . in (1.3) is excessive
for large x; rather, for such x the ratio of a good bound on P(Sn > x) to P(Z > x) should
be close to 1. Theorem 1.1 below provides such a bound, of simple and explicit form.
Another well-known conjecture, apparently due to Edelman [80, 21], is that
P(Sn > x) 6 supn>1 P
√1
n
(ε1 + · · · + εn ) > x
(1.6)
for all x > 0; that is, the conjecture is that the supremum of P(Sn > x) over all finite sequences (a1 , . . . , an ) satisfying condition (1.1) is the same as that over all such
(a1 , . . . , an ) with equal ai ’s; cf. the above discussion concerning the result by MontgomerySmith [48] vs. normal-tail-like bounds. Conjecture (1.6) was recently disproved; see
[92, 59].
Another two known and interesting conjectures are that P(Sn > 1) 6 41 [32, 2, 26]
7
and that P(Sn > 1) > 64
[13, 28, 51, 89].
The main result of the present paper is
Theorem 1.1. For all real x > 0
P(Sn > x) 6 Q(x) := P(Z > x) +
where
Cϕ(x)
C
<
P(Z
>
x)
1
+
,
9 + x2
x
(1.7)
√
C := 5 2πe P(|Z| < 1) = 14.10 . . . .
(1.8)
Remark 1.2. The constant factor C is the best possible in the sense that the first
inequality in (1.7) turns into the equality when x = n = 1. It would be of interest to
find the optimal value of C if the constant 9 in the denominator in (1.7) is replaced by
a significantly smaller positive value, say c. Then it could be possible to replace the
1
constant C by a smaller value. At that, the factor c+x
2 would be decreasing faster than
1
9+x2 ,
∂
1 2
especially when x > 0 is not too large – since the “rate” ∂x
ln c+x
= c/x+x
is
2
greater for smaller c > 0. However, such a quest appears to entail further significant
technical complications. Also, it is an open (and apparently very difficult) problem
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 3/22
Gaussian-Rademacher bound
Cϕ(x)
whether the asymptotic rate of decrease of the “extra” term 9+x2 as x → ∞ is the best
possible one. Such questions appear to be related to the open problems stated at the
end of [59]. It is hoped that these matters will be addressed in subsequent studies.
Using e.g. part (II) of Proposition 3.1 (in Section 3 of this paper), it is easy to see
that the ratio of the bound Q(x) in (1.7) to P(Z > x) increases from ≈ 2.25 to ≈ 3.61
and then decreases to 1 as x increases from 0 to ≈ 2.46 to ∞, respectively. Figure 1
presents a graphical comparison of this ratio, Q(x)/ P(Z > x), with
(i) the best possible constant factor c = c∗ ≈ 3.18 in (1.3);
(ii) the level 1, which is asymptotic (as x → ∞) to the ratio of either one of the two
bounds in (1.7) to P(Z > x), and hence, by the central limit theorem, is also asymptotic to the ratio of the supremum of P(Sn > x) (over all normalized Rademacher
sums Sn ) to P(Z > x);
(iii) the ratio of Hoeffding’s bound e−x
2
/2
to P(Z > x).
In Figure 1, the graph of the latter ratio looks like a steep straight line (and asymptotically, for large x, is a straight line), most of which is outside the vertical range of
the picture, thus showing how much the bounds c∗ P(Z > x) and Q(x) improve the
2
Hoeffding bound e−x /2 .
ratios
8
6
4
2
x
0
5
10
15
20
25
30
35
2
Figure 1: Ratio Q(x)/ P(Z > x) (thick solid) compared with the ratio e−x /2 / P(Z > x)
(solid, steeply upwards), as well as with the levels 1 (dashed) and c∗ ≈ 3.18 (dotted)
In view of the main result of Bentkus [5], one immediately obtains the following
corollary of Theorem 1.1.
Corollary 1.3. Let X, X1 , . . . , Xn be independent identically distributed r.v.’s such that
P(|X| 6 1) = 1 and E X = 0. Then
P
X + · · · + X
1
n
√
> x 6 2Q̂n (x)
n
for all real x > 0, where Q̂n is the linear interpolation of the restriction of the function
Q to the set √2n ( n2 − b n2 c + Z).
Here we shall present just one more application of Theorem 1.1, to the self-normalized sums
X1 + · · · + Xn
Vn := p 2
,
X1 + · · · + Xn2
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 4/22
Gaussian-Rademacher bound
where, following Efron [22], we assume that the Xi ’s satisfy the so-called orthant symmetry condition: the joint distribution of s1 X1 , . . . , sn Xn is the same for any choice of
signs s1 , . . . , sn ∈ {1, −1}, so that, in particular, each Xi is symmetrically distributed.
It suffices that the Xi ’s be independent and symmetrically (but not necessarily identically) distributed. In particular, Vn = Sn if Xi = ai εi for all i. It was noted by Efron
that (i) Student’s
statistic Tn is a monotonic function of the so-called self-normalized
q
p
Vn / 1 − Vn2 /n and (ii) the orthant symmetry implies in general that
the distribution of Vn is a mixture of the distributions of normalized Rademacher sums
Sn . Thus, one obtains
sum: Tn =
n−1
n
Corollary 1.4. Theorem 1.1 holds with Vn in place of Sn .
Note that many of the most significant advances concerning self-normalized sums
are rather recent; e.g., a necessary and sufficient condition for their asymptotic normality was obtained only in 1997 by Giné, Götze, and Mason [23].
It appears natural to compare the probability inequalities given in Theorem 1.1 with
limit theorems for large deviation probabilities. Most of such theorems, referred to as
large deviation principles (LDP’s), deal with logarithmic asymptotics, that is, asymptotics of the logarithm of small probabilities; see e.g. [15]. As far as the logarithmic
asymptotics is concerned, the mentioned
the Hoeffd bounds c∗P(Z > x) and Q(x) and
2
2
ing bound e−x /2 are all the same: ln c∗ P(Z > x) ∼ ln Q(x) ∼ ln e−x /2 = −x2 /2 as
x → ∞; yet, as we have seen, at least the first two of these bounds are vastly different
from the Hoeffding bound, especially from the perspective of statistical practice. Results on the so-called exact asymptotics for large deviations (that is, asymptotics for the
small probabilities themselves, rather than for their logarithms) are much fewer; see
e.g. [15, Theorem 3.7.4] and [54, Ch. VIII]. Note that the inequalities in (1.7) hold for
all x > 0, and, a priori, the summands ai εi do not have to be identically or nearly identically distributed; cf. conjecture (1.6). In contrast, almost all limit theorems for large
deviations in the literature – whether with exact or logarithmic asymptotics – hold only
√
for x = O( n ), with n being the number of identically or quasi-identically distributed
(usually independent or nearly independent) random summands; the few exceptions
here include results of the papers [50, 63, 64, 69, 91] and references therein, where the
√
restriction x = O( n ) is not imposed and x is allowed to be arbitrarily large. In general, observe that a limit theorem is a statement on the existence of an inequality, not
yet fully specified, as e.g. in “there exists some n0 such that |xn − x| < ε for all n > n0 ”;
as such, a limit theorem cannot provide a specific bound. Of course, being less specific,
limit theorems are applicable to objects of much greater variety and complexity, and
limit theorems usually provide valuable initial insight. Yet, it seems natural to suppose
that the tendency, say in the studies of large deviation probabilities, will be to proceed
from logarithmic asymptotics to asymptotics of the probabilities themselves and then
on to exact inequalities. We appear to be largely at the beginning of this process, still
struggling even with such comparatively simple objects as the Rademacher sums – the
simplicity of which is only comparative, as the discussion around Figure 1 in [78] suggests. However, there have already been a number of big strides made in this direction.
For instance, Boucheron, Bousquet, Lugosi, and Massart [12] obtained explicit bounds
on moments of general functions of independent r.v.’s; their approach was based on
a generalization of Ledoux’s entropy method [44, 45], using at that a generalized tensorization inequality due to Latała and Oleszkiewicz [40]. Another, more recent example
demonstrating the same tendency is the work by van de Geer [88]. Even more recently,
Tropp [86] provided noncommutative generalizations of the Bennett, Bernstein, Chernoff, and Hoeffding bounds – even with explicit and optimal constants; as pointed out in
[86], “[a]symptotic theory is less relevant in practice”. Yet, as stated above, in the case
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 5/22
Gaussian-Rademacher bound
of Rademacher sums and other related cases significantly more precise bounds can be
obtained.
2
Proof of Theorem 1.1: outline
Let us begin the proof with several introductory remarks.
There are many symbols used in the proof. Therefore, let us assume a localization principle for notations: any notations introduced in a section or in a proof of a
lemma/sublemma supersede those introduced in preceding sections or proofs. For example, the meaning of the Xi ’s introduced later in this section differs from that in
Section 1.
Without loss of generality (w.l.o.g.), assume that
0 6 a1 6 . . . 6 an =: a,
(2.1)
so that a = maxi ai . Introduce the numbers
ui := ui,x := xai ,
whence for all x > 0
0 6 u1 6 . . . 6 un = xa.
(2.2)
The proof of Theorem 1.1 is to a large extent based on a careful analysis of the
Esscher exponential tilt transform of the r.v. Sn . In introducing and using this transform, Esscher and then Cramér were motivated by applications in actuarial science.
Closely related to the Esscher transform is the saddle-point approximation; for a recent development in this area, see [61]. The Esscher tilt has been used extensively in
limit theorems for large deviation probabilities, but much less commonly concerning
explicit probability inequalities – two rather different in character cases of the latter
kind are represented by Raič [81] and Pinelis and Molzon [62]. One may also note that,
in deriving LDP’s, the exponential tilt is usually employed to get a lower bound on the
probability; in contrast, in this paper the tilt is used to obtain the upper bound. One
may also note that, whereas in [78, 8] the main difficulty was to deal with moderate
values values of x, in the present paper both the moderate and large values of x present
significant problems; in a sense, here the consideration depends not just on the value
of x itself but, to a greater extent, on the value of the product xa.
The main idea of the proof is to reduce the problem from that on the vector (a1 , . . . , an )
of an unbounded dimension n to a set of low-dimensional extremal problems. The
first step here is to use exponential tilting to obtain upper bounds on P(Sn > x) in
R
P
2
term of sums of the form
g̃ dν , where
i g(ui ), which can then be represented as x
g̃(u) := g(u)/u2 (for u 6= 0),
ν :=
1 X 2
u δu ,
x2 i i i
(2.3)
and δt denotes the Dirac probability measure at point t, so that ν is a probability measure on the interval [0, xa]. This step turns the initial finite-dimensional problem into
an infinite-dimensional one, involving the measure ν . However, then the well-known
Carathéodory principle allows one to reduce the dimension to (at most) k − 1, where k
is the total number of the integrals (with the respect to the measure ν ) involved in the
extremal problem in hand; see e.g. [58] for recent developments in this direction, and
references therein. The above ideas were carried out in the first version of this paper
— see [55].
Later, I realized that the systems of integrands one has to deal with in the proof of
Theorem 1.1 possess the so-called Tchebycheff and, even, Markov properties; therefore,
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 6/22
Gaussian-Rademacher bound
one can reduce the dimension even further, to about k/2, which allows for more effective
analyses. It should also be noted that the verification of the Markov property of a finite
sequence of functions largely reduces to checking the positivity of several functions
of only one variable. Major expositions of the theory of Tchebycheff–Markov systems
and its applications are given in the monographs by Karlin and Studden [33] and Kreı̆n
and Nudel0man [38]; closely related to this theory are certain results in real algebraic
geometry, whereby polynomials are “certified” to be positive on a semialgebraic domain
by means of an explicit representation, say in terms of sums of squares of polynomials;
see e.g. [39, 47]. A brief review of the Tchebycheff and Markov systems of functions,
which contains all the definitions and facts necessary for the applications in the present
paper, is given in [60]. For the readers’ convenience, we shall present here a condensed
version of [60] — in Appendix A at the end of this paper.
Even after the just described reductions in dimension, the proof of Theorem 1.1
entails extensive (even if rather routine) calculations, especially symbolic ones.
In this section, a number of lemmas will be stated, from which Theorem 1.1 will
easily follow. Most of these lemmas will be proved in Section 3 – with the exception of
Lemmas 2.3 and 2.7, whose proofs are more complicated and will each be presented in
a separate section. Each of these two more complicated lemmas is based on a number of
sublemmas – which are stated in the corresponding section and used there to prove the
lemma. Each of these sublemmas (except for Sublemma 4.1) is a technical statement
about one or several smooth functions of one real variable and is proved using the
Mathematica implementation of the Tarski algorithm [85, 46, 14]; the proofs of these
sublemmas can be found in [56]. It should be quite clear that all such calculations done
with an aid of a computer are no less reliable or rigorous than similar, or even less
involved, calculations done by hand.
*****
For all i = 1, . . . , n, let
Xi := ai εi .
Next, let X̃1 , . . . , X̃n be any r.v.’s such that
E g(X̃1 , . . . , X̃n ) =
E exSn g(X1 , . . . , Xn )
E exSn
(2.4)
for all Borel-measurable functions g : Rn → R. Equivalently, one may require condition
(2.4) only for Borel-measurable indicator functions g ; clearly, such r.v.’s X̃i do exist. It
is also clear that the r.v.’s
X̃i are independent. Moreover, for each i the distribution of
X̃i is eui δai + e−ui δ−ai /(eui + e−ui ).
Formula (2.4) presents the mentioned Esscher exponential tilt transform, with the
tilting parameter (TP) the same as the x in (1.7); that is, we choose the TP to be the
2
minimizer of e−tx E etZ = e−tx+t /2 in t > 0 — rather than the minimizer of e−tx E etSn ,
which latter is usually taken as the TP in limit theorems for large deviations and can
thus be expressed only via an implicit function. Our choice of the TP appears to simplify
the proof greatly.
In terms of the tilted r.v.’s X̃1 , . . . , X̃n , introduce now
mx :=
X
i
1 X
E X̃i =
ui th ui ,
x i
Lx :=
sx :=
sX
i
1
Var X̃i =
x
1 X
E |X̃i − E X̃i |3 ,
s3x i
EJP 17 (2012), paper 35.
s
X
i
u2i
,
ch2 ui
(2.5)
(2.6)
ejp.ejpecp.org
Page 7/22
Gaussian-Rademacher bound
where ch := cosh, sh := sinh, th := tanh, and arcch := arccosh assuming that arcch z > 0
for all z ∈ [1, ∞); thus, for each z ∈ [1, ∞), arcch z is the unique solution y > 0 to the
equation ch y = z . Let F n and Φ denote, respectively, the tail function of X̃1 + · · · + X̃n
and the standard normal tail function, so that
F n (z) = P(X̃1 + · · · + X̃n > z) and Φ(z) = P(Z > z)
for all real z . Also, let cBE denote the least possible constant in the Berry-Esseen inequality
z − m x sup F n (z) − Φ
6 cBE Lx ;
s
x
z∈R
(2.7)
by Shevtsova [83],
cBE 6
56
100 ;
a slightly worse bound, cBE 6 0.5606, is due to Tyurin [87].
Lemma 2.1. For all x > 0
P(Sn > x) 6 N (x) + 2cBE B(x),
(2.8)
where
x − m
o
x2 s2x
x
− xmx + ln Φ
+ xsx ,
2
sx
i
n
o
X
B(x) := Lx exp − x2 +
ln ch ui .
N (x) := exp
nX
ln ch ui +
(2.9)
(2.10)
i
Lemma 2.1 carries out much of the first step in the proof of Theorem 1.1, as mentioned before: using exponential tilting to reduce the original problem, on the vector
P
(a1 , . . . , an ) of an unbounded dimension n, to one involving sums of the form i g(ui ) —
recall here the expressions of mx and sx in (2.5) in terms of such sums. At this point,
P
only the factor Lx in (2.10) remains to be bound in terms a sum of the form
i g(ui ),
which will be done later, in Sublemma 4.1.
Next, introduce the ratio
ϕ(x)
,
xΦ(x)
r(x) :=
(2.11)
which is the inverse Mills ratio at x divided by x. By [71, Proposition 1.2], r is strictly
and continuously decreasing from ∞ to 1 on the interval (0, ∞), so that there is a unique
root x3/2 ∈ (0, ∞) of the equation
r(x3/2 ) = 3/2;
at that,
x3/2 = 1.03 . . .
and
1 < r(x) 6
3
2
for x > x3/2 .
(2.12)
51
125
= 0.408
(2.13)
Cϕ(x)
9 + x2
(2.14)
Introduce also
u∗ :=
and
h(x) :=
(cf. (1.7)).
The next two lemmas provide upper bounds on the terms N (x) and 2cBE B(x) in (2.8)
— for x large enough; also, in Lemma 2.3, un = maxi xai is assumed to be small enough.
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 8/22
Gaussian-Rademacher bound
Lemma 2.2. If x > x3/2 then N (x) 6 Φ(x).
13
10
Lemma 2.3. If x >
and un 6 u∗ , then 2cBE B(x) 6 h(x).
The proofs of the above two lemmas are comparatively difficult, especially the latter
one, of Lemma 2.3, which will take entire Section 4. It is in these two proofs that
we use methods of extremal problems (including special tools for Tchebycheff–Markov
systems) for measures in a given moment set — to carry out the mentioned reduction
from an infinite-dimensional problem to finite, in fact low, dimensions.
In contrast with Lemmas 2.2 and 2.3, the following lemma is easy and to be used
just as a quick reference concerning the second inequality in (1.7).
Lemma 2.4. If x > 0 then h(x) < CΦ(x)/x.
Next is another easy lemma, which serves as the induction basis (with n = 1) in the
proof of Theorem 1.1 below; it is also used in the proof of Lemma 2.8.
Lemma 2.5. If x > 0 then P(ε1 > x) 6 Φ(x) + h(x).
Now we shall address a case not covered by Lemma 2.3: when un is not small enough
(and x is still large enough). For this case we adopt an approach, which is based on the
Chapman–Kolmogorov identity (2.16) and similar to methods used e.g. in [68, Proof of
Proposition 2], [11, Proof of Theorem 4.2], and [78, Proof of Theorem 2]. Clearly, this
method is quite different from the combination of the methods of exponential tilting and
solving extremal problems for moment sets used — for the small enough values of un —
in the proofs of Lemmas 2.1–2.3. Consider
x−a
U := Ux,a := √
1 − a2
and
x+a
V := Vx,a := √
,
1 − a2
with a as in (2.1). The following two lemmas provide information about behavior of the
Cϕ(x)
two respective terms, Φ(x) and h(x) = 9+x2 , in the bound Q(x) on P(Sn > x) in (1.7).
This information will be used to carry the induction step in proof of Theorem 1.1.
Lemma 2.6. If x >
√
3 then 12 Φ(U ) + 12 Φ(V ) 6 Φ(x).
Lemma 2.6 was proved in [11]; cf. also [78, Lemma 5].
Lemma 2.7. If x >
(2.2), a = un /x.
15
10
and un > u∗ , then
1
2 h(U )
+ 12 h(V ) 6 h(x); recall here that, by
Thus, Lemmas 2.6 and 2.7 taken together close the gap that was left open in Lemma 2.3
because of the restriction un 6 u∗ there. Still, in Lemmas 2.2, 2.3, 2.6, and 2.7 the value
of x was assumed to be large enough. This remaining gap is closed by
Lemma 2.8. For all x ∈ (0,
√
3]
P(Sn > x) 6 Φ(x) + h(x).
(2.15)
A key point in the proof of Lemma 2.8 is using inequality (1.3) with c = 3.22, as
provided by [78]; we could have used instead the main result of [8], with c = c∗ =
3.17 . . ., but c = 3.22 is enough for our purposes here.
Based on the above lemmas, we can now present
Proof of Theorem 1.1. By definition (2.14) and Lemma 2.4, it is enough to prove inequality (2.15) for all x > 0. This can be done by induction on n. Indeed, for n = 1 this
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 9/22
Gaussian-Rademacher bound
is Lemma 2.5. Assume now
√ that n > 2. In view of Lemma 2.8, it is enough to prove inequality (2.15) for all x > 3 . At that, in view of Lemmas 2.1, 2.2, and 2.3, it is enough
to consider the case un > u∗ . To do that, write
1
2
P(S̃n−1 > U ) + 12 P(S̃n−1 > V ),
(2.16)
√
:= b1 ε1 +· · ·+bn−1 εn−1 , with bi := ai / 1 − a2 . It remains to use the induction
P(Sn > x) =
where S̃n−1
hypothesis together with Lemmas 2.6 and 2.7.
3
Proofs of Lemmas 2.1, 2.2, 2.4, 2.5, and 2.8
Proof of Lemma 2.1. Reading equation (2.4) with g(X1 , . . . , Xn )
=
e−xSn
Q
xSn
× I{Sn > x} right-to-left, recalling (2.7), and observing that E e
= i ch ui , one
has
P(Sn > x)
=−
E exSn
Z
e−xy dF n (y) =
[x,∞)
Z
∞
xe−xy F n (x) − F n (y) dy 6 N1 (x) + B1 (x),
x
where
∞
y − m i
h x − m x
x
N1 (x) :=
xe−xy Φ
−Φ
dy
sx
sx
Z x∞
y − m dy
N (x)
x
=
=
e−xy ϕ
s
s
E
exSn
x
x
x
Z
and
Z
∞
B1 (x) := 2cBE Lx
2
xe−xy dy = 2cBE Lx e−x =
x
2cBE B(x)
.
E exSn
Thus, (2.8) follows.
Now and later in the paper, we need the following special l’Hospital-type rule for
monotonicity.
Proposition 3.1. ([75, Propositions 4.1 and 4.3]) Let −∞ 6 a < b 6 ∞. Let f and g
be differentiable functions defined on the interval (a, b). It is assumed that g and g 0 do
not take on the zero value and do not change their respective signs on (a, b).
(I) If f (a+) = g(a+) = 0 or f (b−) = g(b−) = 0, and if the ratio f 0 /g 0 is strictly increasing/decreasing on (a, b), then (respectively) (f /g)0 is strictly positive/negative and
hence the ratio f /g is strictly increasing/decreasing on (a, b).
(II) If f (b−) = g(b−) = 0 and if the ratio f 0 /g 0 switches its monotonicity pattern at
most once on (a, b) — only from increase to decrease, then the ratio f /g does so.
Proof of Lemma 2.2. Let us begin this proof by using the well-known fact that the tail
function Φ is log-concave. This fact is contained e.g. in [25, 67]. Alternatively, it can be
easily obtained using part (I) of Proposition 3.1, since (ln Φ)0 = − ϕ . So, one can write
Φ
ln Φ(y) 6 ln Φ(x) + (ln Φ)0 (x)(y − x) = ln Φ(x) − xr(x)(y − x),
with y =
x−mx
sx
+ xsx (cf. (2.9)) and r(x) defined by (2.11). Therefore and in view of (2.5),
Z xa h
1
N (x)
f (u) i
ln
6
Ẽ(r,
ν)
:=
e(u)
+
r
·
1
−
ν( du)
x2
sx
Φ(x)
0
(recall (2.1)),
e(u) :=
1
th u
ln ch u
+
−
u2
u
2 ch2 u
and
f (u) := 1 −
EJP 17 (2012), paper 35.
th u
1
+ 2
u
ch u
ejp.ejpecp.org
Page 10/22
Gaussian-Rademacher bound
for u 6= 0, e(0) := 0 and f (0) := 1, and r := r(x). Note that the probability measure ν on
the interval [0, xa] defined by (2.3) satisfies the restriction
Z
xa
b dν = s2x ,
where
b(u) :=
0
1
.
ch2 u
(3.1)
Recalling now (2.12), we see that to prove Lemma 2.2 we only need to show that
Ẽ(r, ν) 6 0 for all such probability measures ν and all r ∈ [1, 32 ]; in fact, since Ẽ(r, ν)
is affine in r , it suffices to consider only r ∈ {1, 23 }.
Using Proposition A.3 in Appendix A and the Mathematica command Reduce, one
can verify that each of the two systems (1, −b, f − e) and (1, −b, f ) is an M+ -system on
any interval [c, d] ⊂ [0, ∞); as mentioned earlier, this verification reduces to checking the
positivity of several (Wronskian) functions of only one variable; for the system (1, −b, f −
e), this takes about 20 sec on a standard laptop, and about 1 sec for the system (1, −b, f ).
Since sx ∈ (0, 1] and r > 1, the integrand in the integral expression of Ẽ(r, ν) can be
rewritten as g := r − θ1 (f − θe) with θ := srx ∈ (0, 1], and so, (1, −b, −g) is an M+ -system
on [0, xa], for any r > 1 and any value of sx . Hence, by Proposition A.4 in Appendix A,
R xa
the minimum of 0 (−g) dν , and thus the maximum of Ẽ(r, ν), over all the probability
R xa
measures ν on [0, xa] satisfying the restriction 0 b dν = s2x is attained when the support
of ν is a singleton subset (say {u}) of [0, xa]. For this u, one has sx = 1/ ch u, and it now
suffices to show that g(u) = e(u) + r · 1 − f (u) ch u 6 0 for r ∈ {1, 32 } and u ∈ [0, ∞);
using again the Mathematica command Reduce, it takes about 2 sec to check this in
each of the two cases, r = 1 and r = 23 .
Proof of Lemma 2.4. Using part (I) of Proposition 3.1, one can see that the ratio
xh(x)
Φ(x)
is increasing in x > 0, from 0 to C . Now the result follows.
Proof of Lemma 2.5. Observe that the definition (1.8) of C is equivalent to the condition
Φ(1) + h(1) = 21 (cf. Remark 1.2). Hence and because Φ + h is decreasing on (0, ∞), one
has P(ε1 > x) = 21 = Φ(1) + h(1) 6 Φ(x) + h(x) for all x ∈ (0, 1]. For x > 1, one obviously
has P(ε1 > x) = 0 < Φ(x) + h(x).
Proof of Lemma 2.8. By the symmetry, Chebyshev’s inequality, and the main result of
[78],
√
13
P(Sn > x) 6 12 I{0 < x 6 1} + 2x1 2 I{1 < x 6 13
}
+
3.22Φ(x)
I{
<
x
6
3}
10
10
√
for all x ∈ (0, 3 ]. In particular, for all x ∈ (0, 1] one has P(Sn > x) 6 12 = P(ε1 > x) 6
Φ(x) + h(x), by Lemma 2.5.
2
Next, let us prove (2.15) for x ∈ (1, 13
10 ]. Write x Φ(x) = Φ(x)/p(x), where p(x) :=
0
0
1/x2 . Note that Φ(∞−) = p(∞−) = 0 and Φ (x)/p0 (x) = x3 ϕ(x)/2, so that Φ /p0 switches
its monotonicity pattern exactly once on (0, ∞), from increase to decrease. Hence, by
part (II) of Proposition 3.1, x2 Φ(x) = Φ(x)/p(x) switches its monotonicity pattern at
most once, and at that necessarily from increase to decrease, as x increases from 1
13
to 10
. So, the minimum of x2 Φ(x) over x ∈ [1, 13
10 ] is attained at one of the end points
of the interval [1, 13
]
;
in
fact,
the
minimum
is
at
x = 1. It is also easy to see that the
10
13
2
minimum of x h(x) over x ∈ [1, 10 ] is attained at x = 1as well. Thus, P(Sn > x) 6 2x1 2 =
1
1
13
Φ(x) + h(x) 6
Φ(x) + h(x) = Φ(x) + h(x) for x ∈ (1, 10
].
2x2 Φ(x)+h(x)
2 Φ(1)+h(1)
√
13
The case x ∈ ( 13
10 , 3 ] is similar to the just considered case x ∈ (1, 10 ]. Here,
using part (II) of Proposition 3.1 again, one can see that h/Φ switches,
√ just once,
13
from increase
to
decrease
on
(0,
∞)
;
in
particular,
h/Φ
increases
on
(
,
3 ], because
10
√
(h/Φ )0 ( 3) = 0.29 . . . > 0. So, to complete the proof of Lemma 2.8, it is enough to check
13
13
that 3.22Φ( 13
10 ) 6 Φ( 10 ) + h( 10 ), which is true.
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 11/22
Gaussian-Rademacher bound
4
Proof of Lemma 2.3
As was stated earlier, proofs of all sublemmas in this paper (except for Sublemma 4.1
below) can be found in [56].
We shall need the following tight upper bound on the Lyapunov ratio Lx , defined by
(2.6):
Sublemma 4.1. One has
Lx 6
1 X 3
u (1 + th2 ui ) ch ui .
x3 i i
(4.1)
The proof of Sublemma 4.1 will be given at the end of this section.
By Sublemma 4.1 and the definition (2.10) of B(x),
2
B(x) 6 x1 e−x
where
+J˜
,
(4.2)
Z
˜ ν) := x2
J˜ := J(x,
Z
` dν + ln
k(u) := u(1 + th2 u) ch u,
`(u) :=
k dν,
ln ch u
u2
for u 6= 0,
1
2,
`(0) :=
and ν is the probability measure on the interval [0, u∗ ] defined by (2.3), so
that ν satisfies the restriction (3.1). To obtain the upper bound h(x) on 2cBE B(x) as
˜ ν) over all such probability measures ν .
stated in Lemma 2.3, we shall maximize J(x,
R
R
R
To do so, let us first maximize k dν given values of the integrals 1 dν(= 1), b dν(=
R
s2x , as in (3.1)), and ` dν .
Noting that (ln ch)00 = th0 = sech2 and applying (twice) the special lHospital-type rule
for monotonicity given by part (I) of Proposition 3.1, one sees that
`0 < 0 on (0, ∞).
(4.3)
Sublemma 4.2. [56] The sequence (g0 , g1 , g2 , g3 ) := (1, −b, −`, k) is an M+ -system on
[0, u∗ ]; here one may want to recall Definition A.2 in Appendix A.
A proof of Sublemma 4.2 can be found in [56], where it is based on the statement in
[60] reproduced as Proposition A.3 in Appendix A here.
So, by Proposition A.4 (with n = 2 and m = 1 there), it suffices to consider measures
ν of the form ν = (1 − t)δu + tδu∗ for some t ∈ [0, 1] and u ∈ [0, u∗ ]. For such ν ,
˜ ν) = J(t, u) := Jx (t, u) := x2 · (1 − t)`(u) + t`(u∗ ) + ln (1 − t)k(u) + tk(u∗ ) .
J(x,
Thus, we need to maximize J(t, u) over all (t, u) ∈ [0, 1] × [0, u∗ ]; clearly, this maximum
is attained. For all (t, u) ∈ (0, 1) × [0, u∗ ),
k(u∗ ) − k(u) + τ `(u∗ ) − `(u)
∂J(t, u) (1 − t)k(u) + tk(u∗ )
=
∂t
u∗ − u
u∗ − u
= k 0 (w) + τ `0 (w),
(4.4)
∂J(t, u) (1 − t)k(u) + tk(u∗ )
= k 0 (u) + τ `0 (u),
(4.5)
∂u
1−t
where τ := x2 · (1 − t)k(u) + tk(u∗ ) and w is some number such that u < w < u∗ (whose
existence follows by the mean-value theorem). So, if the maximum of J over the set
[0, 1] × [0, u∗ ] is attained at some point (t, u) ∈ (0, 1) × (0, u∗ ), then at this point one has
k0 (w)
k0 (u)
∂J
∂J
∂t = 0 = ∂u , whence, by (4.4), (4.5), and (4.3), `0 (w) = −τ = `0 (u) while u∗ > w > u > 0,
which contradicts
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 12/22
Gaussian-Rademacher bound
0
Sublemma 4.3. [56] The function ρ := k`0 is strictly increasing on the interval [0, u∗ ]
(by continuity, we let ρ(0) := ρ(0+) = −∞).
Also, no maximum of J is attained at any point (t, u) ∈ (0, 1) × {0}, because at any
such point the right-hand side of (4.5) is k 0 (0) + τ `0 (0) = 1 + τ · 0 > 0, whereas the
left-hand side of (4.5) must be 6 0. Thus, the maximum can be attained at some point
(t, u) ∈ [0, 1]×[0, u∗ ] only if either t ∈ {0, 1} or u = u∗ . Therefore the maximizing measure
ν must be concentrated at one point, say u, of the interval [0, u∗ ]. Together with (4.2),
this shows that
B(x) 6
1 −x2 +J0 (x,u)
e
,
u∈[0,u∗ ] x
sup
where
J0 (x, u) := Jx (0, u) = x2 · `(u) + ln k(u).
So, Lemma 2.3 reduces now to the following statement:
Λ(x, u) := J0 (x, u) −
(?)
x2
− ln x + ln(9 + x2 ) − K 6 0
2
(4.6)
for all (x, u) ∈ [ 13
10 , ∞) × [0, u∗ ], where
C
K := ln √
.
2 2π cBE
Thus, one may want to maximize Λ in u ∈ [0, u∗ ]. Towards that end, observe that for all
u>0
1
∂Λ
= γ(u) − x2 ,
0
−` (u) ∂u
where
k0
1
= −ρ ;
k`0
k
so, the partial derivative of Λ in u > 0 equals γ(u) − x2 in sign. On the other hand, the
γ := −
function k1 is positive and strictly decreasing and, in view of Sublemma 4.3, the function
(−ρ) is so as well (on the interval [0, u∗ ]). It follows that the function γ too is positive
and strictly decreasing on (0, u∗ ]; at that, γ(0+) = ∞.
Introduce now
p
x∗ := γ(u∗ ) = 7.39 . . . .
(4.7)
By the mentioned properties of the function γ , for each x ∈ (0, x∗ ] one has γ(u) > x2
for all u ∈ [0, u∗ ] and hence Λ(x, u) increases in u ∈ [0, u∗ ], so that Λ(x, u) 6 Λ(x, u∗ ) for
all u ∈ [0, u∗ ]. Since the derivative of Λ(x, u∗ ) in x is a rather simple rational function,
it is easy to see that Λ(x, u∗ ) 6 0 for all x > 13
10 . So, inequality (4.6) holds for all
13
(x, u) ∈ [ 10 , x∗ ] × [0, u∗ ].
It remains to prove (4.6) for each x ∈ [x∗ , ∞) (and all u ∈ [0, u∗ ]). For each such x,
there is a unique ux ∈ [0, u∗ ] such that γ(u) − x2 and hence ∂Λ
∂u are opposite to u − ux in
sign, and so, Λ(x, u) 6 Λ(x, ux ) for all u ∈ [0, u∗ ].
Since, by (4.3), the function ` is strictly and continuously decreasing on [0, ∞), there
is a unique inverse function `−1 : (0, 21 ] 7→ [0, ∞). Now introduce
J˜0 (x, λ) := J0 x, `−1 (λ) = x2 λ + ln k̃(λ),
where
k̃ := k ◦ `−1
and λ ∈ [`(u∗ ), `(0)] = [`(u∗ ), 21 ]. Next, observe that (ln k̃)0 = −γ◦`−1 , which is decreasing
on [`(u∗ ), 12 ], because γ and ` (and hence `−1 ) are decreasing. It follows that the function
ln k̃ is concave on [`(u∗ ), 12 ], and so, J˜0 (x, λ) is concave in λ ∈ [`(u∗ ), 12 ] – for each real x.
At this point, we need
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 13/22
Gaussian-Rademacher bound
Sublemma 4.4. [56] If u ∈ (0, u∗ ] then γ(u) >
6
u2 .
√
√
6
x
√
6
x
By (4.7) and Sublemma 4.4, if u =
6
)
√x 6
`( x )
and x > x∗ , then u ∈ (0, u∗ ] and γ(
>
x = γ(ux ), which in turn implies that
< ux , `(
> `(ux ), and (ln k̃)
<
(ln k̃)0 `(ux ) = −γ(ux ) = −x2 since γ , `, and (ln k̃) are decreasing ; so, for all λ ∈
√ J˜0
J˜0
[`(u∗ ), 12 ], ∂∂λ
x, `( x6 ) < ∂∂λ
x, `(ux ) = 0; therefore and by the concavity of J˜0 (x, λ) in
λ,
√
√
√
√
J˜0
J˜0 (x, λ) 6 J˜0 x, `( x6 ) + ∂∂λ
x, `( x6 ) λ − `( x6 ) 6 Jˆ0 (x, x6 )
2
√
0
6
x )
0
for all λ ∈ [`(u∗ ), 21 ], where
Jˆ0 (x, u) := J0 (x, u) + x2 − γ(u) `(u∗ ) − `(u) .
√
2
Thus, in view of (4.6), Lemma 2.3 reduces to the inequality Jˆ0 (x, x6 ) − x2 − ln x + ln(9 +
x2 ) − K 6 0 for
all x > x∗ , where we change the variable once again, from x to u, by the
√
formula x =
6
u .
So, Lemma 2.3 reduces to
Sublemma 4.5. [56] For all u ∈ (0, u∗ ]
Λ̃(u) := Jˆ0 (
√
6
u ,u)
−
3
u2
√
− ln
6
u
+ ln(9 +
6
u2 )
6 K.
It remains, in this section, to present
Proof of Sublemma 4.1. Observe that Lx = (xsx )−3
means exactly that
X
u3i (1 − th4 ui ) − s3x
i
X
P
i
u3i (1−th4 ui ). So, inequality (4.1)
u3i (1 + th2 ui ) ch ui =
X
i
for all ui ’s in the interval [0, u∗ ] such that
u2i g(ui ) 6 0
(4.8)
i
P
i
u2i = x2 and
u2i
i ch2 ui
P
= x2 s2x , where
1 1
3
−
s
ch
u
.
x
ch2 u ch2 u
P
P
P u2
Next, the object i u2i g(ui ) in (4.8) with the restrictions i u2i = x2 and i ch2iu = x2 s2x
i
can be rewritten as x2 E h(Y ) given E Y = s2x , where h(·) := ha (·) as in (4.9) below with
P
a = s3x and Y is a r.v. with the distribution ν := x12 i u2i δvi , with vi := ch21 ui ; note that
one always has sx ∈ (0, 1] and ν is indeed a probability measure due to the restriction
P 2
P 2
2
−2
i ui = x . So, by Subsublemma 4.6 below and Jensen’s inequality, x
i ui g(ui ) =
E h(Y ) 6 h(E Y ) = h(s2x ) = 0, which proves the inequality in (4.8) and hence that in
g(u) := u(1 − th4 u) − s3x u(1 + th2 u) ch u = u 2 −
(4.1).
Subsublemma 4.6. [56] For each a ∈ [0, 1], the function
(0, 1] 3 v 7→ ha (v) := arcch( √1v )(2 − v)(v −
√a )
v
(4.9)
is concave.
5
Proof of Lemma 2.7
This proof could be somewhat simplified using the mentioned result (1.5); however,
let us present an independent proof here, which is not much more complicated. Let
∆ := ∆(x, u) :=
√ 2π 1
C
2h
Ux,u/x + 12 h Vx,u/x − h(x) .
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 14/22
Gaussian-Rademacher bound
We have to show that ∆(x, u) 6 0 for all pairs (x, u) in the set
P := {(x, u) ∈ [ 15
10 , ∞) × [u∗ , ∞) : u < x};
the condition u < x here corresponds to the condition a = an < 1.
The idea of the proof of Lemma 2.7 is, essentially, to fix a value of x and then differentiate ∆(x, u) twice with respect to certain functions of u (which may be different
for different fixed values of x) so that the sign of the resulting generalized second partial derivative of ∆(x, u) in u be comparatively easy to determine. In other words, we
establish a generalized convexity pattern for ∆(x, u) in u.
Toward this end, introduce first the set
4
P̃ := {(x, u) ∈ [ 15
10 , ∞) × [ 10 , ∞) : u < x},
which is slightly larger than P ; recall here (2.13). Then we shall consider the mentioned
generalized first and second partial derivatives of ∆(x, u) in u:
∂∆
and
∂u
∂∆1
∆2 := ∆2 (x, u) := F2 (x, u)
,
∂u
∆1 := ∆1 (x, u) := F1 (x, u)
(5.1)
(5.2)
where F1 (x, u) and F2 (x, u) are certain expressions, to be defined soon, such that
F1 (x, u)(u − 1) > 0 and F2 (x, u) > 0 for all (x, u) ∈ P̃ with u 6= 1.
(5.3)
Moreover, we shall show that F1 (x, u) and F2 (x, u) are such that the ∆1 and ∆2 as in
(5.1) and (5.2) possess the following properties:
∆2 > 0 on P̃ ,
(5.4)
∆1 (x, x−) = − 21 < 0 for x > 0,
(5.5)
and, furthermore, one has the following sublemmas, proved in [56]:
Sublemma 5.1. [56] ∆1 (x,
4
10 )
> 0 for all x ∈ [ 15
10 , ∞)
and
15
Sublemma 5.2. [56] ∆(x, u∗ ) < 0 for all x ∈ [ 10
, ∞).
4
It will then follow by (5.2), (5.3), and (5.4) that ∆1 (x, u) increases in u ∈ [ 10
, 1) and
15
in u ∈ (1, x) for each x ∈ [ 10
, ∞), whence, by (5.5), ∆1 (x, u) < 0 for all (x, u) ∈ P̃ such
that u > 1 and, by Sublemma 5.1, ∆1 (x, u) > 0 for all (x, u) ∈ P̃ such that u < 1. Thus,
for all points (x, u) ∈ P̃ with u 6= 1 one has ∆1 (x, u)(u − 1) < 0 and hence, by (5.1) and
4
15
(5.3), ∆(x, u) decreases in u ∈ [ 10
, x) for each x ∈ [ 10
, ∞). Using now Sublemma 5.2
4
and recalling that u∗ > 10 , one concludes that ∆ < 0 on P , which yields Lemma 2.7.
It remains to present F1 and F2 such that (5.3), (5.4), and (5.5) hold indeed. Let
n u − x2 2 o
x2 − u2 p2 (x, u)2
F1 (x, u) := exp
and
2 (x2 − u2 ) (u − 1)x2 (x2 − u) p1 (x, u)
n 2ux2 o (u − 1)2 (x − u)2 (u + x)2 x2 − u2 p (x, u)2 p (x, u)3
1
3
F2 (x, u) := exp
,
x2 − u2
p2 (x, u)
where

p1 (x, u) := x2 (11 + x2 ) − (10u2 + 2ux2 ),

p2 (x, u) := x2 (9 + x2 ) − (8u2 + 2ux2 ),
p3 (x, u) := x2 (9 + x2 ) − (8u2 − 2ux2 ).
EJP 17 (2012), paper 35.
(5.6)
(5.7)
(5.8)


ejp.ejpecp.org
Page 15/22
Gaussian-Rademacher bound
Using e.g. the Mathematica command Reduce, one can see that on the set P̃ the polynomials p1 , p2 , and p3 are positive. Note also that u < x < x2 for all (x, u) ∈ P̃ . So, (5.3)
holds.
Next, with definitions (5.1), (5.2), (5.6), and (5.7) in place, it turns out that ∆2 (x, u)
is a polynomial in (x, u) (of degree 24 in x, and 14 in u). Using Reduce again, one verifies
(5.4).
Finally, it is straightforward (even if somewhat tedious) to check (5.5).
A
Tchebycheff–Markov systems
For a nonnegative integer n, let g0 , . . . , gn be (real-valued) continuous functions on
an interval [a, b] for some a and b such that −∞ < a < b < ∞. Let M denote the set of
all (nonnegative) Borel measures on [a, b]. Take any point c = (c0 , . . . , cn ) ∈ Rn+1 such
that
Z
b
n
Mc := µ ∈ M :
o
gi dµ = ci for all i ∈ 0, n 6= ∅;
(A.1)
a
here and in what follows, for any m and n in Z ∪ {∞} we let m, n := {j ∈ Z : m 6 j 6 n}.
Definition A.1. The sequence (g0 , . . . , gn ) of functions is a T -system if the restrictions
of these n+1 functions to any subset of [a, b] of cardinality n+1 are linearly independent.
If, for each k ∈ 0, n, the initial subsequence (g0 , . . . , gk ) of the sequence (g0 , . . . , gn ) is a
T -system, then (g0 , . . . , gn ) is said to be an M -system (where M refers to Markov).
n
Let (g0 , . . . , gn ) be a T -system on [a, b]. Let det gi (xj ) 0 denote the determinant of
the matrix gi (xj ) : i ∈ 0, n, j ∈ 0, n . This determinant is continuous in (x0 , . . . , xn ) in
the (convex) simplex (say Σ) defined by the
n inequalities a 6 x0 < · · · < xn 6 b and does
not vanish anywhere on Σ. So, det gi (xj ) 0 is constant in sign on Σ.
n
Definition A.2. The sequence (g0 , . . . , gn ) is said to be a T+ -system on [a, b] if det gi (xj ) 0 >
0 for all (x0 , . . . , xn ) ∈ Σ. If (g0 , . . . , gk ) is a T+ -system on [a, b] for each k ∈ 0, n, then the
sequence (g0 , . . . , gn ) is said to be an M+ -system on [a, b].
In the case when the functions g0 , . . . , gn are n times differentiable at a point x ∈
(a, b), consider also the Wronskians
k
(j)
W0k (x) := det gi (x) 0 ,
(j)
where k ∈ 0, n and gi
g0 (x).
(0)
is the j th derivative of gi , with gi
:= gi ; in particular, W00 (x) =
Proposition A.3. Suppose that the functions g0 , . . . , gn are (still continuous on [a, b]
and) n times differentiable on (a, b). Then, for the sequence (g0 , . . . , gn ) to be an M+ system on [a, b], it is necessary that W0k > 0 on (a, b) for all k ∈ 0, n, and it is sufficient
that u0 > 0 on [a, b] and W0k > 0 on (a, b) for all k ∈ 1, n.
Thus, verifying the M+ -property largely reduces to checking the positivity of several
functions of only one variable.
A special case of Proposition A.3 (with n = 1 and g0 = 1) is the following well-known
fact: if a function g1 is continuous on [a, b] and has a positive derivative on (a, b), then g1
is (strictly) increasing on [a, b]; vice versa, if g1 is increasing on [a, b], then the derivative
of g1 (if exists) must be nonnegative on (a, b).
As in this special case, the proof of Proposition A.3 in general can be based on the
mean-value theorem; cf. e.g. [33, Theorem 1.1 of Chapter XI], which states that the
requirement for W0k to be strictly positive on the closed interval [a, b] for all k ∈ 0, n
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 16/22
Gaussian-Rademacher bound
is equivalent to a condition somewhat stronger than being an M+ -system on [a, b]; in
connection with this, one may also want to look at [38, Theorem IV.5.2]. Note that,
in the applications to the proofs of Lemmas 2.2 and 2.3 of this paper, the relevant
Wronskians vanish at the left endpoint of the interval.
The proof of Proposition A.3 can be obtained
by induction on n using the recursive
n
formulas for the determinants det gi (xj ) 0 and W0n as displayed right above [33, (5.5)
in Chapter VIII] and in [33, (5.6) in Chapter VIII], where we use gi in place of ψi .
Proposition A.4. Suppose that (g0 , . . . , gn+1 ) is an M+ -system on [a, b] or, more generally, each of the sequences (g0 , . . . , gn ) and (g0 , . . . , gn+1 ) is a T+ -system on [a, b]. Suppose also that condition (A.1) holds. Let m := b n+1
2 c. Then one has the following.
Rb
(I) The maximum (respectively, the minimum) of a gn+1 dµ over all µ ∈ Mc is attained at a unique measure µmax (respectively, µmin ) in Mc . Moreover, the measures µmax and µmin do not depend on the choice of gn+1 , as long as gn+1 is such
that (g0 , . . . , gn+1 ) is a T+ -system on [a, b].
(II) There exist subsets Xmax and Xmin of [a, b] such that Xmax ⊇ supp µmax , Xmin ⊇
supp µmin , and
(a) if n = 2m then card Xmax = card Xmin = m + 1, Xmax 3 b, and Xmin 3 a;
(b) if n = 2m − 1 then card Xmax = m + 1, card Xmin = m, and Xmax ⊇ {a, b}.
To illustrate Proposition A.4, one may consider the simplest two special cases when
the conditions of the proposition hold and its conclusion is obvious:
(i) n = 0, g0 (x) ≡ 1, g1 is increasing on [a, b], and c0 > 0; then supp µmax ⊆ {b} and
supp µmin ⊆ {a}; in fact, µmax = c0 δb and µmin = c0 δa ; here and in what follows, δx
denotes the Dirac probability measure at point x.
(ii) n = 1, g0 (x) ≡ 1, g1 (x) ≡ x, g2 is strictly convex on [a, b], c0 > 0, and c1 ∈ [c0 a, c0 b];
b−c1
−c0 a
then supp µmax ⊆ {a, b} and card supp µmin 6 1; in fact, µmax = c0b−a
δa + c1b−a
δb ,
and µmin = c0 δc1 /c0 if c0 > 0 and µmin = 0 if c0 = 0.
These examples also show that the T -property of systems of functions can be considered as generalized monotonicity/convexity; see e.g. [82] and bibliography there.
Proof of Proposition A.4. Consider two cases, depending on whether c is strictly or singularly positive; in equivalent geometric terms, this means, respectively, that c belongs
to the interior or the boundary of the smallest closed convex cone containing the subset
{(g0 (x), . . . , gn (x)) : x ∈ [a, b]} of Rn+1 [38, Theorem IV.6.1].
In the first case, when c is strictly positive, both statements of Proposition A.4 follow
by [38, Theorem IV.1.1]; at that, one should let Xmax = supp µmax and Xmin = supp µmin .
(The condition that c be strictly positive appears to be missing in the statement of the
latter theorem; cf. [33, Theorem 1.1 of Chapter 1.1].)
In the other case, when c is singularly positive, use [38, Theorem III.4.1], which
states that in this case the set Mc consists of a single measure (say µ∗ ), and its support
set X∗ := supp µ∗ is of an index 6 n; that is, `− + 2` + `+ 6 n, where `− , `, and `+
stand for the cardinalities of the intersections of X∗ with the sets {a}, (a, b), and {b}. It
remains to show that this condition on the index of X∗ implies that there exist subsets
Xmax and Xmin of [a, b] satisfying the conditions (IIa) and (IIb) of Proposition A.4 and
such that Xmax ∩ Xmin ⊇ X∗ .
2m−`− −`+
−
c 6 b 2m−`
c = m − `− ; so, card(X∗ ∪
If n = 2m then card(X∗ ∩ (a, b)) = ` 6 b
2
2
{b}) 6 `− +(m−`− )+1 = m+1. Adding now to the set X∗ ∪{b} any m+1−card(X∗ ∪{b})
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 17/22
Gaussian-Rademacher bound
points of the complement of X∗ ∪ {b} to [a, b], one obtains a subset Xmax of [a, b] such
that Xmax ⊇ X∗ , Xmax 3 b, and card Xmax = m + 1. Similarly, there exists a subset Xmin
of [a, b] such that Xmin ⊇ X∗ , Xmin 3 a, and card Xmin = m + 1.
2m−1−`− −`+
If n = 2m − 1 then card(X∗ ∩ (a, b)) = ` 6 b
c 6 m − 1 and hence card(X∗ ∪
2
{a, b}) 6 1 + (m − 1) + 1 = m + 1. So, there exists a subset Xmax of [a, b] such that
Xmax ⊇ X∗ , Xmax ⊇ {a, b}, and card Xmax = m + 1. One also has card X∗ = `− + ` + `+ 6
− +`+
b 2m−1+`
c 6 b 2m+1
2
2 c = m. So, there exists a subset Xmin of [a, b] such that Xmin ⊇ X∗
and card Xmin = m.
References
[1] Sergei N. Antonov and Victor M. Kruglov, Sharpened versions of a Kolmogorov’s inequality,
Statist. Probab. Lett. 80 (2010), no. 3-4, 155–160. MR-2575440
[2] A. Ben-Tal, A. Nemirovski, and C. Roos, Robust solutions of uncertain quadratic and conic
quadratic problems, SIAM J. Optim. 13 (2002), no. 2, 535–560 (electronic). MR-1951034
[3] Aharon Ben-Tal and Arkadi Nemirovski, On safe tractable approximations of chanceconstrained linear matrix inequalities, Math. Oper. Res. 34 (2009), no. 1, 1–25. MR-2542986
[4] George Bennett, Probability inequalities for the sum of independent random variables, J.
Amer. Statist. Assoc. 57 (1962), no. 297, 33–45.
[5] V. Bentkus, An inequality for large deviation probabilities of sums of bounded i.i.d. random
variables, Liet. Mat. Rink. 41 (2001), no. 2, 144–153. MR-1851123
[6]
, A remark on the inequalities of Bernstein, Prokhorov, Bennett, Hoeffding, and Talagrand, Liet. Mat. Rink. 42 (2002), no. 3, 332–342. MR-1947624
[7]
, An inequality for tail probabilities of martingales with differences bounded from one
side, J. Theoret. Probab. 16 (2003), no. 1, 161–173. MR-1956826
[8] V. Bentkus and D. Dzindzalieta, A tight Gaussian bound for weighted sums of Rademacher
random variables (preprint).
[9] Vidmantas Bentkus, On Hoeffding’s inequalities, Ann. Probab. 32 (2004), no. 2, 1650–1673.
MR-2060313
[10]
, On measure concentration for separately Lipschitz functions in product spaces, Israel J. Math. 158 (2007), 1–17. MR-2342455
[11] Sergey G. Bobkov, Friedrich Götze, and Christian Houdré, On Gaussian and Bernoulli covariance representations, Bernoulli 7 (2001), no. 3, 439–451. MR-1836739
[12] Stéphane Boucheron, Olivier Bousquet, Gábor Lugosi, and Pascal Massart, Moment inequalities for functions of independent random variables, Ann. Probab. 33 (2005), no. 2, 514–560.
MR-2123200
[13] D. L. Burkholder, Independent sequences with the Stein property, Ann. Math. Statist. 39
(1968), 1282–1288. MR-0228045
[14] George E. Collins, Quantifier elimination for real closed fields by cylindrical algebraic decomposition, Quantifier elimination and cylindrical algebraic decomposition (Linz, 1993),
Texts Monogr. Symbol. Comput., Springer, Vienna, 1998, pp. 85–121. MR-1634190
[15] Amir Dembo and Ofer Zeitouni, Large deviations techniques and applications, second ed.,
Applications of Mathematics (New York), vol. 38, Springer-Verlag, New York, 1998. MR1619036
[16] Kürşad Derinkuyu and Mustafa Ç. Pınar, On the S-procedure and some variants, Math. Methods Oper. Res. 64 (2006), no. 1, 55–77. MR-2264772
[17] Kürşad Derinkuyu, Mustafa Ç. Pınar, and Ahmet Camcı, An improved probability bound for
the approximate S-Lemma, Oper. Res. Lett. 35 (2007), no. 6, 743–746. MR-2361043
[18] M. L. Eaton and Bradley Efron, Hotelling’s T 2 test under symmetry conditions, J. Amer.
Statist. Assoc. 65 (1970), 702–711. MR-0269021
[19] Morris L. Eaton, A note on symmetric Bernoulli random variables, Ann. Math. Statist. 41
(1970), 1223–1226. MR-0268930
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 18/22
Gaussian-Rademacher bound
[20]
, A probability inequality for linear combinations of bounded random variables, Ann.
Statist. 2 (1974), 609–613.
[21] D. Edelman, Private communication, 1994.
[22] Bradley Efron, Student’s t-test under symmetry conditions, J. Amer. Statist. Assoc. 64
(1969), 1278–1302. MR-0251826
[23] Evarist Giné, Friedrich Götze, and David M. Mason, When is the Student t-statistic asymptotically standard normal?, Ann. Probab. 25 (1997), no. 3, 1514–1531. MR-1457629
[24] S. E. Graversen and G. Peškir, Extremal problems in the maximal inequalities of Khintchine,
Math. Proc. Cambridge Philos. Soc. 123 (1998), no. 1, 169–177. MR-1474873
[25] Richard L. Hall, Marek Kanter, and Michael D. Perlman, Inequalities for the probability
content of a rotated square and related convolutions, Ann. Probab. 8 (1980), no. 4, 802–
813. MR-577317
[26] Jean-Baptiste Hiriart-Urruty, A new series of conjectures and open questions in optimization and matrix analysis, ESAIM Control Optim. Calc. Var. 15 (2009), no. 2, 454–470. MR2513094
[27] Paweł Hitczenko and Stanisław Kwapień, On the Rademacher series, Probability in Banach
spaces, 9 (Sandjberg, 1993), Progr. Probab., vol. 35, Birkhäuser Boston, Boston, MA, 1994,
pp. 31–36. MR-1308508
[28]
, On the Rademacher series, Probability in Banach spaces, 9 (Sandjberg, 1993), Progr.
Probab., vol. 35, Birkhäuser Boston, Boston, MA, 1994, pp. 31–36. MR-1308508
[29] Paweł Hitczenko and Stephen Montgomery-Smith, Measuring the magnitude of sums of
independent random variables, Ann. Probab. 29 (2001), no. 1, 447–466. MR-1825159
[30] Wassily Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer.
Statist. Assoc. 58 (1963), 13–30. MR-0144363
[31] Jørgen Hoffmann-Jørgensen, Sums of independent Banach space valued random variables,
Studia Math. 52 (1974), 159–186. MR-0356155
[32] Ron Holzman and Daniel J. Kleitman, On the product of sign vectors and unit vectors, Combinatorica 12 (1992), no. 3, 303–316. MR-1195893
[33] Samuel Karlin and William J. Studden, Tchebycheff systems: With applications in analysis
and statistics, Pure and Applied Mathematics, Vol. XV, Interscience Publishers John Wiley &
Sons, New York-London-Sydney, 1966. MR-0204922
[34] Michael J. Klass and Krzysztof Nowicki, An improvement of Hoffmann-Jørgensen’s inequality, Ann. Probab. 28 (2000), no. 2, 851–862. MR-1782275
[35]
, Uniformly accurate quantile bounds via the truncated moment generating function:
the symmetric case, Electron. J. Probab. 12 (2007), no. 47, 1276–1298 (electronic). MR2346512
[36]
, Uniformly accurate quantile bounds for sums of arbitrary independent random variables, J. Theoret. Probab. 23 (2010), no. 4, 1068–1091. MR-2735737
[37] H. König and S. Kwapień, Best Khintchine type inequalities for sums of independent, rotationally invariant random vectors, Positivity 5 (2001), no. 2, 115–152. MR-1825172
[38] M. G. Kreı̆n and A. A. Nudel0 man, The Markov moment problem and extremal problems,
American Mathematical Society, Providence, R.I., 1977, Ideas and problems of P. L. Čebyšev
and A. A. Markov and their further development, Translated from the Russian by D. Louvish,
Translations of Mathematical Monographs, Vol. 50. MR-0458081
[39] Jean Bernard Lasserre, Moments, positive polynomials and their applications, Imperial College Press Optimization Series, vol. 1, Imperial College Press, London, 2010. MR-2589247
[40] R. Latała and K. Oleszkiewicz, Between Sobolev and Poincaré, Geometric aspects of functional analysis, Lecture Notes in Math., vol. 1745, Springer, Berlin, 2000, pp. 147–168.
MR-1796718
[41] Rafał Latała, Estimation of moments of sums of independent real random variables, Ann.
Probab. 25 (1997), no. 3, 1502–1513. MR-1457628
[42]
, Estimates of moments and tails of Gaussian chaoses, Ann. Probab. 34 (2006), no. 6,
2315–2331. MR-2294983
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 19/22
Gaussian-Rademacher bound
[43] Rafał Latała and Krzysztof Oleszkiewicz, On the best constant in the Khinchin-Kahane inequality, Studia Math. 109 (1994), no. 1, 101–104. MR-1267715
[44] Michel Ledoux, On Talagrand’s deviation inequalities for product measures, ESAIM Probab.
Statist. 1 (1995/97), 63–87 (electronic). MR-1399224
[45]
, The concentration of measure phenomenon, Mathematical Surveys and Monographs,
vol. 89, American Mathematical Society, Providence, RI, 2001. MR-1849347
[46] S. Łojasiewicz, Sur les ensembles semi-analytiques, Actes du Congrès International des
Mathématiciens (Nice, 1970), Tome 2, Gauthier-Villars, Paris, 1971, pp. 237–241. MR0425152
[47] Murray Marshall, Positive polynomials and sums of squares, Mathematical Surveys and
Monographs, vol. 146, American Mathematical Society, Providence, RI, 2008. MR-2383959
[48] S. J. Montgomery-Smith, The distribution of Rademacher sums, Proc. Amer. Math. Soc. 109
(1990), no. 2, 517–522. MR-1013975
[49] Elchanan Mossel, Ryan O’Donnell, and Krzysztof Oleszkiewicz, Noise stability of functions
with low influences: invariance and optimality, Ann. of Math. (2) 171 (2010), no. 1, 295–341.
MR-2630040
[50] A. V. Nagaev, Probabilities of large deviations of sums of independent random variables
(Doctor of Science Thesis, Tashkent 1970).
[51] Krzysztof Oleszkiewicz, On the Stein property of Rademacher sequences, Probab. Math.
Statist. 16 (1996), no. 1, 127–130. MR-1407938
[52]
, On a nonsymmetric version of the Khinchine-Kahane inequality, Stochastic inequalities and applications, Progr. Probab., vol. 56, Birkhäuser, Basel, 2003, pp. 157–168. MR2073432
[53] G. Peshkir and A. N. Shiryaev, Khinchin inequalities and a martingale extension of the sphere
of their action, Uspekhi Mat. Nauk 50 (1995), no. 5(305), 3–62. MR-1365047
[54] V. V. Petrov, Sums of independent random variables, Springer-Verlag, New York, 1975, Translated from the Russian by A. A. Brown, Ergebnisse der Mathematik und ihrer Grenzgebiete,
Band 82. MR-0388499
[55] I. Pinelis, An asymptotically Gaussian bound on the Rademacher tails, preprint, version 1,
http://arxiv.org/pdf/1007.2137v1.pdf.
[56]
, An asymptotically Gaussian bound on the Rademacher tails, preprint, version 3,
http://arxiv.org/pdf/1007.2137v3.pdf.
[57]
, On the Bennett-Hoeffding inequality (preprint), arXiv:0902.4058v1 [math.PR].
[58]
, On the extreme points of moments sets, preprint, http://arxiv.org/find/all/1/
au:+pinelis/0/1/0/all/0/1.
[59]
, On the supremum of the tails of normalized sums of independent Rademacher random variables, preprint, http://arxiv.org/find/all/1/au:+pinelis/0/1/0/all/0/1.
[60]
, Tchebycheff systems and extremal problems for generalized moments: a brief survey, preprint, http://arxiv.org/find/all/1/au:+pinelis/0/1/0/all/0/1.
[61]
, Exponential deficiency of convolutions of densities,
10.1051/ps/2010010, ESAIM: Probability and Statistics (2011).
published online,
DOI
[62] I. Pinelis and R. Molzon, Berry-Esséen bounds for general nonlinear statistics, with applications to Pearson’s and non-central Student’s and Hotelling’s (preprint), arXiv:0906.0177v1
[math.ST].
[63] I. F. Pinelis, A problem on large deviations in a space of trajectories, Theory Probab. Appl.
26 (1981), no. 1, 69–84.
[64] I. F. Pinelis, Asymptotic equivalence of the probabilities of large deviations for sums and
maximum of independent random variables, Limit theorems of probability theory, Trudy
Inst. Mat., vol. 5, “Nauka” Sibirsk. Otdel., Novosibirsk, 1985, pp. 144–173, 176. MR-821760
[65] Iosif Pinelis, Extremal probabilistic problems and Hotelling’s T 2 test under a symmetry
condition, Ann. Statist. 22 (1994), no. 1, 357–368. MR-1272088
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 20/22
Gaussian-Rademacher bound
[66]
, Optimal tail comparison based on comparison of moments, High dimensional probability (Oberwolfach, 1996), Progr. Probab., vol. 43, Birkhäuser, Basel, 1998, pp. 297–314.
MR-1652335
[67]
, Fractional sums and integrals of r -concave tails and applications to comparison probability inequalities, Advances in stochastic inequalities (Atlanta, GA, 1997), Contemp. Math.,
vol. 234, Amer. Math. Soc., Providence, RI, 1999, pp. 149–168. MR-1694770
[68]
, On exact maximal Khinchine inequalities, High dimensional probability, II (Seattle,
WA, 1999), Progr. Probab., vol. 47, Birkhäuser Boston, Boston, MA, 2000, pp. 49–63. MR1857314
[69]
, Exact asymptotics for large deviation probabilities, with applications, Modeling uncertainty, Internat. Ser. Oper. Res. Management Sci., vol. 46, Kluwer Acad. Publ., Boston,
MA, 2002, pp. 57–93. MR-1893275
[70]
, L’Hospital type rules for monotonicity: applications to probability inequalities for
sums of bounded random variables, JIPAM. J. Inequal. Pure Appl. Math. 3 (2002), no. 1,
Article 7, 9 pp. (electronic). MR-1888922
[71]
, Monotonicity properties of the relative error of a Padé approximation for Mills’ ratio, JIPAM. J. Inequal. Pure Appl. Math. 3 (2002), no. 2, Article 20, 8 pp. (electronic). MR1906389
[72]
, Spherically symmetric functions with a convex second derivative and applications to
extremal probabilistic problems, Math. Inequal. Appl. 5 (2002), no. 1, 7–26. MR-1880267
[73]
, Dimensionality reduction in extremal problems for moments of linear combinations
of vectors with random coefficients, Stochastic inequalities and applications, Progr. Probab.,
vol. 56, Birkhäuser, Basel, 2003, pp. 169–185. MR-2073433
[74]
, Binomial upper bounds on generalized moments and tail probabilities of (super)martingales with differences bounded from above, High dimensional probability, IMS
Lecture Notes Monogr. Ser., vol. 51, Inst. Math. Statist., Beachwood, OH, 2006, pp. 33–52.
MR-2387759
[75]
, On l’Hospital-type rules for monotonicity, JIPAM. J. Inequal. Pure Appl. Math. 7
(2006), no. 2, Article 40, 19 pp. (electronic). MR-2221321
[76]
, On normal domination of (super)martingales, Electron. J. Probab. 11 (2006), no. 39,
1049–1070. MR-2268536
[77]
, Exact inequalities for sums of asymmetric random variables, with applications,
Probab. Theory Related Fields 139 (2007), no. 3-4, 605–635. MR-2322709
[78]
, Toward the best constant factor for the Rademacher-Gaussian tail comparison,
ESAIM Probab. Stat. 11 (2007), 412–426. MR-2339301
[79]
, On inequalities for sums of bounded random variables, J. Math. Inequal. 2 (2008),
no. 1, 1–7. MR-2453629
[80] S. Portnoy, Private communication, 1991.
[81] Martin Raič, CLT-related large deviation bounds based on Stein’s method, Adv. in Appl.
Probab. 39 (2007), no. 3, 731–752. MR-2357379
[82] Moshe Shaked and J. George Shanthikumar, Stochastic orders, Springer Series in Statistics,
Springer, New York, 2007. MR-2265633
[83] I. G. Shevtsova, Refinement of estimates for the rate of convergence in Lyapunov’s theorem,
Dokl. Akad. Nauk 435 (2010), no. 1, 26–28. MR-2790498
[84] Michel Talagrand, The missing factor in Hoeffding’s inequalities, Ann. Inst. H. Poincaré
Probab. Statist. 31 (1995), no. 4, 689–702. MR-1355613
[85] Alfred Tarski, A Decision Method for Elementary Algebra and Geometry, RAND Corporation,
Santa Monica, Calif., 1948. MR-0028796
[86] Joel A. Tropp, User-friendly tail bounds for sums of random matrices (preprint),
arXiv:1004.4389v7 [math.PR].
[87] I. Tyurin, New estimates of the convergence rate in the Lyapunov theorem (preprint,
arXiv:0912.0726v1 [math.PR]).
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 21/22
Gaussian-Rademacher bound
[88] Sara A. van de Geer, On non-asymptotic bounds for estimation in generalized linear models
with highly correlated design, Asymptotics: particles, processes and inverse problems, IMS
Lecture Notes Monogr. Ser., vol. 55, Inst. Math. Statist., Beachwood, OH, 2007, pp. 121–
134. MR-2459935
[89] Mark Veraar, A note on optimal probability lower bounds for centered random variables,
Colloq. Math. 113 (2008), no. 2, 231–240. MR-2425084
[90]
, On Khintchine inequalities with a weight, Proc. Amer. Math. Soc. 138 (2010), no. 11,
4119–4121. MR-2679633
[91] Vladimir Vinogradov, Refined large deviation limit theorems, Pitman Research Notes in
Mathematics Series, vol. 315, Longman Scientific & Technical, Harlow, 1994. MR-1312369
[92] A. V. Zhubr, On one extremal problem for N -cube, To appear, 2012.
Acknowledgments. I am pleased to thank the referee for a careful reading of the
paper and useful suggestions about the exposition.
EJP 17 (2012), paper 35.
ejp.ejpecp.org
Page 22/22
Download