Lecture 07 Hoeffding, Hoeffding-Chernoff, and Khinchine inequalities. 18.465 Let a

advertisement
Lecture 07
Hoeffding, Hoeffding-Chernoff, and Khinchine inequalities.
18.465
Let a1 , . . . , an ∈ R and let ε1 , . . . , εn be i.i.d. Rademacher random variables: P (εi = 1) = P (εi = −1) = 0.5.
Theorem 7.1. [Hoeffding] For t ≥ 0,
� n
�
�
�
�
t2
P
εi ai ≥ t ≤ exp − �n
.
2 i=1 a2i
i=1
Proof. Similarly to the proof of Bennett’s inequality (Lecture 5),
� n
�
� n
�
n
�
�
�
−λt
−λt
P
εi ai ≥ t ≤ e E exp λ
εi ai = e
E exp (λεi ai ) .
i=1
Using inequality
ex +e−x
2
i=1
x2 /2
≤e
i=1
(from Taylor expansion), we get
E exp (λεi ai ) =
2
λ2 a i
1 λai 1 −λai
e + e
≤e 2 .
2
2
Hence, we need to minimize the bound with respect to λ > 0:
� n
�
�
λ2
P
εi ai ≥ t ≤ e−λt e 2
Pn
i=1
a2i
.
i=1
Setting derivative to zero, we obtain the result.
Now we change variable: u =
2
2
t
Pn
i=1
a2i
. Then t =
⎛
P⎝
�
� �n
2u i=1 a2i .
n
�
�
⎞
�
n
� �
εi ai ≥ �2u
a2i ⎠ ≤ e−u
i=1
i=1
and
⎛
P⎝
n
�
i=1
Here
�n
2
i=1 ai
= Var(
�
⎞
�
n
� �
a2i ⎠ ≥ 1 − e−u .
εi ai ≤ �2u
i=1
�n
i=1 εi ai ).
Rademacher sums will play important role in future. Consider again the problem of estimating
1
n
�n
i=1
f (Xi )−
Ef . We will see that by the Symmetrization technique,
n
n
n
1�
1�
1�
f (Xi ) − Ef ∼
f (Xi ) −
f (Xi� ).
n i=1
n i=1
n i=1
In fact,
� n
�
� n
�
� n
�
n
�1 �
�
�1 �
�
�1 �
�
1�
�
�
�
�
�
�
E�
f (Xi ) − Ef � ≤ E �
f (Xi ) −
f (Xi� )� ≤ 2E �
f (Xi ) − Ef � .
�n
�
�n
�
�n
�
n
i=1
i=1
i=1
i=1
The second inequality above follows by adding and subtracting Ef :
� n
�
� n
�
� n
�
n
�1 �
�
�1 �
�
�1 �
�
1�
�
�
�
�
�
�
E�
f (Xi ) −
f (Xi� )� ≤ E �
f (Xi ) − Ef � + E �
f (Xi� ) − Ef �
�n
�
�n
�
�n
�
n
i=1
i=1
i=1
i=1
� n
�
�1 �
�
�
�
= 2E �
f (Xi ) − Ef �
�n
�
i=1
13
Lecture 07
Hoeffding, Hoeffding-Chernoff, and Khinchine inequalities.
18.465
while for the first inequality we use Jensen’s inequality:
�
�
�
�
n
n
n
�1 �
�
�1 �
�
1�
�
�
�
� �
f (Xi ) − Ef � = E �
f (Xi ) −
Ef (Xi )�
E�
�n
�
�n
�
n i=1
i=1
i=1
� n
�
n
�1 �
�
1�
�
� �
≤ EX EX � �
f (Xi ) −
Ef (Xi )� .
�n
�
n i=1
i=1
Note that
1
n
�n
i=1
f (Xi ) −
1
n
�n
i=1
Ef (Xi� ) is equal in distribution to
1
n
�n
i=1 εi (f (Xi )
− f (Xi� )).
We now prove Hoeffding-Chernoff Inequality:
Theorem 7.2. Assume 0 ≤ Xi ≤ 1 and µ = EX. Then
� n
�
1�
Xi − µ ≥ t ≤ e−nD(µ+t,µ)
P
n i=1
where the KL-divergence D(p, q) = p log
p
q
+ (1 − p) log
1−p
1−q .
Proof. Note that φ(x) = eλx is convex and so eλx = eλ(x·1+(1−x)·0) ≤ xeλ + (1 − x)eλ·0 = 1 − x + xeλ . Hence,
EeλX = 1 − EX + EXeλ = 1 − µ + µeλ .
Again, we minimize the following bound with respect to λ > 0:
� n
�
�
P
Xi ≥ n(µ + t)
≤ e−λn(µ+t) Eeλ Xi
P
i=1
�
�n
= e−λn(µ+t) EeλX
�
�n
≤ e−λn(µ+t) 1 − µ + µeλ
Take derivative w.r.t. λ:
−n(µ + t)e−λn(µ+t) (1 − µ + µeλ )n + n(1 − µ + µeλ )n−1 µeλ e−λn(µ+t) = 0
−(µ + t)(1 − µ + µeλ ) + µeλ = 0
eλ =
(1 − µ)(µ + t)
.
µ(1 − µ − t)
Substituting,
� n
�
��
�µ+t �
��n
�
µ(1 − µ − t)
(1 − µ)(µ + t)
P
Xi ≥ n(µ + t)
≤
1−µ+
(1 − µ)(µ + t)
1−µ−t
i=1
��
�
n
�µ+t �
�1−µ−t
µ
1−µ
=
µ+t
1−µ−t
�
�
��
µ+t
1−µ−t
= exp −n (µ + t) log
+ (1 − µ − t) log
,
µ
1−µ
14
Lecture 07
Hoeffding, Hoeffding-Chernoff, and Khinchine inequalities.
18.465
completing the proof. Moreover,
�
�
� n
�
n
1�
1�
P µ−
Xi ≥ t = P
Zi − µZ ≥ t ≤ e−nD(µz +t,µZ ) = e−nD(1−µX +t,1−µX )
n i=1
n i=1
where Zi = 1 − Xi (and thus µZ = 1 − µX ).
�
If 0 < µ ≤ 1/2,
D(1 − µ + t, 1 − µ) ≥
t2
.
2µ(1 − µ)
Hence, we get
�
n
1�
Xi ≥ t
P µ−
n i=1
�
nt2
≤ e− 2µ(1−µ) = e−u .
Solving for t,
�
n
1�
Xi ≥
P µ−
n i=1
�
2µ(1 − µ)u
n
�
≤ e−u .
�
If Xi ∈ {0, 1} are i.i.d. Bernoulli trials, then µ = EX = P (X = 1), Var(X) = µ(1−µ), and P µ −
1
n
�n
i=1
�
Xi ≥ t ≤
nt2
e−
2Var(X) .
The following inequality says that if we pick n reals a1 ,
· · · , an ∈ R and add them up each multiplied by a
�
�
2
|ai | .
random sign ±1, then the expected value of the sum should not be far off from
Theorem 7.3. [Khinchine inequality] Let a1 , · · · , an ∈ R, �i , · · · , �n be i.i.d. Rademacher random variables:
P(�i = 1) = P(�i = −1) = 0.5, and 0 < p < ∞. Then
Ap ·
� n
�
�1/2
2
|ai |
i=1
�p �1/p
� � n
� n
�1/2
��
�
�
�
�
2
≤ E�
ai �i �
≤ Bp ·
|ai |
�
�
i=1
i=1
for some constants Ap and Bp depending on p.
Proof. Let
�
2
|ai | = 1 without lossing generality. Then
��
�p
�
�
E�
ai �i �
�
∞
=
0
∞
�
=
0
∞
�
=
0
�p
�
���
�
�
P �
ai �i � ≥ sp dsp
�
�
���
�
�
P �
ai �i � ≥ s · psp−1 dsp
�
�
���
�
�
P �
ai �i � ≥ s · psp−1 dsp
∞
�
≤
2 exp(−
0
=
s2
) · psp−1 dsp , Hoeffding’s inequality
2
p
(Bp ) , when p ≥ 2.
15
Lecture 07
Hoeffding, Hoeffding-Chernoff, and Khinchine inequalities.
18.465
When 0 < p < 2,
��
�p
�
�
E�
ai �i �
��
�2
�
�
≤ E�
ai �i �
��
� 23 p+(2− 23 p)
�
�
= E�
ai �i �
≤
�p � 23 � ��
�6−2p � 13
� ��
�
�
�
�
E�
ai �i �
E�
ai �i �
, Holder’s inequality
2− 23 p
≤ (B6−2p )
�p � 23
� ��
�
�
· E�
ai �i �
.
�
p
6−2p
Thus E | ai �i | ≤ (B6−2p )
, completing the proof.
�
16
Download