Lecture 07 Hoeffding, Hoeffding-Chernoff, and Khinchine inequalities. 18.465 Let a1 , . . . , an ∈ R and let ε1 , . . . , εn be i.i.d. Rademacher random variables: P (εi = 1) = P (εi = −1) = 0.5. Theorem 7.1. [Hoeffding] For t ≥ 0, � n � � � � t2 P εi ai ≥ t ≤ exp − �n . 2 i=1 a2i i=1 Proof. Similarly to the proof of Bennett’s inequality (Lecture 5), � n � � n � n � � � −λt −λt P εi ai ≥ t ≤ e E exp λ εi ai = e E exp (λεi ai ) . i=1 Using inequality ex +e−x 2 i=1 x2 /2 ≤e i=1 (from Taylor expansion), we get E exp (λεi ai ) = 2 λ2 a i 1 λai 1 −λai e + e ≤e 2 . 2 2 Hence, we need to minimize the bound with respect to λ > 0: � n � � λ2 P εi ai ≥ t ≤ e−λt e 2 Pn i=1 a2i . i=1 Setting derivative to zero, we obtain the result. Now we change variable: u = 2 2 t Pn i=1 a2i . Then t = ⎛ P⎝ � � �n 2u i=1 a2i . n � � ⎞ � n � � εi ai ≥ �2u a2i ⎠ ≤ e−u i=1 i=1 and ⎛ P⎝ n � i=1 Here �n 2 i=1 ai = Var( � ⎞ � n � � a2i ⎠ ≥ 1 − e−u . εi ai ≤ �2u i=1 �n i=1 εi ai ). Rademacher sums will play important role in future. Consider again the problem of estimating 1 n �n i=1 f (Xi )− Ef . We will see that by the Symmetrization technique, n n n 1� 1� 1� f (Xi ) − Ef ∼ f (Xi ) − f (Xi� ). n i=1 n i=1 n i=1 In fact, � n � � n � � n � n �1 � � �1 � � �1 � � 1� � � � � � � E� f (Xi ) − Ef � ≤ E � f (Xi ) − f (Xi� )� ≤ 2E � f (Xi ) − Ef � . �n � �n � �n � n i=1 i=1 i=1 i=1 The second inequality above follows by adding and subtracting Ef : � n � � n � � n � n �1 � � �1 � � �1 � � 1� � � � � � � E� f (Xi ) − f (Xi� )� ≤ E � f (Xi ) − Ef � + E � f (Xi� ) − Ef � �n � �n � �n � n i=1 i=1 i=1 i=1 � n � �1 � � � � = 2E � f (Xi ) − Ef � �n � i=1 13 Lecture 07 Hoeffding, Hoeffding-Chernoff, and Khinchine inequalities. 18.465 while for the first inequality we use Jensen’s inequality: � � � � n n n �1 � � �1 � � 1� � � � � � f (Xi ) − Ef � = E � f (Xi ) − Ef (Xi )� E� �n � �n � n i=1 i=1 i=1 � n � n �1 � � 1� � � � ≤ EX EX � � f (Xi ) − Ef (Xi )� . �n � n i=1 i=1 Note that 1 n �n i=1 f (Xi ) − 1 n �n i=1 Ef (Xi� ) is equal in distribution to 1 n �n i=1 εi (f (Xi ) − f (Xi� )). We now prove Hoeffding-Chernoff Inequality: Theorem 7.2. Assume 0 ≤ Xi ≤ 1 and µ = EX. Then � n � 1� Xi − µ ≥ t ≤ e−nD(µ+t,µ) P n i=1 where the KL-divergence D(p, q) = p log p q + (1 − p) log 1−p 1−q . Proof. Note that φ(x) = eλx is convex and so eλx = eλ(x·1+(1−x)·0) ≤ xeλ + (1 − x)eλ·0 = 1 − x + xeλ . Hence, EeλX = 1 − EX + EXeλ = 1 − µ + µeλ . Again, we minimize the following bound with respect to λ > 0: � n � � P Xi ≥ n(µ + t) ≤ e−λn(µ+t) Eeλ Xi P i=1 � �n = e−λn(µ+t) EeλX � �n ≤ e−λn(µ+t) 1 − µ + µeλ Take derivative w.r.t. λ: −n(µ + t)e−λn(µ+t) (1 − µ + µeλ )n + n(1 − µ + µeλ )n−1 µeλ e−λn(µ+t) = 0 −(µ + t)(1 − µ + µeλ ) + µeλ = 0 eλ = (1 − µ)(µ + t) . µ(1 − µ − t) Substituting, � n � �� �µ+t � ��n � µ(1 − µ − t) (1 − µ)(µ + t) P Xi ≥ n(µ + t) ≤ 1−µ+ (1 − µ)(µ + t) 1−µ−t i=1 �� � n �µ+t � �1−µ−t µ 1−µ = µ+t 1−µ−t � � �� µ+t 1−µ−t = exp −n (µ + t) log + (1 − µ − t) log , µ 1−µ 14 Lecture 07 Hoeffding, Hoeffding-Chernoff, and Khinchine inequalities. 18.465 completing the proof. Moreover, � � � n � n 1� 1� P µ− Xi ≥ t = P Zi − µZ ≥ t ≤ e−nD(µz +t,µZ ) = e−nD(1−µX +t,1−µX ) n i=1 n i=1 where Zi = 1 − Xi (and thus µZ = 1 − µX ). � If 0 < µ ≤ 1/2, D(1 − µ + t, 1 − µ) ≥ t2 . 2µ(1 − µ) Hence, we get � n 1� Xi ≥ t P µ− n i=1 � nt2 ≤ e− 2µ(1−µ) = e−u . Solving for t, � n 1� Xi ≥ P µ− n i=1 � 2µ(1 − µ)u n � ≤ e−u . � If Xi ∈ {0, 1} are i.i.d. Bernoulli trials, then µ = EX = P (X = 1), Var(X) = µ(1−µ), and P µ − 1 n �n i=1 � Xi ≥ t ≤ nt2 e− 2Var(X) . The following inequality says that if we pick n reals a1 , · · · , an ∈ R and add them up each multiplied by a � � 2 |ai | . random sign ±1, then the expected value of the sum should not be far off from Theorem 7.3. [Khinchine inequality] Let a1 , · · · , an ∈ R, �i , · · · , �n be i.i.d. Rademacher random variables: P(�i = 1) = P(�i = −1) = 0.5, and 0 < p < ∞. Then Ap · � n � �1/2 2 |ai | i=1 �p �1/p � � n � n �1/2 �� � � � � 2 ≤ E� ai �i � ≤ Bp · |ai | � � i=1 i=1 for some constants Ap and Bp depending on p. Proof. Let � 2 |ai | = 1 without lossing generality. Then �� �p � � E� ai �i � � ∞ = 0 ∞ � = 0 ∞ � = 0 �p � ��� � � P � ai �i � ≥ sp dsp � � ��� � � P � ai �i � ≥ s · psp−1 dsp � � ��� � � P � ai �i � ≥ s · psp−1 dsp ∞ � ≤ 2 exp(− 0 = s2 ) · psp−1 dsp , Hoeffding’s inequality 2 p (Bp ) , when p ≥ 2. 15 Lecture 07 Hoeffding, Hoeffding-Chernoff, and Khinchine inequalities. 18.465 When 0 < p < 2, �� �p � � E� ai �i � �� �2 � � ≤ E� ai �i � �� � 23 p+(2− 23 p) � � = E� ai �i � ≤ �p � 23 � �� �6−2p � 13 � �� � � � � E� ai �i � E� ai �i � , Holder’s inequality 2− 23 p ≤ (B6−2p ) �p � 23 � �� � � · E� ai �i � . � p 6−2p Thus E | ai �i | ≤ (B6−2p ) , completing the proof. � 16