Lecture 05 One dimensional concentration inequalities. Bennett’s inequality. For a fixed f ∈ F, if we observe 1 n �n i=1 18.465 I (f (Xi ) �= Yi ) is small, can we say that P (f (X) �= Y ) is small? By the Law of Large Numbers, n 1� � Y ) = P (f (X) �= Y ) . I (f (Xi ) �= Yi ) → EI(f (X) = n i=1 The Central Limit Theorem says � √ � 1 �n n n i=1 I (f (Xi ) �= Yi ) − EI(f (X) �= Y ) √ → N (0, 1). VarI Thus, n k 1� � Y)∼ √ . I (f (Xi ) �= Yi ) − EI(f (X) = n i=1 n Let Z1 , · · · , Zn ∈ R be i.i.d. random variables. We’re interested in bounds on 1 n � Zi − EZ. (1) Jensen’s inequality: If φ is a convex function, then φ(EZ) ≤ Eφ(X). (2) Chebyshev’s inequality: If Z ≥ 0, then P (Z ≥ t) ≤ EZ t . Proof: EZ = EZI(Z < t) + EZI(Z ≥ t) ≥ EZI(Z ≥ t) ≥ EtI(Z ≥ t) = tP (Z ≥ t) . (3) Markov’s inequality: Let Z be a signed r.v. Then for any λ > 0 � � EeλZ P (Z ≥ t) = P eλZ ≥ eλt ≤ λt e and therefore P (Z ≥ t) ≤ inf e−λt EeλZ . λ>0 Theorem 5.1. [Bennett] Assume EZ = 0, EZ 2 = σ 2 , |Z| < M = const, Z1 , · · · , Zn independent copies of Z, and t ≥ 0. Then P � n � � Zi ≥ t i=1 � � �� nσ 2 tM ≤ exp − 2 φ , nσ 2 M where φ(x) = (1 + x) log(1 + x) − x. Proof. Since Zi are i.i.d., P � n � i=1 � Zi ≥ t ≤ e−λt Eeλ Pn i=1 Zi = e−λt n � � �n EeλZi = e−λt EeλZ . i=1 9 Lecture 05 One dimensional concentration inequalities. Bennett’s inequality. 18.465 Expanding, λZ Ee = E ∞ � (λZ)k k! k=0 = 1+ = ∞ � λk k=2 2 k! ∞ � λk k=0 EZ k k! EZ 2 Z k−2 ≤ 1 + ∞ � λk k=2 ∞ � k! M k−2 σ 2 � σ λ M σ 2 � λM = 1 + e − 1 − λM 2 2 M k! M k=2 � 2 � � σ � λM ≤ exp e − 1 − λM M2 = 1+ k k where the last inequality follows because 1 + x ≤ ex . Combining the results, P � n � � Zi ≥ t −λt ≤ e i=1 = � exp � nσ 2 � λM e − 1 − λM M2 � � � � nσ 2 � λM exp −λt + 2 e − 1 − λM M Now, minimize the above bound with respect to λ. Taking derivative w.r.t. λ and setting it to zero: � nσ 2 � M eλM − M = 0 2 M tM +1 eλM = nσ 2 � � 1 tM λ= log 1 + . M nσ 2 −t + The bound becomes � n � � � � � � ��� � tM t tM nσ 2 tM P Zi ≥ t ≤ exp − log 1 + + 2 + 1 − log 1 + nσ 2 M nσ 2 nσ 2 M i=1 � 2� � � � ��� nσ tM tM tM tM = exp − log 1 + − log 1 + 2 2 2 2 M nσ nσ nσ 2 nσ � 2� � � � ��� nσ tM tM tM = exp − 1+ log 1 + M 2 nσ 2 nσ 2 nσ 2 � � �� tM nσ 2 = exp − 2 φ M nσ 2 � 10