Lecture 10 Symmetrization. Pessimistic VC inequality. 18.465 We are interested in bounding � � � n �1 � � � � I(Xi ∈ C) − P (C)� ≥ t P sup � � C∈C � n i=1 � In Lecture 7 we hinted at Symmetrization as a way to deal with the unknown P (C). � Lemma 10.1. [Symmetrization] If t ≥ n2 , then � � � � � � � � n n n � �1 � � �1 � 1� � � � � � P sup � I(Xi ∈ C) − P (C)� ≥ t ≤ 2P sup � I(Xi ∈ C) − I(Xi ∈ C)� ≥ t/2 . � � n i=1 C∈C � n i=1 C∈C � n i=1 Proof. Suppose the event � � n �1 � � � � sup � I(Xi ∈ C) − P (C)� ≥ t � C∈C � n i=1 � 1 �n � occurs. Let X = (X1 , . . . , Xn ) ∈ {supC∈C � n i=1 I(Xi ∈ C) − P (C)� ≥ t}. Then � � n �1 � � � � I(Xi ∈ CX ) − P (CX )� ≥ t. ∃CX such that � �n � i=1 For a fixed C, ⎛� ⎞ � �2 �� n � n �1 � � � 1 � � PX � � I(Xi� ∈ C) − P (C)� ≥ t/2 = P ⎝ I(Xi� ∈ C) − P (C) ≥ t2 /4⎠ �n � n i=1 i=1 ≤ (by Chebyshev’s Ineq) = = since we chose t ≥ � 4 n 2 t2 n � i=1 4 � n 2 t2 4E � 1 �n n i=1 �2 I(Xi� ∈ C) − P (C) t2 E(I(Xi� ∈ C) − P (C))(I(Xj� ∈ C) − P (C)) i,j E(I(Xi� ∈ C) − P (C))2 = 4nP (C) (1 − P (C) 1 1 ≤ 2 ≤ 2 2 n t 2 nt 2 n. So, � � �� n � �1 � � � � � � � PX � � I(Xi ∈ CX ) − P (CX )� ≤ t/2�∃CX ≥ 1/2 �n � � i=1 if t ≥ � 2/n. Assume that the event � n � �1 � � � � I(Xi� ∈ CX ) − P (CX )� ≤ t/2 � �n � i=1 occurs. Recall that � n � �1 � � � � I(Xi ∈ CX ) − P (CX )� ≥ t. � �n � i=1 Hence, it must be that � n � n �1 � � 1� � � � I(Xi ∈ CX ) − I(Xi ∈ CX )� ≥ t/2. � �n � n i=1 i=1 21 Lecture 10 Symmetrization. Pessimistic VC inequality. 18.465 We conclude 1 2 � � �� n � �1 � � � � � � � ≤ PX � � I(Xi ∈ CX ) − P (CX )� ≤ t/2�∃CX �n � � i=1 � � �� n � n �1 � � � 1� � � � � ≤ PX � � I(Xi ∈ CX ) − I(Xi ∈ CX )� ≥ t/2�∃CX �n � � n i=1 i=1 � n � � � � n �1 � � � 1� � � � � PX � sup � I(Xi ∈ C) − I(Xi ∈ C)� ≥ t/2�∃CX . � � n i=1 C∈C � n i=1 Since indicators are 0, 1-valued, ⎛ ⎞ � n � ⎜ ⎟ �1 � � ⎟ 1 ⎜ � � ⎜ I ⎜ sup � I(Xi ∈ C) − P (C)� ≥ t⎟ ⎟ � 2 ⎝C∈C � n i=1 ⎠ �� � � ∃CX � n � � � n �1 � � � 1� � � � I(Xi ∈ C) − I(Xi� ∈ C)� ≥ t/2�∃CX · I (∃CX ) sup � � � n i=1 C∈C � n i=1 � n � � � n �1 � � 1� � � ≤ PX,X � sup � I(Xi ∈ C) − I(Xi� ∈ C)� ≥ t/2 . � n i=1 C∈C � n i=1 � ≤ PX � Now, take expectation with respect to Xi ’s to obtain � � � � n �1 � � � � PX sup � I(Xi ∈ C) − P (C)� ≥ t � C∈C � n i=1 � � � � n n �1 � � 1� � � � ≤ 2 · PX,X � sup � I(Xi ∈ C) − I(Xi ∈ C)� ≥ t/2 . � n C∈C � n i=1 i=1 � Theorem 10.1. If V C (C ) = V , then � � � � � �V n �1 � � nt2 2en � � I(Xi ∈ C) − P (C)� ≥ t ≤ 4 P sup � e− 8 . � V C∈C � n i=1 Proof. � � � n n �1 � � 1� � � 2P sup � I(Xi ∈ C) − I(Xi� ∈ C)� ≥ t/2 � n i=1 C∈C � n i=1 � � � � n �1 � � � � = 2P sup � εi (I(Xi ∈ C) − I(Xi� ∈ C))� ≥ t/2 � C∈C � n i=1 � � � � n �1 � � � � � = 2EX,X � Pε sup � εi (I(Xi ∈ C) − I(Xi ∈ C))� ≥ t/2 . � C∈C � n � i=1 22 Lecture 10 Symmetrization. Pessimistic VC inequality. 18.465 The first equality is due to the fact that Xi and Xi� are i.i.d., and so switching their names (i.e. introducing random signs εi , P (εi = ±1) = 1/2) does not have any effect. In the last line, it’s important to see that the probability is taken with respect to εi ’s, while Xi and Xi� ’s are fixed. By Sauer’s lemma, �2n (C, X1 , . . . , Xn , X1� , . . . , Xn� ) � ≤ 2en V �V . In other words, any class will be equivalent to one of C1 , . . . , CN on the data, where N ≤ � � � � n �1 � � � � εi (I(Xi ∈ C) − I(Xi� ∈ C))� ≥ t/2 2EX,X � Pε sup � � C∈C � n i=1 � � � � n �1 � � � � = 2EX,X � Pε sup � εi (I(Xi ∈ Ck ) − I(Xi� ∈ Ck ))� ≥ t/2 � 1≤k≤N � n i=1 � �N � n � � �� 1 � � = 2EX,X � Pε εi (I(Xi ∈ Ck ) − I(Xi� ∈ Ck ))� ≥ t/2 � �n � k=1 ≤ 2E N � k=1 � 2en �V V . Hence, i=1 union bound � �� n � �1 � � � � � εi (I(Xi ∈ Ck ) − I(Xi ∈ Ck ))� ≥ t/2 Pε � �n � i=1 ≤ 2E N � � Hoeffding’s inequality 2 2 −n t � 2 exp − �n 2 8 i=1 (I(Xi ∈ C) − I(Xi� ∈ C)) � � � �V N � nt2 2en −n2 t2 ≤ 2E 2 exp − ≤2 2e− 8 . 8n V k=1 k=1 � 23