Lecture 22 Bounds on the generalization error of voting classifiers. Theorem 22.1. With probability at least 1 − e−t , for any T ≥ 1 and any f = P (yf (x) ≤ 0) ≤ inf � ε+ δ∈(0,1) �� � � V min(T,(log n)/δ 2 ) log n t δ where ε = ε(δ) = K + n n . Here we used the notation Pn (C) = 1 n Remark: �n i=1 � Pn (yf (x) ≤ δ) + ε2 18.465 �T i=1 λ i hi , �2 I(xi ∈ C). ⎛ ⎞ ⎜ V min(T, (log n)/δ 2 ) log nδ t⎟ P (yf (x) ≤ 0) ≤ inf K ⎜ P (yf (x) ≤ δ) + + ⎟ n ⎝� ⎠. �� � � n n δ∈(0,1) �� � inc. with δ dec. with δ �T � k Proof. Let f = i=1 λi hi , g = k1 j =1 Yj , where P (Yj = hi ) = λi and P (Yj = 0) = 1 − T � λi i=1 as in Lecture 17. Then EYj (x) = f (x). P (yf (x) ≤ 0) = P (yf (x) ≤ 0, yg(x) ≤ δ) + P (yf (x) ≤ 0, yg(x) > δ) ≤ P (yg(x) ≤ δ) + P (yg(x) > δ | yf (x) ≤ 0) ⎛ ⎞ k � � � � 1 � Yj (x) > δ � yEY Yj (x) ≤ 0⎠ P yg(x) > δ � yf (x) ≤ 0 = Ex PY ⎝y k j=1 � Shift Y ’s to [0, 1] by defining Yj� = yYj +1 . 2 Then ⎛ ⎞ k � � 1 1 δ 1 � Y � ≥ + � EYj� ≤ ⎠ P (yg(x) > δ|yf (x) ≤ 0) = Ex PY ⎝ k j=1 j 2 2 2 ⎛ ⎞ k � � 1 δ 1 � Y � ≥ EY1� + � EYj� ≤ ⎠ ≤ Ex PY ⎝ k j=1 j 2 2 � � ≤ (by Hoeffding’s ineq.) Ex e−kD(EY1 + 2 ,EY1 ) ≤ Ex e−kδ 2 /2 = e−kδ 2 δ /2 because D(p, q) ≥ 2(p − q)2 (KL-divergence for binomial variables, Homework 1) and, hence, � � � �2 δ δ D EY1� + , EY1� ≥ 2 = δ 2 /2. 2 2 We therefore obtain (22.1) P (yf (x) ≤ 0) ≤ P (yg(x) ≤ δ) + e−kδ 2 /2 55 Lecture 22 Bounds on the generalization error of voting classifiers. 18.465 and the second term in the bound will be chosen to be equal to 1/n. Similarly, we can show Pn (yg(x) ≤ 2δ) ≤ Pn (yf (x) ≤ 3δ) + e−kδ Choose k such that e−kδ 2 /2 = 1/n, i.e. k = 2 δ2 2 /2 . log n. Now define ϕδ as follows: ϕδ (s) 1 δ 2δ s Observe that I(s ≤ δ) ≤ ϕδ (s) ≤ I(s ≤ 2δ). (22.2) By the result of Lecture 21, with probability at least 1 − e−t , for all k, δ and any g ∈ Fk = conv k (H), � � �n n Eϕδ (yg(x)) − n1 i=1 ϕδ (yi g(xi )) 1� � Φ Eϕδ , ϕδ = n i=1 Eϕδ (yg(x)) �� � � V k log nδ t + ≤K n n = ε/2. Note that Φ(x, y) = x√ −y x is increasing with x and decreasing with y. By inequalities (22.1) and (22.2), Eϕδ (yg(x)) ≥ P (yg(x) ≤ δ) ≥ P (yf (x) ≤ 0) − 1 n and n 1 1� ϕδ (yi g(xi )) ≤ Pn (yg(x) ≤ 2δ) ≤ Pn (yf (x) ≤ 3δ) + . n i=1 n By decreasing x and increasing y in Φ(x, y), we decrease Φ(x, y). Hence, ⎛ ⎞ �� ⎜ ⎟ V k log 1 1 ⎟ Φ⎜ ⎝P (yf (x) ≤ 0) − n , Pn (yf (x) ≤ 3δ) + n ⎠ ≤ K n � � �� � � �� x where k = 2 δ2 n δ � � t + n y log n. 56 Lecture 22 If x−y √ x Bounds on the generalization error of voting classifiers. 18.465 ≤ ε, we have � x≤ ε + 2 �� � ε 2 2 �2 +y So, 1 P (yf (x) ≤ 0) − ≤ n � ε + 2 �2 �� � ε 2 1 + Pn (yf (x) ≤ 3δ) + . 2 n � 57