Lecture 30 Generalization bounds for kernel methods. 18.465

Lecture 30 Generalization bounds for kernel methods. 18.465 Let X ⊂ Rd be a compact subset. Assume x1 , . . . , xn are i.i.d. and y1 , . . . , yn = ±1 for classification and �∞ [−1, 1] for regression. Assume we have a kernel K(x, y) = i=1 λi φi (x)φi (y), λi > 0. Consider a map x ∈ X �→ φ(x) = ( � λ1 φ1 (x), . . . , � λk φk (x), . . .) = ( � λk φk (x))k≥1 ∈ H where H is a Hilbert space. Consider the scalar product in H: (u, v)H = �∞ i=1 ui vi and �u�H = � (u, v)H . For x, y ∈ X , (φ(x), φ(y))H = ∞ � λi φi (x)φi (y) = K(x, y). i=1 Function φ is called feature map. Family of classifiers: FH = {(w, z)H : �w�H ≤ 1}. F = {(w, φ(x))H : �w�H ≤ 1} � f : X �→ R. Algorithms: (1) SVMs f (x) = n � αi K(xi , x) = ( i=1 n � αi φ(xi ), φ(x))H i=1 � �� w � Here, instead of taking any w, we only take w as a linear combination of images of data points. We have a choice of Loss function L: • L(y, f (x)) = I(yf (x) ≤ 0) – classification • L(y, f (x)) = (y − f (x))2 – regression (2) Square-loss regularization Assume an algorithm outputs a classifier from F (or FH ), f (x) = (w, φ(x))H . Then, as in Lecture 20, � � n n 1� 1� P (yf (x) ≤ 0) ≤ Eϕδ (yf (x)) = ϕδ (yi f (xi )) + Eϕδ (yf (x)) − ϕδ (yi f (xi )) n i=1 n i=1 � � n n 1� 1� ≤ I(yi f (xi ) ≤ δ) + sup Eϕδ (yf (x)) − ϕδ (yi f (xi )) n i=1 n i=1 f ∈F By McDiarmid’s inequality, with probability at least 1 − e−t � � � � � n n 1� 1� 2t sup Eϕδ (yf (x)) − ϕδ (yi f (xi )) ≤ E sup Eϕδ (yf (x)) − ϕδ (yi f (xi )) + n n n f ∈F f ∈F i=1 i=1 79 Lecture 30 Generalization bounds for kernel methods. 18.465 Using the symmetrization technique, � � � � n n �1 � � �� 1� � E sup E(ϕδ (yf (x)) − 1) − (ϕδ (yi f (xi )) − 1) ≤ 2E sup � εi ϕδ (yi f (xi )) − 1 � . � n i=1 f ∈F f ∈F � n i=1 Since δ · (ϕδ − 1) is a contraction, � n � � n � �1 � � �1 � � �� 2 � � � 2E sup � εi ϕδ (yi f (xi )) − 1 � ≤ 2E sup � εi yi f (xi )� � δ � f ∈F � n i=1 f ∈F � n i=1 � n � � n � �1 � � 4 �1 � � 4 � � � � = E sup � εi f (xi )� = E sup � εi (w, φ(xi ))H � � δ �w�≤1 � n � δ f ∈F � n i=1 i=1 � � � � n n � �� 4 4 � � � � E sup �(w, εi φ(xi ))H � = E sup � εi φ(xi )� = � δn �w�≤1 � � δn �w�≤1 � i=1 i=1 H �� n �� n � � 4 � 4 = E� εi φ(xi ), εi φ(xi ) = E εi εj (φ(xi ), φ(xi ))H δn δn i=1 i=1 i,j H �� 4 4 = E εi εj K(xi , xj ) ≤ E εi εj K(xi , xj ) δn δn i,j i,j � � � n � 4 � 4 EK(x1 , x1 ) � = EK(xi , xi ) = δn i=1 δ n Putting everything together, with probability at least 1 − e−t , n 1� 4 P (yf (x) ≤ 0) ≤ I(yi f (xi ) ≤ δ) + n i=1 δ � EK(x1 , x1 ) + n � 2t . n Before the contraction step, we could have used Martingale method again to have Eε only. Then EK(x1 , x1 ) �n in the above bound will become n1 i=1 K(xi , xi ). 80

Lecture 30 Generalization bounds for kernel methods. 18.465

Related documents

Products

Support

Lecture 30 Generalization bounds for kernel methods. 18.465

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib