Lecture 24 Bounds in terms of sparsity (example). 18.465

Lecture 24 Bounds in terms of sparsity (example). 18.465 In this lecture, we give another example of margin-sparsity bound involved with mixture-of-experts type of models. Let H be a set of functions hi : X → [−1, +1] with finite VC dimension. Let C1 , · · · , Cm m be partitions of H into m clusters H = i=1 Ci . The elements in the convex hull convH takes the form �T � � � � f = i=1 λi hi = c∈{C1 ,··· ,Cm } αc h∈c λch · h, where T � m, i λi = 1, αc = h∈c λh , and λch = λh /αc for h ∈ c. We can approximate f by g as follows. For each cluster c, let {Ykc }k=1,··· ,N be random variables � � c c such that ∀h ∈ c ⊂ H, we have P(Ykc = h) = λch . Then EYkc = h∈c λh · h. Let Zk = c αc Yk and � � � � � N N g = c αc N1 k=1 Ykc = N1 k=1 Zk . Then EZk = Eg = f . We define σc2 = var(Zk ) = c αc2 var(Ykc ), where � 2 var(Ykc ) = �Ykc − EYkc �2 = h∈c λch (h − EYhc ) . (If we define {Yk }k=1,··· ,N be random variables such that �N ∀h ∈ H, P(Yk = h) = λh , and define g = N1 k=1 Yk , we might get much larger var(Yk )). . C2 . . ... . C1 h .. C m Recall that a classifier takes the form y = sign(f (x)) and a classification error corresponds to yf (x) < 0. We can bound the error by P(yf (x) < 0) ≤ P(yg ≤ δ) + P(σc2 > r) + P(yg > δ|yf (x) ≤ 0, σc2 < r). (24.1) The third term on the right side of inequality 24.1 can be bounded in the following way, � P(yg > δ|yf (x) ≤ 0, σc2 < r) = ≤ ≤ ≤ ≤ set ≤ (24.2) As a result, ∀N ≥ 4·r δ2 N 1 � P (yZk − EyZk ) > δ − yf (x)|yf (x) ≤ 0, σc2 < r N k=1 � � N 1 � 2 P (yZk − EyZk ) > δ|yf (x) ≤ 0, σc < r N k=1 � � N 2 δ2 exp − ,Bernstein’s inequality 2N σc2 + 23 N δ · 2 � � 2 2 �� N δ N 2 δ2 exp − min , 4N σc2 83 N δ � � N δ2 exp − , for r small enough 4r � 1 . n log n, inequality 24.2 is satisfied. To bound the first term on the right side of inequality 24.1, we note that EY1 ,··· ,YN P(yg ≤ δ) ≤ EY1 ,··· ,YN Eφδ (yg) and En φδ (yg) ≤ Pn (yg ≤ 2δ) for some φδ : 61 Lecture 24 Bounds in terms of sparsity (example). 18.465 ϕδ (s) 1 δ 2δ Any realization of g = �Nm k=1 s Zk , where Nm depends on the number of clusters (C1 , · · · , Cm ), is a linear combination of h ∈ H, and g ∈ convNm H. According to lemma 20.2, (Eφδ (yg) − En φδ (yg)) / �� Eφδ (yg) ≤ K � n V Nm log /n + u/n δ � with probability at least 1 − e−u . Using a technique developed earlier in this course, and taking the union bound over all m, δ, we get, with probability at least 1 − Ke−u , � P(yg ≤ δ) ≤ K inf m,δ Pn (yg ≤ 2δ) + (Since EPn (yg ≤ 2δ) ≤ EPn (yf (x) ≤ 3δ) + EPn (σc2 ≥ r) + V · Nm n u log + n δ n 1 n � . with appropriate choice of N , based on the same reasoning as inequality 24.1, we can also control Pn (yg ≤ 2δ) by Pn (yf ≤ 3δ) and Pn (σc2 ≥ r) probabilistically). To bound the second term on the right side of inequality 24.1, we approximate σc2 by � �2 �N (1) (2) (1) (2) 2 σN = N1 k=1 12 Zk − Zk where Zk and Zk are independent copies of Zk . We have 2 EY (1,2) σN = 1,··· ,N �2 1 � (1) (2) Zk − Zk 1,··· ,N 2 varY (1,2) σc2 � �2 1 (1) (2) var Zk − Zk 4 �4 1 � (1) (2) E Zk − Zk ≤ 4 � � � �2 (1) (2) (1) (2) −1 ≤ Zk , Zk ≤ 1 ,and Zk − Zk ≤4 = � �2 (1) (2) ≤ E Zk − Zk = 2 varY (1,2) σN 1,··· ,N 2σc2 ≤ 2 · σc2 . 62 Lecture 24 Bounds in terms of sparsity (example). 18.465 We start with 2 2 PY1,··· ,N (σc2 ≥ 4r) ≤ PY (1,2) (σN ≥ 3r) + PY (1,2) (σc2 ≥ 4r|σN ≤ 3r) 1,··· ,N 1,··· ,N 2 ≤ EY (1,2) φr σN ≥ 3r + � � 1,··· ,N 1 n with appropriate choice of N , following the same line of reasoning as in inequality 24.1. We note that 2 2 2 2 PY1 ,··· ,YN (σN ≥ 3r) ≤ EY1 ,··· ,YN φr (σN ), and En φδ (σN ) ≤ Pn (σN ≥ 2r) for some φδ . ϕδ (s) 1 δ 2δ 3δ 4δ Since 2 σN s � � N � 2 1 � � � (1) (2) (1) (2) ∈{ αc hk,c − hk,c : hk,c , hk,c ∈ H} ⊂ convNm {hi · hj : hi , hj ∈ H}, 2N c k=1 and log D({hi · hj : hi , hj ∈ H}, �) ≤ KV log hj : hi , hj ∈ H}, �) ≤ KV · Nm · log 2 � by the assumption of our problem, we have log D(convNm {hi · 2 � by the VC inequality, and �� n 2 2 2 ) ≤ K Eφr (σN ) − En φr (σN ) / Eφr (σN V · Nm log /n + u/n r with probability at least 1 − e−u . Using a technique developed earlier in this course, and taking the union bound over all m, δ, r, with probability at least 1 − Ke−u , � � 1 V · Nm n u 2 P(σc2 ≥ 4r) ≤ K inf Pn (σN ≥ 2r) + + log + . m,δ,r n n δ n As a result, with probability at least 1 − Ke−u , we have � � V · min(rm /δ 2 , Nm ) n u 2 P(yf (x) ≤ 0) ≤ K · inf Pn (yg ≤ 2 · δ) + Pn (σN ≥ r) + log log n + r,δ,m n δ n for all f ∈ convH. 63

Lecture 24 Bounds in terms of sparsity (example). 18.465

Related documents

Products

Support

Lecture 24 Bounds in terms of sparsity (example). 18.465

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib