# Lecture 31 Optimistic VC inequality for random classes of sets. 18.465 √

```Lecture 31
Optimistic VC inequality for random classes of sets.
18.465
√
As in the previous lecture, let F = {(w, φ(x))H , �w� ≤ 1}, where φ(x) = ( λi φi (x))i≥1 , X ⊂ Rd .
Deﬁne d(f, g) = �f − g�∞ = supx∈X |f (x) − g(x)|.
The following theorem appears in Cucker &amp; Smale:
Theorem 31.1. ∀h ≥ d,
�
log N (F, ε, d) ≤
Ch
ε
� 2d
h
where Ch is a constant.
Note that for any x1 , . . . , xn ,
�
dx (f, g) =
n
1�
(f (xi ) − g(xi ))2
n i=1
�1/2
≤ d(f, g) = sup |f (x) − g(x)| ≤ ε.
x
Hence,
N (F, ε, dx ) ≤ N (F, ε, d).
Assume the loss function L(y, f (x)) = (y − f (x))2 . The loss class is deﬁned as
L(y, F ) = {(y − f (x))2 , f ∈ F}.
Suppose |y − f (x)| ≤ M . Then
|(y − f (x))2 − (y − g(x))2 | ≤ 2M |f (x) − g(x)| ≤ ε.
So,
�
�
ε
N (L(y, F), ε, dx ) ≤ N F,
, dx
2M
and
�
log N (L(y, F), ε, dx ) ≤
α=
2d
h
2M Ch
ε
� 2d
h
�
=
2M Ch
ε
�α
&lt; 2 (see Homework 2, problem 4).
Now, we would like to use speciﬁc form of solution for SVM: f (x) =
�n
i=1
αi K(xi , x), i.e. f belongs to a
random subclass. We now prove a VC inequality for random collection of sets.
Let’s consider C(x1 , . . . , xn ) = {C : C ⊆ X } - random collection of sets. Assume that C(x1 , . . . , xn ) satisﬁes:
(1) C(x1 , . . . , xn ) ⊆ C(x1 , . . . , xn , xn+1 )
(2) C(π(x1 , . . . , xn )) = C(x1 , . . . , xn ) for any permutation π.
Let
�C (x1 , . . . , xn ) = card {C ∩ {x1 , . . . , xn }; C ∈ C}
and
G(n) = E�C(x1 ,...,xn ) (x1 , . . . , xn ).
81
Lecture 31
Optimistic VC inequality for random classes of sets.
18.465
Theorem 31.2.
�
P (C) −
sup
P
�
�n
nt2
i=1 I(xi ∈ C)
�
≥ t ≤ 4G(2n)e− 4
P (C)
1
n
C∈C(x1 ,...,xn )
Consider event
�
Ax =
x = (x1 , . . . , xn ) :
P (C) −
sup
C∈C(x1 ,...,xn )
�
�n
i=1 I(xi ∈ C)
�
≥t
P (C)
1
n
So, there exists Cx ∈ C(x1 , . . . , xn ) such that
P (Cx ) −
�n
I(xi ∈ Cx )
� i=1
≥ t.
P (Cx )
1
n
For x�1 , . . . , x�n , an independent copy of x,
�
Px�
if P (Cx ) ≥
1
n
�
n
1�
1
�
P (Cx ) ≤
I(xi ∈ Cx ) ≥
n i=1
4
(which we can assume without loss of generality).
Together,
n
P (Cx ) ≤
1�
I(x�i ∈ Cx )
n i=1
and
P (Cx ) −
�n
I(xi ∈ Cx )
� i=1
≥t
P (Cx )
1
n
imply
1
n
�n
�n
1
�
i=1 I(xi ∈ Cx ) − n
i=1 I(xi ∈ Cx )
� �
≥ t.
n
1
�
i=1 (I(xi ∈ Cx ) + I(xi ∈ Cx ))
2n
Indeed,
�n
I(xi ∈ Cx )
� i=1
0&lt;t≤
P (Cx )
�n
P (Cx ) − n1 i=1 I(xi ∈ Cx )
≤� �
�
�n
1
1
i=1 I(xi ∈ Cx )
2 P (Cx ) + n
�n
�n
1
1
�
i=1 I(xi ∈ Cx ) − n
i=1 I(xi ∈ Cx )
≤ � n� �
�
�n
n
1 1
1
�
i=1 I(xi ∈ Cx ) + n
i=1 I(xi ∈ Cx )
2 n
P (Cx ) −
1
n
82
Lecture 31
Optimistic VC inequality for random classes of sets.
18.465
Hence, multiplying by an indicator,
�
�
n
1
1�
�
&middot; I(x ∈ Ax ) ≤ Px� P (Cx ) ≤
I(xi ∈ Cx ) &middot; I(x ∈ Ax )
4
n i=1
⎛
⎞
�n
�n
1
1
�
i=1 I(xi ∈ Cx ) − n
i=1 I(xi ∈ Cx )
⎠
≤ Px� ⎝ � n� �
� ≥t
�n
n
1 1
1
�
I(x
∈
C
)
+
I(x
∈
C
)
x
i
x
i
i=1
i=1
2 n
n
⎛
⎞
�n
�n
1
1
�
I(x
∈
C
)
−
I(x
∈
C
)
x
i
x
i
i=1
i=1
n
⎠
� n� �
≤ Px� ⎝
sup
� ≥t
�n
n
1 1
1
�
C∈C(x1 ,...,xn )
I(x
∈
C
)
+
I(x
∈
C
)
x
i
x
i
i=1
i=1
2 n
n
Taking expectation with respect to x on both sides,
�
�
�n
P (C) − n1 i=1 I(xi ∈ C)
�
≥ t
P
sup
P (C)
C∈C(x1 ,...,xn )
⎛
⎞
�n
�n
1
1
�
i=1 I(xi ∈ Cx ) − n
i=1 I(xi ∈ Cx )
⎠
� n� �
≤ 4P ⎝
sup
� ≥t
�n
n
1
1
1
�
C∈C(x1 ,...,xn )
I(x
∈
C
)
+
I(x
∈
C
)
x
i
x
i
i=1
i=1
2 n
n
⎛
⎞
�
�
n
n
1
1
�
I(x
∈
C
)
−
I(x
∈
C
)
x
i
x
i
i=1
i=1
n
⎠
� n� �
≤ 4P ⎝
sup
� ≥t
�n
n
1 1
1
�
C∈C(x1 ,...,xn ,x�1 ,...,x�n )
I(x
∈
C
)
+
I(x
∈
C
)
x
i
x
i
i=1
i=1
2 n
n
⎛
⎞
�
n
1
�
ε
(I(x
∈
C
)
−
I(x
∈
C
))
i
x
i
x
i
⎠
� � n � i=1
= 4P ⎝
sup
� ≥t
�n
n
1 1
1
�
C∈C(x1 ,...,xn ,x�1 ,...,x�n )
i=1 I(xi ∈ Cx ) + n
i=1 I(xi ∈ Cx )
2 n
⎛
⎞
�n
1
�
ε
(I(x
∈
C
)
−
I(x
∈
C
))
i
x
i
x
i
⎠
� � n � i=1
= 4EPε ⎝
sup
� ≥t
�n
n
1 1
1
�
C∈C(x1 ,...,xn ,x�1 ,...,x�n )
i=1 I(xi ∈ Cx ) + n
i=1 I(xi ∈ Cx )
2 n
By Hoeﬀding,
⎛
⎞
�
ε
(I(x
∈
C
)
−
I(x
∈
C
))
i
x
i
x
i
⎠
� � � i=1
4EPε ⎝
sup
� ≥t
�n
n
1 1
1
�
C∈C(x1 ,...,xn ,x�1 ,...,x�n )
i=1 I(xi ∈ Cx ) + n
i=1 I(xi ∈ Cx )
2 n
⎛
1
n
�n
⎜
⎜
≤ 4E�C(x1 ,...,xn ,x�1 ,...,x�n ) (x1 , . . . , xn , x�1 , . . . , x�n ) &middot; exp ⎜−
�
⎝
�
√
2
≤ 4E�C(x1 ,...,xn ,x�1 ,...,x�n ) (x1 , . . . , xn , x�1 , . . . , x�n ) &middot; e−
= 4G(2n)e−
t2
1
2n
1
�
n (I(xi ∈Cx )−I(xi ∈Cx ))
Pn
�
i=1 (I(xi ∈Cx )+I(xi ∈Cx ))
⎞
⎟
⎟
�2 ⎟
⎠
nt2
4
nt2
4
83
```