Lecture 11 Optimistic VC inequality. 18.465

advertisement
Lecture 11
Optimistic VC inequality.
18.465
Last time we proved the Pessimistic VC inequality:
�
�
�
�
�V
n
�1 �
�
nt2
2en
�
�
I(Xi ∈ C) − P (C)� ≥ t ≤ 4
P sup �
e− 8 ,
�
V
C � n i=1
�
which can be rewritten with
�
t=
8
n
�
�
2en
log 4 + V log
+u
V
as
�
� � �
��
n
�1 �
�
8
2en
�
�
P sup �
I(Xi ∈ C) − P (C)� ≤
log 4 + V log
+u
≥ 1 − e−u .
�
n
V
C � n i=1
�
n
Hence, the rate is V log
. In this lecture we will prove Optimistic VC inequality, which will improve on
n
�
this rate when P (C) is small.
As before, we have pairs (Xi , Yi ), Yi = ±1. These examples are labeled according to some unknown C0 such
that Y = 1 if X = C0 and Y = 0 if X ∈
/ C0 .
Let C = {C : C ⊆ X }, a set of classifiers. C makes a mistake if
X ∈ C \ C0 ∪ C0 \ C = C�C0 .
Similarly to last lecture, we can derive bounds on
�
�
n
�1 �
�
�
�
sup �
I(Xi ∈ C�C0 ) − P (C�C0 )� ,
�
�
n i=1
C
where P (C �C0 ) is the generalization error.
Let C � = {C�C0 : C ∈ C}. One can prove that V C(C � ) ≤ V C(C) and �n (C � , X1 , . . . , Xn ) ≤ �n (C, X1 , . . . , Xn ).
By Hoeffding-Chernoff, if P (C) ≤ 12 ,
�
n
1�
P P (C) −
I(Xi ∈ C) ≤
n i=1
�
2P (C) t
n
�
≥ 1 − e−t .
Theorem 11.1. [Optimistic VC inequality]
�
�
�n
�
�V
P (C) − n1 i=1 I(Xi ∈ C)
nt2
2en
�
P sup
≥t ≤4
e− 4 .
V
P (C)
C
Proof. Let C be fixed. Then
�
�
n
1�
1
�
P(Xi� )
I(Xi ∈ C) ≥ P (C) ≥
n i=1
4
�
�n
n
whenever P (C) ≥ n1 . Indeed, P (C) ≥ n1 since i=1 I(Xi� ∈ C) ≥ nP (C) ≥ 1. Otherwise P ( i=1 I(Xi� ∈ C) = 0) =
�n
n
�
/ C) = (1 − P (C)) can be as close to 0 as we want.
i=1 P (Xi ∈
Similarly to the proof of the previous lecture, let
�
�
�n
P (C) − n1 i=1 I(Xi ∈ C)
�
(Xi ) ∈ sup
≥t .
P (C)
C
24
Lecture 11
Optimistic VC inequality.
Hence, there exists CX such that
18.465
�n
I(Xi ∈ C)
� i=1
≥ t.
P (C)
P (C) −
1
n
P (CX ) −
1
n
Exercise 1. Show that if
and
�n
I(Xi ∈ CX )
� i=1
≥t
P (CX )
n
1�
I(Xi� ∈ CX ) ≥ P (CX ) ,
n i=1
then
1
n
�n
i=1
I(Xi� ∈ CX ) −
1
n
�n
n
Hint: use the fact that φ(s) =
s−a
√
s
I(Xi ∈ CX )
t
≥√ .
�
2
i=1 I(Xi ∈ CX )
i=1
� �
n
1
�n
I(Xi ∈ CX ) + n1
√
= s − √ss is increasing in s.
i=1
From the above exercise it follows that
�
� n
�
�
1
1�
�
�
≤ P(X � )
I(Xi ∈ CX ) ≥ P (CX ) �∃CX
�
4
n i=1
⎛
⎞
�
�n
�n
�
1
1
�
t �
i=1 I(Xi ∈ CX ) − n
i=1 I(Xi ∈ CX )
≤ P(X � ) ⎝ �n �
≥ √ �∃CX ⎠
�n
n
1
1
�
2�
i=1 I(Xi ∈ CX ) + n
i=1 I(Xi ∈ CX )
n
Since indicator is 0, 1-valued,
⎛
⎞
�n
⎜
⎟
⎟
⎜
P (C) − n1 i=1 I(Xi ∈ C)
1 ⎜
�
I sup
≥ t⎟
⎟
4 ⎜
P
(C)
C
⎠
⎝
�
��
�
∃CX
⎛
⎞
�
�
∈ CX ) −
t �
i=1 I(Xi ∈ CX )
≤ P(X � ) ⎝ � �
≥ √ �∃CX ⎠ · I (∃CX )
�n
n
1
1
�
2�
i=1 I(Xi ∈ CX ) + n
i=1 I(Xi ∈ CX )
n
⎛
⎞
�n
�n
1
1
�
I(X
∈
C)
−
I(X
∈
C)
t
i
i
i=1
i=1
n
≤ P(X � ) ⎝sup �n �
≥ √ ⎠.
�n
n
1
1
2
C
I(Xi ∈ C) +
I(X � ∈ C)
1
n
�n
�
i=1 I(Xi
n
i=1
1
n
�n
n
i=1
i
Hence,
�
�
�n
P (CX ) − n1 i=1 I(Xi ∈ CX )
1
�
P sup
≥t
4
P (CX )
C
⎛
⎞
�n
�n
1
1
�
I(X
∈
C)
−
I(X
∈
C)
t
i
i
i=1
i=1
n
≤ P ⎝sup �n �
≥√ ⎠
�n
n
1
1
� ∈ C)
2
C
I(X
∈
C)
+
I(X
i
i
i=1
i=1
n
n
⎛
⎞
�n
1
�
εi (I(Xi ∈ C) − I(Xi ∈ C))
t
≥ √ ⎠.
= EPε ⎝sup � n� i=1
�n
n
1
1
�
2
C
i=1 I(Xi ∈ C) + n
i=1 I(Xi ∈ C)
n
25
Lecture 11
Optimistic VC inequality.
18.465
There exist C1 , . . . , CN , with N ≤ �2n (C, X1 , . . . , Xn , X1� , . . . , Xn� ). Therefore,
⎛
⎞
�n
1
�
ε
(I(X
∈
C)
−
I(X
∈
C))
t
i
i
i
EPε ⎝sup � n� i=1
≥√ ⎠
�n
n
1
1
� ∈ C)
2
C
I(X
∈
C)
+
I(X
i
i
i=1
i=1
n
n
⎧
⎫⎞
⎛
�n
⎬
1
�
⎨
εi (I(Xi ∈ Ck ) − I(Xi ∈ Ck ))
t ⎠
� n i=1
=EPε ⎝
≥√
⎩ 1 �n I(X ∈ C ) + 1 �n I(X � ∈ C )
2⎭
k≤N
i
i=1
n
⎛
k
n
i=1
i
k
⎞
�
ε
(I(X
∈
C
)
−
I
(X
∈
C
))
t
i
k
i
k
i
≤E
Pε ⎝ � � i=1
≥√ ⎠
�n
n
1
1
�
2
k=1
i=1 I(Xi ∈ Ck ) + n
i=1 I(Xi ∈ Ck )
n
�
⎛
⎞
� n
N
n
n
�
�
�
�
�
t
1
1
1
εi (I(Xi� ∈ Ck ) − I(Xi ∈ Ck )) ≥ √ �
I(Xi ∈ Ck ) +
I(Xi� ∈ Ck )⎠
=E
Pε ⎝
n i=1
n i=1
2 n i=1
N
�
1
n
�n
k=1
The last expression can be upper-bounded by Hoeffding’s inequality as follows:
�
⎛
⎞
� n
N
n
�
�
�
�
1
t
1
E
Pε ⎝
εi (I(Xi� ∈ Ck ) − I(Xi ∈ Ck )) ≥ √ �
(I(Xi ∈ Ck ) + I(Xi� ∈ Ck ))⎠
n i=1
2 n i=1
k=1
≤E
N
�
�
exp −
k=1
t2 n1
�n
2 n12 2
i=1
(I(Xi ∈ Ck ) + I(Xi� ∈ Ck ))
�
�
2
(I(Xi� ∈ Ck ) − I(Xi ∈ Ck ))
since upper sum in the exponent is bigger than the lower sum (compare term-by-term)
≤E
N
�
k=1
�
2
− nt4
e
≤
2en
V
�V
e−
nt2
4
.
�
26
Download