Lecture 10 Symmetrization. Pessimistic VC inequality. 18.465

advertisement
Lecture 10
Symmetrization. Pessimistic VC inequality.
18.465
We are interested in bounding
�
�
�
n
�1 �
�
�
�
I(Xi ∈ C) − P (C)� ≥ t
P sup �
�
C∈C � n i=1
�
In Lecture 7 we hinted at Symmetrization as a way to deal with the unknown P (C).
�
Lemma 10.1. [Symmetrization] If t ≥ n2 , then
�
�
�
�
�
�
�
�
n
n
n
�
�1 �
�
�1 �
1�
�
�
�
�
�
P sup �
I(Xi ∈ C) − P (C)� ≥ t ≤ 2P sup �
I(Xi ∈ C) −
I(Xi ∈ C)� ≥ t/2 .
�
�
n i=1
C∈C � n i=1
C∈C � n i=1
Proof. Suppose the event
�
�
n
�1 �
�
�
�
sup �
I(Xi ∈ C) − P (C)� ≥ t
�
C∈C � n i=1
� 1 �n
�
occurs. Let X = (X1 , . . . , Xn ) ∈ {supC∈C � n i=1 I(Xi ∈ C) − P (C)� ≥ t}. Then
�
�
n
�1 �
�
�
�
I(Xi ∈ CX ) − P (CX )� ≥ t.
∃CX such that �
�n
�
i=1
For a fixed C,
⎛�
⎞
�
�2
�� n
�
n
�1 �
�
�
1
�
�
PX � �
I(Xi� ∈ C) − P (C)� ≥ t/2 = P ⎝
I(Xi� ∈ C) − P (C)
≥ t2 /4⎠
�n
�
n
i=1
i=1
≤ (by Chebyshev’s Ineq)
=
=
since we chose t ≥
�
4
n 2 t2
n
�
i=1
4 �
n 2 t2
4E
� 1 �n
n
i=1
�2
I(Xi� ∈ C) − P (C)
t2
E(I(Xi� ∈ C) − P (C))(I(Xj� ∈ C) − P (C))
i,j
E(I(Xi� ∈ C) − P (C))2 =
4nP (C) (1 − P (C)
1
1
≤ 2 ≤
2
2
n t
2
nt
2
n.
So,
�
�
�� n
�
�1 �
�
�
�
�
�
�
PX � �
I(Xi ∈ CX ) − P (CX )� ≤ t/2�∃CX ≥ 1/2
�n
�
�
i=1
if t ≥
�
2/n. Assume that the event
� n
�
�1 �
�
�
�
I(Xi� ∈ CX ) − P (CX )� ≤ t/2
�
�n
�
i=1
occurs. Recall that
� n
�
�1 �
�
�
�
I(Xi ∈ CX ) − P (CX )� ≥ t.
�
�n
�
i=1
Hence, it must be that
� n
�
n
�1 �
�
1�
�
�
�
I(Xi ∈ CX ) −
I(Xi ∈ CX )� ≥ t/2.
�
�n
�
n
i=1
i=1
21
Lecture 10
Symmetrization. Pessimistic VC inequality.
18.465
We conclude
1
2
�
�
�� n
�
�1 �
�
�
�
�
�
�
≤ PX � �
I(Xi ∈ CX ) − P (CX )� ≤ t/2�∃CX
�n
�
�
i=1
�
�
�� n
�
n
�1 �
�
�
1�
�
�
�
�
≤ PX � �
I(Xi ∈ CX ) −
I(Xi ∈ CX )� ≥ t/2�∃CX
�n
�
�
n i=1
i=1
� n
�
�
�
�
n
�1 �
�
�
1�
�
�
�
�
PX � sup �
I(Xi ∈ C) −
I(Xi ∈ C)� ≥ t/2�∃CX .
�
�
n i=1
C∈C � n i=1
Since indicators are 0, 1-valued,
⎛
⎞
� n
�
⎜
⎟
�1 �
�
⎟
1 ⎜
�
�
⎜
I ⎜ sup �
I(Xi ∈ C) − P (C)� ≥ t⎟
⎟
�
2 ⎝C∈C � n i=1
⎠
��
�
�
∃CX
� n
�
�
�
n
�1 �
�
�
1�
�
�
�
I(Xi ∈ C) −
I(Xi� ∈ C)� ≥ t/2�∃CX · I (∃CX )
sup �
�
�
n i=1
C∈C � n i=1
� n
�
�
�
n
�1 �
�
1�
�
�
≤ PX,X � sup �
I(Xi ∈ C) −
I(Xi� ∈ C)� ≥ t/2 .
�
n i=1
C∈C � n i=1
�
≤ PX �
Now, take expectation with respect to Xi ’s to obtain
�
�
�
�
n
�1 �
�
�
�
PX sup �
I(Xi ∈ C) − P (C)� ≥ t
�
C∈C � n i=1
�
�
�
�
n
n
�1 �
�
1�
�
�
�
≤ 2 · PX,X � sup �
I(Xi ∈ C) −
I(Xi ∈ C)� ≥ t/2 .
�
n
C∈C � n
i=1
i=1
�
Theorem 10.1. If V C (C ) = V , then
�
�
�
�
�
�V
n
�1 �
�
nt2
2en
�
�
I(Xi ∈ C) − P (C)� ≥ t ≤ 4
P sup �
e− 8 .
�
V
C∈C � n i=1
Proof.
�
�
�
n
n
�1 �
�
1�
�
�
2P sup �
I(Xi ∈ C) −
I(Xi� ∈ C)� ≥ t/2
�
n i=1
C∈C � n i=1
�
�
�
�
n
�1 �
�
�
�
= 2P sup �
εi (I(Xi ∈ C) − I(Xi� ∈ C))� ≥ t/2
�
C∈C � n i=1
�
�
�
�
n
�1 �
�
�
�
�
= 2EX,X � Pε sup �
εi (I(Xi ∈ C) − I(Xi ∈ C))� ≥ t/2 .
�
C∈C � n
�
i=1
22
Lecture 10
Symmetrization. Pessimistic VC inequality.
18.465
The first equality is due to the fact that Xi and Xi� are i.i.d., and so switching their names (i.e. introducing
random signs εi , P (εi = ±1) = 1/2) does not have any effect. In the last line, it’s important to see that the
probability is taken with respect to εi ’s, while Xi and Xi� ’s are fixed.
By Sauer’s lemma,
�2n (C, X1 , . . . , Xn , X1� , . . . , Xn� )
�
≤
2en
V
�V
.
In other words, any class will be equivalent to one of C1 , . . . , CN on the data, where N ≤
�
�
�
�
n
�1 �
�
�
�
εi (I(Xi ∈ C) − I(Xi� ∈ C))� ≥ t/2
2EX,X � Pε sup �
�
C∈C � n i=1
�
�
�
�
n
�1 �
�
�
�
= 2EX,X � Pε
sup �
εi (I(Xi ∈ Ck ) − I(Xi� ∈ Ck ))� ≥ t/2
�
1≤k≤N � n i=1
�
�N � n
�
�
�� 1 �
�
= 2EX,X � Pε
εi (I(Xi ∈ Ck ) − I(Xi� ∈ Ck ))� ≥ t/2
�
�n
�
k=1
≤ 2E
N
�
k=1
� 2en �V
V
. Hence,
i=1
union bound
�
�� n
�
�1 �
�
�
�
�
εi (I(Xi ∈ Ck ) − I(Xi ∈ Ck ))� ≥ t/2
Pε �
�n
�
i=1
≤ 2E
N
�
�
Hoeffding’s inequality
2 2
−n t
�
2 exp − �n
2
8 i=1 (I(Xi ∈ C) − I(Xi� ∈ C))
�
�
�
�V
N
�
nt2
2en
−n2 t2
≤ 2E
2 exp −
≤2
2e− 8 .
8n
V
k=1
k=1
�
23
Download