Lecture 17 Covering numbers of the convex hull. 18.465

advertisement
Lecture 17
Covering numbers of the convex hull.
18.465
Consider the classification setting, i.e. Y = {−1, +1}. Denote the set of weak classifiers
H = {h : X �→ [−1, +1]}
and assume H is a VC-subgraph. Hence, D(H, ε, dx ) ≤ K · V log 2/ε. A voting algorithm outputs
f=
T
�
λi hi , where hi ∈ H,
i=1
T
�
λi ≤ 1, λi > 0.
i=1
Let
F = conv H =
� T
�
λi hi , hi ∈ H,
i=1
T
�
�
λi ≤ 1, λi ≥ 0, T ≥ 1 .
i=1
Then sign(f (x)) is the prediction of the label y. Let
� d
�
T
�
�
Fd = convd H =
λi hi , hi ∈ H,
λi ≤ 1, λi ≥ 0 .
i=1
i=1
Theorem 17.1. For any x = (x1 , . . . , xn ), if
log D(H, ε, dx ) ≤ KV log 2/ε
then
log D(convd H, ε, dx ) ≤ KV d log 2/ε.
Proof. Let h1 , . . . , hD be ε-packing of H with respect to dx , D = D(H, ε, dx ).
Note that dx is a norm.
�
dx (f, g) =
If f =
�d
i=1
n
1�
(f (xi ) − g(xi ))2
n i=1
�1/2
= �f − g�x .
�d
λi hi , for all hi we can find hki such that d(hi , hki ) ≤ ε. Let f � = i=1 λi hki . Then
� d
�
d
��
�
�
�
�
�
ki �
d(f, f ) = �f − f �x = �
λi (hi − h )� ≤
λi �hi − hki �x ≤ ε.
�
�
i=1
x
i=1
Define
FD,d =
� d
�
1
D
λi hi , hi ∈ {h , . . . , h },
i=1
d
�
�
λi ≤ 1, λi ≥ 0 .
i=1
Hence, we can approximate any f ∈ Fd by f � ∈ FD,d within ε.
�d
Now, let f = i=1 λi hi ∈ FD,d and consider the following construction. We will choose Y1 (x), . . . , Yk (x)
from h1 , . . . , hd according to λ1 , . . . , λd :
P (Yj (x) = hi (x)) = λi and P (Yj (x) = 0) = 1 −
d
�
λi .
i=1
Note that with this construction
EYj (x) =
d
�
λi hi (x) = f (x).
i=1
42
Lecture 17
Covering numbers of the convex hull.
18.465
Furthermore,
�
�2
⎞2
⎛
� �
�
n
k
�
�
�1 k
�
1
⎝1
E�
Yj − f �
Yj (xi ) − f (xi )⎠
�k
� = En
k j=1
� j=1
�
i=1
x
⎞2
⎛
n
k
�
�
1
1
=
E⎝
(Yj (xi ) − EYj (xi ))⎠
n i=1
k j=1
=
n
k
1� 1 �
E(Yj (xi ) − EYj (xi ))2
n i=1 k 2 j=1
≤
4
k
because |Yj (xi ) − EYj (xi )| ≤ 2. Choose k = 4/ε2 . Then
�
�2
⎛
⎞2
� k
�
k
�
�1 �
�
⎝1
E�
Yj − f �
Yj , f ⎠ ≤ ε2 .
�k
� = Edx k
� j=1
�
j=1
x
So, there exists a deterministic combination
1
k
�k
j=1 Yj
such that dx ( k1
�k
j=1
Yj , f ) ≤ ε.
Define
�
FD,d
=
⎧
k
⎨1 �
⎩k
Yj : k = 4/ε2 , Yj ∈ {h1 , . . . , hd } ⊆ {h1 , . . . , hD }
⎭
j=1
Hence, we can approximate any f =
⎫
⎬
�d
i=1 λi hi
�
∈ FD,d , hi ∈ {h1 , . . . , hD }, by f � ∈ FD,d
within ε.
�
Let us now bound the cardinality of FD,d
. To calculate the number of ways to choose k functions out of
h1 , . . . , hd , assume each of hi is chosen kd times such that k = k1 + . . . + kd . We can formulate the problem
as finding the number of strings of the form
. . . 0� 1 �00 ��
. . . 0
. . . 0
�00 ��
� 1 . . . 1 �00 ��
�.
k1
k2
kd
43
Lecture 17
Covering numbers of the convex hull.
18.465
In this string, there are d − 1 ”1”s and k ”0”s, and total length is k + d − 1. The number of such strings is
�k+d−1�
. Hence,
k
� � �
�
D
k+d
�
×
card FD,d ≤
d
k
DD−d Dd (k + d)k+d
− d)D−d
k k dd
�
�d �
�D−d �
�k
D(k + d)
D
k+d
=
d2
D−d
k
�
�d �
�D−d �
�k
D(k + d)
d
d
=
1
+
1
+
d2
D−d
k
≤
dd (D
using inequality 1 + x ≤ ex
�
�d
D(k + d)e2
≤
d2
where k = 4/ε2 and D = D(F, ε, dx ).
�
Therefore, we can approximate any f ∈ Fd by f �� ∈ FD,d within ε and f �� ∈ FD,d by f � ∈ FD,d
within ε.
�
Hence, we can approximate any f ∈ Fd by f � ∈ FD,d
within 2ε. Moreover,
e2 D(k + d)
log N (Fd = convd H, 2ε, dx ) ≤ d log
d2
�
�
k+d
= d 2 + log D + log 2
d
�
�
��
2
4
≤ d 2 + KV log + log 1 + 2
ε
ε
2
≤ KV d log
ε
since
k+d
d2
≤ 1 + k and d ≥ 1, V ≥ 1.
�
44
Download