Lecture 21 Bounds on the generalization error of voting classifiers. 18.465

advertisement
Lecture 21
Bounds on the generalization error of voting classifiers.
18.465
We continue to prove the lemma from Lecture 20:
�d
Lemma 21.1. Let Fd = convd H = { i=1 λi hi , hi ∈ H} and fix δ ∈ (0, 1]. Then
�
��
� ��
dV log nδ
t
Eϕδ − ϕ¯δ
≥ 1 − e−t .
≤K
+
P ∀f ∈ Fd , √
n
n
Eϕδ
Proof. We showed that
log D(ϕδ (yFd ) , ε/2, dx,y ) ≤ KdV log
2
.
εδ
By the result of Lecture 16,
n
1�
k
Eϕδ (yf (x)) −
ϕδ (yi f (xi )) ≤ √
n i=1
n
�
√
Eϕδ
�
log
1/2
D(ϕδ (yFd (x)) , ε)dε +
0
tEϕδ
n
with probability at least 1 − e−t . We have
� √Eϕδ
� √Eϕδ �
k
k
2
1/2
√
log D(ϕδ (yFd (x)) , ε)dε ≤ √
dV log dε
εδ
n 0
n 0
�
� √
k 2 δ Eϕδ /2 √
1
=√
dV log dx
x
nδ 0
�
k 2√
δ�
2
≤√
dV 2
Eϕδ log √
2
nδ
δ Eϕδ
= x, ε = 2x
. Without loss of generality, assume Eϕδ ≥ 1/n.
� δ
Otherwise, we’re doing better than in Lemma: √EE ≤ logn n ⇒ E ≤ logn n . Hence,
�
√
� √Eϕδ
k
dV Eϕδ
2 n
1/2
√
log D(ϕδ (yFd (x)) , ε)dε ≤ K
log
n
δ
n 0
�
dV Eϕδ
n
≤K
log
n
δ
where we have made a change of variables
2
εδ
So, with probability at least 1 − e−t ,
n
1�
Eϕδ (yf (x)) −
ϕδ (yi f (xi )) ≤ K
n i=1
�
dV Eϕδ (yf (x))
n
log +
n
δ
�
tEϕδ (yf (x))
n
which concludes the proof.
�
The above lemma gives a result for a fixed d ≥ 1 and δ ∈ (0, 1]. To obtain a uniform result, it’s enough
to consider δ ∈ Δ = {2−k , k ≥ 1} and d ∈ {1, 2, . . .}. For a fixed δ and d, use the Lemma above with tδ,d
defined by e−tδ,d = e−t d26πδ2 . Then
�
P ∀f ∈ Fd , . . . +
and
⎛
P⎝
�
d,δ
∀f ∈ Fd , . . . +
�
tδ,d
n
�
�
≥ 1 − e−tδ,d = 1 − e−t
6δ
d2 π 2
�⎞
�
tδ,d ⎠
6δ
≥1−
e−t 2 2 = 1 − e−t .
n
d π
δ,d
53
Lecture 21
Bounds on the generalization error of voting classifiers.
Since tδ,d = t + log
18.465
d2 π 2
6δ ,
�
⎞
⎛�
2 2
t + log d 6δπ
dV log nδ
Eϕδ − ϕ¯δ
⎠
∀f ∈ Fd , √
+
≤K⎝
n
n
Eϕδ
��
�
� �
dV log nδ
d2 π 2
t
≤K
+ log
+
n
n
6δ
��
� �
n
dV
log
t
δ
≤ K�
+
n
n
�
2 2
dV log n
δ
since log d 6δπ , the penalty for union-bound, is much smaller than
.
n
Recall the bound on the misclassification error
n
1 �
P (yf (x) ≤ 0) ≤
I(yi f (xi ) ≤ δ) +
n i=1
If
�
�
n
1�
Eϕδ (yf (x)) −
ϕδ (yi f (xi )) .
n i=1
�n
Eϕδ − n1 i=1 ϕδ
√
≤ ε,
Eϕδ
then
n
Eϕδ − ε
�
Eϕδ −
1�
ϕδ ≤ 0.
n i=1
Hence,
�
�
�� �
n
�
ε
ε 2 1�
Eϕδ ≤ + �
+
ϕδ
2
2
n i=1
Eϕδ ≤ 2
� ε �2
2
n
+2
1�
ϕδ .
n i=1
The bound becomes
⎛
⎞
n
⎜ 1 �
dV
n t⎟
⎜
⎟
P (yf (x) ≤ 0) ≤ K ⎜
I(yi f (xi ) ≤ δ) +
log + ⎟
n
δ
n
⎝ n i=1
⎠
� �� �
(*)
where K is a rough constant.
(*) not satisfactory because in boosting the bound should get better when the number of functions grows.
We prove a better bound in the next lecture.
54
Download