Lecture 22 Bounds on the generalization error of voting classifiers. 18.465 �

advertisement
Lecture 22
Bounds on the generalization error of voting classifiers.
Theorem 22.1. With probability at least 1 − e−t , for any T ≥ 1 and any f =
P (yf (x) ≤ 0) ≤ inf
�
ε+
δ∈(0,1)
��
� �
V min(T,(log n)/δ 2 ) log n
t
δ
where ε = ε(δ) = K
+
n
n .
Here we used the notation Pn (C) =
1
n
Remark:
�n
i=1
�
Pn (yf (x) ≤ δ) + ε2
18.465
�T
i=1
λ i hi ,
�2
I(xi ∈ C).
⎛
⎞
⎜
V min(T, (log n)/δ 2 ) log nδ
t⎟
P (yf (x) ≤ 0) ≤ inf K ⎜
P
(yf
(x)
≤
δ)
+
+ ⎟
n
⎝�
⎠.
��
� �
n
n
δ∈(0,1)
��
�
inc. with δ
dec. with δ
�T
�
k
Proof. Let f = i=1 λi hi , g = k1 j =1 Yj , where
P (Yj = hi ) = λi and P (Yj = 0) = 1 −
T
�
λi
i=1
as in Lecture 17. Then EYj (x) = f (x).
P (yf (x) ≤ 0) = P (yf (x) ≤ 0, yg(x) ≤ δ) + P (yf (x) ≤ 0, yg(x) > δ)
≤ P (yg(x) ≤ δ) + P (yg(x) > δ | yf (x) ≤ 0)
⎛
⎞
k
�
�
�
�
1
�
Yj (x) > δ � yEY Yj (x) ≤ 0⎠
P yg(x) > δ � yf (x) ≤ 0 = Ex PY ⎝y
k j=1
�
Shift Y ’s to [0, 1] by defining Yj� =
yYj +1
.
2
Then
⎛
⎞
k
�
�
1
1
δ
1
�
Y � ≥ + � EYj� ≤ ⎠
P (yg(x) > δ|yf (x) ≤ 0) = Ex PY ⎝
k j=1 j
2 2
2
⎛
⎞
k
�
�
1
δ
1
�
Y � ≥ EY1� + � EYj� ≤ ⎠
≤ Ex PY ⎝
k j=1 j
2
2
�
�
≤ (by Hoeffding’s ineq.) Ex e−kD(EY1 + 2 ,EY1 )
≤ Ex e−kδ
2
/2
= e−kδ
2
δ
/2
because D(p, q) ≥ 2(p − q)2 (KL-divergence for binomial variables, Homework 1) and, hence,
�
�
� �2
δ
δ
D EY1� + , EY1� ≥ 2
= δ 2 /2.
2
2
We therefore obtain
(22.1)
P (yf (x) ≤ 0) ≤ P (yg(x) ≤ δ) + e−kδ
2
/2
55
Lecture 22
Bounds on the generalization error of voting classifiers.
18.465
and the second term in the bound will be chosen to be equal to 1/n.
Similarly, we can show
Pn (yg(x) ≤ 2δ) ≤ Pn (yf (x) ≤ 3δ) + e−kδ
Choose k such that e−kδ
2
/2
= 1/n, i.e. k =
2
δ2
2
/2
.
log n.
Now define ϕδ as follows:
ϕδ (s)
1
δ 2δ
s
Observe that
I(s ≤ δ) ≤ ϕδ (s) ≤ I(s ≤ 2δ).
(22.2)
By the result of Lecture 21, with probability at least 1 − e−t , for all k, δ and any g ∈ Fk = conv k (H),
�
�
�n
n
Eϕδ (yg(x)) − n1 i=1 ϕδ (yi g(xi ))
1�
�
Φ Eϕδ ,
ϕδ =
n i=1
Eϕδ (yg(x))
��
� �
V k log nδ
t
+
≤K
n
n
= ε/2.
Note that Φ(x, y) =
x√
−y
x
is increasing with x and decreasing with y.
By inequalities (22.1) and (22.2),
Eϕδ (yg(x)) ≥ P (yg(x) ≤ δ) ≥ P (yf (x) ≤ 0) −
1
n
and
n
1
1�
ϕδ (yi g(xi )) ≤ Pn (yg(x) ≤ 2δ) ≤ Pn (yf (x) ≤ 3δ) + .
n i=1
n
By decreasing x and increasing y in Φ(x, y), we decrease Φ(x, y). Hence,
⎛
⎞
��
⎜
⎟
V k log
1
1
⎟
Φ⎜
⎝P (yf (x) ≤ 0) − n , Pn (yf (x) ≤ 3δ) + n ⎠ ≤ K
n
� �
��
�
�
��
x
where k =
2
δ2
n
δ
� �
t
+
n
y
log n.
56
Lecture 22
If
x−y
√
x
Bounds on the generalization error of voting classifiers.
18.465
≤ ε, we have
�
x≤
ε
+
2
�� �
ε 2
2
�2
+y
So,
1
P (yf (x) ≤ 0) − ≤
n
�
ε
+
2
�2
�� �
ε 2
1
+ Pn (yf (x) ≤ 3δ) +
.
2
n
�
57
Download