# Lecture 23 Bounds in terms of sparsity. 18.465 �

```Lecture 23
Bounds in terms of sparsity.
�T
Let f =
i=1
λi hi , where λ1 ≥ λ2 ≥ . . . ≥ λT ≥ 0. Rewrite f as
f=
d
�
�T
i=d+1
T
�
λi h i +
i=1
where γ(d) =
18.465
λ i hi =
d
�
λi hi + γ(d)
i=1
i=d+1
T
�
λ�i hi
i=d+1
λi and λ�i = λi /γ(d).
Consider the following random approximation of f ,
g=
d
�
λi hi + γ(d)
i=1
k
1�
Yj
k j=1
where, as in the previous lectures,
P (Yj = hi ) = λ�i , i = d + 1, . . . , T
for any j = 1, . . . , k. Recall that EYj =
�T
i=d+1
λ�i hi .
Then
P (yf (x) ≤ 0) = P (yf (x) ≤ 0, yg(x) ≤ δ) + P (yf (x) ≤ 0, yg(x) &gt; δ)
�
� �
��
≤ P (yg(x) ≤ δ) + E PY yf (x) ≤ 0, yg(x) ≥ δ � (x, y)
Furthermore,
�
�
�
�
�
�
PY yf (x) ≤ 0, yg(x) ≥ δ � (x, y) ≤ PY yg(x) − yf (x) &gt; δ � (x, y)
⎛
⎛
⎞
⎞
k
�
�
1
= PY ⎝γ(d)y ⎝
Yj (x) − EY1 ⎠ ≥ δ � (x, y)⎠ .
k j=1
By renaming Yj� =
⎛
yYj +1
2
∈ [0, 1] and applying Hoeﬀding’s inequality, we get
⎛
⎞
⎞
⎛
⎞
k
k
�
�
�
�
1
1
δ �
PY ⎝γ(d)y ⎝
Yj (x) − EY ⎠ ≥ δ � (x, y)⎠ = PY ⎝
Y � (x) − EY1� ≥
(x, y)⎠
k j=1
k j=1 j
2γ(d)
2
kδ
− 2γ(d)
2
≤e
.
Hence,
2
P (yf (x) ≤ 0) ≤ P (yg(x) ≤ δ) + e
2
kδ
− 2γ(d
)2
If we set e
=
1
n,
then k =
2γ 2 (d)
δ2
− 2γkδ
2 (d)
.
log n.
We have
g=
d
�
i=1
d+k =d+
2γ 2 (d)
δ2
λi hi + γ(d)
k
1�
Yj ∈ convd+k H,
k j=1
log n.
58
Lecture 23
Bounds in terms of sparsity.
18.465
Deﬁne the eﬀective dimension of f as
�
e(f, δ) = min
0≤d≤T
�
2γ 2 (d)
d+
log n .
δ2
Recall from the previous lectures that
Pn (yg(x) ≤ 2δ) ≤ Pn (yf (x) ≤ 3δ) +
1
.
n
Hence, we have the following margin-sparsity bound
Theorem 23.1. For λ1 ≥ . . . λT ≥ 0, we deﬁne γ(d, f ) =
P (yf (x) ≤ 0) ≤ inf
�
ε+
�
δ∈(0,1)
�T
i=d+1
λi . Then with probability at least 1 − e−t ,
Pn (yf (x) ≤ δ) + ε2
�2
where
��
ε=K
V &middot; e(f, δ)
n
log +
n
δ
� �
t
n
Example 23.1. Consider the zero-error case. Deﬁne
δ ∗ = sup{δ &gt; 0, Pn (yf (x) ≤ δ) = 0}.
Hence, Pn (yf (x) ≤ δ ∗ ) = 0 for conﬁdence δ ∗ . Then
n
t
V &middot; e(f, δ ∗ )
log ∗ +
n
δ
n
�
�
V log n
n
t
log ∗ +
≤K
(δ ∗ )2 n
δ
n
P (yf (x) ≤ 0) ≤ 4ε2 = K
because e(f, δ) ≤
2
δ2
�
�
log n always.
Consider the polynomial weight decay: λi ≤ Ki−α , for some α &gt; 1. Then
γ(d) =
T
�
i=d+1
λi ≤ K
T
�
i−α ≤ K
�
∞
x−α dx = K
d
i=d+1
1
Kα
= α−1
α−1
(α − 1)d
d
Then
�
2γ 2 (d)
e(f, δ) = min d +
log n
d
δ2
�
�
Kα�
≤ min d + 2 2(α−1)
log n
d
δ d
�
Taking derivative with respect to d and setting it to zero,
1−
Kα log n
=0
δ 2 d2α−1
we get
d = Kα &middot;
log1/(2α−1) n
log n
≤ K 2/(2α−1) .
2/(2α−1)
δ
δ
59
Lecture 23
Bounds in terms of sparsity.
18.465
Hence,
e(f, δ) ≤ K
log n
δ 2/(2α−1)
Plugging in,
�
P (yf (x) ≤ 0) ≤ K
V log n
n
t
log ∗ +
δ
n
n(δ ∗ )2/(2α−1)
�
.
As α → ∞, the bound behaves like
V log n
n
log ∗ .
n
δ
60
```