Theorem 1 (1-NN Convergence). As n → ∞, the 1-NN error is upper-bounded
by twice the Bayes Optimal classifier error.
i.i.d.
Proof. Let {x1 , . . . , xn } ∼ Pdata (x).
Then with probability 1,
lim argminx∈{x1 ,...,xn } dist(xtest , x) = xtest .
n→∞
Therefore,
lim error1-NN = lim
n→∞
n→∞
=
X
X
P (y | xnn )(1 − P (y | xtest ))
y∈Y
P (y | xtest )(1 − P (y | xtest ))
y∈Y
X
P (y | xtest ) 1 − P (y | xtest )
= P (y ∗ | x) · 1 − P (y∗ | xtest ) +
y̸=y ∗
X
≤ (1 − P (y∗ | xtest )) +
1 − P (y | xtest )
y̸=y ∗
= 1 − P (y∗ | xtest ) + 1 − P (y ∗ | xtest )
= 2 1 − P (y ∗ | xtest )
= 2ϵh∗
where y ∗ = argmaxy∈Y P (y | xtest ) and h∗ is the Bayes optimal classifier.
i.i.d.
Theorem 2 (Curse of Dimensionality). Let x1 , . . . , xn ∼ U[0, 1]d . The expected radius s of the ball containing the k nearest neighbors satisfies
sd ≈
k
n
⇒
s≈
1/d
k
.
n
For example, if n = 1000 and k = 2, Then
d 2
s 0.1
10
0.63
100
0.955
1000
0.995
As d → ∞, s → 1, meaning that nearly the entire space is required to capture
even a few neighbors.
1