VC Dimension

advertisement
VC Dimension – definition
and impossibility result
Lecturer: Yishay Mansour
Eran Nir and Ido Trivizki
VC Dimension – Lecture
Overview






PAC Model – Review
VC dimension – motivation
Definitions
Some examples of geometric concepts
Sample size lower bounds
More examples
The PAC Model - Review

A fixed, unknown distribution D from which the
examples are chosen independently.
The target concept is a computable function ct

error (h)  PrD [ct ( x)  h( x)]




Our goal – finding h such that: Pr[error (h)  ò]  1  
ò- accuracy parameter;  - confidence parameter.
An algorithm A learns a family of concepts C if for
any ct  C and any distribution D , A outputs a
function h such that error (h)  PrD [ct ( x)  h( x)] .
VC Dimension - Motivation


Question: How many examples does a learning
algorithm need?
For PAC and a finite concept class C we proved:
1 C
m  ln
ò 

We would like to be able to handle infinite concept
classes – VC Dimensions will provide us a
substitute to ln C for infinite concept classes.
VC Dimension - Definitions


Given a concept class C defined over the
instance space X, let S  X
The projection of C on S is all the possible
functions that C induces on S :


C (S )  {c  S | c  C}
m
| C ( S ) | 2 (| S | m)
A concept class C shatters S if
2 | C (S ) |
|S |
In other words: a class shatters a set if every
possible function on the set is in the class.
VC Dimension – Definitions
Cont.
VCdim (Vapnik-Chervonenkis dimension) of C:
 The maximum size of a set shattered by C:
VC dim(C)  max{d : S :| S | d  C (S )  {0,1}d }
 If a maximum value doesn’t exist then
VC dim(C )  
 For a finite class C: VC dim(C )  log | C |
VC Dimension – Examples

In order to show that the VCdim of a class is d
we have to show:


VC dim  d : find some shattered set of size d.
VC dim  d  1: show that no set of size d+1 is
shattered
VC Dimension – Examples:
Half Lines (C1)

The concepts are c for  [0,1], X  [0,1] where:
0 x  
c ( x)  
1 x  
VC Dimension – Examples:
Half Lines (C1) Cont.
Claim: VC dim(C1 )  1
1
1
C 1 ( )  1, C 3 ( )  0 ,
4 2
4 2
1
|  C ({ }) | 2 .
2

VC dim(C1 )  1 :

VC dim(C1 )  2 : for any set of size 2 there is an
thus
assignment which is not in the concept class: for
S  {x, y}, x  y the assignment which lets x be 1
and y be 0 is impossible.
VC Dimension – Examples:
Linear halfspaces (C2)

The concepts are cw where for w  (1,2 , ), x 2
let cw ( x)  1  1x12 x2   . cw are lines in the
plane where positive points above or on the
line, and negative points are below.
VC Dimension – Examples:
Linear halfspaces (C2) Cont.
Claim: VC dim(C1 )  3
 VC dim(C )  3 : Any three points that are not
1
collinear can be shattered.
 VC dim(C1 )  4 : No set of four points can be
shattered:
Generally: Half spaces in d have VCdim of d  1.
VC Dimension – Examples: Axisaligned rectangles in the plane (C3)

Positive examples are points inside the
rectangle, and negative examples are points
outside the rectangle.
VC Dimension – Examples: Axisaligned rectangles in the plane (C3)
Claim: VC dim(C1 )  4
 VC dim(C )  4 : a for points set in the following
1
shape can be shattered:
VC Dimension – Examples: Axisaligned rectangles in the plane (C3)
Claim: VC dim(C1 )  4
 VC dim(C1 )  5 : Given a set of five points in the
plane, there must be some point that is neither
the extreme left, right, top or bottom point of
the five. If we label this non-extermal point
negative and the remaining four extermal
points positive, no rectangle can satisfy the
assignment.
VC Dimension – Examples:
A finite union of intervals (C4)

For any set of points we could cover the positive
points by choosing the intervals small enough so
VC dim(C1 )  
VC Dimension – Examples:
Convex Polygons on the plane (C5)



Points inside the convex polygon are positive
and outside are negative.
There is no bound on the number of edges.
Claim: VC dim(C5 )  
VC Dimension – Examples:
Convex Polygons on the plane (C5)
Proof: VC dim(C5 )  
 For every labeling of d points on the circle
perimeter, there exists ct  C that is consistent
with the labeling.
 This ct is a polygon which includes all the
positive examples and none of the negative.
Thus the group of points is shuttered.
 This holds for every d, and so
VC dim(C5 )  
Sample Size Lower Bounds

Goal: we want to show that for a concept class
with a finite VCdim d there is a function m of
ò,  and d such that if we sample less than
m(ò,  , d ) points, any PAC learning algorithm
would fail.

Theorem: If a concept class C has VCdim d+1
then:
d
d
m(ò,  , d ) 
 ( )
16ò
ò
Sample Size Lower Bounds Proof



For contradiction: let T  {z0 , z1,..., zd } such that C
shatters T (possible because VC dim(C )  d  1).
Let D(x) be
1  8ò x  z0

D(x)= 8ò
0

x=z i ,1  i  d
otherwise
Choose ct ( x) randomly so that it’s
x  z0
1

ct (x)= 0 / 1 ( with probability 0.5) x=z i ,1  i  d
0
otherwise

Sample Size Lower Bounds –
Proof Cont.

ct ( x) is in C because C shatters T.
d
Claim: if we sample less than points out of
2
{z1 ,..., zd } then the error is at least 2ò .

Proof: Let RARE be {z1,..., zd }




Sample size: the expected number of points we
sample from RARE is at most m  8ò  d / 2
1
1 1
Error: Pr[ ERROR]  Pr[ RARE ]  Pr[UNSEEN | RARE ]   8ò    2ò
2
2 2
This implies that with probability of at least 0.5
we sample at most d / 2 points of RARE and
thus have error of at least 2ò.
VC Dimension – Examples:
Parity (C6)


Let X  {0,1}n . The concept class is S ( x)  iS xi
where S  {1,..., n}.
Claim: VC dim(C6 )  n


VC dim(C6 )  n : Let
ei  0...010...0 . For any bits
assignment b1,..., bn for the vectors e1,..., en we choose
the set S  {i | bi  1}. We get:
0 j  S
 S (e j )  
1 j  S
and so e1,..., en is shattered.
VC dim(C6 )  n: There are 2n parity functions, thus
VC dim(C6 )  log2 2n  n
VC Dimension – Examples:
OR of n literals (C7)


Let X  {0,1}n , S, S  {1,..., n}. The concept class is
CS ( x )   iS xi   iS xi 
Claim: VC dim(C7 )  n

VC dim(C7 )  n : use n unit vectors (see prev.
proof).

VC dim(C7 )  n :


Use ELIM algorithm to show VC dim(C6 )  n  1 .
Show the (n+1) vector cannot be assigned 1, thus
no set of (n+1) vectors can be shuttered.
Radon Theorem


Definitions:
 Convex Set: A is convex if for every x, y  A
the line connecting x, y is in A.
 Convex Hull: The Convex Hull of S is the
smallest convex set which contains all the
points of S. We denote it as conv(S).
Theorem (Radon):
d

 Let E be a set of d+2 points in
. There is a
subset S of E such that conv( S ) conv( E \ S )  .
VC Dimension – Examples:
Hyper-Planes (C8)


The concept class assigns 1 to a point if it’s
above or on a corresponding hyper-plane, 0
otherwise.
Claim: VC dim(C8 )  n


VC dim(C8 )  n  1: use n unit vectors and the zero
vector to form a n+1 set that can be shuttered.
VC dim(C8 )  n  2: use Radon theorem (next page)
VC Dimension – Examples:
Hyper-Planes (C8) Cont.


Assume a set of size d+2 points can be shattered.
Use Radon Theorem to find S such that
conv( S )


conv( E \ S )  
Assume there is a separating hyper-plane that
classifies points in S as ‘1’, points not in S as 0.
No way to classify points in conv( S ) conv( E \ S ).
Download