Instance-Based Learning - University of Birmingham

advertisement
Instance Based Learning
Ata Kaban
The University of Birmingham
1
Today we learn:
 K-Nearest Neighbours
 Case-based reasoning
 Lazy and eager learning
2
Instance-based learning



One way of solving tasks of approximating
discrete or real valued target functions
Have training examples: (xn, f(xn)), n=1..N.
Key idea:
– just store the training examples
– when a test example is given then find the
closest matches
3

1-Nearest neighbour:
Given a query instance xq,
• first locate the nearest training example xn
• then f(xq):= f(xn)

K-Nearest neighbour:
Given a query instance xq,
• first locate the k nearest training examples
• if discrete values target function then take vote
among its k nearest nbrs
else if real valued target fct then take the mean
of the f values of the k nearest nbrs

f ( x ) :
k
i 1
q
f ( xi )
k
4
The distance between examples



We need a measure of distance in order to know
who are the neighbours
Assume that we have T attributes for the learning
problem. Then one example point x has elements
xt  , t=1,…T.
The distance between two points xi xj is often
defined as the Euclidean distance:
d (xi , x j ) 
T
2
[
x

x
]
 ti tj
t 1
5
Voronoi Diagram
6
Characteristics of Inst-b-Learning


An instance-based learner is a lazy-learner
and does all the work when the test example
is presented. This is opposed to so-called
eager-learners, which build a parameterised
compact model of the target.
It produces local approximation to the target
function (different with each test instance)
7
When to consider Nearest Neighbour algorithms?




Instances map to points in 
Not more then say 20 attributes per
instance
Lots of training data
Advantages:
n
– Training is very fast
– Can learn complex target functions
– Don’t lose information

Disadvantages:
– ? (will see them shortly…)
8
one
two
three
four
five
six
seven
Eight ?
9
Training data
Number Lines Line types Rectangles Colours Mondrian?
1
6
1
10
4
No
2
4
2
8
5
No
3
5
2
7
4
Yes
4
5
1
8
4
Yes
5
5
1
10
5
No
6
6
1
8
6
Yes
7
7
1
14
5
No
Test instance
Number Lines Line types Rectangles Colours Mondrian?
8
7
2
9
4
10
Keep data in normalised form
One way to normalise the data ar(x) to a´r(x) is
xt ' 
xt  x t
t
x r  mean of t th attributes
 t  sta ndard deviation of t attributes
th
11
Normalised training data
Number Lines
1
Line
types
0.632
-0.632
2
-1.581
3
Rectangles Colours Mondrian?
0.327
-1.021
No
1.581
-0.588
0.408
No
-0.474
1.581
-1.046
-1.021
Yes
4
-0.474
-0.632
-0.588
-1.021
Yes
5
-0.474
-0.632
0.327
0.408
No
6
0.632
-0.632
-0.588
1.837
Yes
7
1.739
-0.632
2.157
0.408
No
Test instance
Number Lines
8
Line
types
1.739
1.581
Rectangles Colours Mondrian?
-0.131
-1.021
12
Distances of test instance from training data
Example Distance Mondrian?
of test
from
example
1
No
2.517
Classification
1-NN
Yes
3-NN
Yes
2
3.644
No
5-NN
No
3
2.395
Yes
7-NN
No
4
3.164
Yes
5
3.472
No
6
3.808
Yes
7
3.490
No
13
What if the target function is real
valued?

The k-nearest neighbour algorithm
would just calculate the mean of the k
nearest neighbours
14
Variant of kNN: Distance-Weighted kNN

We might want to weight nearer
neighbors more heavily
w f (x )

) :
 w
k
f (x q
i 1
i
k
i 1

i
i
1
where wi 
d (x q , xi ) 2
Then it makes sense to use all training
examples instead of just k (Stepard’s
method)
15
Difficulties with k-nearest
neighbour algorithms


Have to calculate the distance of the
test case from all training cases
There may be irrelevant attributes
amongst the attributes – curse of
dimensionality
16
Case-based reasoning (CBR)


CBR is an advanced instance based
learning applied to more complex instance
objects
Objects may include complex structural
descriptions of cases & adaptation rules
17



CBR cannot use Euclidean distance
measures
Must define distance measures for those
complex objects instead (e.g. semantic nets)
CBR tries to model human problem-solving
– uses past experience (cases) to solve new
problems
– retains solutions to new problems

CBR is an ongoing area of machine learning
research with many applications
18
Applications of CBR

Design
– landscape, building, mechanical,
conceptual design of aircraft sub-systems

Planning
– repair schedules

Diagnosis
– medical

Adversarial reasoning
– legal
19
CBR process
New
Case
Retrieve
matching
Learn
Matched
Cases
Case
Base
Knowledge and
Adaptation rules
Retain
Closest
Case
No
Adapt?
Yes
Reuse
Revise
Suggest
solution
20
CBR example: Property pricing
Case Location Bedrooms Recep
code
rooms
1
8
2
1
Type
floors Condition
terraced
1
poor
Price
£
20,500
2
8
2
2
terraced
1
fair
25,000
3
5
1
2
semi
2
good
48,000
4
5
1
2
terraced
2
good
41,000
Test instance
Case Location Bedrooms Recep
code
rooms
5
7
2
2
Type
semi
floors Condition
1
poor
Price
£
???
21
How rules are generated


There is no unique way of doing it. Here
is one possibility:
Examine cases and look for ones that
are almost identical
– case 1 and case 2
• R1: If recep-rooms changes from 2 to 1 then
reduce price by £5,000
– case 3 and case 4
• R2: If Type changes from semi to terraced then
reduce price by £7,000
22
Matching

Comparing test instance
– matches(5,1) = 3
– matches(5,2) = 3
– matches(5,3) = 2
– matches(5,4) = 1

Estimate price of case 5 is £25,000
23
Adapting

Reverse rule 2
– if type changes from terraced to semi then
increase price by £7,000

Apply reversed rule 2
– new estimate of price of property 5 is
£32,000
24
Learning

So far we have a new case and an
estimated price
– nothing is added yet to the case base

If later we find house sold for £35,000
then the case would be added
– could add a new rule
• if location changes from 8 to 7 increase price
by £3,000
25
Problems with CBR




How should cases be represented?
How should cases be indexed for fast
retrieval?
How can good adaptation heuristics be
developed?
When should old cases be removed?
26
Advantages



A local approximation is found for each
test case
Knowledge is in a form understandable
to human beings
Fast to train
27
Summary



K-Nearest Neighbours
Case-based reasoning
Lazy and eager learning
28
Lazy and Eager Learning

Lazy: wait for query before generalizing
– k-Nearest Neighbour, Case based reasoning

Eager: generalize before seeing query
– Radial Basis Function Networks, ID3, …

Does it matter?
– Eager learner must create global approximation
– Lazy learner can create many local
approximations
29
Download