Instance Based Learning Ata Kaban The University of Birmingham 1 Today we learn: K-Nearest Neighbours Case-based reasoning Lazy and eager learning 2 Instance-based learning One way of solving tasks of approximating discrete or real valued target functions Have training examples: (xn, f(xn)), n=1..N. Key idea: – just store the training examples – when a test example is given then find the closest matches 3 1-Nearest neighbour: Given a query instance xq, • first locate the nearest training example xn • then f(xq):= f(xn) K-Nearest neighbour: Given a query instance xq, • first locate the k nearest training examples • if discrete values target function then take vote among its k nearest nbrs else if real valued target fct then take the mean of the f values of the k nearest nbrs f ( x ) : k i 1 q f ( xi ) k 4 The distance between examples We need a measure of distance in order to know who are the neighbours Assume that we have T attributes for the learning problem. Then one example point x has elements xt , t=1,…T. The distance between two points xi xj is often defined as the Euclidean distance: d (xi , x j ) T 2 [ x x ] ti tj t 1 5 Voronoi Diagram 6 Characteristics of Inst-b-Learning An instance-based learner is a lazy-learner and does all the work when the test example is presented. This is opposed to so-called eager-learners, which build a parameterised compact model of the target. It produces local approximation to the target function (different with each test instance) 7 When to consider Nearest Neighbour algorithms? Instances map to points in Not more then say 20 attributes per instance Lots of training data Advantages: n – Training is very fast – Can learn complex target functions – Don’t lose information Disadvantages: – ? (will see them shortly…) 8 one two three four five six seven Eight ? 9 Training data Number Lines Line types Rectangles Colours Mondrian? 1 6 1 10 4 No 2 4 2 8 5 No 3 5 2 7 4 Yes 4 5 1 8 4 Yes 5 5 1 10 5 No 6 6 1 8 6 Yes 7 7 1 14 5 No Test instance Number Lines Line types Rectangles Colours Mondrian? 8 7 2 9 4 10 Keep data in normalised form One way to normalise the data ar(x) to a´r(x) is xt ' xt x t t x r mean of t th attributes t sta ndard deviation of t attributes th 11 Normalised training data Number Lines 1 Line types 0.632 -0.632 2 -1.581 3 Rectangles Colours Mondrian? 0.327 -1.021 No 1.581 -0.588 0.408 No -0.474 1.581 -1.046 -1.021 Yes 4 -0.474 -0.632 -0.588 -1.021 Yes 5 -0.474 -0.632 0.327 0.408 No 6 0.632 -0.632 -0.588 1.837 Yes 7 1.739 -0.632 2.157 0.408 No Test instance Number Lines 8 Line types 1.739 1.581 Rectangles Colours Mondrian? -0.131 -1.021 12 Distances of test instance from training data Example Distance Mondrian? of test from example 1 No 2.517 Classification 1-NN Yes 3-NN Yes 2 3.644 No 5-NN No 3 2.395 Yes 7-NN No 4 3.164 Yes 5 3.472 No 6 3.808 Yes 7 3.490 No 13 What if the target function is real valued? The k-nearest neighbour algorithm would just calculate the mean of the k nearest neighbours 14 Variant of kNN: Distance-Weighted kNN We might want to weight nearer neighbors more heavily w f (x ) ) : w k f (x q i 1 i k i 1 i i 1 where wi d (x q , xi ) 2 Then it makes sense to use all training examples instead of just k (Stepard’s method) 15 Difficulties with k-nearest neighbour algorithms Have to calculate the distance of the test case from all training cases There may be irrelevant attributes amongst the attributes – curse of dimensionality 16 Case-based reasoning (CBR) CBR is an advanced instance based learning applied to more complex instance objects Objects may include complex structural descriptions of cases & adaptation rules 17 CBR cannot use Euclidean distance measures Must define distance measures for those complex objects instead (e.g. semantic nets) CBR tries to model human problem-solving – uses past experience (cases) to solve new problems – retains solutions to new problems CBR is an ongoing area of machine learning research with many applications 18 Applications of CBR Design – landscape, building, mechanical, conceptual design of aircraft sub-systems Planning – repair schedules Diagnosis – medical Adversarial reasoning – legal 19 CBR process New Case Retrieve matching Learn Matched Cases Case Base Knowledge and Adaptation rules Retain Closest Case No Adapt? Yes Reuse Revise Suggest solution 20 CBR example: Property pricing Case Location Bedrooms Recep code rooms 1 8 2 1 Type floors Condition terraced 1 poor Price £ 20,500 2 8 2 2 terraced 1 fair 25,000 3 5 1 2 semi 2 good 48,000 4 5 1 2 terraced 2 good 41,000 Test instance Case Location Bedrooms Recep code rooms 5 7 2 2 Type semi floors Condition 1 poor Price £ ??? 21 How rules are generated There is no unique way of doing it. Here is one possibility: Examine cases and look for ones that are almost identical – case 1 and case 2 • R1: If recep-rooms changes from 2 to 1 then reduce price by £5,000 – case 3 and case 4 • R2: If Type changes from semi to terraced then reduce price by £7,000 22 Matching Comparing test instance – matches(5,1) = 3 – matches(5,2) = 3 – matches(5,3) = 2 – matches(5,4) = 1 Estimate price of case 5 is £25,000 23 Adapting Reverse rule 2 – if type changes from terraced to semi then increase price by £7,000 Apply reversed rule 2 – new estimate of price of property 5 is £32,000 24 Learning So far we have a new case and an estimated price – nothing is added yet to the case base If later we find house sold for £35,000 then the case would be added – could add a new rule • if location changes from 8 to 7 increase price by £3,000 25 Problems with CBR How should cases be represented? How should cases be indexed for fast retrieval? How can good adaptation heuristics be developed? When should old cases be removed? 26 Advantages A local approximation is found for each test case Knowledge is in a form understandable to human beings Fast to train 27 Summary K-Nearest Neighbours Case-based reasoning Lazy and eager learning 28 Lazy and Eager Learning Lazy: wait for query before generalizing – k-Nearest Neighbour, Case based reasoning Eager: generalize before seeing query – Radial Basis Function Networks, ID3, … Does it matter? – Eager learner must create global approximation – Lazy learner can create many local approximations 29