Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions Worked example • In the diagram below the figures next to the + and - signs refer to the values taken by a real-valued target function. Calculate the value predicted for the target function at the query instance xq by the 5-Nearest Neighbour Learning Algorithm. Worked example (contd) 1 k ~ f k ( xq ) f k ( xi ) k i 1 where xi are the k nearest neighbours + 1.5 - 0.5 - 0.6 - 0.7 - 0.9 xq + 1.0 + 1.9 - 0.3 + 1.2 1 ~ f k ( xq ) {1.0 1.2 0.6 0.7 0.3} 0.12 5 Exercise 1 • Assume a Boolean target function and a two dimensional instance space (shown below). Determine how the k-Nearest Neighbour Learning algorithm would classify the new instance xq for k = 1,3,5. The + and – signs in the instance space refer to positive and negative examples respectively. Exercise 1:Solution Distance from query instance 1.00 1.35 1.40 1.60 1.90 2.00 2.20 2.40 2.80 Classification + - - + + + + - xq + + + 1-NN + 3-NN - 5-NN - 7-NN - Exercise1 (cont) • How does the efficiency and accuracy of k-Neighbourhood search change as k increases? – IF there are sufficient numbers of examples the accuracy should increase – The time to calculate the prediction will also increase. In that sense less efficient Exercise 2 (exam Q from previous years) (a) Some machine learning algorithms are described as eager, others as, lazy. Choose an example of each type of algorithm and explain in what sense one is eager and the other is lazy. Answer: K-nearest neighbour or Case-Based Reasoning are lazy learning methods because they do computations only when presented with a new example to classify. By contrary, Decision trees, neural nets, Bayesian classification are eager methods because they build up a model at the training phase. They need to do little work when presented with new examples to be classified. (b) Describe the essential differences between knearest neighbour learning and Case- based reasoning Answer: k-NN uses the Euclidean distance measure to find examples that are close to a test case. Thus it works with numerical data. CBR can deal with a wide variety of data types, and for these the Euclidean distance is not defined. Thus CBR needs to define a measure of closeness for non-numerical objects. (c) Describe in words the R4 model of Case-based Reasoning Answer: The R4 model of CBR is based on the following four main stages: – Retrieve: Matching cases in the case-base to the incoming test case – Re-use or Revise: If a perfect match occurs we can re-use the solution stored in the case-base, if not, we can apply the adaptation rules to adapt a stored case so that a match with the incoming case is obtained. – Retain: If the associated outcome corresponding to the incoming case is later known the case is added to the case-base (but only if the outcomecase isn't identical to an existing case or an existing case after adaptation) d) How do Case-based Reasoning systems to learn? Answer: Learning occurs by – retain new cases/outcomes – adding, modifying adaptation rules e) Some researchers in the field of machine learning argue that Case-based reasoning is closer to human thinking than other some others forms of machine learning. Give an real-world example that supports this view. Answer: For example, a doctor diagnosing a patient by matching the patient's symptoms to those of another patient whose diagnosis was known. Humans often reason by matching new instances to previously experienced (or told) situations, also they are often adapted.