Machine Learning

advertisement
Machine Learning
Instance Based Learning &
Case Based Reasoning
Exercise Solutions
Worked example
• In the diagram below the figures next to the + and - signs
refer to the values taken by a real-valued target function.
Calculate the value predicted for the target function at
the query instance xq by the 5-Nearest Neighbour
Learning Algorithm.
Worked example (contd)
1 k
~
f k ( xq )   f k ( xi )
k i 1
where xi are the k  nearest neighbours
+ 1.5
- 0.5
- 0.6
- 0.7
- 0.9
xq
+ 1.0
+ 1.9
- 0.3
+ 1.2
1
~
f k ( xq )  {1.0  1.2  0.6  0.7  0.3}  0.12
5
Exercise 1
• Assume a Boolean target function and a two
dimensional instance space (shown below).
Determine how the k-Nearest Neighbour
Learning algorithm would classify the new
instance xq for k = 1,3,5. The + and – signs in
the instance space refer to positive and negative
examples respectively.
Exercise 1:Solution
Distance
from query
instance
1.00
1.35
1.40
1.60
1.90
2.00
2.20
2.40
2.80
Classification
+
-
-
+
+
+
+
-
xq
+
+
+
1-NN
+
3-NN
-
5-NN
-
7-NN
-
Exercise1 (cont)
• How does the efficiency and
accuracy of k-Neighbourhood search
change as k increases?
– IF there are sufficient numbers of
examples the accuracy should increase
– The time to calculate the prediction will
also increase. In that sense less
efficient
Exercise 2 (exam Q from previous years)
(a) Some machine learning algorithms are described as
eager, others as, lazy. Choose an example of each
type of algorithm and explain in what sense one is
eager and the other is lazy.
Answer:
K-nearest neighbour or Case-Based Reasoning are lazy
learning methods because they do computations only
when presented with a new example to classify. By
contrary, Decision trees, neural nets, Bayesian
classification are eager methods because they build
up a model at the training phase. They need to do
little work when presented with new examples to be
classified.
(b) Describe the essential differences between knearest neighbour learning and Case- based
reasoning
Answer: k-NN uses the Euclidean distance
measure to find examples that are close to a test
case. Thus it works with numerical data. CBR
can deal with a wide variety of data types, and
for these the Euclidean distance is not defined.
Thus CBR needs to define a measure of
closeness for non-numerical objects.
(c) Describe in words the R4 model of Case-based
Reasoning
Answer: The R4 model of CBR is based on the
following four main stages:
– Retrieve: Matching cases in the case-base to the
incoming test case
– Re-use or Revise: If a perfect match occurs we
can re-use the solution stored in the case-base, if
not, we can apply the adaptation rules to adapt a
stored case so that a match with the incoming
case is obtained.
– Retain: If the associated outcome corresponding
to the incoming case is later known the case is
added to the case-base (but only if the outcomecase isn't identical to an existing case or an
existing case after adaptation)
d) How do Case-based Reasoning systems to
learn?
Answer: Learning occurs by
– retain new cases/outcomes
– adding, modifying adaptation rules
e) Some researchers in the field of machine learning argue
that Case-based reasoning is closer to human thinking
than other some others forms of machine learning. Give
an real-world example that supports this view.
Answer: For example, a doctor diagnosing a patient by
matching the patient's symptoms to those of another
patient whose diagnosis was known. Humans often
reason by matching new instances to previously
experienced (or told) situations, also they are often
adapted.
Download