Supplemental PowerPoints for Case-Based Reasoning

advertisement
Case-Based Reasoning
(Not covered in book)










Introduction, or what is a case?
CBR and Learning
CBR Cycle
Instance-Based Common Simplification
Measuring Distance
Weighted Features
Case Forgetting?
Plusses / Minuses
Software – aiaiCBR / WEKA
Summary
1
Introduction, or what is a case?

Many researchers have found that representing expert knowledge
as “cases” instead of “rules” is more natural and robust

What a

Should include relevant “features” to enable determining
applicability of the case to the task
Includes a solution to the problem
Most commonly stored in a record/structure format, with
features and solutions being attributes, each case a single record
When solving problem, find most similar previous case(s), and
use to suggest solution to new problem – Case-Based Reasoning
(CBR)



“case” is depends a fair amount on the task / domain
2
CBR & Learning

Possibly the simplest form of machine learning
– Training cases are merely stored (kind of like “rote
learning”)
– Has been called “lazy learning” – no work is done
until an answer is needed

May include storing newly solved problems –
adding to the knowledge-base (case-base)
3
Case-Based Reasoning Cycle






At the highest level of generality, a general CBR cycle may be
described by the following four processes:
1. RETRIEVE the most similar case or cases
2. REUSE the information and knowledge in that case to solve the
problem
3. REVISE the proposed solution
4. RETAIN the parts of this experience likely to be useful for
future problem solving
A new problem is solved by retrieving one or more previously
experienced cases, reusing the case in one way or another,
revising the solution based on reusing a previous case, and
retaining the new experience by incorporating it into the existing
knowledge-base (case-base).
4
Simplifications are Common

Instance-Based
– Case = instance
– generally does match, but no adapt
– Match usually done using “nearest neighbor”
» Each new instance to be solved is compared to all training
instances, with “distance” or “similarity” calculated for
each attribute for each instance
– CBR tools frequently just do this
5
Common Applications
Helpdesk
 Diagnosis

6
Real World


Some examples of CBR at work from the Sixteenth Innovative Applications of Artificial
Intelligence Conference (IAAI-04):
Deployed Application Papers:
– Tenth Anniversary of the Plastics Color Formulation Tool. By William Cheetham. "Since 1994 GE
Plastics has employed a case-based reasoning tool that determines color formulas which match
requested colors. This tool, called FormTool, has saved GE millions of dollars in productivity and
material (i.e. colorant) costs. The technology developed in FormTool has been used to create an online color selection tool for our customers called ColorXpress Select. A customer innovation center
has been developed around the FormTool software."
– The General Motors Variation-Reduction Adviser: Deployment Issues for an AI Application. By
Alexander P. Morgan, John A. Cafeo, Kurt Godden, Ronald M. Lesperance, Andrea M. Simon,
Deborah L. McGuinness, and James L. Benedict. "The General Motors Variation-Reduction Adviser
is a knowledge system built on case-based reasoning principles that is currently in use in a dozen
General Motors Assembly Centers. This paper reviews the overall characteristics of the system and
then focuses on various AI elements critical to support its deployment to a production system. A key
AI enabler is ontology-guided search using domain-specific ontologies."

Emerging Application Papers
– CaBMA: Case-Based Project Management Assistant. By Ke Xu and Hector Muñoz Avila. "We are
going to present an implementation of an AI system, CaBMA, built on top of a commercial project
management tool, MS Project. Project management is a business process for successfully delivering
one-of-a kind products and services under real-world time and resource constraints. CaBMA (for:
Case-Based Project Management Assistant) provides the following functionalities: (1) It captures
cases from project plans. (2) It reuses captured cases to refine project plans and generate project
plans from the scratch. (3) It maintains consistency of pieces of a project plan obtained by case
reuse. (4) It refines the case base to cope with inconsistencies resulting from capturing cases over a
period of time. CaBMA adds a knowledge layer on top of MS Project to assist the user with his
project management tasks."
7
Real World

Applying Case-Based Reasoning to Manufacturing. By David Hinkle and
Christopher Toomey. AI Magazine 16(1): 65-73 (Spring 1995). "CLAVIER is a
case-based reasoning (CBR) system that assists in determining efficient loads
of composite material parts to be cured in an autoclave. CLAVIER's central
purpose is to find the most appropriate groupings and configurations of parts
(or loads) to maximize autoclave throughput yet ensure that parts are properly
cured. CLAVIER uses CBR to match a list of parts that need to be cured against
a library of previously successful loads and suggest the most appropriate next
load. clavier also uses a heuristic scheduler to generate a sequence of loads that
best meets production goals and satisfies operational constraints. The system is
being used daily on the shop floor and has virtually eliminated the production
of low-quality parts that must be scrapped, saving thousands of dollars each
month. As one of the first fielded CBR systems, CLAVIER demonstrates that
CBR is a practical technology that can be used successfully in domains where
more traditional approaches are difficult to apply."
8
Nearest Neighbor
•x
x
•x
•y
•x
x
•y
•x
•x
•x
•z
•z
•z
•z
x
•z
•z
•z
•z
•y
•y
T
•y
•y
•y
•y
•y
•z
9
Measuring Distance / Similarity
Distance / Similarity are opposites – it doesn’t
matter which you measure
 Distances for each attribute calculated, must be
combined
 Combination of distances – commonly via “city
block” or “euclidean” (“crow flies”)

– <go back one slide to illustrate>

Higher power(s) increase the influence of large
differences
10
Example Distance Metrics
Attributes
A
B
C
Sum
Test
Train 1
Train 2
Train 3
City Block 1
City Block 2
5
6
7
5
1
2
5
4
3
5
1
2
5
9
7
10
4
2
6
6
City Block 3 0
0
5
5
Euclidean 1
1
1
16
18
Euclidean 2
4
4
4
12
Euclidean 3
0
0
25
25
11
Kinds of attributes



Binary/boolean – two valued; e.g. Resident Student?
Nominal/categorical/enumerated/discrete – multiple valued,
unordered; e.g. Major
Ordinal - Ordered, but no sense of distance between –
– e.g. Fr, So, Jr, Sr; Grad
– e.g. Household Income 1 - < 15K, 2 – 15-20K, 3- 20-25K, 4- 2530K, 5 – 30-40K, 6 – 40-50K, 7 - > 50K



Interval – ordered, distance is measurable; e.g. birth year
Ratio – an actual measurement with defined zero point such that we could say that one value is double another or
triple, or ½; e.g. GPA
CBR can work with all kinds of attributes (unlike some
other learning methods)
12
More Similarity/Distance

Nominal Attributes frequently considered all or nothing
- a complete match or no match at all
– Match  similarity = highest possible value, or distance = 0
– Not Match  similarity = 0; or distance = highest possible
value


Nominals that are actually ordered (Ordinals) ought to
be treated differently (e.g. partial matches)
Normalization is necessary for numeric attributes
(interval, ratio) – as discussed on next slide
13
Normalization
CBR (as with some other schemes, such as neural
networks) requires all numeric attributes to be on a
similar scale – thus normalize or standardize
(different term than DB normalization)
 One normalization approach:

Norm val = (val – minimum value for attribute)
(max value for attribute – min val)

One standardization approach:
Stand val = (val – mean) / SD
14
Missing Values


Frequently treated as maximum distance to ANY other
value
For numerics, the maximum distance depends on what
value comparing to
– E.g. if values range from 0-1 and comparing a missing value to
.9, maximal possible distance is .9
– If comparing a missing value to .3, maximal possible distance
is .7
– If comparing missing value to .5, maximal possible distance is
.5
15
Dealing with Noise
Noise is something that makes a task harder (e.g.
real noise makes listening/hearing harder)
(noise on data transmission makes communication
more difficult)
(noise in learning is incorrect values for attributes,
including class, or could be un-representative
instance)
 In instance-based learning, an approach to dealing
with noise is to use greater number of neighbors, so
are not led astray by an incorrect or weird example

16
K-nearest neighbor
Can combine “opinions” by having the K nearest
neighbors vote for the prediction to make
 Or, more sophisticated weighted k-vote

– An instance’s vote is weighted by how close it is to
the test instance – closest neighbor is weighted more
than further neighbor
17
Effect of Distance Weighting Scheme
Dist
.1
.2
.3
.4
.5
.6
.7
.8
.9
Vote 1 – dist .9
.8
.7
.6
.5
.4
.3
.2
.1
Vote 1 / dist
5
3.3 2.5
2
1.7 1.4 1.2 1.1
•1
10
– dist is smoother
•1 / dist gives a lot more credit to instances that
are very close
18
Let’s try AICBR
Do Zoo – single run, and all run
 Do njcrimenominal

19
K-nearest, Numeric Prediction
Average prediction of k-nearest
 OR weighted average of k-nearest based on
distance

20
Weighted Similarity / Distance
Distance/Similarity function should weight
different attributes differently – key task is
determining those weights
 Weight learning could make up for lack of
normalization, but that is pushing the weight
learning algorithm unnecessarily

– Plus, if looking at weights, obscures their meaning
Next slide sketches general wrapper approach
 Other approaches focus on “Feature Selection” –
attributes selected to be in or out

21
Learning weights
Divide training data into training and validation (a
sort of pre-test) data
 Until time to stop

– Loop through validation data
» Predict, and see success / or not
» Compare validation instance to training instances used to
predict
» Attributes that lead to correct prediction have weights increased
» Attributes that lead to incorrect prediction have weights
decreased
» Re-normalize weights to avoid chance of overflow
22
Learning re: Instances

May not need to save all instances
– Very normal instances may not all need be be saved
– One strategy – classify during training, and only keep
instances that are misclassified
» Problem – will accumulate noisy or idiosyncratic examples
– More sophisticated – keep records for how often examples lead
to correct and incorrect predictions and discard those that have
poor performance
– An in between strategy – weight instances based on their
previous success or failure (I’m experimenting with)
– Some approaches actually do some generalization
23
Nearest Neighbor Plusses & Minuses
+ Can be used for both Classification and
Continuous prediction
 + Input Variables can be independent or highly
correlated - no assumptions made
 + Cases can sometimes be drawn from existing
DBs
 + Use of Stored Examples for Prediction is not
that inefficient (and easily parallelizable)
 + Performance tends to be competitive
 +/- Explanatory Understandability
 - Danger of “Overfitting”

24
Possible Improvements over Basic InstanceBased Reasoning
Better Matching (e.g. using background
knowledge or generalization)
 Adaptation (e.g. using numerical or background
knowledge, or even previous cases)
 Learning to Improve Matching (e.g. advanced
weight learning (weights based on categories),
weights on cases, knowledge-based indexing, or
failure-avoidance (censors))
 Memory organization for prediction efficiency

25
Sources re: Full CBR
http://www.iiia.csic.es/People/enric/AICom.html
 http://www.cs.indiana.edu/~leake/papers/p-9601_dir.html/paper.html
 http://www.ai-cbr.org/classroom/cbr-review.html

26
CBR with AIAICBR
Experiment with threshold on Basketball
(discretized answer),
 Japanbank

27
Perhaps we’ll do CBR with Weka Too
28
End CBR
29
Download