Case-Based Reasoning (Not covered in book) Introduction, or what is a case? CBR and Learning CBR Cycle Instance-Based Common Simplification Measuring Distance Weighted Features Case Forgetting? Plusses / Minuses Software – aiaiCBR / WEKA Summary 1 Introduction, or what is a case? Many researchers have found that representing expert knowledge as “cases” instead of “rules” is more natural and robust What a Should include relevant “features” to enable determining applicability of the case to the task Includes a solution to the problem Most commonly stored in a record/structure format, with features and solutions being attributes, each case a single record When solving problem, find most similar previous case(s), and use to suggest solution to new problem – Case-Based Reasoning (CBR) “case” is depends a fair amount on the task / domain 2 CBR & Learning Possibly the simplest form of machine learning – Training cases are merely stored (kind of like “rote learning”) – Has been called “lazy learning” – no work is done until an answer is needed May include storing newly solved problems – adding to the knowledge-base (case-base) 3 Case-Based Reasoning Cycle At the highest level of generality, a general CBR cycle may be described by the following four processes: 1. RETRIEVE the most similar case or cases 2. REUSE the information and knowledge in that case to solve the problem 3. REVISE the proposed solution 4. RETAIN the parts of this experience likely to be useful for future problem solving A new problem is solved by retrieving one or more previously experienced cases, reusing the case in one way or another, revising the solution based on reusing a previous case, and retaining the new experience by incorporating it into the existing knowledge-base (case-base). 4 Simplifications are Common Instance-Based – Case = instance – generally does match, but no adapt – Match usually done using “nearest neighbor” » Each new instance to be solved is compared to all training instances, with “distance” or “similarity” calculated for each attribute for each instance – CBR tools frequently just do this 5 Common Applications Helpdesk Diagnosis 6 Real World Some examples of CBR at work from the Sixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-04): Deployed Application Papers: – Tenth Anniversary of the Plastics Color Formulation Tool. By William Cheetham. "Since 1994 GE Plastics has employed a case-based reasoning tool that determines color formulas which match requested colors. This tool, called FormTool, has saved GE millions of dollars in productivity and material (i.e. colorant) costs. The technology developed in FormTool has been used to create an online color selection tool for our customers called ColorXpress Select. A customer innovation center has been developed around the FormTool software." – The General Motors Variation-Reduction Adviser: Deployment Issues for an AI Application. By Alexander P. Morgan, John A. Cafeo, Kurt Godden, Ronald M. Lesperance, Andrea M. Simon, Deborah L. McGuinness, and James L. Benedict. "The General Motors Variation-Reduction Adviser is a knowledge system built on case-based reasoning principles that is currently in use in a dozen General Motors Assembly Centers. This paper reviews the overall characteristics of the system and then focuses on various AI elements critical to support its deployment to a production system. A key AI enabler is ontology-guided search using domain-specific ontologies." Emerging Application Papers – CaBMA: Case-Based Project Management Assistant. By Ke Xu and Hector Muñoz Avila. "We are going to present an implementation of an AI system, CaBMA, built on top of a commercial project management tool, MS Project. Project management is a business process for successfully delivering one-of-a kind products and services under real-world time and resource constraints. CaBMA (for: Case-Based Project Management Assistant) provides the following functionalities: (1) It captures cases from project plans. (2) It reuses captured cases to refine project plans and generate project plans from the scratch. (3) It maintains consistency of pieces of a project plan obtained by case reuse. (4) It refines the case base to cope with inconsistencies resulting from capturing cases over a period of time. CaBMA adds a knowledge layer on top of MS Project to assist the user with his project management tasks." 7 Real World Applying Case-Based Reasoning to Manufacturing. By David Hinkle and Christopher Toomey. AI Magazine 16(1): 65-73 (Spring 1995). "CLAVIER is a case-based reasoning (CBR) system that assists in determining efficient loads of composite material parts to be cured in an autoclave. CLAVIER's central purpose is to find the most appropriate groupings and configurations of parts (or loads) to maximize autoclave throughput yet ensure that parts are properly cured. CLAVIER uses CBR to match a list of parts that need to be cured against a library of previously successful loads and suggest the most appropriate next load. clavier also uses a heuristic scheduler to generate a sequence of loads that best meets production goals and satisfies operational constraints. The system is being used daily on the shop floor and has virtually eliminated the production of low-quality parts that must be scrapped, saving thousands of dollars each month. As one of the first fielded CBR systems, CLAVIER demonstrates that CBR is a practical technology that can be used successfully in domains where more traditional approaches are difficult to apply." 8 Nearest Neighbor •x x •x •y •x x •y •x •x •x •z •z •z •z x •z •z •z •z •y •y T •y •y •y •y •y •z 9 Measuring Distance / Similarity Distance / Similarity are opposites – it doesn’t matter which you measure Distances for each attribute calculated, must be combined Combination of distances – commonly via “city block” or “euclidean” (“crow flies”) – <go back one slide to illustrate> Higher power(s) increase the influence of large differences 10 Example Distance Metrics Attributes A B C Sum Test Train 1 Train 2 Train 3 City Block 1 City Block 2 5 6 7 5 1 2 5 4 3 5 1 2 5 9 7 10 4 2 6 6 City Block 3 0 0 5 5 Euclidean 1 1 1 16 18 Euclidean 2 4 4 4 12 Euclidean 3 0 0 25 25 11 Kinds of attributes Binary/boolean – two valued; e.g. Resident Student? Nominal/categorical/enumerated/discrete – multiple valued, unordered; e.g. Major Ordinal - Ordered, but no sense of distance between – – e.g. Fr, So, Jr, Sr; Grad – e.g. Household Income 1 - < 15K, 2 – 15-20K, 3- 20-25K, 4- 2530K, 5 – 30-40K, 6 – 40-50K, 7 - > 50K Interval – ordered, distance is measurable; e.g. birth year Ratio – an actual measurement with defined zero point such that we could say that one value is double another or triple, or ½; e.g. GPA CBR can work with all kinds of attributes (unlike some other learning methods) 12 More Similarity/Distance Nominal Attributes frequently considered all or nothing - a complete match or no match at all – Match similarity = highest possible value, or distance = 0 – Not Match similarity = 0; or distance = highest possible value Nominals that are actually ordered (Ordinals) ought to be treated differently (e.g. partial matches) Normalization is necessary for numeric attributes (interval, ratio) – as discussed on next slide 13 Normalization CBR (as with some other schemes, such as neural networks) requires all numeric attributes to be on a similar scale – thus normalize or standardize (different term than DB normalization) One normalization approach: Norm val = (val – minimum value for attribute) (max value for attribute – min val) One standardization approach: Stand val = (val – mean) / SD 14 Missing Values Frequently treated as maximum distance to ANY other value For numerics, the maximum distance depends on what value comparing to – E.g. if values range from 0-1 and comparing a missing value to .9, maximal possible distance is .9 – If comparing a missing value to .3, maximal possible distance is .7 – If comparing missing value to .5, maximal possible distance is .5 15 Dealing with Noise Noise is something that makes a task harder (e.g. real noise makes listening/hearing harder) (noise on data transmission makes communication more difficult) (noise in learning is incorrect values for attributes, including class, or could be un-representative instance) In instance-based learning, an approach to dealing with noise is to use greater number of neighbors, so are not led astray by an incorrect or weird example 16 K-nearest neighbor Can combine “opinions” by having the K nearest neighbors vote for the prediction to make Or, more sophisticated weighted k-vote – An instance’s vote is weighted by how close it is to the test instance – closest neighbor is weighted more than further neighbor 17 Effect of Distance Weighting Scheme Dist .1 .2 .3 .4 .5 .6 .7 .8 .9 Vote 1 – dist .9 .8 .7 .6 .5 .4 .3 .2 .1 Vote 1 / dist 5 3.3 2.5 2 1.7 1.4 1.2 1.1 •1 10 – dist is smoother •1 / dist gives a lot more credit to instances that are very close 18 Let’s try AICBR Do Zoo – single run, and all run Do njcrimenominal 19 K-nearest, Numeric Prediction Average prediction of k-nearest OR weighted average of k-nearest based on distance 20 Weighted Similarity / Distance Distance/Similarity function should weight different attributes differently – key task is determining those weights Weight learning could make up for lack of normalization, but that is pushing the weight learning algorithm unnecessarily – Plus, if looking at weights, obscures their meaning Next slide sketches general wrapper approach Other approaches focus on “Feature Selection” – attributes selected to be in or out 21 Learning weights Divide training data into training and validation (a sort of pre-test) data Until time to stop – Loop through validation data » Predict, and see success / or not » Compare validation instance to training instances used to predict » Attributes that lead to correct prediction have weights increased » Attributes that lead to incorrect prediction have weights decreased » Re-normalize weights to avoid chance of overflow 22 Learning re: Instances May not need to save all instances – Very normal instances may not all need be be saved – One strategy – classify during training, and only keep instances that are misclassified » Problem – will accumulate noisy or idiosyncratic examples – More sophisticated – keep records for how often examples lead to correct and incorrect predictions and discard those that have poor performance – An in between strategy – weight instances based on their previous success or failure (I’m experimenting with) – Some approaches actually do some generalization 23 Nearest Neighbor Plusses & Minuses + Can be used for both Classification and Continuous prediction + Input Variables can be independent or highly correlated - no assumptions made + Cases can sometimes be drawn from existing DBs + Use of Stored Examples for Prediction is not that inefficient (and easily parallelizable) + Performance tends to be competitive +/- Explanatory Understandability - Danger of “Overfitting” 24 Possible Improvements over Basic InstanceBased Reasoning Better Matching (e.g. using background knowledge or generalization) Adaptation (e.g. using numerical or background knowledge, or even previous cases) Learning to Improve Matching (e.g. advanced weight learning (weights based on categories), weights on cases, knowledge-based indexing, or failure-avoidance (censors)) Memory organization for prediction efficiency 25 Sources re: Full CBR http://www.iiia.csic.es/People/enric/AICom.html http://www.cs.indiana.edu/~leake/papers/p-9601_dir.html/paper.html http://www.ai-cbr.org/classroom/cbr-review.html 26 CBR with AIAICBR Experiment with threshold on Basketball (discretized answer), Japanbank 27 Perhaps we’ll do CBR with Weka Too 28 End CBR 29