Chapter 13 Data Mining Recommended References • This lecture assumes some knowledge on learning systems. We recommend: – P. Langley: Elements of Machine Learning. Morgan Kaufman 1996. – T.M. Mitchell: Machine Learning. McGraw Hill 1997. – R. Bergmann: Slides on “Lernende Systeme”, wwwagr.informatik.uni-kl.de/~bergmann ;also: M.M. Richter: Lernende Systeme, Vorlesungsmanuskript Kaiserslautern. – Bergmann, R. & Stahl, S. (1998). Similarity Measures for Object-Oriented Case Representations. Proceedings of the European Workshop on CaseBased Reasoning, EWCBR'98. • Data Mining references: – P. Adriaans, D.Zatinge: Data Mining. Addison Wesley 1996. – Th. Reinartz: Focusing Solutions for Data Mining. Springer Lecture Notes in AI 1623, 1998. – S.M. Weiss, N. Indurkhya: Predictive Data Mining. Morgan Kaufman 1997. -2- (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Data Mining, Learning and Performance (1) • The ultimate goal is to make an optimal performance of some process P. • The meaning of this is given by the users utility. • In order to make an optimal performance certain knowledge is necessary. This knowledge may be implicitly in the available data and has to be made usable, i.e. has to be learned. • For learning one needs: – What are precisely the goals? – How to measure the achievements of the goals? – How to react if goals are not achieved ? -3- (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Data Mining, Learning and Performance (2) The performance of the process P is tested in experiments which generates certain data D. These data are the input to some evaluation function F. Users view on the performance of P ? Formal evaluation function F for P • Coincidence of the users view on the performance and the result of the evaluation is wanted. • Often the coincidence can be only be approximated -4- (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Data Mining, Learning and Performance (3) Process P and knowledge K experiment Generated data D Evaluation result update Improved Process P’ and knowledge K’ Data mining Data Mining: analyze data and evaluation result Analysis result Learning -5- (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern KDD: Knowledge Discovery in Data Bases • Knowledge Discovery in Data Bases is the non-trivial process of identifying valid, novel, potential useful, and ultimately understandable patterns in data (Fayyad). • Data Mining is often used as a synonym for KDD but sometimes restricted to a crucial step in KDD: • The step of applying data analysis and discovery algorithms that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data. -6- (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern KDD Phases Business Understanding Data Understanding Data Preparation Data Exploration Data Mining Evaluation Deployment -7- (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Requirement Analysis for KDD Processes data characteristics application properties requirement analysis volume quality representation domain characteristics system context characteristics application requirements application goals -8- (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Data Mining and the Pre-Sales Process • The purpose of the data mining for the pre-sales process is to get knowledge which allows the supplier to catch more customers of the intended target groups. • The knowledge obtained can be concerned with – The market in general – The market with respect to certain products – The behavior of certain customer classes: Marketing Campaign Management: How react customers on marketing actions ? Basket Analysis: What buy customers typically ? – Individual customers and their behavior • The general strategy for data mining of a company is the strategic model which on the other hand is influenced from feed back of the results obtained. -9- (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Data Mining and the Sales Process • The purpose of the data mining for the sales process is to get knowledge which allows the supplier to improve the quality of his processes in such a way that customers who have contacted the supplier – are guided efficiently in the sales process – make a positive decision for the sale • This includes – offering the products appropriately – offering adequate alternatives – guiding effeciently through the dialogue • This influences the diagnostic and the action model. - 10 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Data Mining in the After Sales Process • The purpose of the data mining for the after-sales process is to get knowledge which allows to deal with customer questions and complaints more efficiently. • The goals are – improve recognition of reasons for calls – avoid repeated calls – come efficiently to solutions • Useful knowledge is mainly contained in experiences and therefore the collection of experiences is central. • Experiences are best stored as cases in CBR. - 11 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern The Starting Point: Data (1) • Data have a certain quality – Correctness and completeness problem • It is essential to address the problem of data quality: if you feed garbage into the system, you will get garbage out ! – the insights obtained from the data lead to incorrect consequences (wrong data) – the insights are too general to be useful (incomplete data) - 12 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Starting Point: Data (2) • Data may be noisy • Incorrect data – wrong values for the attributes – incorrect classification – duplicate data • Incomplete data – missing values for some attributes – missing attributes – missing objects • Data not usable – free text difficult to cope with – terminology not understood – not suitable for the intended goals - 13 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Starting Point: Data (3) • Knowledge management task: • Quality management ! • Data sampling – Define the goals – Quality is more important than quantity – Make use of existing information sources to ensure completeness of the base – Create your own sources – Data have to come in time: Data which are too old are not useful (updating problem) • See chapter 15. - 14 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Data for what Knowledge ? • The way data are obtained depends on the type of knowledge one is interested in. • We distinguish three main types: – Knowledge about some market. This will influence the strategic, the diagnostic model and the action model of the supplier. – Knowledge about individual customers. It is used to treat the customer individually, e.g. making special offers. – Knowledge about technical objects: Their quality, how to explain to operate them etc, • With the type of knowledge different – goals of the supplier – data sources are connected. - 15 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Data Ware House Idea: Store knowledge like physical objects Allows: Access, delivery, manipulation as for physical objects. Data Ware House: • Access to knowledge for immediate use • Makes knowledge available for improving the quality The data warehouse is managed by the knowledge manager. - 16 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern From Data to Knowledge (1) Association Relationships Connectivity Wisdom Insight, moral How, why? Implications Knowledge Strategy, practice, method What, when, where, who? What? (Understand principles) (Understand models, rules, patterns) Information Description, definition, perspective (Understand relations) Understanding Data Facts - 17 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern From Data to Knowledge (2) • Data are raw products • Information pieces are semifinished products • Knowledge and wisdom are high quality products Abstraction But: When using knowledge acces to actual data and information is necessary, How to do this ? - 18 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern From Data to Knowledge (3) It is a knowledge management task to provide for each application of knowledge the needed actual data: Task to perform Knowledge applied Data needed - 19 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern From Data to Knowledge (4) • • Only explicit knowledge can be used directly Explicit knowledge is directly formulated: Prescriptions, rules, norms Suggestions, ways to behave General laws, exceptions Hierarchical relations Properties, Constraints . . . - 20 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern From Data to Knowledge (5) • • Implicit knowledge cannot be directly used Implicit knowledge is: contained in data and information often hidden and difficult to discover not directly applicable silent knowledge - 21 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern From Data to Knowledge (6) Implicit knowledge: Sales statistics contain implicit knowledge about customer preferences Data bases about accidents contain implicit knowledge about dangerous situations Test data contain implicit knowledge about quality - 22 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern From Data to Knowledge (7) • Data and pieces of information have to be correct (or exact tolerances have to be given) • Knowledge has not to be totally correct in order to be useful: – Probabilities, Evidences – Heuristics – Rules of thumb – Vague statements („this is not reliable“, „the weather there is not nice in November“) – Fuzzy statements • A correct statement in a complex situation may even be useless because it is too complicated - 23 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Wisdom • Wisdom is usually referred to as a very advanced type of knowledge • It refers to the understanding of basic background principles • Only in the exact sciences it can be expressed in precise terms • Wisdom is of relevance for the strategic model (which is mainly informal) - 24 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Make Knowledge Explicit (1) • General properties of products need to be differently represented in different situations: • Vacations in Tirol are nice and warm (for persons from Alaska) nice and cool (for persons from Brazil) • A car is good and speedy on small and hilly roads (Germany) is comfortable (USA) - 25 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Make Knowledge Explicit (2) • Use the properties of a product in order to – guarantee the satisfaction of different safety regulations – satisfy different types of demands – respect different types of sensitivities • Describe these properties in different ways • For such purposes one has to extract the specific views from the overall knowledge - 26 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Reliability of Knowledge (1) Darkness indicates reliability Obtained by learning and data mining Obtained by CBR This assumes that the underlying data and information bases are reliable Obtained by approximative reasoning Obtained by logical deduction Obtained by direct retrieval Extension of knowledge - 27 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Reliability of Knowledge (2) • This schema is only a rough and general indication. • The success in applications depend heavily on e.g. – correctness, amount and typicality of data – adequate choice of the specific method and precision with which it is applied – number of experiments carried out – testing of the results • Therefore the success depends on the investigated effort. • There is again the utility question: Costs of obtaining knowledge versus gain of applying knowledge - 28 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Sources of Data • General analysis, public domain – accessible to everyone but often widely distributed and hard to collect • General analysis, performed by the company itself or some paid institution – expensive, but can be taylored to the needs of the company • History of customers – requires customers who buy regularly – has to be updated regularly • Internal analysis of customer behavior – reaction on change of • prices • dialogue strategies etc. • Cases – collected experiences, failure statistics etc. - 29 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern History of Customers • Knowledge about behavior about individual customers should in general not be obtained by asking personal questions but rather automatically. • One possibility is to do this at the cashier if the customer pays by a customer or credit card. A method for E-C is if the customer orders directly over the net. • There may be certain restrictions by law. • The history can contain among others – main products ordered and their quantities – times or events when ordered (weekend, holidays, time of the year,...) • The history should contain (if possible) information about the customer (for description of customer classes) – age, sex, profession, location of living,... - 30 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Cases (1) • In the after sales process histories have to be recorded if they are available, they are the material for the cases. • Often there are not enough cases available to cover all or most of the relevant problem situations. • In this situation artificial cases can be created which is done by variation of relevant parameters. • Both, collecting and creating cases requires some a priori understanding of the tasks to be performed. • To build a CBR-System one has to define the four containers vocabulary, case base, similarity measure and solution transformation. - 31 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Cases (2) • There are commercial systems like CBR-Works which support the collection and representation of cases (see also chapters 3 and 12). • A general methodology for developing CBR-System for applications in the help desk area is described in – R. Bergmann, S. Breen, M. Göker, M. Manago, S. Wess: Developing Industrial Case-Based Reasoning Application The INRECA- Methodology. Springer Lecture Notes in AI 1612, 1999. - 32 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern From Data to Information Using Knowledge Raw Data Customer: Company X, Architects PC component: Matrox G100? will be Company X: 1x PC Dual-Pentium XL437, Sold 4/97 2x ML 649 (P233/124/9,6), Sold 5/97 SW: High-End, CAD&3D Visual., TCP/IP Netw., … G100: Entry level graphics card, AGP slot necessary, very good Price/Power relation, limited 3D power, ... by using Knowledge “The G100 is only little useful for Company X because the architects use high-end 3D graphics software. G100 is an entry level graphics card and additionally needs it an AGP slot which is not built in the current HW configuration of the PC’s.” valuable Information - 33 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Three Main Phases • Measurement: Collects numerical data about the intended utility • Evaluation: Extracts statements about the utility from the data (excellent, good, sufficient, improved, insufficient, ...) • Sensitivity Analysis: Extracts influence factors responsible for the result of the evaluation. • The learning and data mining tools can – use the results of all three phases – can improve these phases - 34 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Measurement • The utility is often only informal and implicit in the head of the user. • The measurement problem is – to map it on quantitative magnitudes – to define procedures which measure these quantities. • The measurement procedures are often difficult to define and expensive. • The parameters in the procedures have to be named precisely such that the procedure can be applied repeatedly (as e.g. in the exact sciences) - 35 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Evaluation • The evaluation of the measured data has to close the gap between the data and the utility of the user: – the evaluation predicate should (at least ideally) coincide with the predicate which is given by the user to the performance (see also the relation between similarity and utility in chapter 6). • The evaluation should contain a statement about its reliability, e.g. – tolerances for errors – error probabilities – confidence intervals • The reliability depends heavily on the input data (volume, representability, correctness,noise, etc.) - 36 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Sensitivity Analysis • This is the most difficult and the most important phase. • The evaluation is given as a function Ev(d1,....,dn) where the di are data obtained by the measurement. • The data di are on the other hand an indirect consequence of parameters pi which can be directly influenced by the person who designs the process (or product etc.) which is evaluated: – Ev(d1,...,dn) = Influence(p1,...,pm) – where the function Influence is in general unknown. • We call a parameter pi an (important) influence factor if small variations of pi result in large variations of the function influence(....,pi,...). • The determination of influence factors is the basis for learning improvements of object under consideration. - 37 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern QMCB: Quantitative Models of Consumer Behavior • Goal: The calculation and prediction of meaningful market diagnostics on the basis of data. • A possible approach: Integration of statistical methods and models as well as econometric models in a knowledge based system. • Tasks: – Descriptive (a posterori) analysis of data – Model based simulation of future buying behavior. • The special types of task require special data representations for useful evaluations. - 38 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Different Types of Forecasts • The types vary with respect to the knowledge they contain and the usefulness of the prognosis. From the QMCB one should be able to compute directly (examples): – – – – Market share of a product Product purchase probability, expectation and variance Brand purchase probability, expectation and variance Heterogeneity in purchase rates • Indirect consequences: – relative product attraction – relative brand attraction – etc. - 39 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Example System: KVASS (1) • KVASS (KaufVerhaltensAnalyse und SimulationsSystem) is an example of a model and knowledge based data analysis system. – Reference: R. Decker:Knowledge Based Selection and Application of Quantitative Models of Consumer Behavior. Information Systems and Data Analysis (ed. H.H.Bock, W.Lenski, M.M.Richter), Springer Verlag 1994, p. 405-414. • Basic idea: Model data with a predefined set of descriptors. These are essentially attributes with there domains, e.g. – estimation method : {undefined, least squares, ...., moments} – type of recording : {undefined, diary, ..., interview} - 40 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Example System: KVASS (2) • Classes of descriptors are: – Essential aspects for a general description (type of recording, market share etc.) – Temporal aspects (periods for data collections etc.) – Information on the models used for computation (e.g. estimation method) – Technical descriptors for interpretation of the representation (e.g. ordinal, nominal etc.) – Combination of descriptors allow to represent complex situations; this can be translated in more understandable relational representations (see chapter 4). - 41 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Example System: KVASS (3) • The system describes essentially a measurement procedure, i.e. the first phase. • The purpose is not to make an evaluation about the success of a product or process of the company. • The correctness condition is that the results provided by the analysis of the system coincide with the reality. • The results of the system are on the other hand important for the sensitivity analysis concerning success or failure of processes or products designed by the company. - 42 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Causal Analysis (1) • Causal analysis is some kind of sensitivity analysis. Task: Make causal relations explicit. • Suppose the Xi are activities and the Yi are sales results. Notation: – Xi + Yi : positive influence – Xi - Yi : negative influence – no arow : neutral • Initial situation: A suspected model for the influence. • Either experiment: Variation of the Xi and measurement of the Yi or analysis in several companies. • Data analysis: E.g. by analyzing the covariance structure. • Result: Revised model and refined model. - 43 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Causal Analysis (2) • Example (artificially created): – – – – – X1: Effort in catalogues X2: Effort in dynamic forms X3 :Effort in recording and applying customer histories Y1: Return from book sales Y2: Return from high tech products sales • Initial model based on qualitative knowledge: + Y1 X1 + X3 + +Y2 X2 + - 44 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Causal Analysis (3) Revised model: X1 + Y1 - X3 X2 + + Y2 + A possibility for coming to a refined quantitative model is to assume a linear model (which may be justified by some knowledge). This leads to the linear equations The solutions for the Y1 = a11X1 - a13X3 coefficients aik will Y2 = a21X1 + a22X2 + a23X3 determine a quantitative model. - 45 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Quality Management: Internal Analysis • As the first step the goals of the analysis have to be defined: – Where are the weak points ? – What has to be improved or optimized ? – Where are improvements possible ? • This is part of the requirements analysis • Further steps include – – – – identify groups of objects with similar quality characteristics identify properties of these groups describe these groups draw conclusions for quality improvements - 46 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Example: Quality Analysis for Dialogues (1) • Classification of Dialogues (evaluation of the user): – succesfully finished – quit because no adequate product available – quit for unknown reasons : This is the failure class. • Measurement: – Has to collect data which arise during the dialogue – These data may not be recorded during an ordinary dialogue, e.g. • Which questions raised by the customer where dealing with a certain property type of the product • Which actions where performed by customers from a certain customer class – The quality of the measured data has to be considered - 47 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Quality Analysis for Dialogues (2) • The evaluation is simple because it is the same as the one of the user. • The sensitivity analysis has two phases here: – (1) Describe the evaluation result in terms of measured quantities and determine the influence factors of this description. – (2) Describe the evaluation result in terms of factors which define the dialogue. • The first phases involves already a learning step: – The classification of the dialogue in terms of measured quantities has to be learned. This classification approximates the real classes obtained from the evaluation. - 48 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Quality Analysis for Dialogues (3) • The analysis of the first phases is based on the dialogue situations and additionally measured data. • Typical candidates for interesting data in order to classify types of situations are • • • • length of the dialogue not understandable terms customer questions (How often? Typical ones?) etc. • The selection of these candidates depends on a hypothesis for a preliminary dependency model. The data mining and learning methods are used in order to refine and correct this model. - 49 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Quality Analysis for Dialogues (4) • The result allows a prognosis of the dialogue class from the occurrence of dialogue situations which are important influence factors (but here in terms ob measured data!), in particular a description of failure situations, i.e. situations which lead with high probability to a failure dialogue. • The description of the failure situations is refined in order to – discover dependencies between influence factors – in particular to obtain definitions of earliest failure situations in dialogues, i.e. earliest situations in the dialogue which will lead to a failure. • The earliest failure situations give rise to the second phase of the sensitivity analysis. - 50 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Quality Analysis for Dialogues (5) • Second phase: Analysis of reasons for reaching earliest failure situations, mainly: • Which elements in the strategy are responsible? • Weak points of the knowledge base (e.g. wrong prices for products)? • These reasons can directly be influenced when the dialogue is designed. • Consequences of the analysis (learned results): – improved knowledge base – Possible changes of the strategy – Possible disadvantages of changes • Final recommendations: Update - 51 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Discussion • The dialogue and the situations can be given in a (possibly object oriented) attribute-value representation. Some virtual attributes (like length of dialogues) can be useful, they contain valuable knowledge. • One way to proceed is to use cluster analysis techniques and machine learning algorithms (e.g. CN2, C4.5) for learning the classification. • Another way is to consider the data base as a case base and start with an initial similarity measure which is improved during the development of a CBR-system for the classification and the improvement suggestions. - 52 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Learning Informal Concepts • Many concepts in e-c, in particular in connection with CRM and customer classes are of informal character where no direct formal equivalent exists. • Computer support requires a formal notion which approximates the informal concept as good as possible. • Such formal versions have to be learned and the learning process requires data mining activities which are again based on studies of customers and their behavior. • It has to taken into account that informal concepts are usually not stable over time. - 53 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern The Correctness Problem • The correctness problem for the statement that two expressions are logically equivalent reduces to a formal proof. • How to “proof” that an informal and a formal concept are equivalent ? – Formal systems do not have access to informal notions. – Humans have usually difficulties to compare both types of notions because this refers to a broad scope of intended uses. • Required is a kind of Turing Test which decides that a human who uses the informal version and a machine which uses the formal version refer to the same concept. • The ordering principle is that the test does not deal with the concept itself but with partial orderings related to the concept. - 54 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern The Ordering Principle and a Turing Test (1) Suppose there is a partial ordering „<„ with the concept C associated: The partial ordering then again has two versions: formal and informal. The Turing test refers to these two versions of „<„ : Formal version of C Informal human version of C goal - 55 - The goal is that when variations of the arguments of < are presented: The human says „up“ if and only if the formal system says „up“ (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern The Ordering Principle and a Turing Test (2) Concept to grasp: Typical lion better better Human: Aesthetic property - 56 - Formal version uses Ordering: Quotient of length/height The partial ordering approximates the concept C in the sense that semantics of y < z is : z is more typical for C than y is. (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern The Ordering Principle and a Turing Test (3) • Advantages of the ordering principle: – The validity of the equivalence of formal and informal concepts can be effectively validated by Turing tests, i.e. by experiments. – If there are several orderings involved this can be done for all of them. – The search for a formal counterpart of an informal concept can be performed in an approximative way and partial validation is possible. • The formal partial ordering is what has to be learned • The learning process is an approximation process in order to perform the Turing test sufficiently well. - 57 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern The Learning Scenario • (1)The informal concept C on a set U is regarded as a fuzzy set where a set of prototypes P U is known. (2) An informal relation rx(y,z) stating “y is more similar to x than z is” • The object to be learned is a similarity measure sim: U x P [0,1]. • Turing test: The relations x (from the formal similarity measure) and rx agree. • We decompose the approach into two basic steps: – A first step to get a suitable representation language : Concept learning. – A second step for learning the similarity measure: Subsymbolic learning. - 58 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Learning of Weights • Learning similarities is an example of subsymbolic learning and reduces often to learning weights: We distinguish: n – global weights: sim(q, c ) w i sim i (q i , ci ) i 1 – prototype specific weights: wi,c: relevance matrix sim(q, c ) n w i,c sim i (q i ,ci ) i 1 • Change of weights: Change of relevance of features. • Error function determined by Turing test. • Learning procedures can be supervised or unsupervised. - 59 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Learning of Weights with/without Feedback • Many algorithms for both learning types are known. • Learning without Feedback for Retrieval / Reuse – Use the the distribution of cases in the case base in order to determine the relevance of attributes + A2 + - A1 is more important than A2 + + • Learning with Feedback - A1 – Correct or incorrect choice of cases / classification – result leads to the change of weights - 60 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Learning of Weights without Feedback • Determination of class specific weights: – Binary coding of the attributes by • Discretizing of real valued attributes • Transforming each symbolic attribute into n binary attributes – Suppose • wik the weight for attribute i for class k • class(c) the class (solution) in case c • ci the attribute i in case c – Put: wik = P( class(c)=k | ci) conditional probability that the class of a case is k under the condition that the attribute i vorliegt is given. – Estimation of the probabilities use samples of the case base. - 61 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Learning of Weights with Feedback • Correct or incorrect classification leads to a correction of weights: wik := wik + Dwik • There are several ways for the adaptation of the weights: • Approach of Salzberg (1991) for binary attributes: – Feedback = positive (i.e. correct classification): • Weight for attributes with the same values increases • Weight for attributes with different values decreases • Feedback = negative (i.e. wrong classification): • Weight for attributes with the same values decreases • Weight for attributes with different values increases • The increment Dwik remains constant. - 62 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern Summary • • • • • • Relations between data mining and kdd. Relations between data mining, learning and performance. The way from data to knowledge. Making knowledge explicit. Collecting cases and building a CBR-system Examples: – Quantitative models of consumer behavior (external analysis) – Causal analysis (external analysis) – Quality analysis for dialogues (internal analysis) • Learning of informal concepts can be reduced to learning of similarity measures. - 63 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern