13. Data Mining

advertisement
Chapter 13
Data Mining
Recommended References
• This lecture assumes some knowledge on learning
systems. We recommend:
– P. Langley: Elements of Machine Learning. Morgan Kaufman 1996.
– T.M. Mitchell: Machine Learning. McGraw Hill 1997.
– R. Bergmann: Slides on “Lernende Systeme”,
wwwagr.informatik.uni-kl.de/~bergmann ;also: M.M. Richter:
Lernende Systeme, Vorlesungsmanuskript Kaiserslautern.
– Bergmann, R. & Stahl, S. (1998). Similarity Measures for Object-Oriented
Case Representations. Proceedings of the European Workshop on CaseBased Reasoning, EWCBR'98.
• Data Mining references:
– P. Adriaans, D.Zatinge: Data Mining. Addison Wesley 1996.
– Th. Reinartz: Focusing Solutions for Data Mining. Springer Lecture
Notes in AI 1623, 1998.
– S.M. Weiss, N. Indurkhya: Predictive Data Mining. Morgan
Kaufman 1997.
-2-
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Data Mining, Learning and Performance (1)
• The ultimate goal is to make an optimal performance of
some process P.
• The meaning of this is given by the users utility.
• In order to make an optimal performance certain
knowledge is necessary. This knowledge may be implicitly
in the available data and has to be made usable, i.e. has
to be learned.
• For learning one needs:
– What are precisely the goals?
– How to measure the achievements of the goals?
– How to react if goals are not achieved ?
-3-
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Data Mining, Learning and Performance (2)
The performance of the process
P is tested in experiments which
generates certain data D. These
data are the input to some
evaluation function F.
Users view on the
performance of P

?
Formal evaluation
function F for P
• Coincidence of the users view on
the performance and the result of
the evaluation is wanted.
• Often the coincidence can be only
be approximated
-4-
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Data Mining, Learning and Performance (3)
Process P and
knowledge K
experiment
Generated data D
Evaluation result
update
Improved Process P’
and knowledge K’
Data
mining
Data Mining:
analyze data and
evaluation result
Analysis result
Learning
-5-
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
KDD: Knowledge Discovery in Data Bases
• Knowledge Discovery in Data Bases is the non-trivial
process of identifying valid, novel, potential useful,
and ultimately understandable patterns in data
(Fayyad).
• Data Mining is often used as a synonym for KDD but
sometimes restricted to a crucial step in KDD:
• The step of applying data analysis and discovery
algorithms that, under acceptable computational
efficiency limitations, produce a particular
enumeration of patterns over the data.
-6-
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
KDD Phases
Business
Understanding
Data
Understanding
Data
Preparation
Data
Exploration
Data
Mining
Evaluation
Deployment
-7-
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Requirement Analysis for KDD Processes
data
characteristics
application
properties
requirement
analysis
volume
quality
representation
domain
characteristics
system context
characteristics
application
requirements
application
goals
-8-
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Data Mining and the Pre-Sales Process
• The purpose of the data mining for the pre-sales
process is to get knowledge which allows the supplier to
catch more customers of the intended target groups.
• The knowledge obtained can be concerned with
– The market in general
– The market with respect to certain products
– The behavior of certain customer classes: Marketing Campaign
Management: How react customers on marketing actions ?
Basket Analysis: What buy customers typically ?
– Individual customers and their behavior
• The general strategy for data mining of a company is
the strategic model which on the other hand is
influenced from feed back of the results obtained.
-9-
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Data Mining and the Sales Process
• The purpose of the data mining for the sales process is
to get knowledge which allows the supplier to improve
the quality of his processes in such a way that
customers who have contacted the supplier
– are guided efficiently in the sales process
– make a positive decision for the sale
• This includes
– offering the products appropriately
– offering adequate alternatives
– guiding effeciently through the dialogue
• This influences the diagnostic and the action model.
- 10 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Data Mining in the After Sales Process
• The purpose of the data mining for the after-sales
process is to get knowledge which allows to deal with
customer questions and complaints more efficiently.
• The goals are
– improve recognition of reasons for calls
– avoid repeated calls
– come efficiently to solutions
• Useful knowledge is mainly contained in experiences
and therefore the collection of experiences is central.
• Experiences are best stored as cases in CBR.
- 11 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
The Starting Point: Data (1)
• Data have a certain quality
– Correctness and completeness problem
• It is essential to address the problem of data
quality: if you feed garbage into the system,
you will get garbage out !
– the insights obtained from the data lead to
incorrect consequences (wrong data)
– the insights are too general to be useful
(incomplete data)
- 12 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Starting Point: Data (2)
• Data may be noisy
• Incorrect data
– wrong values for the attributes
– incorrect classification
– duplicate data
• Incomplete data
– missing values for some attributes
– missing attributes
– missing objects
• Data not usable
– free text difficult to cope with
– terminology not understood
– not suitable for the intended goals
- 13 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Starting Point: Data (3)
• Knowledge management task:
• Quality management !
• Data sampling
– Define the goals
– Quality is more important than quantity
– Make use of existing information sources to
ensure completeness of the base
– Create your own sources
– Data have to come in time: Data which are too old
are not useful (updating problem)
• See chapter 15.
- 14 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Data for what Knowledge ?
• The way data are obtained depends on the type of
knowledge one is interested in.
• We distinguish three main types:
– Knowledge about some market. This will influence the
strategic, the diagnostic model and the action model of the
supplier.
– Knowledge about individual customers. It is used to treat the
customer individually, e.g. making special offers.
– Knowledge about technical objects: Their quality, how to
explain to operate them etc,
• With the type of knowledge different
– goals of the supplier
– data sources
are connected.
- 15 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Data Ware House
Idea: Store knowledge like physical objects
Allows: Access, delivery, manipulation as for physical objects.
Data Ware House:
• Access to knowledge for immediate use
• Makes knowledge available for improving the quality
The data warehouse is managed by the knowledge manager.
- 16 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
From Data to Knowledge (1)
Association
Relationships
Connectivity
Wisdom Insight, moral
How, why?
Implications
Knowledge Strategy, practice, method
What, when,
where, who?
What?
(Understand principles)
(Understand models, rules, patterns)
Information Description, definition, perspective
(Understand relations)
Understanding
Data Facts
- 17 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
From Data to Knowledge (2)
• Data are raw products
• Information pieces are semifinished
products
• Knowledge and wisdom are high
quality products
Abstraction
But:
When using knowledge acces to
actual data and information
is necessary, How to do this ?
- 18 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
From Data to Knowledge (3)
It is a knowledge management task to provide for each application
of knowledge the needed actual data:
Task to perform
Knowledge applied
Data needed
- 19 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
From Data to Knowledge (4)
•
•
Only explicit knowledge can be used directly
Explicit knowledge is directly formulated:





Prescriptions, rules, norms
Suggestions, ways to behave
General laws, exceptions
Hierarchical relations
Properties, Constraints
.
.
.
- 20 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
From Data to Knowledge (5)
•
•
Implicit knowledge cannot be directly used
Implicit knowledge is:
 contained in data and information
 often hidden and difficult to discover
 not directly applicable
 silent knowledge
- 21 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
From Data to Knowledge (6)
Implicit knowledge:

Sales statistics contain implicit knowledge about
customer preferences

Data bases about accidents contain implicit
knowledge about dangerous situations

Test data contain implicit knowledge about quality
- 22 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
From Data to Knowledge (7)
• Data and pieces of information have to be correct (or
exact tolerances have to be given)
• Knowledge has not to be totally correct in order to be
useful:
– Probabilities, Evidences
– Heuristics
– Rules of thumb
– Vague statements („this is not reliable“, „the
weather there is not nice in November“)
– Fuzzy statements
• A correct statement in a complex situation may even
be useless because it is too complicated
- 23 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Wisdom
• Wisdom is usually referred to as a very
advanced type of knowledge
• It refers to the understanding of basic
background principles
• Only in the exact sciences it can be
expressed in precise terms
• Wisdom is of relevance for the strategic
model (which is mainly informal)
- 24 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Make Knowledge Explicit (1)
•
General properties of products need to be
differently represented in different situations:
• Vacations in Tirol are
 nice and warm (for persons from Alaska)
 nice and cool (for persons from Brazil)
• A car
 is good and speedy on small and hilly roads
(Germany)
 is comfortable (USA)
- 25 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Make Knowledge Explicit (2)
• Use the properties of a product in order to
– guarantee the satisfaction of different safety
regulations
– satisfy different types of demands
– respect different types of sensitivities
• Describe these properties in different ways
• For such purposes one has to extract the specific
views from the overall knowledge
- 26 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Reliability of Knowledge (1)
Darkness indicates
reliability
Obtained by
learning and data
mining
Obtained by CBR
This assumes
that the underlying
data and information
bases are reliable
Obtained by approximative
reasoning
Obtained by logical deduction
Obtained by direct retrieval
Extension of knowledge
- 27 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Reliability of Knowledge (2)
• This schema is only a rough and general indication.
• The success in applications depend heavily on e.g.
– correctness, amount and typicality of data
– adequate choice of the specific method and precision with
which it is applied
– number of experiments carried out
– testing of the results
• Therefore the success depends on the investigated
effort.
• There is again the utility question: Costs of obtaining
knowledge versus gain of applying knowledge
- 28 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Sources of Data
• General analysis, public domain
– accessible to everyone but often widely distributed and hard to collect
• General analysis, performed by the company itself or some
paid institution
– expensive, but can be taylored to the needs of the company
• History of customers
– requires customers who buy regularly
– has to be updated regularly
• Internal analysis of customer behavior
– reaction on change of
• prices
• dialogue strategies etc.
• Cases
– collected experiences, failure statistics etc.
- 29 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
History of Customers
• Knowledge about behavior about individual customers
should in general not be obtained by asking personal
questions but rather automatically.
• One possibility is to do this at the cashier if the customer
pays by a customer or credit card. A method for E-C is if the
customer orders directly over the net.
• There may be certain restrictions by law.
• The history can contain among others
– main products ordered and their quantities
– times or events when ordered (weekend, holidays, time of the
year,...)
• The history should contain (if possible) information about
the customer (for description of customer classes)
– age, sex, profession, location of living,...
- 30 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Cases (1)
• In the after sales process histories have to be recorded
if they are available, they are the material for the cases.
• Often there are not enough cases available to cover all
or most of the relevant problem situations.
• In this situation artificial cases can be created which is
done by variation of relevant parameters.
• Both, collecting and creating cases requires some a
priori understanding of the tasks to be performed.
• To build a CBR-System one has to define the four
containers vocabulary, case base, similarity measure
and solution transformation.
- 31 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Cases (2)
• There are commercial systems like CBR-Works
which support the collection and representation of
cases (see also chapters 3 and 12).
• A general methodology for developing CBR-System
for applications in the help desk area is described in
– R. Bergmann, S. Breen, M. Göker, M. Manago, S. Wess:
Developing Industrial Case-Based Reasoning Application The INRECA- Methodology. Springer Lecture Notes in AI
1612, 1999.
- 32 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
From Data to Information Using
Knowledge
Raw Data
Customer: Company X, Architects
PC component: Matrox G100?
will be
Company X:
1x PC Dual-Pentium XL437, Sold 4/97
2x ML 649 (P233/124/9,6), Sold 5/97
SW: High-End, CAD&3D Visual., TCP/IP Netw., …
G100:
Entry level graphics card, AGP slot necessary,
very good Price/Power relation, limited 3D power,
...
by using
Knowledge
“The G100 is only little useful for Company X
because the architects use high-end 3D graphics
software. G100 is an entry level graphics card
and additionally needs it an AGP slot which
is not built in the current HW configuration
of the PC’s.”
valuable
Information
- 33 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Three Main Phases
• Measurement: Collects numerical data about the
intended utility
• Evaluation: Extracts statements about the utility from
the data (excellent, good, sufficient, improved,
insufficient, ...)
• Sensitivity Analysis: Extracts influence factors
responsible for the result of the evaluation.
• The learning and data mining tools can
– use the results of all three phases
– can improve these phases
- 34 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Measurement
• The utility is often only informal and implicit in the head of
the user.
• The measurement problem is
– to map it on quantitative magnitudes
– to define procedures which measure these quantities.
• The measurement procedures are often difficult to define
and expensive.
• The parameters in the procedures have to be named
precisely such that the procedure can be applied
repeatedly (as e.g. in the exact sciences)
- 35 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Evaluation
• The evaluation of the measured data has to close the
gap between the data and the utility of the user:
– the evaluation predicate should (at least ideally) coincide with
the predicate which is given by the user to the performance (see
also the relation between similarity and utility in chapter 6).
• The evaluation should contain a statement about its
reliability, e.g.
– tolerances for errors
– error probabilities
– confidence intervals
• The reliability depends heavily on the input data
(volume, representability, correctness,noise, etc.)
- 36 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Sensitivity Analysis
• This is the most difficult and the most important phase.
• The evaluation is given as a function Ev(d1,....,dn) where
the di are data obtained by the measurement.
• The data di are on the other hand an indirect
consequence of parameters pi which can be directly
influenced by the person who designs the process (or
product etc.) which is evaluated:
– Ev(d1,...,dn) = Influence(p1,...,pm)
– where the function Influence is in general unknown.
• We call a parameter pi an (important) influence factor if
small variations of pi result in large variations of the
function influence(....,pi,...).
• The determination of influence factors is the basis for
learning improvements of object under consideration.
- 37 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
QMCB: Quantitative Models of
Consumer Behavior
• Goal: The calculation and prediction of meaningful
market diagnostics on the basis of data.
• A possible approach: Integration of statistical methods
and models as well as econometric models in a
knowledge based system.
• Tasks:
– Descriptive (a posterori) analysis of data
– Model based simulation of future buying behavior.
• The special types of task require special data
representations for useful evaluations.
- 38 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Different Types of Forecasts
• The types vary with respect to the knowledge they
contain and the usefulness of the prognosis. From the
QMCB one should be able to compute directly
(examples):
–
–
–
–
Market share of a product
Product purchase probability, expectation and variance
Brand purchase probability, expectation and variance
Heterogeneity in purchase rates
• Indirect consequences:
– relative product attraction
– relative brand attraction
– etc.
- 39 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Example System: KVASS (1)
• KVASS (KaufVerhaltensAnalyse und SimulationsSystem)
is an example of a model and knowledge based data
analysis system.
– Reference: R. Decker:Knowledge Based Selection and
Application of Quantitative Models of Consumer Behavior.
Information Systems and Data Analysis (ed. H.H.Bock, W.Lenski,
M.M.Richter), Springer Verlag 1994, p. 405-414.
• Basic idea: Model data with a predefined set of
descriptors. These are essentially attributes with there
domains, e.g.
– estimation method : {undefined, least squares, ...., moments}
– type of recording : {undefined, diary, ..., interview}
- 40 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Example System: KVASS (2)
• Classes of descriptors are:
– Essential aspects for a general description (type of recording,
market share etc.)
– Temporal aspects (periods for data collections etc.)
– Information on the models used for computation (e.g. estimation
method)
– Technical descriptors for interpretation of the representation
(e.g. ordinal, nominal etc.)
– Combination of descriptors allow to represent complex
situations; this can be translated in more understandable
relational representations (see chapter 4).
- 41 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Example System: KVASS (3)
• The system describes essentially a measurement
procedure, i.e. the first phase.
• The purpose is not to make an evaluation about the
success of a product or process of the company.
• The correctness condition is that the results provided by
the analysis of the system coincide with the reality.
• The results of the system are on the other hand important
for the sensitivity analysis concerning success or failure
of processes or products designed by the company.
- 42 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Causal Analysis (1)
• Causal analysis is some kind of sensitivity analysis.
Task: Make causal relations explicit.
• Suppose the Xi are activities and the Yi are sales
results. Notation:
– Xi
+ Yi : positive influence
– Xi
- Yi : negative influence
– no arow : neutral
• Initial situation: A suspected model for the influence.
• Either experiment: Variation of the Xi and measurement
of the Yi or analysis in several companies.
• Data analysis: E.g. by analyzing the covariance
structure.
• Result: Revised model and refined model.
- 43 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Causal Analysis (2)
• Example (artificially created):
–
–
–
–
–
X1: Effort in catalogues
X2: Effort in dynamic forms
X3 :Effort in recording and applying customer histories
Y1: Return from book sales
Y2: Return from high tech products sales
• Initial model based on qualitative knowledge:
+ Y1
X1
+
X3
+
+Y2
X2
+
- 44 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Causal Analysis (3)
Revised model:
X1
+ Y1
-
X3
X2
+
+ Y2 +
A possibility for coming to a refined quantitative model is to
assume a linear model (which may be justified by some knowledge).
This leads to the linear equations
The solutions for the
Y1 = a11X1 - a13X3
coefficients aik will
Y2 = a21X1 + a22X2 + a23X3
determine a quantitative
model.
- 45 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Quality Management: Internal Analysis
• As the first step the goals of the analysis have to be
defined:
– Where are the weak points ?
– What has to be improved or optimized ?
– Where are improvements possible ?
• This is part of the requirements analysis
• Further steps include
–
–
–
–
identify groups of objects with similar quality characteristics
identify properties of these groups
describe these groups
draw conclusions for quality improvements
- 46 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Example: Quality Analysis for Dialogues (1)
• Classification of Dialogues (evaluation of the user):
– succesfully finished
– quit because no adequate product available
– quit for unknown reasons : This is the failure class.
• Measurement:
– Has to collect data which arise during the dialogue
– These data may not be recorded during an ordinary dialogue,
e.g.
• Which questions raised by the customer where dealing with
a certain property type of the product
• Which actions where performed by customers from a certain
customer class
– The quality of the measured data has to be considered
- 47 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Quality Analysis for Dialogues (2)
• The evaluation is simple because it is the same as the
one of the user.
• The sensitivity analysis has two phases here:
– (1) Describe the evaluation result in terms of measured
quantities and determine the influence factors of this
description.
– (2) Describe the evaluation result in terms of factors which
define the dialogue.
• The first phases involves already a learning step:
– The classification of the dialogue in terms of measured
quantities has to be learned. This classification approximates
the real classes obtained from the evaluation.
- 48 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Quality Analysis for Dialogues (3)
• The analysis of the first phases is based on the dialogue
situations and additionally measured data.
• Typical candidates for interesting data in order to classify
types of situations are
•
•
•
•
length of the dialogue
not understandable terms
customer questions (How often? Typical ones?)
etc.
• The selection of these candidates depends on a
hypothesis for a preliminary dependency model. The
data mining and learning methods are used in order to
refine and correct this model.
- 49 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Quality Analysis for Dialogues (4)
• The result allows a prognosis of the dialogue class from
the occurrence of dialogue situations which are important
influence factors (but here in terms ob measured data!), in
particular a description of failure situations, i.e. situations
which lead with high probability to a failure dialogue.
• The description of the failure situations is refined in order
to
– discover dependencies between influence factors
– in particular to obtain definitions of earliest failure situations in
dialogues, i.e. earliest situations in the dialogue which will lead to
a failure.
• The earliest failure situations give rise to the second
phase of the sensitivity analysis.
- 50 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Quality Analysis for Dialogues (5)
• Second phase: Analysis of reasons for reaching
earliest failure situations, mainly:
• Which elements in the strategy are responsible?
• Weak points of the knowledge base (e.g. wrong prices for
products)?
• These reasons can directly be influenced when the
dialogue is designed.
• Consequences of the analysis (learned results):
– improved knowledge base
– Possible changes of the strategy
– Possible disadvantages of changes
• Final recommendations: Update
- 51 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Discussion
• The dialogue and the situations can be given in a
(possibly object oriented) attribute-value representation.
Some virtual attributes (like length of dialogues) can be
useful, they contain valuable knowledge.
• One way to proceed is to use cluster analysis techniques
and machine learning algorithms (e.g. CN2, C4.5) for
learning the classification.
• Another way is to consider the data base as a case base
and start with an initial similarity measure which is
improved during the development of a CBR-system for
the classification and the improvement suggestions.
- 52 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Learning Informal Concepts
• Many concepts in e-c, in particular in connection with
CRM and customer classes are of informal character
where no direct formal equivalent exists.
• Computer support requires a formal notion which
approximates the informal concept as good as possible.
• Such formal versions have to be learned and the learning
process requires data mining activities which are again
based on studies of customers and their behavior.
• It has to taken into account that informal concepts are
usually not stable over time.
- 53 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
The Correctness Problem
• The correctness problem for the statement that two
expressions are logically equivalent reduces to a formal
proof.
• How to “proof” that an informal and a formal concept are
equivalent ?
– Formal systems do not have access to informal notions.
– Humans have usually difficulties to compare both types of notions
because this refers to a broad scope of intended uses.
• Required is a kind of Turing Test which decides that a
human who uses the informal version and a machine which
uses the formal version refer to the same concept.
• The ordering principle is that the test does not deal with
the concept itself but with partial orderings related to the
concept.
- 54 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
The Ordering Principle and a Turing Test (1)
Suppose there is a partial ordering „<„
with the concept C associated: The partial ordering
then again has two versions: formal and informal.
The Turing test refers to these two versions of „<„ :
Formal
version
of C
Informal human
version of C
goal
- 55 -
The goal is that when
variations of the
arguments of < are
presented:
The human says „up“
if and only if the formal
system says „up“
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
The Ordering Principle and a Turing Test (2)
Concept to grasp: Typical lion
better
better
Human:
Aesthetic property
- 56 -
Formal
version uses
Ordering:
Quotient of
length/height
The partial ordering
approximates the
concept C in the
sense that
semantics of y < z
is : z is more
typical for C
than y is.
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
The Ordering Principle and a Turing Test (3)
• Advantages of the ordering principle:
– The validity of the equivalence of formal and informal
concepts can be effectively validated by Turing tests, i.e. by
experiments.
– If there are several orderings involved this can be done for
all of them.
– The search for a formal counterpart of an informal concept
can be performed in an approximative way and partial
validation is possible.
• The formal partial ordering is what has to be learned
• The learning process is an approximation process in
order to perform the Turing test sufficiently well.
- 57 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
The Learning Scenario
• (1)The informal concept C on a set U is regarded as a
fuzzy set where a set of prototypes P  U is known.
(2) An informal relation rx(y,z) stating “y is more similar
to x than z is”
• The object to be learned is a similarity measure
sim: U x P  [0,1].
• Turing test: The relations x (from the formal similarity
measure) and rx agree.
• We decompose the approach into two basic steps:
– A first step to get a suitable representation language :
Concept learning.
– A second step for learning the similarity measure:
Subsymbolic learning.
- 58 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Learning of Weights
• Learning similarities is an example of subsymbolic
learning and reduces often to learning weights:
We distinguish:
n
– global weights:
sim(q, c ) 
 w i  sim i (q i , ci )
i 1
– prototype specific weights:
wi,c: relevance matrix
sim(q, c ) 
n
 w i,c  sim i (q i ,ci )
i 1
• Change of weights: Change of relevance of features.
• Error function determined by Turing test.
• Learning procedures can be supervised or unsupervised.
- 59 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Learning of Weights with/without
Feedback
• Many algorithms for both learning types are known.
• Learning without Feedback for Retrieval / Reuse
– Use the the distribution of cases in the case base in order to
determine the relevance of attributes
+
A2
+
-
A1 is more important than A2
+
+
• Learning with Feedback
-
A1
– Correct or incorrect choice of cases / classification
– result leads to the change of weights
- 60 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Learning of Weights without Feedback
• Determination of class specific weights:
– Binary coding of the attributes by
• Discretizing of real valued attributes
• Transforming each symbolic attribute into n binary attributes
– Suppose
• wik the weight for attribute i for class k
• class(c) the class (solution) in case c
• ci the attribute i in case c
– Put: wik = P( class(c)=k | ci)
conditional probability that the class of a case is k under the
condition that the attribute i vorliegt is given.
– Estimation of the probabilities use samples of the case base.
- 61 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Learning of Weights with Feedback
• Correct or incorrect classification leads to a correction
of weights:
wik := wik + Dwik
• There are several ways for the adaptation of the
weights:
• Approach of Salzberg (1991) for binary attributes:
– Feedback = positive (i.e. correct classification):
• Weight for attributes with the same values increases
• Weight for attributes with different values decreases
• Feedback = negative (i.e. wrong classification):
• Weight for attributes with the same values decreases
• Weight for attributes with different values increases
• The increment Dwik remains constant.
- 62 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Summary
•
•
•
•
•
•
Relations between data mining and kdd.
Relations between data mining, learning and performance.
The way from data to knowledge.
Making knowledge explicit.
Collecting cases and building a CBR-system
Examples:
– Quantitative models of consumer behavior (external analysis)
– Causal analysis (external analysis)
– Quality analysis for dialogues (internal analysis)
• Learning of informal concepts can be reduced to learning
of similarity measures.
- 63 -
(c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern
Download