When to use Data Mining

advertisement
When to use Data Mining
Introduction
• An important question that should be answered before you
commence any data mining project is whether data mining
techniques are, in fact necessary.
• In determining this it is important to understand what level
of sophistication of data mining is required. For instance,
do you just need a few standardized printed reports or do
you need interactive ROI analysis or OLAP analysis to see
what your data looks like?
• Do you need or true data mining techniques that build
predictive models to search through your database for
useful patterns?
The Data Mining Process
What all Data Mining techniques
have in common
• Each Data Mining algorithm has the following in
common:
– Model Structure. The structure that defines the model
(Is it a tree, a neural network, or a neighbor?)
– Search. How does the algorithm amend and modify the
model over time as more data is made available
– Validation. When does the algorithm terminate because
it has created a valid model?
What all Data Mining techniques
have in common (cont’d)
Data Mining in the Business Process
• When Data Mining is used for non-exploratory
reasons or whenever supervised learning
techniques are used, this customer reaction
provide a fairly well-defined target column within
the database, which relates to the business process.
The target must have the following attributes in
order to be successful with data mining:
– The target has value
– The target is actionable
– The effect of action can be captured
Data Mining in the Business Process (cont’d)
Avoiding some big mistakes in Data Mining
• The technology-centered view of the data
mining process emphasizes getting the
model right, with the assumption that the
predictive product has been well-defined
and that the data that has been captured to
date is well understood.
• This is not always the case.
Three measures for Data Mining Tools
• Accuracy. The data mining tool must produce a model that
is as accurate as possible.
• Explanation. The data mining tool needs to be able to
‘explain’ how the model works to the end user in a clear
way
• Integration. The data mining tool must integrate with the
current business process, and data and information flow in
the company.
• When these three requirements are well met, the data
mining tools will produce highly profitable models that are
likely to remain stable over long periods of time.
Embedded Data Mining for business
How to measure Accuracy,
Explanation, and Integration
• Measuring Accuracy:
–
–
–
–
–
–
Accuracy
Error rate
Error rate at rejection
Mean squared error
Lift
Profit/ROI
How to measure Accuracy,
Explanation, and Integration
• Measuring Explanation:
– Automated rule generation
– OLAP integration
– Model validation
• Measuring Integrity
–
–
–
–
–
–
–
Proprietary data extracts
Metadata
Predictor preprocessing
Predictor/prediction types
Dirty data
Missing values
Scalability
What the Future holds for
Embedded Data Mining
• Once the data mining process becomes easy enough to use
and is seamlessly integrated into business process and the
general data and information flow around the enterprise,
there will be new applications and synergies that will make
data mining an even more critical requirement for any fully
functioning data warehouse
– Use data mining to improve the multidimensional database
– Use data mining to improve the data warehouse structure
– Multidimensional databases and summary data will enhance data
mining performance. The more data, the better any data mining
technique is
Download