Document 15063091

advertisement
Course Name: Business Intelligence
Year: 2009
Knowledge Discovery and Data Mining
19th Meeting
Source of this Material
(2).
Loshin, David (2003). Business Intelligence:
The Savvy Manager’s Guide. Chapter 14
Bina Nusantara University
3
The Business Case
Data mining fills a niche in the BI arena where the data consumer is not
necessarily sure what to be looking for. The kinds of knowledge that are
discovered may be directed toward a specific goal or not, but the methods of
data mining are driven by finding pattern in the data that reflect more
meaningful bits of knowledge. It would be unwise to engage in building a BI
program and ignore the promise of a data mining component.
Bina Nusantara University
4
Data Mining and The Data Warehouse
Knowledge discovery is a process that requires a lot of data, and that data
needs to be in a reliable state before it can be subjected to the data mining
process.
The accumulation of enterprise data within a data warehouse that has been
properly validated, cleaned, and integrated provides the best source of data
that can be subjected to knowledge discovery.
Not only is the warehouse likely to incorporate the breadth of data needed for
this component of the BI process, it probably contains the historical data
needed. Because a lot of data mining relies on using one set of data for
training a process that can be tested on another set of data, having the
historical information available for testing and evaluating hypothesis makes the
warehouse even more valuable.
Bina Nusantara University
5
The Virtuous Cycle
As Berry and Linoff state in their book Data Mining Technique, the process of
mining data can be describe as a virtuous cycle. The virtue is based on the
continuous improvement of a business process that is driven by the discovery
of actionable knowledge and taking the actions prescribed by these
discoveries.
• Identity The Business Problem
One of the more difficult tasks is identifying the business problem that needs to be
solved. Very often, other aspects of the BI program can feed into this process. Other
kinds of business problems are actually part of the general business cycle.
•
Mine The Data For Actionable Information
Depending on the problem, there are a number of different data mining techniques
that can be used to look for actionable knowledge. But no matter what techniques are
used, the process is to assemble the right set of information, prepare that information
for mining, apply the algorithms, and analyze the results to find some knowledge that
is actionable.
Bina Nusantara University
6
The Virtuous Cycle (cont…)
•
Take The Action
The next logical step is to take the actions suggested by the discoveries during the
data mining process. Keep track of which actions were taken, because that leads into
the next stage.
•
Measure Results
The importance of measuring the results of the actions taken is that it refines the
process of addressing the original business problem. The goal here is to look at what
the expected response was to the specific actions and to determine the quality of
each action.
Bina Nusantara University
7
Directed Versus Undirected Knowledge Discovery
There are two different approaches to knowledge discovery. The first is when
we already have the problem we want to solve and are applying the data
mining methods to discover the relationship between the variables under
scrutiny in terms of the other available variables. This called directed
knowledge discovery, as opposed to undirected knowledge discovery.
Undirected knowledge discovery is the process of using data mining techniques
to find interesting patterns within a data set as a way to highlight some
potentially interesting issue. This approach is more likely to be used to
recognize behavior or relationships, whereas directed knowledge discovery is
used primarily to explain or describe those relationships once they have been
found.
Bina Nusantara University
8
Six Basic Tasks of Data Mining
•
Classification
A frequent data mining task is classification, which involves examining the attributes
of a particular object and assigning it to a defined class. Classification can be used to
divide a customer base into best, mediocre, and low value customers.
•
Estimation
Estimation is a process of assigning some continuously valued numeric value to an
object. Estimation can be used as part of the classification process (such as using an
estimation model to guess a person’s annual salary as part of a market segmentation
process). A value of estimation is that because a value is being assigned to some
continuous variable, the resulting assignments can be ranked by score.
•
Prediction
The subtle difference between prediction and the previous two tasks is that prediction
is the attempt to classify objects according to some expected future behavior.
Classification and estimation can be used for the purposes of prediction by using
historical data, where the classification is already known, to build a model (this is
called training). That model can then be applied to new data to predict future
behavior.
Bina Nusantara University
9
Six Basic Tasks of Data Mining (cont…)
•
Affinity Grouping
Affinity grouping is a process of evaluating relationships or associations between data
elements that demonstrate some kind of affinity between objects.
•
Clustering
Clustering is the task of taking a large collection of objects and dividing them into
smaller groups of objects that exhibit some similarity. The difference between
clustering and classification is that during the clustering task, the classes are not
defined beforehand. Clustering can be used in concert with other data mining tasks
as a way of identifying a business problem area to be further explored.
•
Description
The last of the size task is description, which is the process of trying to characterize
what has been discovered or trying to explain the results of the data mining process.
Being able to describe a behavior or a business rule is another step toward an
effective intelligence program that can indentify knowledge, articulate it, and then
evaluate actions that can be taken.
Bina Nusantara University
10
Data Mining Technique
Although there are a number of technique used for data mining, this section
enumerates some techniques that are frequently used as well as some
examples of how each technique is used.
• Market Basket Analysis
Market basket analysis is the process of clustering objects to look for groups of
objects that frequently appear together. Market basket analysis is a good way to look
for items that appear together or a set of discrete events that take place in a
particular sequence.
•
Memory-Based Reasoning
Memory-based reasoning (MBR) is a process of using one data set to create a model
from which prediction or assumptions can be made about newly introduced objects.
There are two basic components to an MBR method. The first is the similarity
(sometimes called distance) function, which measures how similar the members of
any pair of object are to each other. The second is the combination function, which is
used to combine the results from the set of neighbors to arrive at a decision.
Bina Nusantara University
11
Data Mining Technique (cont…)
•
Cluster Detection
There are two approaches to clustering. The first approach is to assume that a
certain number of clusters are already embedded in the data; the goal is to break the
data up into that number of clusters. In the other approach, called agglomerative
clustering, instead of assuming the existence of any specific predetermined number
of clusters, every item starts out in its own cluster, and an iterative process attempts
to merge clusters, again though a process of computing similarity.
•
Link Analysis
Link analysis is the process of looking for and establishing links between objects
within a data set as well as characterizing the weight associated with any link
between two objects. Link analysis is useful for analytical applications that rely on
graph theory for drawing conclusions. Another analytical area for which link analysis
is useful is process optimization.
•
Rule Induction
Part of the knowledge discovery process is the identification of business rules that
are embedded within data. The methods associated with rule induction are used for
this discovery process. One approach to rule discovery is the use of decision tree.
Bina Nusantara University
12
Data Mining Technique (cont..)
Another approach to rule induction is the discovery of association rules. Association
rules specify a relation between attributes that appears more frequently than
expected if the attributes were independent.
•
Neural Networks
A neural network is an attempt to represent the model of a human brain as a
collection of individual neurons connected within a network. A neural network
essentially captures a set of statistical operations embodied as the application of a
weighted combination function applied to all inputs to a neuron to compute a single
output value that is then propagated to other neuron within the networks.
Bina Nusantara University
13
Management Issues
Knowledge discovery and data mining are very valuable components of the BI
program. To maintain the high value of the knowledge discovery process, keep
the following management issues in mind.
• Buy Versus Build
To determine what kinds of data mining techniques are most appropriate for the
business problems that arise within your organization and then to buy the tools that
support those techniques and hire experienced engineers to work with those tools.
•
Data Preparation
One issue that can destroy the effectiveness of any data mining activity is using data
that has not been properly prepared for the task at hand.
•
Understanding The Results
Some of the techniques described in this chapter are better suited for understanding
results than others. Remember: To successfully draw conclusions from the results of
data mining, you should have a good understanding of the data.
Bina Nusantara University
14
Management Issues (cont…)
•
Managing Business Client Expectations
Remember that data mining is an exploratory process and that sometimes what we
discover during an exploration is that there is nothing to discover. Data mining can be
a powerful value-adding technology, it does not always provide the expected magic
bullet solution to all the problems.
•
Remember The Virtuous Cycle
The data mining and knowledge discovery process is a virtuous cycle, and the
process will not have as much value if your do not identify actions to take, actually
take those actions, and the measure the results. Determining which techniques
provide the best insight into a business problem and figuring out the best ways to
exploit discovered knowledge are the critical components to data mining success.
Bina Nusantara University
15
End of Slide
Bina Nusantara University
16
Download