data mining - Faculty Web Hosting

This report is an introduction to what has come to known as data mining and knowledge discovery in databases.
The contents in this report are presented from a database perspective, where emphasis is placed on basic data
mining concepts and techniques for uncovering interesting data patterns hidden in large data sets. In this paper,
I will show you how data mining is part of natural evolution of database technology, why data mining is
important, and it is defined. I will also introduce you the general architecture of data mining system, as well as
gain insight into the kinds of data on which mining can be performed, the types of patterns that can be found
and how to tell which patterns represent useful knowledge.
2. Table of Contents
Objectives and Motivation……………………………………………………
a. What is Data Mining?…………………………………………………
b. What are the steps to Data Mining?………………………………….
c. Why Data Mining is important? …………………………………..
d. Architecture of a typical Data Mining System………………………
e. Data Mining task and functionalities………………………………...
f. Classification of Data Mining System………………………………..
Since 1960s, database and information technology has been evolving systematically from primitive file
processing systems to sophisticated and powerful database systems. The research and development in database
system systems since 1970s has progressed from hierarchical and network database systems to the relational
database system, data modeling tools and indexing and data organization techniques. Since mid 1980s database
technology is characterized by adoption of relational technology and an upsurge of research and development
on new and powerful database systems. These employ advanced data models such as extended-relational,
object-oriented, object-relational and deductive models. Data can be store in many different types of databases.
One database architecture is data warehouse that include analytical technique OLAP (On-Line Analytical
Processing) to view information from different angels. Though OLAP support analysis and decision making,
additional tools are required for in – depth analysis, such as data classification, clustering and characterization
of data changes over time. Data mining tools perform data analysis and uncover important data patterns,
contributing to business strategies, knowledge base, and scientific and medical research.
Objectives and Motivation
The abundance of data, coupled with the need for powerful data analysis tools, has been described as a data rich
but information poor situation. The fast growing, tremendous amount of data, collected and stored in large and
numerous databases, has far exceeded our human ability for comprehension without powerful tools. As a result,
data collected in large database become “data tombs” – data achieves that are seldom visited.
Consequently, important decisions are often made based not on the information rich data stored in database but
rather on a decision maker’s intuition, simply because the decision maker doesn’t have the tools to extract the
valuable knowledge embedded in the vast amount of data. The widening gap between data and information calls
for a systematic development of data mining tools that will turn data tombs into “golden nuggets” of
What is Data Mining?
Simply stated, data mining refers to extracting or mining knowledge from large amount of data. In broad view,
we can say that data mining is the process of discovering interesting knowledge from large amount of data
stored either in database, data warehouse or other information repository.
There are many other terms carrying similar meaning to data mining such as knowledge mining from database,
knowledge extraction, pattern analysis, data archeology and data dredging. Also many people treat data mining
as a synonym for knowledge discovery in databases or KDD.
What are the steps to Data Mining?
The following figure shows the steps involved in process of data mining.
Data cleaning: - During this step noisy and inconsistent are removed.
Data integration: - During this step multiple data sources are combined.
Data selection: - During this step data relevant to the analysis task are retrieved from the database.
Data transformation: - During this step data are transformed into forms appropriate for mining by
performing summary operation.
Data mining: - This is an essential process where intelligent methods are applied in order to extract
data patterns.
Pattern evaluation: - This step is required to identify truly interesting patterns representing knowledge
based on some interesting measures.
Knowledge presentation: - During this step visualization and knowledge representation techniques are
used to present the mined knowledge to the user.
c. Why Data Mining is important?
Although data has been around, it has been appear it has been on paper and in many cases in the minds of
people. So, organizing data was still a problem. Then with computers and databases we started storing the data
in computerized file and databases. Now we have large quantities of data computerized. The data could be in
files, relational databases, multimedia databases and even on the World Wide Web. Businesses are suddenly
realizing that the data they have been collecting for the past 15-20 years can give them an immense competitive
edge. Due to the client – server paradigm, data warehousing technology and currently available immense
desktop computing power, it has become very easy for and end user to look at stored data from all sort of
perspectives and extract valuable business intelligence.
Data mining is being used to perform market segmentation to launch new products and services as well as to
match existing product and services to customers’ need. In banking, health care, and insurance industry, data
mining is being used to detect fraudulent behavior by tracking spending and claims patterns. Data mining has
become a reality now, and is currently favorite in the hands of decision makers as it can provide valuable hidden
business intelligence from historical data.
Architecture of a typical Data Mining System
The architecture of a typical data mining system has the following major components.
 Database, data warehouse, or other information repository: - This is one or a set of databases, data
warehouses, spreadsheets or other kinds of information repositories. Data cleaning and data integration
techniques may be performed on the data.
 Database or data warehouse server: - The database or data warehouse server is responsible for fetching
relevant data, based on the user’s data mining request.
 Knowledge base: - This is the domain knowledge that is used to guide the search or evaluate the
interestingness of resulting patterns. Such knowledge can include concept hierarchies to organize
attribute values into different levels of abstraction.
 Data mining engine: - This is essential to the data mining system and consists of a set of functional
modules for tasks such as characterization, association, classification, cluster analysis and evolution
and deviation analysis.
Graphical user
Pattern evaluation
Data cleaning
Data integration
Data mining engine
Database server
Pattern evaluation module: - This module typically employs interestingness measures and interacts
with data mining modules to focus the search towards interesting patterns.
Graphical user interface: - This module communicates between and the data mining system, allowing
the user to interact with the system by specifying a data mining query or task.
Data Mining task and functionalities
Data mining functionalities are used to specify the kind of patterns to be found in data mining task. We can
classify data mining task into two categories: descriptive and predictive. Descriptive mining tasks characterize
the general properties of data in the database. Predictive mining task perform inference on the current data in
order to make predictions. Data mining functionalities and kinds of patterns they discover are described below.
 Characterization and Discrimination: - Data characterization is a summary of the general
characteristics or futures of a target class of data. The data corresponding to the user specified class are
typically collected by database query. Data discrimination is a comparison of the general features of
target class object with the general features of objects from one or a set of contrasting classes.
Association Analysis: - Association analysis is the discovery of association rules showing attributesvalue conditions that occur frequently together in a given set of data. This kind of analysis is widely
used for market base or transaction data analysis.
Classification and Prediction: - Classification is the process of finding a set of functions that describe
and distinguish data classes for purpose of being able to us the model to predict the class of objects
whose class label is unknown. Prediction refers to predicting some numerical missing or unavailable
data values.
Cluster Analysis: - Cluster analysis refers to analyzing the data objects without consulting a known
class label.
Outlier Analysis: - Outlier analysis refers to analysis of data objects in databases that do not comply
with the general behavior of the data. The analysis of outlier data is referred to as outlier mining and is
useful in fraud detection.
Evolution Analysis: - Data evolution analysis describes and models regularities or trends for object
whose behavior changes over time. This includes time-series data analysis, sequence or periodicity
pattern matching and similarity based data analysis.
Classification of Data Mining System
Data mining is an interdisciplinary field, the confluence of a set of disciplines as shown in figure below.
n science
Data Mining
Data mining systems can be categorized according to various criteria, as follows,
 Classification according to kinds of database mined: - A data mining system can be classified
according to the kinds of database mined. For instance, we may have relational, transactional, objectoriented, object-relational, spatial, time series database, multimedia database or data warehouse
mining system.
 Classification according to the kinds of knowledge mined: - A data mining system can be classified
according to the kinds of knowledge they mine, that is based on data mining functionalities such as
characterization, discrimination, association, classification, clustering, outlier analysis, and evolution
 Classification according to the kinds of techniques used: - A data mining system can be classified
according to the data mining techniques used. For example, database oriented or data warehouse
oriented techniques, machine learning, statistics, visualization, pattern recognition, neural networks
and so on.
 Classification according to the applications adapted: - A data mining system can be classified
according to the applications they adapt. For example, there could be data mining systems specifically
for finance, telecommunication, DNA, stock markets, e-mail and so on.
Data Mining is used to extract knowledge from large amount of historical data. Data Mining has become a
reality now and is used to make critical business decisions. There are so many advantages to use method. In this
report, I have just introduced you to Data Mining. There are still a lot more detail to Data Mining that I will
cover in my final report. Also, over the past decade numerous prototype data mining tools have emerged from
various universities and research laboratories all over the world, and some of these tools are now commercially
[This web site is the most comprehensive resource on the Internet for SOA. It has White papers,
Articles, Presentation and other material on SOA.]
[It List the common facts on SOA and it has the frequently asked question database]
[Articles from 10 companies’ implementation of SOA, with why, how they have chosen SOA]
[This provide the information about the Future of SOA]
[6] Eric A. Marks, Michael Bell, Service-Oriented Architecture (SOA): A Planning and
Implementation Guide for Business and Technology Wiley, April 2006
[This book provides practical guidance across the entire SOA life cycle from business prospective to
technical metrics.]
Pictures and diagrams:
[ It list the five characteristics of SOA and provide the interactive detailed diagram]
Back to TOP