DATA MINING 1. Abstract This report is an introduction to what has come to known as data mining and knowledge discovery in databases. The contents in this report are presented from a database perspective, where emphasis is placed on basic data mining concepts and techniques for uncovering interesting data patterns hidden in large data sets. In this paper, I will show you how data mining is part of natural evolution of database technology, why data mining is important, and it is defined. I will also introduce you the general architecture of data mining system, as well as gain insight into the kinds of data on which mining can be performed, the types of patterns that can be found and how to tell which patterns represent useful knowledge. 2. Table of Contents Introduction…………………………………………………………………… Objectives and Motivation…………………………………………………… Outline…………………………………………………………………………. a. What is Data Mining?………………………………………………… b. What are the steps to Data Mining?…………………………………. c. Why Data Mining is important? ………………………………….. d. Architecture of a typical Data Mining System……………………… e. Data Mining task and functionalities………………………………... f. Classification of Data Mining System……………………………….. 4 5 6 6 7 8 9 11 12 Conclusion………………………………………………………………………… References…………………………………………………………………………. 13 14 3. Introduction Since 1960s, database and information technology has been evolving systematically from primitive file processing systems to sophisticated and powerful database systems. The research and development in database system systems since 1970s has progressed from hierarchical and network database systems to the relational database system, data modeling tools and indexing and data organization techniques. Since mid 1980s database technology is characterized by adoption of relational technology and an upsurge of research and development on new and powerful database systems. These employ advanced data models such as extended-relational, object-oriented, object-relational and deductive models. Data can be store in many different types of databases. One database architecture is data warehouse that include analytical technique OLAP (On-Line Analytical Processing) to view information from different angels. Though OLAP support analysis and decision making, additional tools are required for in – depth analysis, such as data classification, clustering and characterization of data changes over time. Data mining tools perform data analysis and uncover important data patterns, contributing to business strategies, knowledge base, and scientific and medical research. 4. Objectives and Motivation The abundance of data, coupled with the need for powerful data analysis tools, has been described as a data rich but information poor situation. The fast growing, tremendous amount of data, collected and stored in large and numerous databases, has far exceeded our human ability for comprehension without powerful tools. As a result, data collected in large database become “data tombs” – data achieves that are seldom visited. Consequently, important decisions are often made based not on the information rich data stored in database but rather on a decision maker’s intuition, simply because the decision maker doesn’t have the tools to extract the valuable knowledge embedded in the vast amount of data. The widening gap between data and information calls for a systematic development of data mining tools that will turn data tombs into “golden nuggets” of knowledge. 5. Outline a. What is Data Mining? Simply stated, data mining refers to extracting or mining knowledge from large amount of data. In broad view, we can say that data mining is the process of discovering interesting knowledge from large amount of data stored either in database, data warehouse or other information repository. There are many other terms carrying similar meaning to data mining such as knowledge mining from database, knowledge extraction, pattern analysis, data archeology and data dredging. Also many people treat data mining as a synonym for knowledge discovery in databases or KDD. b. What are the steps to Data Mining? The following figure shows the steps involved in process of data mining. Data cleaning: - During this step noisy and inconsistent are removed. Data integration: - During this step multiple data sources are combined. Data selection: - During this step data relevant to the analysis task are retrieved from the database. Data transformation: - During this step data are transformed into forms appropriate for mining by performing summary operation. Data mining: - This is an essential process where intelligent methods are applied in order to extract data patterns. Pattern evaluation: - This step is required to identify truly interesting patterns representing knowledge based on some interesting measures. Knowledge presentation: - During this step visualization and knowledge representation techniques are used to present the mined knowledge to the user. c. Why Data Mining is important? Although data has been around, it has been appear it has been on paper and in many cases in the minds of people. So, organizing data was still a problem. Then with computers and databases we started storing the data in computerized file and databases. Now we have large quantities of data computerized. The data could be in files, relational databases, multimedia databases and even on the World Wide Web. Businesses are suddenly realizing that the data they have been collecting for the past 15-20 years can give them an immense competitive edge. Due to the client – server paradigm, data warehousing technology and currently available immense desktop computing power, it has become very easy for and end user to look at stored data from all sort of perspectives and extract valuable business intelligence. Data mining is being used to perform market segmentation to launch new products and services as well as to match existing product and services to customers’ need. In banking, health care, and insurance industry, data mining is being used to detect fraudulent behavior by tracking spending and claims patterns. Data mining has become a reality now, and is currently favorite in the hands of decision makers as it can provide valuable hidden business intelligence from historical data. d. Architecture of a typical Data Mining System The architecture of a typical data mining system has the following major components. Database, data warehouse, or other information repository: - This is one or a set of databases, data warehouses, spreadsheets or other kinds of information repositories. Data cleaning and data integration techniques may be performed on the data. Database or data warehouse server: - The database or data warehouse server is responsible for fetching relevant data, based on the user’s data mining request. Knowledge base: - This is the domain knowledge that is used to guide the search or evaluate the interestingness of resulting patterns. Such knowledge can include concept hierarchies to organize attribute values into different levels of abstraction. Data mining engine: - This is essential to the data mining system and consists of a set of functional modules for tasks such as characterization, association, classification, cluster analysis and evolution and deviation analysis. Graphical user interface Pattern evaluation Data cleaning Data integration Data mining engine Database server Filtering Database e. Knowledge base Data warehouse Pattern evaluation module: - This module typically employs interestingness measures and interacts with data mining modules to focus the search towards interesting patterns. Graphical user interface: - This module communicates between and the data mining system, allowing the user to interact with the system by specifying a data mining query or task. Data Mining task and functionalities Data mining functionalities are used to specify the kind of patterns to be found in data mining task. We can classify data mining task into two categories: descriptive and predictive. Descriptive mining tasks characterize the general properties of data in the database. Predictive mining task perform inference on the current data in order to make predictions. Data mining functionalities and kinds of patterns they discover are described below. Characterization and Discrimination: - Data characterization is a summary of the general characteristics or futures of a target class of data. The data corresponding to the user specified class are typically collected by database query. Data discrimination is a comparison of the general features of target class object with the general features of objects from one or a set of contrasting classes. f. Association Analysis: - Association analysis is the discovery of association rules showing attributesvalue conditions that occur frequently together in a given set of data. This kind of analysis is widely used for market base or transaction data analysis. Classification and Prediction: - Classification is the process of finding a set of functions that describe and distinguish data classes for purpose of being able to us the model to predict the class of objects whose class label is unknown. Prediction refers to predicting some numerical missing or unavailable data values. Cluster Analysis: - Cluster analysis refers to analyzing the data objects without consulting a known class label. Outlier Analysis: - Outlier analysis refers to analysis of data objects in databases that do not comply with the general behavior of the data. The analysis of outlier data is referred to as outlier mining and is useful in fraud detection. Evolution Analysis: - Data evolution analysis describes and models regularities or trends for object whose behavior changes over time. This includes time-series data analysis, sequence or periodicity pattern matching and similarity based data analysis. Classification of Data Mining System Data mining is an interdisciplinary field, the confluence of a set of disciplines as shown in figure below. Database technolo Informatio gy n science Visualization Statistics Data Mining Machine learning Other discipline Data mining systems can be categorized according to various criteria, as follows, Classification according to kinds of database mined: - A data mining system can be classified according to the kinds of database mined. For instance, we may have relational, transactional, objectoriented, object-relational, spatial, time series database, multimedia database or data warehouse mining system. Classification according to the kinds of knowledge mined: - A data mining system can be classified according to the kinds of knowledge they mine, that is based on data mining functionalities such as characterization, discrimination, association, classification, clustering, outlier analysis, and evolution analysis. Classification according to the kinds of techniques used: - A data mining system can be classified according to the data mining techniques used. For example, database oriented or data warehouse oriented techniques, machine learning, statistics, visualization, pattern recognition, neural networks and so on. Classification according to the applications adapted: - A data mining system can be classified according to the applications they adapt. For example, there could be data mining systems specifically for finance, telecommunication, DNA, stock markets, e-mail and so on. 6. Conclusion Data Mining is used to extract knowledge from large amount of historical data. Data Mining has become a reality now and is used to make critical business decisions. There are so many advantages to use method. In this report, I have just introduced you to Data Mining. There are still a lot more detail to Data Mining that I will cover in my final report. Also, over the past decade numerous prototype data mining tools have emerged from various universities and research laboratories all over the world, and some of these tools are now commercially available. REFERENCES ***** [1] http://www.soainstitute.org [This web site is the most comprehensive resource on the Internet for SOA. It has White papers, Articles, Presentation and other material on SOA.] [2] http://www.microsoft.com/biztalk/solutions/soa/soafaq.mspx ** [It List the common facts on SOA and it has the frequently asked question database] [4] http://blogs.zdnet.com/service-oriented/?p=781 *** [Articles from 10 companies’ implementation of SOA, with why, how they have chosen SOA] [5] http://www.itbusinessedge.com/item/?ci=9146 *** [This provide the information about the Future of SOA] *** [6] Eric A. Marks, Michael Bell, Service-Oriented Architecture (SOA): A Planning and Implementation Guide for Business and Technology Wiley, April 2006 [This book provides practical guidance across the entire SOA life cycle from business prospective to technical metrics.] *** Pictures and diagrams: [3]http://www.systinet.com/soa_explained/why_soa_matters [ It list the five characteristics of SOA and provide the interactive detailed diagram] Back to TOP