Matakuliah Tahun : M0264/Manajemen Basis Data : 2008 Manajemen Basis Data Pertemuan 8 Objectives • Introduction to Data Warehousing (Pengenalan Data Warehouse) • Introduction to OLAP (Pengenalan OLAP) • Introduction to Data Mining (Pengenalan Data Mining) Bina Nusantara Introduction to Data Warehousing • To begin a data warehouse project, need to find answers for questions such as: – Which user requirements are most important and which data should be considered first? – Should project be scaled down into something more manageable? – Should infrastructure for a scaled down project be capable of ultimately delivering a full-scale enterprise-wide data warehouse? Bina Nusantara Introduction to Data Warehousing • For many enterprises, the way to avoid the complexities associated with designing a data warehouse is to start by building one or more data marts. • Data marts allow designers to build something that is far simpler and achievable for a specific group of users. Bina Nusantara Introduction to Data Warehousing • Few designers are willing to commit to an enterprisewide design that must meet all user requirements at one time. • Despite the interim solution of building data marts, goal remains same: i.e., the ultimate creation of a data warehouse that supports the requirements of the enterprise. Bina Nusantara Introduction to Data Warehousing • Requirements collection and analysis stage of a data warehouse project involves interviewing appropriate members of staff (such as marketing users, finance users, and sales users) to enable identification of prioritized set of requirements that data warehouse must meet. • At same time, interviews are conducted with members of staff responsible for operational systems to identify which data sources can provide clean, valid, and consistent data that will remain supported over next few years. Bina Nusantara Introduction to Data Warehousing • Interviews provide the necessary information for the topdown view (user requirements) and the bottom-up view (which data sources are available) of the data warehouse. • The database component of a data warehouse is described using a technique called dimensionality modeling. Bina Nusantara Introduction to Data Warehousing Database Design Methodology for Data Warehouses • Nine-Step Methodology’ includes following steps: – – – – – – – – – Bina Nusantara Choosing the process Choosing the grain Identifying and conforming the dimensions Choosing the facts Storing pre-calculations in the fact table Rounding out the dimension tables Choosing the duration of the database Tracking slowly changing dimensions Deciding the query priorities and the query modes. Introduction to OLAP • The dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data, Codd (1993). • Describes a technology that uses a multi-dimensional view of aggregate data to provide quick access to strategic information for purposes of advanced analysis. Bina Nusantara Introduction to OLAP • Enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to a wide variety of possible views of the data. • Allows users to view corporate data in such a way that it is a better model of the true dimensionality of the enterprise. Bina Nusantara Introduction to OLAP • Can easily answer ‘who?’ and ‘what?’ questions, however, ability to answer ‘what if?’ and ‘why?’ type questions distinguishes OLAP from general-purpose query tools. • Types of analysis ranges from basic navigation and browsing (slicing and dicing) to calculations, to more complex analyses such as time series and complex modeling. Bina Nusantara Introduction to Data Mining • The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions (Simoudis, 1996). • Involves analysis of data and use of software techniques for finding hidden and unexpected patterns and relationships in sets of data. Bina Nusantara Introduction to Data Mining • Reveals information that is hidden and unexpected, as little value in finding patterns and relationships that are already intuitive. • Patterns and relationships are identified by examining the underlying rules and features in the data. • Tends to work from the data up and most accurate results normally require large volumes of data to deliver reliable conclusions. Bina Nusantara Introduction to Data Mining • Starts by developing an optimal representation of structure of sample data, during which time knowledge is acquired and extended to larger sets of data. • Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing. • Relatively new technology, however already used in a number of industries. Bina Nusantara Introduction to Data Mining Data Mining Operations • Four main operations include: – – – – Predictive modeling. Database segmentation. Link analysis. Deviation detection. • There are recognized associations between the applications and the corresponding operations. – e.g. Direct marketing strategies use database segmentation. Bina Nusantara