Data Warehousing 1 Instructor Mr H.S. Mluba Email:mluba.h@gmail.com 2 Course Description This course involves the understanding of various concept of data warehouse which containing historical data. Such data warehouse are created for analytical purposes (including the use of the tools of data mining and knowledge discovery), and storage. 3 Course Objectives A. To enable students to understand the basic concepts of data warehouse theory, design and implementation; B. To understand various application of data warehouses in comparison with transactional databases; C. To enable student to understand how can use data warehouse for knowledge extraction. 4 Data Warehousing COURSE CONTENTS Introduction to data warehouse Trends in data warehousing Data warehouse :The building blocks Data pre-processing: Introduction Data pre-processing: Data cleaning Data pre-processing: Data transformation 5 Data Warehousing COURSE CONTENTS Data pre-processing: Dimensionality reduction Data warehouse environment-Structure of the data warehouse Data warehouse environment-Granularity Data warehouse environment-Structuring data in data warehouse 6 Data Warehousing COURSE CONTENTS Data Extraction, Transformation and Loading –INTRODUCTION Data Extraction, Transformation and Loading – TRANSFORMATION Data Extraction, Transformation and LoadingData Loading Online Analytical Processing (OLAP) 7 Data Warehousing COURSE CONTENTS Multi dimensional OLAP (MOLAP) Relational OLAP (ROLAP) Dimensional Modeling Data cube and OLAP technology in multidimensional database Efficient processing of OLAP queries. 8 Data Warehousing COURSE CONTENTS Quality factors of data warehouse and its evaluation. Supporting Decision Making Data quality management in data warehouse Introduction to web warehousing OLAP and the web, Building a web-enabled data warehouse 9 Data Warehousing COURSE CONTENTS Data warehouse deployment: Deployment activities Data warehouse deployment: Security Issues, Backup and Recovery Introduction to data mining 10 Data Warehousing COURSE CONTENTS Selected Concept on using data warehouse for knowledge discovery (Mining of Association rule). Selected Concept on using data warehouse for knowledge discovery (ClassificationIntroduction) 11 Data Warehousing COURSE CONTENTS Selected Concept on using data warehouse for knowledge discovery (Classification-Bayesian classification and rule based classification). Selected Concept on using data warehouse for knowledge discovery (Classification-Lazy learner, Prediction, Accuracy and Error Measure). 12 Reference/Text Books: 1. Inmon W. H.: Building the Data Warehouse, Wiley &Sons, 2005. 2. Paulraj Ponniah. Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals. Copyright © 2001 John Wiley & Sons, Inc. 3. Jiawei H., Micheline K., Data Mining Concepts and Techniques, Morgan Kaufmann Publishers, 2001. 13 Reference/Text Books: 4. Ralph Kimball and Margy Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition). John Wiley and Sons, 2002 ISBN: 0471-20024-7 5. Humphries Hawkins Dy. Data warehousing architecture and implementation (First Edition). Prentice Hall PTR 14 Prerequisite Introduction to Database Systems 15 Data Warehousing MODULE 1- INTRODUCTION 16 Instructor Mr H.S. Mluba Email:mluba.h@gmail.com 17 Module 1. Introduction to Data warehouse Evolution of Data warehouse Concept of data warehouse Goals of data warehouse Data warehouse application 18 Evolution Of Data Warehouse Since the 1970s, organizations have gained competitive advantage through automation of business processes to offer more efficient and cost-effective services to customers This resulted in accumulation of growing amounts of data in operational databases Organizations now focus on ways to use operational data to support decision-making, as a means of gaining competitive advantage However, operational systems were never designed to support such business activities 19 Module 1. Introduction to Data warehouse Evolution of Data warehouse Concept of data warehouse Goals of data warehouse Data warehouse application 20 What is a Data Warehouse? A copy of transaction data, specifically structured for query and analysis” —Ralph Kimball “A data warehouse is a simple, complete and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context” —IBM. 21 What is a Data Warehouse? Defined in many different ways, but not rigorously. A decision support database that is maintained separately from the organization’s operational database Support information processing by providing a solid platform of consolidated, historical data for analysis. “A data warehouse is a subject-oriented, integrated, time- variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon 22 What is a Data Warehouse? Data warehousing: The process of constructing and using data warehouses 23 Data Warehouse—Subject-Oriented Organized around major subjects, such as customer, product, sales Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process 24 Data Warehouse - Subject-Oriented (cont…) Data is categorised and stored in the DW by type rather than by Application Operational Systems Manufacturing Accounting Order entry Operational data is organised by specific processes or tasks Data Warehouse Customer Vendor Product Warehoused data is organised by subject area and draws from data residing in many operational systems 25 Data Warehouse—Integrated Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted. 26 Data Warehouse – Integrated (cont…) •Built separately •Built over time •Integrated from start •Built at same time Operational Environment Savings Database Data Warehouse Database Savings Application No Application Flavour Current Accounts Database Current Accounts Application Personal Loans Database Subject = Customer Personal Loans Application Customer data stored in several Databases Example: Banking Institution 27 Data Warehouse—Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems Operational database: current value data Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) 28 Data Warehouse—Time Variant Every key structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain “time element” 29 Data Warehouse—Nonvolatile A physically separate store of data transformed from the operational environment 30 Data Warehouse - Non-Volatile (cont…) Insert Read Update Operational Application Insert Load Delete Read Operational Application Update Delete Data Warehouse Read Only End Users 31 Data Warehouse—Nonvolatile Operational update of data does not occur in the data warehouse environment Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: initial loading of data and access of data 32 Why a Separate Data Warehouse? High performance for both systems • • DBMS— tuned for OLTP: access methods, indexing, concurrency control, recovery Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view, consolidation 33 Why a Separate Data Warehouse? Different functions and different data: • missing data: Decision support requires historical data which operational DBs do not typically maintain data consolidation: Decision support requires consolidation (aggregation, summarization) of data from heterogeneous sources. data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled. 34 The Traditional Approach Query-driven (lazy, on-demand) Clients Integration System ... Wrapper Wrapper Source Source Metadata Wrapper ... Source 35 Disadvantages of Query-Driven Approach Delay in query processing Slow or unavailable information sources Complex filtering and integration Inefficient and potentially expensive for frequent queries Competes with local processing at sources Hasn’t caught on in industry 36 The Warehousing Approach Data Warehouse Integration System Extractor/ Monitor Source Extractor/ Monitor Source ... Metadata Extractor/ Monitor ... Source 37 The Warehousing Approach Information integrated in advance Stored in warehouse for direct querying and analysis 38 Advantages of Warehousing Approach High query performance – But not necessarily most current information Doesn’t interfere with local processing at sources – Complex queries at warehouse – OLTP at information sources Information copied at warehouse – Can modify, annotate, summarize, restructure, etc. – Can store historical information – Security, no auditing 39 Difference between Data warehouse and Operational Database Operational Data base (OLTP) Data warehouse (OLAP) It involves day to day processing It involves historical processing of information. OLTP systems are used by clerks, DBAs, or database professionals. OLAP systems are used by knowledge workers such as executives, managers, and analysts. It is used to run the business. It is used to analyze the business. It focuses on Data in. It focuses on Information out. It is based on Entity Relationship Model. It is based on Star Schema, Snowflake Schema, and Fact Constellation Schema. It is application oriented. It focuses on Information out. It contains current data. It contains historical data. 40 Difference between Data warehouse and Operational Database Operational Data base (OLTP) Data warehouse (OLAP) It provides primitive and highly detailed data. It provides summarized and consolidated data. It provides detailed and flat relational view of data. It provides summarized and multidimensional view of data. The number of users is in thousands. The number of users is in hundreds. The number of records accessed is in tens. The number of records accessed is in millions. The database size is from 100 MB to The database size is from 100GB to 100 GB. 100 TB. 41 Module 1. Introduction to Data warehouse Evolution of Data warehouse Concept of data warehouse Goals of data warehouse Data warehouse application 42 Goals of Data Warehouse Serving as the foundation for improved decision making. – It must have the right data in it to support decision making. • Decision is the true output from facts given by data warehouse. 43 When is Data Warehouse Not Appropriate When the Operational System are not ready. – The data warehouse is populated with information primarily from the operational systems of the enterprise. A good indicator of operational system readiness is the amount of IT effort focused on operational systems. – A number of telltale signs indicate a lack of readiness • Many new operational systems are planned for development or are in the process of being deployed. • Many of the operational systems are legacy applications that require much firefighting. • Many of the operational systems require major enhancements and must be overhauled. 44 When is Data Warehouse Not Appropriate When the need is operational Integration. – Despite its ability to provide integrated data for decisional information needs, a data warehouse does not in any way contribute to meeting the operational information needs of the enterprise. They do not integrate data quickly enough or often enough for operational management purposes. – If the enterprise needs operational integration, then the typical data warehouse deployment is insufficient. 45 Module 1. Introduction to Data warehouse Evolution of Data warehouse Concept of data warehouse Goals of data warehouse Data warehouse application 46 Data Warehouse Application The successful implementation of data warehousing technologies creates new possibilities for enterprises. Applications that previously were not feasible due to the lack of integrated data are now possible. There are different types of enterprises that implement data warehouses and the types of applications that they have deployed. Warehousing is categorized in different applications into the following types and tasks. 47 Types of Warehousing Application Sales and Marketing – Performance trend analysis: Since a data warehouse is designed to store historical data, it is an ideal technology for analyzing performance trends within an organization. – Cross-selling: By obtaining a clearer picture of customers and the services that they avail themselves of, the enterprise can identify opportunities for cross-selling additional products and services to existing customers. – Customer profiling and target marketing: Internal enterprise data can be integrated with census and demographic data to analyze and derive customer profiles. – Promotions and product bundling: The data warehouse allows enterprises to analyze their customers' purchasing histories as an input to promotions and product bundling. 48 Types of Warehousing Application Financial analysis and Management – Risk analysis and management: Integrated warehouse data allow enterprises to analyze their risk exposure. – Profitability analysis: If operating costs and revenues are tracked or allocated at a sufficiently detailed level in operational systems, a data warehouse can be used for profitability analysis. 49 Types of Warehousing Application Customer care and services – Customer relationship management : Warehouse data can also be used as the basis for managing the enterprise's relationships with its many customers. Customers will be far from pleased if different groups in the same enterprise ask them for the same information more than once. Customers appreciate enterprises that never forget special instructions, preferences, or requests. 50