09. Data Warehouse (DW) & On-line Analytic Processing (OLAP) Rev: Feb, 2013 Euiho (David) Suh, Ph.D. POSTECH Strategic Management of Information and Technology Laboratory (POSMIT: http://posmit.postech.ac.kr) Dept. of Industrial & Management Engineering POSTECH Contents 1 2 3 Data Warehouse 1) Introduction of Data Warehouse 2) Concepts for Data Warehouse 3) Difficulties and Trends On-line Analytic Processing (OLAP) 1) Introduction of OLAP 2) Concepts for OLAP Case Study 1. Data Warehouse Definition of Data Warehouse 1) Introduction of Data Warehouse ■ Data Warehouse Integrated A data warehouse is a Non-volatile collection of data in support of management’s decisions Time variant – Stores static data that has been extracted from other databases in an organization – Central source of data that has been cleaned, transformed, and cataloged – Data is used for data mining, analytical processing, analysis, research, decision support Scattered Information Cleaned Data Warehouse Query & Distribute to End User 100 Customer 50 Cost 0 Bond Sales HR Finance 3 1. Data Warehouse Data Warehouse Architecture 1) Introduction of Data Warehouse ■ Data Warehouse architecture Source Data SQL Data Warehouse Data Mart Enterprise server Workgroup server External file Query, Reporting tool SQL OLAP tool SQL SQL OLTP System SQL EIS/DSS Application RDB SQL Datamining Application SQL Slice/Dice Back up file Infra, Data integration and Administration * Building the Data Warehouse Web browser MDB Application development, Data access & Use *Use of Data Warehouse 4 1. Data Warehouse Data Warehouse Architecture 1) Introduction of Data Warehouse ■ Technical architecture for a data warehousing system source data Data Acquisition Component Data Manager Component warehouse data Information Directory Component Design Component Data Delivery Component Middleware Component warehouse metadata external metadata Management Component 5 external data Data Access Component 1. Data Warehouse Introduction of Database 2) Concepts for Data Warehouse ■ Definition of database – Integrated collection of logically related data elements ■ Common Database Structures (Types) – Hierarchical • • • – Network • • – Most widely used structure Data elements are stored in tables Row represents a record; column is a field Can relate data in one file with data in another, if both files share a common data element Multidimensional • • • • – Used in some mainframe DBMS packages Many-to-many relationships Relational • • • • – Early DBMS structure Records arranged in tree-like structure Relationships are one-to-many Relational Structure Variation of relational model Uses multidimensional structures to organize data Data elements are viewed as being in cubes Popular for analytical databases that support Online Analytical Processing (OLAP) Object-Oriented • • Store data together with the appropriate methods for accessing it i.e. encapsulation Information is represented in the form of objects as used in object-oriented programming 6 Object-Oriented Structure 1. Data Warehouse Metadata and Data Marts 2) Concepts for Data Warehouse ■ Metadata – Data about data (similar to catalog card in library) – Define the data in the data warehouse – Enable to find the data in data warehouse, more easily and fast ■ Data Marts – Collection of database – Comparing with Data Warehouse, data marts are usually smaller and focus on a particular subject or department. – Data marts are subsets of larger Data Warehouse ■ Data Warehouse vs. Data Mart – Data in Data Warehouse • The data needs to be gathered from all the relevant transactional systems that produce it, cleansed and validated, and made available from a system-of-record that ensures the referential integrity of the data – Data in Data Mart • The data needs to be presented in a structure that is intuitive to the users and facilitates their ability to query the data that is relevant to their needs 7 1. Data Warehouse Information Flow 2) Concepts for Data Warehouse ■ Data Warehouse built on top of DB Data Marts Finance Management Reporting Accounting Sales Marketing 8 1. Data Warehouse Data Warehouse Components 2) Concepts for Data Warehouse ■ Data Warehouse Components 9 1. Data Warehouse Applications and Data Marts 2) Concepts for Data Warehouse ■ Applications and Data Marts 10 1. Data Warehouse Difficulties in implementing DW 3) Difficulties and Trends ■ Complete Alignment – Make sure you have full involvement and buy -in from those that represent your users - the consumers of your data warehouse. ■ Iterative & Frequent Update – Consider all aspects of the process of researching your data sources, capturing and transmitting that data to the data warehouse, transforming and loading it into the data warehouse and accounting for its lineage. ■ Risk – Make sure you develop a proper risk management plan. 11 1. Data Warehouse Future Trends 3) Difficulties and Trends ■ Enterprise Data Warehouse – The enterprise data warehouse, whether a single store or integrated data marts across a variety of platforms, yields a view of the operation previously unattainable by Don Hatcher, SAS ■ Real-time – Organization move to more real-time data transformation and seek to better leverage common metadata across applications by Allan Houpt, CA ■ Capacity – The future of data warehousing is all about ever larger data warehouses - in fact I just read about a U.S. Government effort to create petabyte repositories by Roman Bukary, SAP Director of Market Strategy 12 2. OLAP Definition of OLAP 1) Introduction of OLAP ■ OLAP (On-Line Analytical Processing) – The dynamic enterprise analysis required to create, manipulate, animate and synthesis information from Enterprise Data Models * Providing OLAP: An IT Mandate E.F. Codd (1993) – FASMI (Fast Analysis of Shared Multidimensional Information) • This definition was first used in early 1995, and has not needed revision since Pendse & Greeth (1995) FAST ANALYSIS SHARED MULTIDIMENSIONAL INFORMATION 13 2. OLAP OLAP Architecture 1) Introduction of OLAP ■ OLAP Architecture 14 2. OLAP From OLTP to OLAP 2) Concepts for OLAP ■ Data used in OLAP – Sales data of June? (OLTP) – Multi-dimensional data (having many features) (OLAP) ■ Direct Access: EUC Environment Information Source Information Broker Information Consumer ■ From What to Why – OLTP: Storing primitive data, supporting routine business operation (What) – OLAP: Storing cumulative data, supporting business goal (Why) 15 2. OLAP OLTP vs. OLAP 2) Concepts for OLAP ■ OLTP vs. OLAP OLTP OLAP Definition On-Line Transaction Processing On-Line Analytical Processing Objective Operational Analytical Focus Daily repetitious work Decision support in organization Developer Computer expert End-user User Simple operator Special analyst Storing Current value Summarized and Consolidated data Use Repetitive Unstructured Response Immediate Delayed Data Updated Summarized Update Field Recomputation Amount of Data Small Much Data Structure Complex Simple Database RDB MDB Data period Past, Current Past, Current, Future Query type Regular Irregular, Analytical 16 2. OLAP Enterprise IT Architecture 2) Concepts for OLAP ■ OLTP/OLAP Enterprise IT Architecture 17 2. OLAP Data Warehouse vs. OLAP Server 2) Concepts for OLAP ■ Data Warehouse vs. OLAP Server Data Warehouse OLAP Server Objective Ready to all kinds of retrieval Specialized retrieval Characteristics Data Storage Computation Engine Query Type Read only Read/Write Response Flexible Consistent, rapid Content Historical, present Historical, present, Future Data Structure Plain Multi-dimensional Amount of Data Huge, much detail Much, detail Development period A few month, yrs A few weeks, months 18 2. OLAP Two types of OLAP 2) Concepts for OLAP ■ MOLAP Query MDBMS MD Processing Clients Respond ■ ROLAP SQL RDBMS SQL Query MD Processing 19 Respond Clients 2. OLAP From RDB to MDB 2) Concepts for OLAP ■ Basic Data Structure of MDB & RDB Cube Table Field, Row Dimension Record, Column – RDB: OLTP, Data Warehouse Hierarchy – MDB: OLAP ■ RDB as OLAP Server – Cannot handle and represent Multi-dimensional relationship well – Cannot summarize data well ■ MDB as OLAP Server – – – Gives many managerial viewpoints EUC Supports analysis functionality 20 Reference ■ Euiho Suh, “EIS_DSS_OLAP_DW (PPT Slide)”, POSMIT Lab. (POSTECH Strategic Management of Information and Technology Laboratory) ■ O’Brien & Marakas, “Introduction to Information Systems – Sixteenth Edition”, McGraw – Hill, Chapter 5 21