Chapter 12 Databases for Online Analytical Processing Class 09: Chapter 12 1 OLTP vs OLAP • Operational Database: a database designed to support the day-to-day transactions of an organization • Data Warehouse: historical data is periodically trimmed from the operational database and moved to a database specifically designed for analysis – Term coined by Bill Inmon in early 1980s – Significant contributions by Ralph Kimball and others Class 09: Chapter 12 2 OLTP vs OLAP • Online Transaction Processing (OLTP): – High transaction volume – Each transaction uses relatively little data – Day-to-day activities; current data • Online Analytical Processing (OLAP): – Relatively few transactions – Each transaction uses large amounts of data – Historical data; analysis and decision-making Class 09: Chapter 12 3 1 Data Warehouses • Data Warehouse: A subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision-making process – – – – Organized around major subjects of the enterprise Integrated from multiple operational sources Only accurate across a known time period Not updated in real-time; new data added periodically (as often as needed) Class 09: Chapter 12 4 Benefits of Data Warehousing • Potential high returns on investment • Competitive advantage • Increased productivity of corporate decision-makers Class 09: Chapter 12 5 Challenges of Data Warehousing • • • • • • • • • Underestimation of resources for data loading Hidden integrity problems in source data Required data not captured Ever-increasing end-user demands Consolidating data from disparate sources High demand for resources Data ownership Difficulty in determining requirements “Big Bang” projects (complex, large scope) Class 09: Chapter 12 6 2 DW DBMS Requirements • • • • • • • • • Load performance Load processing Data quality management Query performance Terabyte/Petabyte scalability Networked or Cloud data warehouse Warehouse administration Integrated dimensional analysis Advanced query and analytics capability Class 09: Chapter 12 7 Data Warehouse Metadata • Primary purpose is to show the pathway back to where the data began • However, it has other functions that relate to data transformation, loading, DW management and query generation • Major integration issue is how to synchronize the various types of meta-data across multiple products: – Passing metadata from tool to tool – Using a metadata repository Class 09: Chapter 12 8 Administration and Management Tools • • • • • • • • • • • Monitoring data loading Data quality and integrity checks Managing and updating metadata Monitoring database performance Auditing the data warehouse Replicating, subsetting and distributing data Maintaining efficient data storage management Archiving and backing up data Purging data Implementing recovery after failure Security management Class 09: Chapter 12 9 3 Comparison of OLTP Systems and Data Warehouses Class 09: Chapter 12 10 DW Architecture: Summary Tables Class 09: Chapter 12 11 Star Schema Architecture Class 09: Chapter 12 12 4 Star Schema Variants • Snowflake Schema: A variant of the star schema where each dimension can have its own dimensions • Starflake (Hybrid) Schema: a hybrid structure that contains a mixture of (denormalized) star and (normalized) snowflake schemas Class 09: Chapter 12 13 Multi-Dimensional OLAP • Use multi-dimensional structures to store data and relationships with data. • Best visualized as cubes of data with cubes within cubes • Each side of cube is a dimension • Support for Analytical Operations: – Consolidation (aggregation of data) – Drill-down (reverse of aggregation) – Slicing and dicing (pivoting): look at data from different viewpoints Class 09: Chapter 12 14 Class 09: Chapter 12 15 5 Class 09: Chapter 12 16 Data Marts • Data Mart: a subset of a data warehouse that supports the requirements of a particular department or business function. – Limited scope – Not intended for operational reporting – Must less information than a data warehouse Class 09: Chapter 12 17 Reasons for Creating a Data Mart • • • • • Data tailored to department or function Lower cost than a full DW Lower risk project than a full DW Limited (usually 1) end user analysis tool Database placed physically near the department, reducing network delays Class 09: Chapter 12 18 6 Data Mart Issues • • • • • • Functionality Size Load performance User access to multiple data marts Administration Expansion and growth (may require reloads) Class 09: Chapter 12 19 Data Mart Approaches • Build enterprise DW to populate data marts – Data marts won’t be done if DW project stalls • Build several data marts and integrate later – Generally lower risk – Data marts may produce inconsistent results – Overall cost may be higher due to integration • Build DW and data marts simultaneously – Practically guarantees a never-ending project from hell Class 09: Chapter 12 20 Designing Data Warehouses • Must understand how data will be used • Star Schema: logical structure that has a fact table in the center surrounded by dimension tables (reference data) – Must identify core transactions in business Class 09: Chapter 12 21 7 Factors Influencing Fact Table Design • • • • • • • Required time period Statistical samples vs. detailed data Columns to omit Column size reduction Intelligent vs. dumb keys Optimal approach to account for time Partitioning of fact table Class 09: Chapter 12 22 Designing Dimension Tables • Identify shared (conformed) dimensions • Star schema vs. snowflake • Collapse vs. split hierarchies Class 09: Chapter 12 23 Typical MOLAP Architecture Class 09: Chapter 12 24 8 SQL Extensions • Augments SQL with operations appropriate to data analysis and decision-support applications such as: – – – – – Ranking Moving averages Comparisons (e.g. time period over time period) Market share Statistical functions (correlation, regression, etc.) Class 09: Chapter 12 25 Data Mining • Data Mining: The process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions – Tends to work from the data up – Normally requires large data volumes for accurate results Class 09: Chapter 12 26 Data Mining Techniques • Predictive Modeling – Classification: put records in predetermined classes – Value prediction: regression • Database Segmentation – Demographic clustering – Neural clustering • Link Analysis – Association discovery – Sequential pattern discovery – Similar time sequence discovery • Deviation Detection: identify outliers from the norm – Statistics – Visualization Class 09: Chapter 12 27 9 Data Integration Methods • ETL: Extract, Transform and Load. – Periodic (schedule) bulk process – Good for loading/refreshing data warehouses and data marts – Commercial packages (e.g. IBM [Ascential] Datastage or custom developed). – Common transformations are summarization, categorization, recoding – The target for the data is a centralized database Class 09: Chapter 12 28 ETL Class 09: Chapter 12 29 Data Integration Methods • EAI: Enterprise Application Integration – Framework of integrating data among disparate applications – Usually accomplished with push technology that is event-driven – Message queues are a common implementation method – The target for the data is an application Class 09: Chapter 12 30 10 EAI Class 09: Chapter 12 31 Data Integration Methods • EII: Enterprise Information Integration – Real-time integration of disparate data sources – As queries are run, data is gathered from the various sources to satisfy the request – The target for the data is a person Class 09: Chapter 12 32 EII Class 09: Chapter 12 33 11 Data Integration Methods • ODS: Operational Data Store – Similar to a Data Warehouse, but operational instead of historic – Assembles “one version of the truth” from multiple disparate data sources – Often loaded using ETL-like processes, but can be clearing house for updates as well – Often used as the source for OLAP databases Class 09: Chapter 12 34 ODS Class 09: Chapter 12 35 Next • Assignment 3 Walkthrough Class 09: Chapter 12 36 12 Class 09: Chapter 12 37 13