DATA WAREHOUSE AND INTEGRATION Lecturer: Nguyễn Văn Hồ, M.A. Data Warehouse and Integration: The Nature of Data & The Data Warehouse Nguyễn Văn Hồ, M.A. honv@uel.edu.vn Data Warehouse and Integration Agenda Introduction Definition of Data warehouse, Business Intelligence, and ETL Data Warehouse vs Operational System Evolving to a modern Data Warehousing Q&A Data Warehouse and Integration Why care about data? Data helps making decisions, a decision made without consider data is simply a guess Image source: http://blog.popcornmetrics.com/content/images/2015/03/blind-text2-min.png 5 Data Warehouse and Integration Insights from data Optimization Competitive advantage What’s the best that can happen? Predictive Modeling What will happen next? What if these trends continue? Why is this happening? Alerts Query Drilldown Ad-hoc Reports Forecasting Advanced Analytics (Predictive) (25% of usage) Statistical Analysis What actions are needed? What exactly is the problem? How many, how often, who and where? Standard Report What just happened? Traditional Analytics (Descriptive/Analysis) WHAT ISHAPPENING (75% of usage) Degree of intelligence 6 Source: Eight levels of analytics/SAS Data Warehouse and Integration Trusted Information CONSUMER BANKING DIVISION • • • • • • • Card center ATM Telesales Center Personal credit center Sales Management department E-Banking department Market research Sub-dep The problems are: - About 10 or 15 days needs to gather all reports - There are some conflict and mistakes. - The are many format of reports (PDF file, Word File, Tiff Files, Excel Files ,…) - Some reports highlight the good points and hide negative points. - etc. Image source: http://www.moneywalks.com/wp-content/uploads/2008/10/credit-report.jpg 7 Data Warehouse and Integration Data is every where, information nowhere I can't find data I need - data is scattered over many versions with subtle differences I can't understand the data I found - data is not well documented I can't use the data I found - data needs to be transformed from one form to other 8 Data Warehouse and Integration What are the users saying… • Data should be integrated across the enterprise • Summary data has a real value to the organization • Historical data holds the key to understanding data over time • What-if capabilities are required 9 Data Warehouse and Integration In What way I can Answer the above question? Is Data Warehousing is the Solution? Can I Improve my business using Data warehousing? Yes, How? Data Warehouse and Integration What is Data Warehouse? Is a central location where consolidated data from multiple locations are stored Is not loaded every time when new data is generated There are timelines determined by the business as to when a Data Warehouse needs to be loaded: Daily, monthly, once in a quarter etc. Source 1 Source 2 Source n User 1 Data Warehouse User 2 User n 11 Data Warehouse and Integration What is Data Warehouse? "A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision-making process." Subject oriented Integrated Data Warehouse Non Volatile Image source: google.com Time variant 12 Data Warehouse and Integration Subject-oriented Data is categorized and stored by business subject rater than by application OLTP Applications Data Warehouse Subject Equity Plans Shares Insurance Customer financial information Savings Loans 13 Data Warehouse and Integration Integrated Data on given subject is defined and stored once. Savings Current Account Loans OLTP Applications Data Warehouse 14 Data Warehouse and Integration Time-variant Data is stored as a series of snapshots, each representing of period of time... TIME DATA Jan-2016 January Feb-2016 February Mar-2016 March 15 Data Warehouse and Integration Non-Volatile Typically data in data warehouse is not updated or deleted Operational Warehouse Load Insert Update Delete Read Read 16 Data Warehouse and Integration Changing data First time load Operational databases Warehouse database Refresh Refresh 17 The Inmon Warehouse Data Warehouse and Integration What is Data Warehouse? "Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model." Image source: google.com 19 The Kimball Warehouse Data Warehouse and Integration Getting started with Choices Kimball - Will start with data marts - Focus on quick delivery to users Inmon - Will focus on the enterprise - Organizational focus How to Choose? Data Warehouse and Integration What is Business Intelligence (BI)? BI is a set of tools and techniques that enables analysis of information which improves business decisions Image source: http://www.vedamsoft.com/images/bi.jpg Data Warehouse and Integration Complete Spectrum of BI Technologies High PREDICTION What may happen? Predictive Analytics MONITORING What’s happening now? Complexit y Dashboard, Scorecard ANALYSIS Why did it happen? Cube, Visualization Utilities REPORTING Queries, Reporting & Search tools What happens? Low Business Value High Data Warehouse and Integration Data warehouse and BI Landscape 25 Image source: http://4.bp.blogspot.com/ Data Warehouse and Integration What is ETL? The process of gathering data from the production systems, cleansing it, validating it and moving it into the Data Warehouse. This process can be considered part of the Data Warehouse Infrastructure. ETL stand for • Data Extraction – get data from multiple, heterogeneous, and external sources • Data Transformation – convert data from legacy or host format to warehouse format • Data Loading – sort, summarize, consolidate, compute views, check integrity, and build indexes and partitions Data Warehouse and Integration What is ETL? Determine the data you need and the data you don't need in your target OLAP OLTP A G H O R C M K F C M L F T L R S T Interested Data Determine from where (Source) the above data is going to come from Determine the data extraction methods, Cleansing rules and transform rules Name varchar(50) FirstName char(20) LastName char(20) Joining FirstName & LastName Load the dimension data and the fact data 27 Data Warehouse and Integration Data Warehouses vs Operational Systems? • Goals • Structure • Size • Performance optimization • Technologies used