Business Intelligence Data Warehousing Concepts Inmon Agenda Who is Bill Inmon? Data Modelling Practical – Building & Populating a Database Practical Business Questions Benefits & Challenges 2 Who is Bill Inmon? Who is Bill Inmon? Data Modelling Practical – Building & Populating a Database Practical Business Questions Benefits & Challenges 3 Who is Bill Inmon? Forefather of Data Warehousing Believes that Data Warehousing should act across the Enterprise This is known as the “Top-Down” approach Believes that the Data Warehouse cannot be a “Big-Bang” exercise, rather the Data Warehouse is built iteratively 4 Top Down Approach • Looks at the organisation as a whole • Data Sources are collected and loaded into one central depository (Data Warehouse) • Data Marts are loaded from the central depository (Data Warehouse) Data Warehouse ETL Data Marts ETL Data Sources DATA MART: A subsection to the Data Warehouse tailored to the needs of a specific business need or process. 5 The Data Warehouse should be built iteratively. Day 1 6 • Review Existing Operational Systems (Functional Applications for a department) Day 2 • Beginnings of Data Warehouse (DW) • 1st Subject Area Day 3 • More Subjects are created As the days pass… Day 4 • Departmental level processing starts to take place, and departmental data becomes more widely used. Begin incorporating other departments. • Day n • 7 On Day n the Data Warehouse is fully developed, and all departments are using the Warehouse data. Data Modelling Who is Bill Inmon? Data Modelling Practical – Building & Populating a Database Practical Business Questions Benefits & Challenges 8 Data Modelling Conceptual 9 Logical Physical Conceptual Data Model Product Time 1st Draft of the Design High Level Scope Order Main Business Entities & the Connections Between Them Entity Relationship Diagram (ERD) Customer 10 Logical Data Model Product Product Category Category Product Product Name Name Product Product Name Name ID ID (PK) (PK) Product Product Name Name Product Product Category Category ID ID (PK) (PK) Product Product Name Name ID ID (FK) (FK) Product Product Category Category Product Product Price Price Unit Unit Price Price ID ID (PK) (PK) Unit Unit Price Price Time Time Date Date ID ID (PK) (PK) Date Date Order Order Product Product 2nd Stage of the Design Product Product ID ID (PK) (PK) Product Product Category Category ID ID (FK) (FK) Unit Unit Price Price ID ID (FK) (FK) More detailed representation of data Uses Business Language Order Order ID ID (PK) (PK) Date Date ID ID (FK) (FK) Customer Customer ID ID (FK) (FK) Product Product ID ID (FK) (FK) Sales Sales Amount Amount Customer Customer Customer Customer ID ID (PK) (PK) Name Name ID ID (FK) (FK) Location Location ID ID (FK) (FK) Adds Attributes to the Entities Adds Primary and Foreign Keys Independent of Technology Also Known as a ‘Mid-Level Diagram’ 11 Location Location Location Location ID ID (PK) (PK) Location Location Customer Name Name ID (PK) Name Physical Data Model Product Category Product Name Product Name ID (PK) INT Product Name INT Product Category ID (PK) INT Product Name ID (FK) INT Product Category nvarchar(50) Product Price Unit Price ID (PK) INT Unit Price Float Time Date ID (PK) INT Date DATE Order Product Final Stage of the Design Process Product ID (PK) INT Product Category ID (FK) INT Unit Price ID (FK) INT Order ID (PK) INT Date ID (FK) INT Customer ID (FK) INT Product ID (FK) INT Sales Amount Float Fully Normalised Customer Data Types Added Customer ID (PK) INT Name ID (FK) INT Location ID (FK) INT Many-to-Many Relationships Resolved Can Include Partitions and Indexes Location Location ID (PK) INT Location nvarchar(50) 12 Customer Name Name ID (PK) INT Name nvarchar(50) Different Data Types – Microsoft SQL Numbers int = integer decimal/numeric (precision,scale) = E.g. Total (9,2) , Maximum number is 9999999.99 float = Approximate Numeric Time date = Date datetime = Date and Time time = Time Strings Varchar(n) = Variable Length Character up to a maximum of 8000 Char(n) = Fixed Length Character up to a maximum of 8000 Nvarchar(n) = UNICODE Variable Length Character up to a maximum of 4000 Nchar(n) = UNICODE Fixed Length Character up to a maximum of 4000 13 Normal Forms Data Model Exercises Exercise 1 Exercise 2 30 Minutes 30 Minutes Create a Conceptual Data Model Based on Your Data Create a Logical Data Model (MidLevel Diagram) Based on Your Data Exercise 3 20 Minutes Create a Physical Data Model Based on Your Data 15 Practical – Building & Populating a Database Who is Bill Inmon? Data Modelling Practical – Building & Populating a Database Practical Business Questions Benefits & Challenges 16 Creating a Database in Microsoft SQL Creating a Database CREATE DATABASE Database Name Creating a Table CREATE TABLE Table Name ( Column 1 data type, Column 2 data type, Column n data type ) 17 Creating a Database in Microsoft SQL Creating a Primary Key Column Name data type PRIMARY KEY Creating a Foreign Key Column Name data type FOREIGN KEY REFERENCES Table Name(Column Name) 18 Database Creation Exercise Exercise 2 Hours Create your Database in Microsoft SQL Server HINT: The Order of Table Creation Must Start With the Tables With No Dependencies (Foreign Keys), and End With the Tables With the Most Dependencies) 19 Preparing Data Exercise Before importing your data into your new database, you will need to prepare your data Exercise 4 hours Use MS Excel to Prepare Your Data for Import 1. Create a New Worksheet to Represent Each Table. 2. In Excel, You Can Use Tools Like Pivot Tables, and Functions Like ‘Vlookup’ to aid population. 3. Ensure You Create a Surrogate (Primary) Key for Each Table. 20 Importing Exercise Exercise 1.5 hours Use SQL Server Import & Export Data Tool to Import Your Data to Your Database. HINT: The Same as Your Database Creation, You Must Import Data Starting With the Tables with No Dependencies First, and Finish With Tables With the Most Dependencies. 21 Practical Business Questions Who is Bill Inmon? Data Modelling Practical – Building & Populating a Database Practical Business Questions Benefits & Challenges 22 Business Questions Exercise Question 1 - What was the Total Profit made for each year? (By Ship Date) Question 2 - What was the Total Shipping cost for each Country and Year? Question 3 - What was the Total Profit for each Product Category in 2014? (Shipping Year) Question 4 - Which County made the Highest Monthly Profit, and what Year was it in? (By Ship Date) Question 5 - What was the total profit, and shipping cost for Office Supplies for each month and year, and product subcategory? (Include Product Category in your output) Question 6 - Which Product Sub Category within Office supplies made the highest profit in each month and year? HINT - Use the answer to Question 5 as a basis for this question Question 7- Which country made the highest total profit over a 5 month period 23 Benefits & Challenges Who is Bill Inmon? Data Modelling Practical – Building & Populating a Database Practical Business Questions Benefits & Challenges 24 Benefits & Challenges Enterprise wide view of data Easy to Maintain High Cost Long Initial Setup Time Specialist Skills Required Architected Environment Optimised for Performance 25 Questions / Comments 26 What We Have Covered Who is Bill Inmon? Data Modelling Practical – Building & Populating a Database Practical Business Questions Benefits & Challenges 27