Uploaded by chuuta

Data Warehousing Concepts - Inmon

advertisement
Business Intelligence
Data Warehousing Concepts
Inmon
Agenda
Who is Bill Inmon?
Data Modelling
Practical – Building & Populating a Database
Practical Business Questions
Benefits & Challenges
2
Who is Bill Inmon?
Who is Bill Inmon?
Data Modelling
Practical – Building & Populating a Database
Practical Business Questions
Benefits & Challenges
3
Who is Bill Inmon?
Forefather of Data Warehousing
Believes that Data Warehousing should act across the Enterprise
This is known as the “Top-Down” approach
Believes that the Data Warehouse cannot be a “Big-Bang” exercise, rather the
Data Warehouse is built iteratively
4
Top Down Approach
• Looks at the organisation as a whole
• Data Sources are collected and loaded into one central depository (Data Warehouse)
• Data Marts are loaded from the central depository (Data Warehouse)
Data
Warehouse
ETL
Data
Marts
ETL
Data Sources
DATA MART: A subsection to the Data Warehouse tailored to the needs of a specific business need or process.
5
The Data Warehouse should be built iteratively.
Day 1
6
• Review Existing Operational Systems
(Functional Applications for a department)
Day 2
• Beginnings of Data Warehouse (DW)
• 1st Subject Area
Day 3
• More Subjects are created
As the days pass…
Day 4
•
Departmental level processing starts to take place,
and departmental data becomes more widely used.
Begin incorporating other departments.
•
Day n
•
7
On Day n the Data Warehouse is fully developed,
and all departments are using the Warehouse data.
Data Modelling
Who is Bill Inmon?
Data Modelling
Practical – Building & Populating a Database
Practical Business Questions
Benefits & Challenges
8
Data Modelling
Conceptual
9
Logical
Physical
Conceptual Data Model
Product
Time
 1st Draft of the Design
 High Level Scope
Order
 Main Business Entities & the
Connections Between Them
 Entity Relationship Diagram
(ERD)
Customer
10
Logical Data Model
Product
Product Category
Category
Product
Product Name
Name
Product
Product Name
Name ID
ID (PK)
(PK)
Product
Product Name
Name
Product
Product Category
Category ID
ID (PK)
(PK)
Product
Product Name
Name ID
ID (FK)
(FK)
Product
Product Category
Category
Product
Product Price
Price
Unit
Unit Price
Price ID
ID (PK)
(PK)
Unit
Unit Price
Price
Time
Time
Date
Date ID
ID (PK)
(PK)
Date
Date
Order
Order
Product
Product
 2nd Stage of the Design
Product
Product ID
ID (PK)
(PK)
Product
Product Category
Category ID
ID (FK)
(FK)
Unit
Unit Price
Price ID
ID (FK)
(FK)
 More detailed representation of data
 Uses Business Language
Order
Order ID
ID (PK)
(PK)
Date
Date ID
ID (FK)
(FK)
Customer
Customer ID
ID (FK)
(FK)
Product
Product ID
ID (FK)
(FK)
Sales
Sales Amount
Amount
Customer
Customer
Customer
Customer ID
ID (PK)
(PK)
Name
Name ID
ID (FK)
(FK)
Location
Location ID
ID (FK)
(FK)
 Adds Attributes to the Entities
 Adds Primary and Foreign Keys
 Independent of Technology
 Also Known as a ‘Mid-Level Diagram’
11
Location
Location
Location
Location ID
ID (PK)
(PK)
Location
Location
Customer Name
Name ID (PK)
Name
Physical Data Model
Product Category
Product Name
Product Name ID (PK) INT
Product Name INT
Product Category ID (PK) INT
Product Name ID (FK) INT
Product Category nvarchar(50)
Product Price
Unit Price ID (PK) INT
Unit Price Float
Time
Date ID (PK) INT
Date DATE
Order
Product
 Final Stage of the Design Process
Product ID (PK) INT
Product Category ID (FK) INT
Unit Price ID (FK) INT
Order ID (PK) INT
Date ID (FK) INT
Customer ID (FK) INT
Product ID (FK) INT
Sales Amount Float
 Fully Normalised
Customer
 Data Types Added
Customer ID (PK) INT
Name ID (FK) INT
Location ID (FK) INT
 Many-to-Many Relationships Resolved
 Can Include Partitions and Indexes
Location
Location ID (PK) INT
Location nvarchar(50)
12
Customer Name
Name ID (PK) INT
Name nvarchar(50)
Different Data Types – Microsoft SQL
Numbers
int = integer
decimal/numeric (precision,scale) = E.g. Total (9,2) , Maximum number is 9999999.99
float = Approximate Numeric
Time
date = Date
datetime = Date and Time
time = Time
Strings
Varchar(n) = Variable Length Character up to a maximum of 8000
Char(n) = Fixed Length Character up to a maximum of 8000
Nvarchar(n) = UNICODE Variable Length Character up to a maximum of 4000
Nchar(n) = UNICODE Fixed Length Character up to a maximum of 4000
13
Normal Forms
Data Model Exercises
Exercise 1
Exercise 2
30 Minutes
30 Minutes
Create a Conceptual Data Model
Based on Your Data
Create a Logical Data Model (MidLevel Diagram) Based on Your Data
Exercise 3
20 Minutes
Create a Physical Data Model Based
on Your Data
15
Practical – Building & Populating a Database
Who is Bill Inmon?
Data Modelling
Practical – Building & Populating a Database
Practical Business Questions
Benefits & Challenges
16
Creating a Database in Microsoft SQL
Creating a Database
CREATE DATABASE Database Name
Creating a Table
CREATE TABLE Table Name
(
Column 1 data type,
Column 2 data type,
Column n data type
)
17
Creating a Database in Microsoft SQL
Creating a Primary Key
Column Name data type PRIMARY KEY
Creating a Foreign Key
Column Name data type FOREIGN KEY REFERENCES Table Name(Column Name)
18
Database Creation Exercise
Exercise
2 Hours
Create your Database in Microsoft SQL Server
HINT: The Order of Table Creation Must Start With the Tables With No
Dependencies (Foreign Keys), and End With the Tables With the Most
Dependencies)
19
Preparing Data Exercise
Before importing your data into your new database, you will need to prepare your data
Exercise
4 hours
Use MS Excel to Prepare Your Data for Import
1. Create a New Worksheet to Represent Each Table.
2. In Excel, You Can Use Tools Like Pivot Tables, and Functions
Like ‘Vlookup’ to aid population.
3. Ensure You Create a Surrogate (Primary) Key for Each Table.
20
Importing Exercise
Exercise
1.5 hours
Use SQL Server Import & Export Data Tool to Import Your Data to Your
Database.
HINT: The Same as Your Database Creation, You Must Import Data
Starting With the Tables with No Dependencies First, and Finish With
Tables With the Most Dependencies.
21
Practical Business Questions
Who is Bill Inmon?
Data Modelling
Practical – Building & Populating a Database
Practical Business Questions
Benefits & Challenges
22
Business Questions
Exercise
Question 1 - What was the Total Profit made for each year? (By Ship Date)
Question 2 - What was the Total Shipping cost for each Country and Year?
Question 3 - What was the Total Profit for each Product Category in 2014? (Shipping
Year)
Question 4 - Which County made the Highest Monthly Profit, and what Year was it in?
(By Ship Date)
Question 5 - What was the total profit, and shipping cost for Office Supplies for each
month and year, and product subcategory? (Include Product Category in your output)
Question 6 - Which Product Sub Category within Office supplies made the highest profit
in each month and year?
HINT - Use the answer to Question 5 as a basis for this question
Question 7- Which country made the highest total profit over a 5 month period
23
Benefits & Challenges
Who is Bill Inmon?
Data Modelling
Practical – Building & Populating a Database
Practical Business Questions
Benefits & Challenges
24
Benefits & Challenges
Enterprise wide
view of data
Easy to
Maintain
High Cost
Long Initial
Setup Time
Specialist Skills
Required
Architected
Environment
Optimised for
Performance
25
Questions / Comments
26
What We Have Covered
Who is Bill Inmon?
Data Modelling
Practical – Building & Populating a Database
Practical Business Questions
Benefits & Challenges
27
Download