Building and Implementing Integrated Data Models

Building and Implementing
Integrated Data Models
Nancy Wills, Director, Access, Query and Data Mgmt
Ralph Hollinshead, Manager, Solutions Data Integration
Copyright © 2004, SAS Institute Inc. All rights reserved.
Overview
Part One: Building an Integrated Data Model
Part Two: Deploying and Scaling the Data
Architecture
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS® Banking Intelligence Solutions
Framework
New
Solutions
X Sell
Up sell
Customer
Retention
Marketing
Automation
Strategic
Performance
Management
Credit
Scoring
Credit Risk
Banking Intelligence Architecture
INTEGRATED EXTENDABLE ARCHITECTURE
FOCUSED ON BUSINESS ISSUES
BASED ON EXPERIENCE
Copyright © 2004, SAS Institute Inc. All rights reserved.
Enterprise
Source
Systems
Independent Solutions
Extract and
Cleanse Files
Solution Data Marts
Solutions
SAS® Credit Risk Management
SAS® Cross-Sell and Up-Sell for Banking
SAS® Customer Retention for Banking
SAS® Credit Scoring for Banking
Copyright © 2004, SAS Institute Inc. All rights reserved.
Integrated Data Model: Not All Customers
are the Same
 Customer A: No Data Warehouse
• Interested Multiple SAS Solutions
 Customer B: With Data Warehouse
• Adverse to Data Replication Issues
 Customer C: With Data Warehouse
• No Data Marts allowed – Active Data Warehousing Approach
Copyright © 2004, SAS Institute Inc. All rights reserved.
Customer A: Full SAS Data Architecture
Enterprise
Source
Systems
Solution Data Marts
Extract and
Cleanse Files
SAS Banking
Detail Data Store
Solution
s
SAS® Credit Risk Management
2
1
2
SAS® Cross-Sell and
Up-Sell for Banking
SAS® Customer Retention
for Banking
Flexible Options to Meet
Customer Needs!
SAS® Credit Scoring for Banking
Copyright © 2004, SAS Institute Inc. All rights reserved.
Customer B: Partial SAS Data Architecture
Enterprise
Source
Systems
Solution Data Marts
Extract and
Cleanse Files
Customer Enterprise
Data Warehouse
Solution
s
SAS® Credit Risk Management
2
1
2
SAS® Cross-Sell and
Up-Sell for Banking
SAS® Customer Retention
for Banking
Flexible Options to Meet
Customer Needs!
SAS® Credit Scoring for Banking
Copyright © 2004, SAS Institute Inc. All rights reserved.
Customer C: Customer Data Architecture
Enterprise
Source
Systems
Solution
s
Extract and
Cleanse Files
Customer Enterprise
Data Warehouse
SAS® Marketing
Automation
Information Maps
Copyright © 2004, SAS Institute Inc. All rights reserved.
Scorecard for Data Architecture Approach
Data Management Issue
Score
Sensitivity to Data Replication
-0-5
Sensitivity to H/W processor and storage budget
-0-5
Existing warehouse quality
-0-5
Implementation time constraints
-0-5
Intentions to implement >1 SAS solution
+0-5
Historical data requirements
+0-5
Score
Decision
-25
No DDS. Marts only if absolutely necessary. Information maps may be
appropriate.
0
Use DDS to persist current extract from source systems. Marts hold multiple
extracts up to full history.
+25
Implement full warehouse, persist history in DDS and as much as wanted in
the marts.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Techniques for Data Model Integration
 Detail Data Store
• Varying Industries
• General Standards
• Warehousing Techniques
 Data Marts
• Approach Compared to DDS
Copyright © 2004, SAS Institute Inc. All rights reserved.
Integrating Models at the Industry Level
Telco
Banking
- Accounts
- Account Transactions, etc. Customer
Supplier
Employee
GL
Account
Product
etc.
Insurance
- Premiums
- Claims
- Benefits, etc.
Copyright © 2004, SAS Institute Inc. All rights reserved.
- Subscriptions
- Equipment
- Networks
-Calls, etc.
Detail Data Store
Standards Needed for Integration
 Data Types / Lengths / Classifier Codes
 Naming Conventions
 Standards for Data Structures
• Hierarchies
• Subtypes
• Reference Data
Copyright © 2004, SAS Institute Inc. All rights reserved.
Data Administration Standards
Domain
Data
Type
Width
Applicable Class
Codes
Comment/Example
Identifier
Varchar
32
ID
Typically the identifier from the source system.
Small Code
Varchar
3
CD
Short length codes such as ADDRESS_TYPE_CD
Medium Code
Varchar
10
CD
Medium length codes such as EXCHANGE_SYMBOL_CD
Large Code
Varchar
20
CD
Long length codes such as POSTAL_CD
Standard Count Code
Numeric
6
CNT
Standard counts such as AUTHORIZED_USERS_CNT
Name
Varchar
40
NM
Proper name. For example, LAST_NM, FIRST_NM, etc.
Short Length Text
Varchar
20
TXT
Short freeform text.
Medium Length Text
Varchar
100
TXT, DESC
Longer freeform text and descriptions associated with code
tables.
Indicator Field
Character
1
FLG
Binary indicatory flag (Y or N).
Surrogate Key
Numeric
10
RK, SK
Generated surrogate keys.
Currency Amount
Numeric
18,5
AMT
Standard currency amount.
Rates and
Percentages
Numeric
9,4
PCT, RT
For example, exchange rates.
DateTime
Date
DT, DTTM
Accommodate dates as well as date/time.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Detail Data Store: Data Warehousing Standards
Surrogate Keys, Point-in-Time, and Rapidly Changing Data
CUSTOMER
CUSTOMER_RK
VALID_FROM_DT
VALID_TO_DT
ACCOUNT_RK
MARITAL_STATUS_CD
FIRST_NM
LAST_NM
100
01JAN1999
29FEB2000
201
S
John
Smith
100
01MAR2000
31DEC4747
201
M
John
Smith
FINANCIAL_ACCOUNT
ACCOUNT_RK
VALID_FROM_DT
VALID_TO_DT
CUSTOMER_RK
FINANCIAL_ACCOUNT_TYPE_CD
OPEN_DT
201
01JAN1999
31DEC4747
100
SAVINGS
01JAN2000
FINANCIAL_ACCOUNT_CHNG
ACCOUNT_RK
VALID_FROM_DT
VALID_TO_DT
BALANCE_AMT
CURRENCY_CD
201
01JAN1999
31JAN1999
2500.75
USD
201
1FEB1999
28FEB1999
4300.25
USD
Copyright © 2004, SAS Institute Inc. All rights reserved.
Conformed Dimensions
Copyright © 2004, SAS Institute Inc. All rights reserved.
Tools: Extending Models
CUSTOMER
INTERNAL_ORG_ASSOC
SUPPLIER
INTERNAL_ORG
EXTERNAL_ORG
COMPETITORS
Copyright © 2004, SAS Institute Inc. All rights reserved.
INTERNAL_ORG_ASSOC_TYPE
Change Analysis Tool
Copyright © 2004, SAS Institute Inc. All rights reserved.
Deploying the Integrated Data Architecture
Copyright © 2004, SAS Institute Inc. All rights reserved.
Option A: Full SAS Data Architecture
Enterprise
Source
Systems
Solution Data Marts
Extract and
Cleanse Files
SAS Banking
Detail Data Store
Solution
s
SAS® Credit Risk Management
2
1
2
SAS® Cross-Sell and
Up-Sell for Banking
SAS® Customer Retention
for Banking
Flexible Options to Meet
Customer Needs!
SAS® Credit Scoring for Banking
Copyright © 2004, SAS Institute Inc. All rights reserved.
Populate DDS and Data Mart
Banking Data Mart
Source Data
Excel
SAS
SAP
Oracle
PeopleSoft
Data Warehouse
DDS
Step 1 - Extract
cleanse and
transform from
source data into flat
file
Flat File
Step 2 – ETL processing to
load data warehouse
•data validation
•key creation
•slowly changing dimensions
Copyright © 2004, SAS Institute Inc. All rights reserved.
Step 3 - Transform into
data mart model
Deployment Focus
Scalability and Performance
 ETL flows
 Physical data model
Copyright © 2004, SAS Institute Inc. All rights reserved.
Deployment
What did We do?







Create and Generate Data
Deploy Hardware and Software
Populate DDS
Populate Data Mart
Analyze ETL Flows
Analyze DDS Model
Change Management
Copyright © 2004, SAS Institute Inc. All rights reserved.
It All Starts with Data




Bought and Built Data Generators
Built Simulated Data
Applied Business Rules
Scaled - 5 gig -> 50 gig -> 500 gig -> 1TB
Copyright © 2004, SAS Institute Inc. All rights reserved.
Deploy Hardware and Software
 Choose Software Components
• SAS for the DDS or Data Warehouse
• Databases for the DDS or Data Warehouse
• SAS for the Data Marts
 Install and Configure SAS Software
 Configure Hardware
 Design for Progressive Larger Deployment
Growth
Copyright © 2004, SAS Institute Inc. All rights reserved.
Windows Server
*Dell PowerEdge
1600SC
Windows 2003
DualHyper-threaded
2.8 Ghz processors
4 GB RAM
4 internal IDE drives
60 GB C drive
275 GB D drive
Single I/O channel
5gig -> 50gig of Data
Copyright © 2004, SAS Institute Inc. All rights reserved.
AIX UNIX Servers
IBM P630 eServer
IBM P670 eServer
AIX 5.3
AIX 5.3
4 processors
16 processors
4 I/O channels
8 - 1gig fiber
I/O Channels
8 GB RAM
4x72 GB disks
14-drive SCSIS storage
array
50gig -> 500gig
Copyright © 2004, SAS Institute Inc. All rights reserved.
Dynamic logical
partitioning
2 TB disks
5500gig -> 1TB of Data
Populate DDS and Data Mart
 Ran ETL Flows
• Registered in SAS Metadata Repository
• Loaded Data into Tables
• Use Slowly Changing Dimension Load Process
 Analyze ETL Flows
Copyright © 2004, SAS Institute Inc. All rights reserved.
Example of SAS ETL Studio Flow Analysis
Copyright © 2004, SAS Institute Inc. All rights reserved.
Change Management




Loaded New Release of DDS in TST Repository
Compared PRD Repository to TST Repository
Ran Batch Reports to Examine Differences.
Ran Impact Analysis on Column and Table
Copyright © 2004, SAS Institute Inc. All rights reserved.
What Did We Find
 Specific Techniques that Work Best
 Recommendations
Tremendous Performance Gains!
Copyright © 2004, SAS Institute Inc. All rights reserved.
Specific Techniques Examples
ETL Flows





Parallel ETL flows
SAS coding techniques to use
Use hash table instead of look up
Make sure the I/O buffer size is tuned
Drop constraints
Copyright © 2004, SAS Institute Inc. All rights reserved.
Specific Techniques Examples
DDS Model
 Indexes – when and when not to add
 Denormalized some tables
 Separate tables for data with high volume
changes
 Partition data by usage (date ranges)
Copyright © 2004, SAS Institute Inc. All rights reserved.
Recommendations






Debugging techniques
Sorting and memory usage
Joins
Understand disk requirements
I/O optimization
Compression and performance
Copyright © 2004, SAS Institute Inc. All rights reserved.
Above All




Write ETL
Test, Tune
Test, Tune
Test, Tune!!!!
Copyright © 2004, SAS Institute Inc. All rights reserved.
Summary and Conclusions





Data integration is key
Different approaches for customers
Change management is vital
Performance tuning is vital
Technology evolving
Copyright © 2004, SAS Institute Inc. All rights reserved.
Questions?
Copyright © 2004, SAS Institute Inc. All rights reserved.