Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data Mgmt Ralph Hollinshead, Manager, Solutions Data Integration Copyright © 2004, SAS Institute Inc. All rights reserved. Overview Part One: Building an Integrated Data Model Part Two: Deploying and Scaling the Data Architecture Copyright © 2004, SAS Institute Inc. All rights reserved. SAS® Banking Intelligence Solutions Framework New Solutions X Sell Up sell Customer Retention Marketing Automation Strategic Performance Management Credit Scoring Credit Risk Banking Intelligence Architecture INTEGRATED EXTENDABLE ARCHITECTURE FOCUSED ON BUSINESS ISSUES BASED ON EXPERIENCE Copyright © 2004, SAS Institute Inc. All rights reserved. Enterprise Source Systems Independent Solutions Extract and Cleanse Files Solution Data Marts Solutions SAS® Credit Risk Management SAS® Cross-Sell and Up-Sell for Banking SAS® Customer Retention for Banking SAS® Credit Scoring for Banking Copyright © 2004, SAS Institute Inc. All rights reserved. Integrated Data Model: Not All Customers are the Same Customer A: No Data Warehouse • Interested Multiple SAS Solutions Customer B: With Data Warehouse • Adverse to Data Replication Issues Customer C: With Data Warehouse • No Data Marts allowed – Active Data Warehousing Approach Copyright © 2004, SAS Institute Inc. All rights reserved. Customer A: Full SAS Data Architecture Enterprise Source Systems Solution Data Marts Extract and Cleanse Files SAS Banking Detail Data Store Solution s SAS® Credit Risk Management 2 1 2 SAS® Cross-Sell and Up-Sell for Banking SAS® Customer Retention for Banking Flexible Options to Meet Customer Needs! SAS® Credit Scoring for Banking Copyright © 2004, SAS Institute Inc. All rights reserved. Customer B: Partial SAS Data Architecture Enterprise Source Systems Solution Data Marts Extract and Cleanse Files Customer Enterprise Data Warehouse Solution s SAS® Credit Risk Management 2 1 2 SAS® Cross-Sell and Up-Sell for Banking SAS® Customer Retention for Banking Flexible Options to Meet Customer Needs! SAS® Credit Scoring for Banking Copyright © 2004, SAS Institute Inc. All rights reserved. Customer C: Customer Data Architecture Enterprise Source Systems Solution s Extract and Cleanse Files Customer Enterprise Data Warehouse SAS® Marketing Automation Information Maps Copyright © 2004, SAS Institute Inc. All rights reserved. Scorecard for Data Architecture Approach Data Management Issue Score Sensitivity to Data Replication -0-5 Sensitivity to H/W processor and storage budget -0-5 Existing warehouse quality -0-5 Implementation time constraints -0-5 Intentions to implement >1 SAS solution +0-5 Historical data requirements +0-5 Score Decision -25 No DDS. Marts only if absolutely necessary. Information maps may be appropriate. 0 Use DDS to persist current extract from source systems. Marts hold multiple extracts up to full history. +25 Implement full warehouse, persist history in DDS and as much as wanted in the marts. Copyright © 2004, SAS Institute Inc. All rights reserved. Techniques for Data Model Integration Detail Data Store • Varying Industries • General Standards • Warehousing Techniques Data Marts • Approach Compared to DDS Copyright © 2004, SAS Institute Inc. All rights reserved. Integrating Models at the Industry Level Telco Banking - Accounts - Account Transactions, etc. Customer Supplier Employee GL Account Product etc. Insurance - Premiums - Claims - Benefits, etc. Copyright © 2004, SAS Institute Inc. All rights reserved. - Subscriptions - Equipment - Networks -Calls, etc. Detail Data Store Standards Needed for Integration Data Types / Lengths / Classifier Codes Naming Conventions Standards for Data Structures • Hierarchies • Subtypes • Reference Data Copyright © 2004, SAS Institute Inc. All rights reserved. Data Administration Standards Domain Data Type Width Applicable Class Codes Comment/Example Identifier Varchar 32 ID Typically the identifier from the source system. Small Code Varchar 3 CD Short length codes such as ADDRESS_TYPE_CD Medium Code Varchar 10 CD Medium length codes such as EXCHANGE_SYMBOL_CD Large Code Varchar 20 CD Long length codes such as POSTAL_CD Standard Count Code Numeric 6 CNT Standard counts such as AUTHORIZED_USERS_CNT Name Varchar 40 NM Proper name. For example, LAST_NM, FIRST_NM, etc. Short Length Text Varchar 20 TXT Short freeform text. Medium Length Text Varchar 100 TXT, DESC Longer freeform text and descriptions associated with code tables. Indicator Field Character 1 FLG Binary indicatory flag (Y or N). Surrogate Key Numeric 10 RK, SK Generated surrogate keys. Currency Amount Numeric 18,5 AMT Standard currency amount. Rates and Percentages Numeric 9,4 PCT, RT For example, exchange rates. DateTime Date DT, DTTM Accommodate dates as well as date/time. Copyright © 2004, SAS Institute Inc. All rights reserved. Detail Data Store: Data Warehousing Standards Surrogate Keys, Point-in-Time, and Rapidly Changing Data CUSTOMER CUSTOMER_RK VALID_FROM_DT VALID_TO_DT ACCOUNT_RK MARITAL_STATUS_CD FIRST_NM LAST_NM 100 01JAN1999 29FEB2000 201 S John Smith 100 01MAR2000 31DEC4747 201 M John Smith FINANCIAL_ACCOUNT ACCOUNT_RK VALID_FROM_DT VALID_TO_DT CUSTOMER_RK FINANCIAL_ACCOUNT_TYPE_CD OPEN_DT 201 01JAN1999 31DEC4747 100 SAVINGS 01JAN2000 FINANCIAL_ACCOUNT_CHNG ACCOUNT_RK VALID_FROM_DT VALID_TO_DT BALANCE_AMT CURRENCY_CD 201 01JAN1999 31JAN1999 2500.75 USD 201 1FEB1999 28FEB1999 4300.25 USD Copyright © 2004, SAS Institute Inc. All rights reserved. Conformed Dimensions Copyright © 2004, SAS Institute Inc. All rights reserved. Tools: Extending Models CUSTOMER INTERNAL_ORG_ASSOC SUPPLIER INTERNAL_ORG EXTERNAL_ORG COMPETITORS Copyright © 2004, SAS Institute Inc. All rights reserved. INTERNAL_ORG_ASSOC_TYPE Change Analysis Tool Copyright © 2004, SAS Institute Inc. All rights reserved. Deploying the Integrated Data Architecture Copyright © 2004, SAS Institute Inc. All rights reserved. Option A: Full SAS Data Architecture Enterprise Source Systems Solution Data Marts Extract and Cleanse Files SAS Banking Detail Data Store Solution s SAS® Credit Risk Management 2 1 2 SAS® Cross-Sell and Up-Sell for Banking SAS® Customer Retention for Banking Flexible Options to Meet Customer Needs! SAS® Credit Scoring for Banking Copyright © 2004, SAS Institute Inc. All rights reserved. Populate DDS and Data Mart Banking Data Mart Source Data Excel SAS SAP Oracle PeopleSoft Data Warehouse DDS Step 1 - Extract cleanse and transform from source data into flat file Flat File Step 2 – ETL processing to load data warehouse •data validation •key creation •slowly changing dimensions Copyright © 2004, SAS Institute Inc. All rights reserved. Step 3 - Transform into data mart model Deployment Focus Scalability and Performance ETL flows Physical data model Copyright © 2004, SAS Institute Inc. All rights reserved. Deployment What did We do? Create and Generate Data Deploy Hardware and Software Populate DDS Populate Data Mart Analyze ETL Flows Analyze DDS Model Change Management Copyright © 2004, SAS Institute Inc. All rights reserved. It All Starts with Data Bought and Built Data Generators Built Simulated Data Applied Business Rules Scaled - 5 gig -> 50 gig -> 500 gig -> 1TB Copyright © 2004, SAS Institute Inc. All rights reserved. Deploy Hardware and Software Choose Software Components • SAS for the DDS or Data Warehouse • Databases for the DDS or Data Warehouse • SAS for the Data Marts Install and Configure SAS Software Configure Hardware Design for Progressive Larger Deployment Growth Copyright © 2004, SAS Institute Inc. All rights reserved. Windows Server *Dell PowerEdge 1600SC Windows 2003 DualHyper-threaded 2.8 Ghz processors 4 GB RAM 4 internal IDE drives 60 GB C drive 275 GB D drive Single I/O channel 5gig -> 50gig of Data Copyright © 2004, SAS Institute Inc. All rights reserved. AIX UNIX Servers IBM P630 eServer IBM P670 eServer AIX 5.3 AIX 5.3 4 processors 16 processors 4 I/O channels 8 - 1gig fiber I/O Channels 8 GB RAM 4x72 GB disks 14-drive SCSIS storage array 50gig -> 500gig Copyright © 2004, SAS Institute Inc. All rights reserved. Dynamic logical partitioning 2 TB disks 5500gig -> 1TB of Data Populate DDS and Data Mart Ran ETL Flows • Registered in SAS Metadata Repository • Loaded Data into Tables • Use Slowly Changing Dimension Load Process Analyze ETL Flows Copyright © 2004, SAS Institute Inc. All rights reserved. Example of SAS ETL Studio Flow Analysis Copyright © 2004, SAS Institute Inc. All rights reserved. Change Management Loaded New Release of DDS in TST Repository Compared PRD Repository to TST Repository Ran Batch Reports to Examine Differences. Ran Impact Analysis on Column and Table Copyright © 2004, SAS Institute Inc. All rights reserved. What Did We Find Specific Techniques that Work Best Recommendations Tremendous Performance Gains! Copyright © 2004, SAS Institute Inc. All rights reserved. Specific Techniques Examples ETL Flows Parallel ETL flows SAS coding techniques to use Use hash table instead of look up Make sure the I/O buffer size is tuned Drop constraints Copyright © 2004, SAS Institute Inc. All rights reserved. Specific Techniques Examples DDS Model Indexes – when and when not to add Denormalized some tables Separate tables for data with high volume changes Partition data by usage (date ranges) Copyright © 2004, SAS Institute Inc. All rights reserved. Recommendations Debugging techniques Sorting and memory usage Joins Understand disk requirements I/O optimization Compression and performance Copyright © 2004, SAS Institute Inc. All rights reserved. Above All Write ETL Test, Tune Test, Tune Test, Tune!!!! Copyright © 2004, SAS Institute Inc. All rights reserved. Summary and Conclusions Data integration is key Different approaches for customers Change management is vital Performance tuning is vital Technology evolving Copyright © 2004, SAS Institute Inc. All rights reserved. Questions? Copyright © 2004, SAS Institute Inc. All rights reserved.