Data Strategy in Practice Sid Adelman & Associates sidadelman@aol.com 818.783.9634 Data Strategy 1 – Introduction to Data Strategy Module 2 – Data Quality Module 3 – Metadata Module 4 – Organization, Roles &Responsibilities Module 5 – Security & Privacy Module 6 – Business Intelligence Module 7 – Information Integration Module 8 – Software/Products Module 9 – Performance & Measurement Module Copyright Sid Adelman, 2007 2 Module 1 – Introduction to Data Strategy Components of a data strategy Why have a data strategy Do these problems exist in your organization? Gain control Support the IT strategy Data in the Dark Ages Enlightened data strategy Critical success factors How to implement a data strategy Best Practices Copyright Sid Adelman, 2007 3 Components of a Data Strategy + RDBMS - Relational Database Management System Data Quality Metadata Performance Data Distribution Organization Data Ownership Copyright Sid Adelman, 2007 4 Components of a Data Strategy + Security and Privacy Total Cost of Ownership Subject area databases Data modeling Data sharing Business Intelligence Information integration Copyright Sid Adelman, 2007 5 Components of a Data Strategy + Legacy/operational data Standards Data migration Application packages Software/products Personal/departmental databases Copyright Sid Adelman, 2007 6 Components of a Data Strategy Categorization of data Communicating and selling the data strategy Measurement Copyright Sid Adelman, 2007 7 Why Have a Data Strategy Capitalize on the data asset Support the IT Strategy Gain control Copyright Sid Adelman, 2007 8 Do these problems exist in your organization? + Uncontrolled redundant data Data not easily accessible by the user Lack of knowledge of available data Poor data quality Each new application designs, builds and populates it own data base Inconsistent reports Copyright Sid Adelman, 2007 9 Do these problems exist in your organization? Private databases No central meta data repository Management unclear on the importance of data No responsibility for data Data standards non existent, not understood or not followed Copyright Sid Adelman, 2007 10 Gain Control Consistent security implementation Understand, define and assign ownership Understand, define and assign stewardship Minimize redundancy Inventory data Develop consistent terminology Copyright Sid Adelman, 2007 11 Support the IT Strategy Provide departments, projects and personnel with guidelines for storing and accessing data Minimize the number of RDBMSs Establish, disseminate and maintain standards for shared data resources Deliver a high level of service – – – – performance Availability response time responsiveness to user requests Copyright Sid Adelman, 2007 12 Data in the Dark Ages Data is kept locked by each application or department Users do not trust the data Data is not well understood either by users or by IT Data is difficult to access Senior Management does not understand the value of data Copyright Sid Adelman, 2007 13 Enlightened Organization Data is shared Users trust the accuracy of the data Data is inventoried and terminology is clear Data is easily accessed by IT and by the users Senior Management view data as an asset that is critical to the organization and to decision making Copyright Sid Adelman, 2007 14 Critical Success Factors Data Strategy supports IT plans Quality data Support of legacy data Support of development efforts Infrastructure – Organization – Skills – Tools Achieve short-term successes Copyright Sid Adelman, 2007 15 How to Implement a Data Strategy Data environment assessment Establish a target data environment Develop an implementation plan Sell Data Strategy within the organization Evaluate progress and justify your existence Revisit the plan Copyright Sid Adelman, 2007 16 Best Practices Don’t get into the details too soon Don’t be seen as a theorist -- your actions must be pragmatic Don’t lead with long-term deliverables Don’t commit more than you can deliver Avoid unproven technology Copyright Sid Adelman, 2007 17 Module 1 Workshop Assessment of Existing Organization Copyright Sid Adelman, 2007 18 Module 2– Data Quality Management Support Evaluation/Diagnosis Timeliness ETL Validation Prioritization - Which Data to Clean First Cost of Cleansing Responsibility for Data Quality Copyright Sid Adelman, 2007 19 Management Support Management awareness of importance of data quality Cost justification of data quality initiative Ongoing commitment Finding a business management sponsor Copyright Sid Adelman, 2007 20 Evaluation/Diagnosis Which source data is most correct Valid values (domains) Business rules Data types (e.g., hex, packed decimal) Completeness Inappropriate defaults Fields used for multiple purposes Accuracy Quality of historical data Copyright Sid Adelman, 2007 21 Data Timeliness Currency of data, e.g., last Friday Frequency of update, e.g., daily, weekly, monthly, quarterly User awareness – how will the users know? Copyright Sid Adelman, 2007 22 ETL Validation Validation of ETL process Tie-outs – Number of records – Dollar matching – Quantitative matching Automatic versus manual checking Referential integrity? Copyright Sid Adelman, 2007 23 Triage - Prioritization Which data to clean Justification for cleansing Ease of cleansing Possibility of cleansing Political support for cleansing Copyright Sid Adelman, 2007 24 Cost of Cleansing Automatic versus manual – Tools to perform automatic cleansing – Effort to support use of tools Use of defaults Knowledge/experience of those performing manual cleansing Copyright Sid Adelman, 2007 25 Responsibility for Data Quality “It’s not enough to say that data quality is everyone’s responsibility.” Data Quality Administrator Ongoing commitment Data ownership responsibility Operational versus data warehouse responsibility Copyright Sid Adelman, 2007 26 Data Quality – Best Practices Inventory the quality of your data Sell the importance of data quality to management Assign data quality responsibility Triage the cleansing process Copyright Sid Adelman, 2007 27 Module 2 Workshop Data Quality Copyright Sid Adelman, 2007 28 Module 3– Metadata Management Support Meta Data as the Keystone Which Metadata to Capture Responsibility for Capture Responsibility for Maintenance Business Metadata Technical Metadata How will Metadata be Used Data Inventory Copyright Sid Adelman, 2007 29 Metadata – Management Support IT and the Business Management understanding of the importance of metadata Impact on project schedules Long term benefit of metadata Importance for operational and data warehouse Copyright Sid Adelman, 2007 30 Metadata as the Keystone Single version of the truth It’s the inventory of information Tears down dysfunctional information fiefdoms Opportunities to reduce redundancy Opportunities for integration Copyright Sid Adelman, 2007 31 Which Metadata to Capture Don’t boil the ocean What meta data is valuable Ease and cost of capture Political issues relating to capture Copyright Sid Adelman, 2007 32 Responsibility for Capturing Metadata Incentive for capturing Management direction Automatic and manual Copyright Sid Adelman, 2007 33 Responsibility for Metadata Maintenance Where does Metadata Repository maintenance report? Why is maintenance important? Long-term commitment Copyright Sid Adelman, 2007 34 Business Metadata Business definitions Source of data How data was derived (algorithms) Lineage (data genealogy) Timeliness Security Ownership Quality Copyright Sid Adelman, 2007 35 Technical Metadata Field name Database Data type Source Length Copyright Sid Adelman, 2007 36 How Will Metadata be Captured Data modeling tools ETL tool Access and analysis tool Metadata Repository tool Data dictionary Copybooks Home grown application Copyright Sid Adelman, 2007 37 How Will Metadata be Used Business – Understanding the data – Understanding the meaning of results – Avoiding incorrect conclusions IT – Research – Impact analysis – Tool interchange Copyright Sid Adelman, 2007 38 Inventory Where is the data? How and where is it used? Quality of data Redundancy Ownership Documentation Copyright Sid Adelman, 2007 39 Metadata – Best Practices Determine which meta data to capture and use Determine how the tools will capture and use metadata Sell management on the importance Assign metadata responsibility Copyright Sid Adelman, 2007 40 Module 3 Workshop Metadata Copyright Sid Adelman, 2007 41 Module 4 Organization – Datarelated Roles & Responsibilities Database Administrator Data Administrator Data Quality Administrator Security Architect Data ownership Copyright Sid Adelman, 2007 42 Database Administrator Database design Backup and recovery Reorganization Monitoring Tuning Index creation Copyright Sid Adelman, 2007 43 Data Administrator Data modeling Source data evaluation Enterprise data integration Data quality analysis Metadata responsibility Copyright Sid Adelman, 2007 44 Data Quality Administrator Uncovering data quality problems Communicating data quality problems ETL verification Responsibility for some cleansing Copyright Sid Adelman, 2007 45 Security Responsibility for who can do what to the data – Data access – Data create/update/delete Working with those administering the tools that have security capabilities Copyright Sid Adelman, 2007 46 Architect Knowing what the enterprise needs Evaluating technical options Developing an appropriate architecture Selling the architecture Copyright Sid Adelman, 2007 47 Data Ownership + Creation Access Determine requirements for performance Determine requirements for availability Determine historical requirements Copyright Sid Adelman, 2007 48 Creation Data Entry process – Training – Incentives for quality Quality of data Data edits Copyright Sid Adelman, 2007 49 Access Need to know Opt in/Opt out Level of granularity By department By role External access by people outside the organization Copyright Sid Adelman, 2007 50 Performance Requirements Response time What is excellent response time worth? Timeliness Copyright Sid Adelman, 2007 51 Availability Requirements How many hours and days does the system need to be available? What is the availability requirement during scheduled hours? Copyright Sid Adelman, 2007 52 Historical Requirements How far back to keep the data How detailed does old data need to be? Impact of code changes and organizational changes over time Copyright Sid Adelman, 2007 53 Organization – Best Practices Establish the appropriate organization for your enterprise Enumerate roles and responsibilities Gain concurrence for roles and responsibilities – Management – Those performing the functions Copyright Sid Adelman, 2007 54 Module 4 Workshop Organization Copyright Sid Adelman, 2007 55 Module 5 Security & Privacy Categorization for security Responsibility for determining Mechanism for establishing procedures Security audit Regulatory issues Data sharing Copyright Sid Adelman, 2007 56 Categorization for Security/Privacy Does all data have the same security/privacy requirements? Who determines security/privacy requirements of data? What are the regulatory requirements for security and privacy? Does your organization have a Security Office? What authority do they have? Copyright Sid Adelman, 2007 57 Responsibility Security Office Internal auditors? Data Owners Responsibility for administering Testing security and privacy Copyright Sid Adelman, 2007 58 Mechanism for Establishing Procedures Security requirements – Internal – Regulatory Tools that implement security Communicating security requirements to those who implement Copyright Sid Adelman, 2007 59 Security Audit Validating procedures Validating training Testing and probing Recommending mitigation Frequency of audits Copyright Sid Adelman, 2007 60 Regulatory Issues Care – HIPPA Finance Brokerage - SEC Insurance Media – FCC Health Copyright Sid Adelman, 2007 61 Data Sharing Inhibitors Motivation/incentives to share Management directives on sharing Copyright Sid Adelman, 2007 62 Inhibitors Power Fear of others Fear of boss micromanaging Copyright Sid Adelman, 2007 63 Motivation/incentives to share Are there any? Copyright Sid Adelman, 2007 64 Management Direction on Sharing Direction to share must come from the CEO – Need to know – Reason for withholding access must be documented – Access only given when directed Copyright Sid Adelman, 2007 65 Security & Privacy – Best Practices Raise the consciousness of security and privacy requirements Connect with your Security Office Determine security capabilities of tools Assign responsibilities Test and validate Copyright Sid Adelman, 2007 66 Module 5 Workshop Security & Privacy Copyright Sid Adelman, 2007 67 Module 6 Business Intelligence Goals and Objectives Architecture Data Mining Tools Methodology Copyright Sid Adelman, 2007 68 Goals and Objectives Why have a data warehouse? Have goals and objectives been identified Have they been communicated? Are they measured post-implementation Copyright Sid Adelman, 2007 69 Architecture Platform Tools/products How the data flows Copyright Sid Adelman, 2007 70 Data Mining Discovery versus hypothesis testing Different tools Different people mining the data Copyright Sid Adelman, 2007 71 Tools RDBMS Data Modeling ETL Access and Analysis Data quality (Cleansing) Measurement Copyright Sid Adelman, 2007 72 Methodology Spiral versus waterfall Phasing more appropriate Tasks more difficult to estimate Copyright Sid Adelman, 2007 73 Business Intelligence – Best Practices Set goals and objectives Set expectations early and often Establish cost justification Find a terrific sponsor Copyright Sid Adelman, 2007 74 Module 6 Workshop Business Intelligence Copyright Sid Adelman, 2007 75 Module 7 Information Integration Integrating business data Data redundancy Different RDBMSs and their impact Data migration Copyright Sid Adelman, 2007 76 Integrating Business Data Understanding the customer ERPs Supply chain Copyright Sid Adelman, 2007 77 Data Redundancy Goal to reduce data redundancy? Inconsistent data Single version of the truth Cost of data redundancy Copyright Sid Adelman, 2007 78 Different RDBMSs & Their Impact More interface programs Less depth in DBA pool More product expense Integration problems Less optimizer capability Copyright Sid Adelman, 2007 79 Data Migration + Should data be dropped? Should data be converted? Should data be integrated/consolidated? Copyright Sid Adelman, 2007 80 Should Data be Dropped? Is it even being used? What’s the cost of maintaining this data? Could another database be used in its place? Any political issues? Any regulatory issues? Copyright Sid Adelman, 2007 81 Should Data be Migrated? Can we consolidate RDBMSs? What is the cost of migration? What is the impact on other systems? Copyright Sid Adelman, 2007 82 Should Data be Integrated/Consolidated? Why do we want to integrate/consolidate? Costs of integration/consolidation Savings of integration/consolidation Political issues Regulatory issues Copyright Sid Adelman, 2007 83 Information Integration – Best Practices Determine information integration benefits and costs Sell information integration to management Establish and execute priorities Copyright Sid Adelman, 2007 84 Module 7 Workshop Information Integration Copyright Sid Adelman, 2007 85 Module 8 Software/Products RDBMS Tools/utilities Organization standards for products Criteria for selection Responsibility for Selection Single vendor/best of breed Deals/Negotiation Relationship with vendors Application packages Copyright Sid Adelman, 2007 86 RDBMS Which RDBMS is the standard Relation to platform What applications is it being used for Copyright Sid Adelman, 2007 87 RDBMS Choices IBM (DB2, IMS, Informix) Microsoft (SQL Server) Oracle Sybase Teradata Copyright Sid Adelman, 2007 88 Why standardize the RDBMS? Minimize the number of RDBMSs Less training required More leverage on RDBMS vendor Flexible assignments Fewer interface problems Fewer interface programs Copyright Sid Adelman, 2007 89 Relation to platform RDBMS performance impacted by platform Platform may dictate (or strongly recommend) RDBMS choice Which decision comes first? Copyright Sid Adelman, 2007 90 What application is RDBMS being used for Operational/OLTP Data Warehouse/Business Intelligence Copyright Sid Adelman, 2007 91 Tools/Utilities Platform dependent RDBMS dependent Expensive 33% on the shelf Lots of product duplication Necessary? Copyright Sid Adelman, 2007 92 Organization Standards for Products Who sets standards? Are the standards known? Are they standards or guidelines? Who can give dispensation? Copyright Sid Adelman, 2007 93 Criteria for Selection Need Cost Vendor – Support – Reputation – Financial stability Copyright Sid Adelman, 2007 94 Responsibility for Selection Technical evaluators Strategic architect Management Copyright Sid Adelman, 2007 95 Single Vendor vs Best of Breed Single – – – – vendor Possibly a better relationship Leverage Not always the best products Products should all work together Best-of-breed – Need to integrate yourself – Finger pointing when problems – Potential incompatibilities Copyright Sid Adelman, 2007 96 Deals/Negotiations Have someone else negotiate Don’t let vendor know you have chosen them before you negotiate www.dobetterdeals.com (Joe Auer – ComputerWorld) Copyright Sid Adelman, 2007 97 Relationship with Vendors Partnerships Money Issues Support Conferences Being a reference Copyright Sid Adelman, 2007 98 Databases Required by the Application Packages Packages do not support all RDBMSs Packages do not support all RDBMSs equally well Does preferred RDBMS violate organization standard Are support personnel (DBAs) available? Copyright Sid Adelman, 2007 99 Impact of Package Machine Requirements Performance Availability Copyright Sid Adelman, 2007 100 Software – Best Practices Determine real requirements Establish software standards Make use of existing software whenever possible Talk to organizations who are using the products Copyright Sid Adelman, 2007 101 Module 8 Workshop Software/Products Copyright Sid Adelman, 2007 102 Module 9 – Performance and Measurement Categorization for performance Capacity Planning Monitoring/Measuring Service Level Agreements Tuning Roles and Responsibilities Reporting performance Copyright Sid Adelman, 2007 103 Categorization for Performance How good does response time need to be? How does it differ from application to application? What is the cost-benefit of excellent response time? Were performance considerations included in the architecture? Copyright Sid Adelman, 2007 104 Categorization for Availability Scheduled hours (24 X 7, 18 X 6,…) Availability during scheduled hours How does it differ from system to system? Is excellent availability cost justified? Was availability included in the architecture? Copyright Sid Adelman, 2007 105 Capacity Planning Database size Number of users Number of transactions Number of queries/reports Time and day of usage Complexity of transactions/queries/reports Proactive response to capacity increase Copyright Sid Adelman, 2007 106 Monitoring/Measuring Response time Resource utilization (CPU, disk access, network) Who is using the system When is the system being used Chargebacks Copyright Sid Adelman, 2007 107 Service Level Agreements Response time Availability – Schedule hours (hours/day, days/week) – Availability during scheduled hours Timeliness of data Response to problems Response to new requests Copyright Sid Adelman, 2007 108 Tuning of problems – measurement tools and responsibilities Tuning capability of platform, RDBMS, tools Responsibility for tuning Awareness Copyright Sid Adelman, 2007 109 Roles and Responsibilities DBA - RDBMS Application performance Systems programmer – operating system System Architect Capacity Planner Performance testing Copyright Sid Adelman, 2007 110 Reporting performance IT – Who needs to take action – Who needs to see reports/alerts Business – Matching project agreements – Expectations Copyright Sid Adelman, 2007 111 Measurement Tools Performance Usage Resource utilization Network Copyright Sid Adelman, 2007 112 Measurement Usage What do you do with the performance measurement information? Copyright Sid Adelman, 2007 113 Reporting to Management High level (not detailed) Problems, aberrations Frequency Form (tables, charts, graphs) Copyright Sid Adelman, 2007 114 Service Level Agreements Response time Availability Who establishes agreements? What’s realistic? Incentives to meet SLAs Copyright Sid Adelman, 2007 115 Performance & Measurement – Best Practices Determine what is advantageous to measure Assign responsibilities Designate tools for measurement Report metrics to management Copyright Sid Adelman, 2007 116 Module 9 Workshop Performance & Measurement Copyright Sid Adelman, 2007 117 Overall Data Strategy Best Practices Don’t get into the details too soon Don’t be seen as a theorist -- your actions must be pragmatic Don’t lead with long-term deliverables Don’t commit more than you can deliver Avoid unproven technology Copyright Sid Adelman, 2007 118 How to Implement a Data Strategy Conduct a data environment assessment Establish a target data environment Develop an implementation plan Sell Data Strategy within the organization Evaluate progress and justify your existence Revisit the plan Copyright Sid Adelman, 2007 119 Summary Pitch the importance of a data strategy to your CIO and CTO Ask to either lead the effort or to be a permanent member of the team Copyright Sid Adelman, 2007 120