Strategy for Data Governance Replace with your name & organization • Las Vegas • February 18, 2008 © Copyright 2012 Your organization 1 Outline Benefits of a data governance strategy Components of a data governance strategy Organization, roles and responsibilities Impact of a data governance strategy on BI and IT How to implement a data governance strategy program © Copyright 2012 Your organization 2 Why you need a data governance strategy I would like an accounting of the company’s financial assets CEO Uhh … let me see. I think we still have enough money in our bank accounts to cover payroll this month, and uhh …I’m not sure if there are any outstanding accounts receivables … Uhh and – hmm … let me think … © Copyright 2012 Your organization CFO 3 Why you need a data governance strategy I would like an accounting of the company’s information assets CEO Uhh … let me see. I don’t really have an inventory of all the data, and I’m not sure what data is in which database, or how much of that data is redundant and inconsistent. I also can’t vouch for the quality of the data … Uhh and – hmm … let me think … © Copyright 2012 Your organization CIO 4 Do these problems exist in your organization? Replace with your problems © Copyright 2012 Your organization 5 Do these problems exist in your organization? Room for more problems and issues © Copyright 2012 Your organization 6 Motivations for Data Governance SEC audits and risk of losing investors Risk of fines and incarceration due to inaccurate regulatory reporting Risk of losing customers due to poor data quality Loss of productivity due to excessive and uncontrolled redundancy Suboptimal business performance © Copyright 2012 Your organization 7 Technology Solutions Enterprise Resource Planning (ERP) Data Warehousing (DW & BI) Customer Relationship Management (CRM) Supply Chain Management (SCM) … © Copyright 2012 Your organization 8 Data Warehousing DW Promises DW Reality Data integration No more uncontrolled data redundancy Consistency of data content Improved data quality Historical enterprise data Unlimited ad-hoc reporting Reliable trend analysis reporting Business intelligence capabilities Stove-pipe data marts and departmental data warehouses Continued redundancy, sometimes even increased data redundancy Data is still inconsistent among data marts and data warehouses (no central staging area, no reconciliation totals) Little improvement to data quality Historical data is limited to departmental views Limited ad-hoc reporting (too complicated, missing relationships, poor performance) Inconsistent trend analysis reports among data marts BI capabilities compromised by inconsistent and unreliable key performance indicators (KPI) © Copyright 2012 Your organization 9 Customer Relationship Management CRM Promises CRM Reality Data integration Non-redundant customer data Data quality Increased customer satisfaction Product pricing customization Knowledge of customer wallet share More stove-pipe systems Continued redundancy, more departmental views, purchased packages not integrated Dirty customer data continues Decreased customer dissatisfaction because of poorquality customer data Wrong pricing because of departmental views, still not cross-organizational Privacy issues and dirty data led to government regulations © Copyright 2012 Your organization 10 The Lesson? You cannot keep doing what you have always done and expect the results to be different. Not even with new technology. “That wouldn’t be logical” Spock, Star Trek © Copyright 2012 Your organization 11 Data Governance Defined … Consultants “The execution and enforcement of authority over the management of data assets and the performance of data functions” (Robert Seiner) “The process by which you manage the quality, consistency, usability, security, and availability of your organization’s data” (Jane Griffin) “A process and structure for formally managing information as a resource. Ensures the appropriate people representing business processes, data, and technology are involved in the decisions that affect them; includes an escalation and decision path for identifying and resolving issues, implementing changes, and communicating resulting actions” (Danette McGilvray) © Copyright 2012 Your organization 12 Data Governance Defined … Clients “A framework of accountabilities and processes for making decisions and monitoring the execution of data management.” (BMO) “Resolving data issues using a horizontal perspective of the organization and focusing on the major “pain points” for our business areas.” (Sallie Mae) “Unites people, process, and technology to change the way data assets are acquired, managed, maintained, transformed into information, shared across the company as common knowledge, and consistently leveraged by the business to improve profitability.” (Wachovia) © Copyright 2012 Your organization 13 Data Governance Defined … Vendors “The orchestration of people, process, and technology to enable the leveraging of data as an enterprise asset. It includes policies, procedures, organization, roles, and responsibilities, with associated communication and training required to design, develop, and provide ongoing support for the effort.” (SAP) “An organization-wide commitment to data quality, with data stewardship recognized as an essential business role. (DataFlux) © Copyright 2012 Your organization 14 Data Governance Defined … Other The execution of authority over the management of data Data quality – including conformance to valid values, uniqueness, non-redundant, complete, accurate, understood, timely, referential integrity Metadata creation and maintenance – information about data, both technical and business Master data management (MDM) Data integration Data categorization for performance, availability, and security © Copyright 2012 Your organization 15 Outline Benefits of a data governance strategy Components of a data governance strategy Organization, roles and responsibilities Impact of a data governance strategy on BI and IT How to implement a data governance strategy program © Copyright 2012 Your organization 16 Components of a DG strategy Data standardization Data integration Data modeling Data quality Metadata management Security and privacy Performance and measurement DBMS and product selection Business intelligence © Copyright 2012 Your organization 17 Data standardization Formal data definitions Business data naming standards Class words lexicon Technical data naming standards Common words lexicon Data domain standards © Copyright 2012 Your organization 18 Our Situation with Standardization Insert your standardization status © Copyright 2012 Your organization 19 Formal Data Definitions A data definition must reflect the real-world meaning A data definition explains the content and meaning of the unique data element A data definition must be complete enough to ensure a thorough understanding of the data element Example: Well Depth Feet Bad definition: “The depth of the well in feet” Good definition: “The total depth of the well in feet from the surface of the surrounding ground to the deepest point dug or drilled regardless of the depth of the well casing.” Data definitions are short and precise (one paragraph) and (optionally) may contain examples Data definitions should never contain information about the source or use of the data elements Source: The DW Challenge by Michael Brackett © Copyright 2012 Your organization 20 Data Naming Standards - Business The name of an attribute should be derived from its definition Attribute names are always fully spelled out Attribute names should have 3 components: – Prime word Example: – Qualifiers (modifiers) “Checking Account Monthly Average Balance” – Class word Attribute names should be fully qualified Attribute names should always end with an approved class word Use only class words from an approved class words lexicon Attribute name components should be business terms, not technical terms © Copyright 2012 Your organization 21 Class Words Lexicon Approved and Published Amount . . . Dec 9,2 Balance . . . Dec 13,2 Code . . . Char 1-5 Count . . . Small Int Indicator . . . Char 1 Name . . . Char 15-40 Number . . . Integer Percent . . . Dec 5,2 Date . . . Date Description . . .Vchar Identifier . . Integer Quantity . . . Small Int Rate . . . Dec 6,4 Text . . . Varchar 250 Business Data Domains © Copyright 2012 Your organization 22 Data Naming Standards - Technical The name of a column is composed of abbreviated attribute name components Use only abbreviations from an approved common words lexicon (abbreviations list) Column name components should always be abbreviated if an approved abbreviation exists whether the column name is too long or not Example: “CHKG_ACCT_MTHLY_AVG_BAL” When column names are too long, qualifiers should be eliminated starting with the least significant qualifier to the second least significant qualifier, etc. © Copyright 2012 Your organization 23 Common Words Lexicon Approved and Published Account . . . ACCT Amount . . . AMT Average . . . AVG Balance . . . BAL Checking . . . CHKG Certificate of Deposit ...CD Code . . . CDE Count . . . CNT Date . . . DTE Description . . .DESC Identifier . . . ID Indicator . . . IND Monthly . . . MTHLY Name . . . NM Number . . . NBR Percent . . . PCT Quantity . . . QTY Rate . . . RTE Savings . . . SVG Text . . . TXT Abbreviations List © Copyright 2012 Your organization 24 Data Domain Standards Every attribute (data element) must be atomic Every attribute must be unique (no synonyms, no homonyms) Every attribute identifies or describes only one business object (entity) in the real world Every attribute must have business metadata (name, definition, business rules, owner, source, etc.) Every attribute must have a predefined data domain Data domains must be based on EDM data quality rules Business metadata and data domains are defined and maintained by business people © Copyright 2012 Your organization 25 Data Standardization – Best Practices Provide training in data administration principles Create formal data definitions Create fully qualified business data names Apply the data domain standards Create and use class words and common words lexicons Publish the data standards © Copyright 2012 Your organization 26 Standardization – What we need to do Enter your proposed actions © Copyright 2012 Your organization 27 Data Integration Look for potential duplicate entities by examining: – – – Entity definitions Semantic intent Entity content Ensure that each entity has one unique business identifier Put one fact (attribute) in one place (entity) using the normalization rules Look for potential duplicate attributes by examining: – Attribute definitions – Semantic intent – Domains Capture real world business actions between entities as data relationships (not reporting patterns) © Copyright 2012 Your organization 28 Single Version of The Truth Customer Account Payment Account Customer Method Product Order Product Part Product Existing Customer Potential Customer Payment Salesperson Based on normalization rules Product Category Part Salaried Salesperson Org Unit Supplier Shipment Commissioned Salesperson Org Structure Warehouse © Copyright 2012 Your organization 29 Unstructured data Storage and administration – Enterprise content management systems (ECMS) – Check-in and check-out functionality – Retention and archiving – Backup and recovery – Secure objects Content reusability Search and delivery Combining structured and unstructured data © Copyright 2012 Your organization 30 Data Integration – Best Practices Determine data integration benefits and costs Create an inventory of all your data Use logical data modeling and normalization rules to find and remove synonyms and homonyms Use a metadata repository to document the names and definitions of your business data Don’t forget to integrate unstructured data with structured data © Copyright 2012 Your organization 31 Data Integration – Our Status Focus on the important data such as customer, supplier, agents, inventory, parts, loans, or whatever it is that runs your business. Include examples of where you are integrated and where not. © Copyright 2012 Your organization 32 Data Integration – This is what we need to do Enter your integration actions © Copyright 2012 Your organization 33 Data modeling Logical Data Model Business view of data Process Independent Project-specific model Business model Enterprise Data Model Business view of data Process Independent Enterprise-wide model Enterprise information architecture Physical Data Model Database model Database view of data Process Dependent Database-specific model © Copyright 2012 Your organization 34 Data Modeling – Our Situation © Copyright 2012 Your organization 35 Logical Data Model Captures what an organization is and what it does in terms of: – – – – – Business objects (entities) Business data (attributes) Business activities (relationships) Business rules (metadata) Business policies (metadata) Not tailored for: – Query or reporting pattern or tool – Access or storage requirements – Performance © Copyright 2012 Your organization 36 Process Independence Access path independent Program independent Query / report independent Database independent Tool independent (OLAP) Language independent Platform independent © Copyright 2012 Your organization 37 Purpose of Logical Data Modeling Facilitate data integration Facilitate business analysis Facilitate communication among business people Improve productivity through reusability Focus on data ownership as opposed to system ownership Bring data quality problems to the surface Separate process logic from data Serve as the baseline data architecture for database design © Copyright 2012 Your organization 38 Enterprise Data Model “Single Version of the Truth” Customer Account Payment Payment Account Method o Integrated 360 business view! Customer Product Order Product Part Product Existing Customer Potential Customer Salesperson Supported by common data definitions, domains, and business rules. Product Category Part Salaried Salesperson Org Unit Supplier Shipment Commissioned Salesperson Org Structure Warehouse © Copyright 2012 Your organization 39 Physical Data Model Database design based on physical attributes: – – – – – – Access patterns Size of tables Number of business users Location of business users Platform (Processor, DBMS) OLAP tools Tailored for: – Query or reporting pattern or tool – Access and storage requirements – Performance © Copyright 2012 Your organization 40 Process Dependent Access path dependent Program dependent Query / report dependent Database dependent Tool dependent (OLAP) Language dependent Platform dependent © Copyright 2012 Your organization 41 Purpose of Physical Data Modeling Facilitate database design Focus on performance Architect database structures: – – – – – Tables Columns Primary keys Foreign keys Referential integrity rules © Copyright 2012 Your organization 42 Data Modeling – Best Practices Always create a logical business data model – do not just focus on database modeling Sell the importance of creating an enterprise information architecture (enterprise data model) to management Assign data modeling responsibilities (the enterprise data model should not be created by database designers) Create a process to link the physical data models to the enterprise data model © Copyright 2012 Your organization 43 Data Modeling – This is what we need to do Enter your proposed data modeling actions © Copyright 2012 Your organization 44 Data quality At what level of DQ maturity is your organization? Program “abends” 1 Data profiling Data cleansing Discovery by accident 2 Correcting source data and programs Limited data analysis 3 short term Enterprise-wide DQ methods & techniques Addressing root causes 4 1 2 3 4 5 Uncertainty Awakening Enlightenment Wisdom Certainty Proactive prevention 5 long term Continuous process improvements Optimization (based on CMM) © Copyright 2012 Your organization 45 Data quality costs Direct Costs of Non-Quality Information Marketing Campaign Per Instance Number of Instances Total Number Per Year Time: ($60/hour loaded rate) Creating redundant occurrence Researching correct address Correcting address errors Handling complaints from customers Mail preparation 2.4 min 10 min 0.3 min 5.5 min 0.1 min 167,141 5,000/mo 6,000/mo 974/yr 393,273 1 12 12 1 4 Materials, Facilities, Equipment: Marketing brochure Postage Warehouse storage Shipping equipment and maintenance $1.96 $0.52 $0.01 $5,000/yr 393,273 393,273 393,273 36% 4 4 4 1 $0.02/trans $0.001/mo $0.005/mo 393,273 393,273 393,273 4 12 12 Computing resources: CPU transactions Data storage Data backup © Larry English, Improving DW and BI Quality Total Annual Costs Total Cost Per Year $ $ $ $ $ 401,138 600,000 21,600 5,357 157,309 $3,083,260 $ 818,008 $ 15,731 $ 1,800 $ $ $ 31,462 4,719 23,596 $5,163,980 © Copyright 2012 Your organization 46 Data quality costs Information Development Cost Analysis Category Infrastructure Basis: Enterprise architected DBs Enterprise reusable create/update programs + Total Infrastructure expenses Value Basis: Total retrieve equivalent pgms + Total value-adding expenses Cost-adding Basis: Redundant create/update pgms Interface/extract programs Redundant database files Total cost-adding expenses Lifetime Total ** Portfolio Total Number Relative Weight Factor* Average Unit Dev/Maint Costs Total Infrastructure Total Value-adding Dev/Maint Cost-adding Expenses** Expenses 200 0.75 $ 15,000 $ 3,000,000 300 1.50 $ 30,000 $ 9,000,000 300 500 400 600 1,500 1.00 1.50 1.00 0.75 © Larry English, Improving DW and BI Quality $ 20,000 $ 30,000 $ 20,000 $ 15,000 % of Budget Expenses $12,000,000 24% $ 6,000,000 12% $32,000,000 64% $50,000,000 100% $ 6,000,000 $15,000,000 $ 8,000,000 $ 9,000,000 3,800 * Determine relative effort to develop average unit of each category using effort to develop a retrieve program as “1.00” + For programs that retrieve some data and create/update other data, determine the percent of retrieve only attributes and percent of create/update attributes (e.g., to retrieve customer data to create an order) **Based on 3,800 application programs and database files in portfolio and $50 Million in development © Copyright 2012 Your organization 47 Dummy (default) values Defaults for mandatory fields SSN 999-99-9999 Age 999 Zip 99999 Income 9,999,999.99 Inability to determine customer profiles Inability to determine customer demographics © Copyright 2012 Your organization 48 “Intelligent” dummy values Defaults with meaning SSN 888-88-8888 Income 999,999.99 Age 000 Source Code ‘FF’ Non-resident alien Employee Corporate customer Account closed prior to 1991 Inability to write straight forward queries without knowing how to filter data © Copyright 2012 Your organization 49 Missing Values Operational systems do not always require informational or demographic data Gender Ethnicity Age Income Referring Source Inability to analyze marketing channels © Copyright 2012 Your organization 50 Multi-purpose fields ONE field explicitly has MANY meanings » Which business unit enters the data » At what time in history it was entered » A value in one or more other fields Appraisal Amount redefined as Advertised Amount 25 redefines = 25 attributes ! redefined as Not mutually exclusive ! Sold Date Loan Type Code Only the value of one is known for each record ! redefined as ... Inability to judge product profitability © Copyright 2012 Your organization 51 Cryptic values (1) Often found in “Kitchen Sink” fields » Usually one byte (if not one bit) » Highly cryptic (A, B, C, 1, 2, 3, ...) » Non-intelligent, non-intuitive codes » Often not mutually exclusive Inability to empower end users to write their own queries © Copyright 2012 Your organization 52 Cryptic values (2) ONE field implicitly has MANY meanings Master_Cd {A, B, C, D, E, F, G, H, I} {A, B, C} {D, E, F} {G, H, I} © Copyright 2012 Your organization Type of customer Type of supplier Regional constraints 53 Free-form address lines Unstructured text » no discernable pattern » cannot be parsed address-line-1: address-line-2: address-line-3: address-line-4: ROSENTHAL, LEVITZ, A TTORNEYS 10 MARKET, SAN FRANC ISCO, CA 95111 Inability to perform market analysis © Copyright 2012 Your organization 54 Contradicting values Values in one field are inconsistent with values in another related field 1488 Flatbush Avenue New York, NY 75261 Texas Zip Type of real property: Single Family Residence Number of rental units:four Income property Inability to make reliable business decisions © Copyright 2012 Your organization 55 Violation of business rules Business Rule: Adjustable Rate Mortgages must have » Maximum Interest Rate ( Ceiling) » Minimum Interest Rate ( Floor) Business Rule: A Ceiling is always higher than a Floor ceiling-interest-rate: floor-interest-rate: 8.25 14.75 switched ? Inability to calculate product profitability © Copyright 2012 Your organization 56 Reused primary key Little history, if any, stored in operational files » primary keys are customarily re-used » may have a different rollup structure January ‘94: August ‘97: branch 501 = San Francisco Main region 1 area SW branch 501 = San Luis Obispo region 2 area SW Inability to evaluate organizational performance © Copyright 2012 Your organization 57 Non-unique primary key Duplicate identification numbers » Multiple customer numbers Customer Name Philip K. Sherman Philip K. Sherman Philip K. Sherman Phone Number 818.357.5166 818.357.7711 818.357.8911 Cust. Number 960601 960105 960003 » Multiple employee numbers Employee Name July 1995: Bob Smith January 1996: Bob Smith August 1999: Bob Smith Department 213 (HR) 432 (SRV) 206 (MKT) Empl. Number 21304762 43218221 20684762 Inability to determine customer relationships Inability to analyze employee benefits trends © Copyright 2012 Your organization 58 Missing data relationships Data that should be related to other data in a dependent (parent-child) relationship Branch Employee Benefit » Branch number 0765 does not exist in the BRANCH table Inability to produce accurate rollups © Copyright 2012 Your organization 59 Inappropriate data relationships Data that is inadvertently related, but should not be » two entity types with the same key values Purchaser: Seller: Jackie Schmidt Robert Black 837221 837221 Inability to determine customer or vendor relationships © Copyright 2012 Your organization 60 Management Support Management awareness of importance of data quality Cost justification of data quality initiative Ongoing commitment Finding a business management sponsor © Copyright 2012 Your organization 61 Triage - Prioritization Which data to cleanse Justification for cleansing Ease of cleansing Possibility of cleansing Political support for cleansing © Copyright 2012 Your organization 62 Cost of Cleansing Automatic versus manual – Tools to perform automatic cleansing – Effort to support use of tools Use of defaults Knowledge/experience of those performing manual cleansing © Copyright 2012 Your organization 63 Responsibility for Data Quality “It’s not enough to say that data quality is everyone’s responsibility.” Data Quality Administrator Ongoing commitment Data ownership responsibility Operational versus data warehouse responsibility © Copyright 2012 Your organization 64 Data Quality – Best Practices Inventory the quality of your data Sell the importance of data quality to management Assign data quality responsibility Triage the cleansing process © Copyright 2012 Your organization 65 Data Quality – Our Status Enter all the major problems you have or anticipate with data quality and don’t limit yourself to one slide. © Copyright 2012 Your organization 66 Data Quality – What Steps We Should Take to Improve Enter all the practical steps you should take and prioritize them. Don’t limit yourself to one slide. © Copyright 2012 Your organization 67 Metadata Management Tables Columns Keys (primary/foreign) Ref. Integrity Rules Indexes ETL rules Process logic Business Names Data Definitions Data Domains Data Relationships Business Rules DQ Rules Data Integrity Rules User’s View Business Metadata Data Lineage Data Location Data Usage Data Volumes Load Statistics Error Statistics Master Metadata Developer’s View Administrator’s View Technical Metadata Usage Metadata © Copyright 2012 Your organization 68 Metadata is everywhere Technicians and Business People Word Processing Files Business Analysts Data Administrator Spreadsheets CASE Tools Database ETL Administrator Developer DBMS Dictionaries ETL Tools Application Developer Data Mining Expert OLAP Tools Data Mining Tools Metadata Migration Process Metadata Repository Technician’s View Business Person’s View Business Metadata Technical Metadata © Copyright 2012 Your organization 69 Metadata as the Keystone Single version of the truth It’s the inventory of information Tears down dysfunctional information fiefdoms Opportunities for data standardization © Copyright 2012 Your organization 70 Management Support for Metadata IT and the Business Management understanding of the importance of metadata Impact on project schedules Long term benefit of metadata Importance for operational and data warehouse © Copyright 2012 Your organization 71 Which Metadata to Capture Don’t boil the ocean What metadata is valuable Ease and cost of capture Political issues relating to capture © Copyright 2012 Your organization 72 Responsibility for Capturing Metadata Incentive for capturing Management direction Automatic and manual © Copyright 2012 Your organization 73 Responsibility for Maintaining Metadata Where does Metadata Repository Administration report? Why is administration and maintenance important? Long-term commitment © Copyright 2012 Your organization 74 How Metadata Is Used Business – Understanding the data – Understanding the meaning of results – Avoiding incorrect conclusions IT – Research – Impact analysis – Tool interchange © Copyright 2012 Your organization 75 Metadata – Best Practices Determine which metadata to capture and use Determine how the tools will capture and use metadata Sell management on the importance of metadata Assign metadata responsibility © Copyright 2012 Your organization 76 Metadata – Where are we? Include anything you have done including a glossary or business and IT definitions. © Copyright 2012 Your organization 77 Metadata – What Should We be Doing As you enter these actions, consider including responsibility but make sure you have talked to those people or departments before presenting to management. © Copyright 2012 Your organization 78 Security and privacy A Workstation Terminals Communication Server B C Remote Access G Database Server Mainframe H D E LAN File Server F Internet Access Legend: Security exists No security Conn. Path Mainframe Security Package LAN Security Package PC Security Package Password Security Encryption Function DBMS Security Generic Security Package A B C D E F G H © Copyright 2012 Your organization 79 Categorization for Security/Privacy Does all data have the same security/privacy requirements? Who determines security/privacy requirements of data? What are the regulatory requirements for security and privacy? Does your organization have a Security Office? What authority do they have? © Copyright 2012 Your organization 80 Responsibility For Data Security Security Office Internal auditors Data Owners Responsibility for administering Testing security and privacy © Copyright 2012 Your organization 81 Mechanism For Establishing Security Procedures Security requirements – Internal – Regulatory Tools that implement security Communicating security requirements to those who implement © Copyright 2012 Your organization 82 Security Audit Validating procedures Validating training Testing and probing Recommending mitigation Frequency of audits © Copyright 2012 Your organization 83 Regulatory Issues Health Care – HIPPA Finance Brokerage - SEC Insurance Media – FCC © Copyright 2012 Your organization 84 Security & Privacy – Best Practices Raise the consciousness of security and privacy requirements Connect with your Security Office Determine security capabilities of tools Assign responsibilities Test and validate © Copyright 2012 Your organization 85 Security & Privacy – What exposures do we have? Hopefully you have talked to your Security Officer and anyone else who is responsible for the security of data. © Copyright 2012 Your organization 86 Security & Privacy – What Steps do we Need to Take Be sure to clear these actions with those responsible for security and privacy. © Copyright 2012 Your organization 87 Performance Benchmarking Capacity planning Designing (optimal schemas) Coding (efficient SQL calls) Monitoring and measuring Tuning – Database structures – DBMS parameters and OS – Communication links – Hardware © Copyright 2012 Your organization 88 Categorization for Performance How good does response time need to be? How does it differ from application to application? What is the cost-benefit of excellent response time? Were performance considerations included in the architecture? © Copyright 2012 Your organization 89 Categorization for Availability Scheduled hours (24 X 7, 18 X 6,…) Availability during scheduled hours How does it differ from system to system? Is excellent availability cost justified? Was availability included in the architecture? © Copyright 2012 Your organization 90 Capacity Planning Database size Number of users Number of transactions Number of queries/reports Time and day of usage Complexity of transactions/queries/reports Proactive response to capacity increase © Copyright 2012 Your organization 91 Monitoring/Measuring Response time Resource utilization (CPU, disk access, network) Who is using the system When is the system being used Chargebacks © Copyright 2012 Your organization 92 Service Level Agreements Response time Availability – Schedule hours (hours/day, days/week) – Availability during scheduled hours Timeliness of data Response to problems Response to new requests Who establishes agreements? What’s realistic? Incentives to meet SLAs © Copyright 2012 Your organization 93 Reporting performance IT – Who needs to take action – Who needs to see reports/alerts Business – Matching project agreements – Expectations © Copyright 2012 Your organization 94 Tuning Awareness of problems – measurement tools and responsibilities Tuning capability of platform, RDBMS, tools Responsibility for tuning © Copyright 2012 Your organization 95 Measurement Tools Performance Usage Resource utilization Network © Copyright 2012 Your organization 96 Performance & Measurement – Best Practices Determine what is advantageous to measure Assign responsibilities Designate tools for measurement Report metrics to management © Copyright 2012 Your organization 97 DBMS/Product Selection Industrial-strength Enterprise Server Mid-range Workgroup Server Desktop Remote Client © Copyright 2012 Your organization 98 Relational DBMS Which RDBMS is the standard Relation to platform What applications is it being used for © Copyright 2012 Your organization 99 Why standardize the RDBMS? Minimize the number of RDBMSs Less training required More leverage on RDBMS vendor Flexible assignments Fewer interface problems Fewer interface programs © Copyright 2012 Your organization 100 Relation to platform RDBMS performance impacted by platform Platform may dictate (or strongly recommend) RDBMS choice Which decision comes first? Desktop Remote Client © Copyright 2012 Your organization Mid-range Workgroup Server Industrial-strength Enterprise Server 101 How DBMS is being used Operational/OLTP Data Warehouse/Business Intelligence OM ODS EDW Operational Systems DM DW Databases © Copyright 2012 Your organization 102 Tools/Utilities Platform dependent DBMS dependent Expensive 33% on the shelf Lots of product duplication Necessary? © Copyright 2012 Your organization 103 Standards for Products Who sets standards? Are the standards known? Are they standards or guidelines? Who can give dispensation? © Copyright 2012 Your organization 104 Criteria for Selection Need Cost Vendor – Support – Reputation – Financial stability © Copyright 2012 Your organization 105 Responsibility for Selection Technical evaluators Strategic architect Management © Copyright 2012 Your organization 106 Single Vendor vs Best of Breed Single vendor – Possibly a better relationship – Leverage – Not always the best products – Products should all work together Best-of-breed – Need to integrate yourself – Finger pointing when problems – Potential incompatibilities © Copyright 2012 Your organization 107 Deals/Negotiations Have someone else negotiate Don’t let vendor know you have chosen them before you negotiate www.dobetterdeals.com (Joe Auer – ComputerWorld) © Copyright 2012 Your organization 108 Relationship with Vendors Partnerships Money Issues Support Conferences Being a reference © Copyright 2012 Your organization 109 Databases Required by the Application Packages Packages do not support all DBMSs Packages do not support all DBMSs equally well Does preferred DBMS violate database standard Are support personnel (DBAs) available? © Copyright 2012 Your organization 110 Impact of Package Machine Requirements Performance Availability © Copyright 2012 Your organization 111 DBMS/Product Selection – Best Practices Determine real requirements Establish software standards Make use of existing software whenever possible Talk to organizations who are using the products © Copyright 2012 Your organization 112 Business intelligence (BI) Source: TDWI trend metric same store sales customer retention new customers charge cards issued 30 day past-due accounts 60 day past-due accounts 90 day past-due accounts merchandise return rate inventory turnover rate Financial Performance Meters actual target variance $108.0m $120.0m - 10% 96% 95% +0.9% 3.8k 5.0k -24.0% 8.5k 12.0k -33.3% 500 400 +2.0% regulatory warning Daily Sales market opportunity Market Growth … provides decision makers a 360o view of their business compliance violation Alerts Trends © Copyright 2012 Your organization Forecasts 113 Goals and Objectives Why have a data warehouse? Have goals and objectives been identified? Have they been communicated? Are they measured post-implementation? © Copyright 2012 Your organization 114 Architecture Platform Tools/products How the data flows © Copyright 2012 Your organization 115 DW and BI Tools RDBMS Data Modeling ETL Access and Analysis Data quality (Cleansing) Measurement © Copyright 2012 Your organization 116 Data Mining Data farming Data mining Verification of assumptions Discovery of the unknown Results based on known data relationships Inferred results from data found in database Deductive method Inductive method Yields information that can be proven to be factual Yields information that is assumed to be true for some probability © Copyright 2012 Your organization 117 Data Sources for Data Mining Operational databases DW databases Orders Shipments E T L Enterprise Data Warehouse Account Master Customer DM Billing Sales DM Data Mining Databases Data Mining Applications © Copyright 2012 Your organization 118 Spiral BI/DW Methodologies Business Goals Assessment & Strategy Project Plan Business Opportunity Post-Impl. Review Data Requirement BI/DW Applications Business Analysis Data Inventory Application Design Implementation Testing Development © Copyright 2012 Your organization 119 Software Release Concept “Extreme scoping” Projects First Release - Larissa Moss Second Release Final Release “feels like prototyping” BI Application Reusable & Expanding Third Release Fifth Release Fourth Release “Refactoring” - Kent Beck Project =/ Application © Copyright 2012 Your organization 120 Using the Software Release Approach Unstable requirements can be tested and enhanced in small increments Scope is very small and manageable Technology infrastructure can be tested and proven Data volumes (per release) are relatively small Project schedules are easier to estimate because the scope is very small Development activities can be iteratively refined, honed, and adapted Mistakes are less expensive to fix early in the development process! © Copyright 2012 Your organization 121 Using the Software Release Approach Unstable requirements can be tested and enhanced in small increments Scope is very small and manageable Technology infrastructure can be tested and proven Data volumes (per release) are relatively small Project schedules are easier to estimate because the scope is very small Development activities can be iteratively refined, honed, and adapted And the quality of the release deliverables (and ultimately the quality of the BI applications) will be higher! And the development process will get faster and faster! © Copyright 2012 Your organization 122 Software Release Guidelines Deliver every three to six months (first release will take longer) Strictly control the scope and keep it very small Keep expectations realistic First Release Second Release Final Release BI Application Third Release Fifth Release Fourth Release The enterprise infrastructure must be robust (technical and non-technical) Metadata must be an integral part of each release; otherwise, the releases will not be manageable Designs, programs, and tools must be flexible © Copyright 2012 Your organization 123 Iterative BI Application Development Release 6 Release 1 Business Case Assessment Release Implementatn Release Implementatn Planning Post-Impl. Review Meta Data Reposit ory Testing Application Prototyping Meta Data Repository Development ETL Design ETL Design Business Case Assessment Meta Data Reposit ory Testing Requiremts & Application Prototyping Meta Data Repository Analysis Application Development Release Implementatn Meta Data Repository Design Data Mining BI Application Requiremts & Data Analysis ETL Testing Release 5 ETL Design Data Analysis Planning Application Prototyping Application Prototyping Data Analysis Post-Impl. Review Application Testing Meta Data Repository Analysis ETL Design Meta Data Repository Design Data Mining Requiremts & Application Prototyping ETL Development ETL Development Release Implementatn Meta Data Reposit ory Testing Application Testing Meta Data Repository Analysis Application Development Application Prototyping Requiremts & Data Analysis ETL Testing Requiremts & Application Prototyping Application Development Meta Data Repository Development Planning Post-Impl. Review Requiremts & Data Analysis ETL Testing Application Testing Business Case Assessment Business Case Assessment Planning Post-Impl. Review Requiremts & Data Analysis ETL Testing Application Testing Meta Data Reposit ory Testing Requiremts & Application Prototyping Meta Data Repository Analysis Application Development Meta Data Repository Development Release 2 Application Prototyping Application Prototyping ETL Design ETL Development Meta Data Repository Development Application Prototyping Application Prototyping ETL Design ETL Design ETL Development ETL Design Release Meta Data Implementatn Repository Design Data Mining Business Case Assessment Requiremts & Data Analysis ETL Testing Application Testing Meta Data Reposi tory Testing Application Testing Meta Data Repository Development Application Prototyping ETL Design ETL Development ETL Design Data Mining Data Analysis Meta Data Repository Design Data Analysis Planning Post-Impl. Review Meta Data RepositoryApplication Analysis Development Application Prototyping Business Case Assessment Requiremts & Data Analysis ETL Testing Requiremts & Application Prototyping Application Development Meta Data Repository Development Release Implementatn Planning Post-Impl. Review Data Analysis Meta Data Repository Design Data Mining Meta Data Reposit ory Testing Requiremts & Application Prototyping Meta Data Repository Analysis Application Prototyping Application Prototyping ETL Design ETL Development ETL Design Data Mining Meta Data Repository Design Release 3 Data Analysis Release 4 © Copyright 2012 Your organization 124 Business Intelligence – Best Practices Set goals and objectives Set expectations early and often Establish cost justification Find a terrific sponsor Use a spiral methodologies Deliver often with software releases © Copyright 2012 Your organization 125 BI & DW – How well are we doing? Include applications, departments, number of users, usage, user satisfaction, ROI, management perception,… © Copyright 2012 Your organization 126 DW & BI – What are we going to do to make our DW and BI Sing? This might include training, selling to management and end users, new BI tools, new organizational responsibilities,… © Copyright 2012 Your organization 127 Outline Benefits of a data governance strategy Components of a data governance strategy Organization, roles and responsibilities Impact of a data governance strategy on BI and IT How to implement a data governance strategy program © Copyright 2012 Your organization 128 Organization, roles and responsibilities Data owner Data steward Data strategist Strategic architect Database administrator/designer Data administrator (EIM) Metadata administrator (EIM) Data quality analyst (EIM) Security officer © Copyright 2012 Your organization 129 Data owner Assigned to business people (often data originators) Typically hold a senior position (directors or managers) Have authority to set policies and dictate business rules and security for the data Are accountable to the information consumers in the organization © Copyright 2012 Your organization 130 Data steward Should be assigned to business people, but could be performed by senior business analysts from IT Must know the industry and the organization very well (often people with seniority) Requires an enterprise-wide understanding of the data and the business rules Have authority to communicate and enforce policies, business rules, and security for the data Mediate data disputes among business people and facilitate resolutions © Copyright 2012 Your organization 131 Data strategist Understands the strategic business goals Knows the government regulations and governmental reporting requirements Understands the DBMS platforms and operating systems Knows the internal application databases (operational and BI) Is aware of future data demands and data volumes Creates and maintains the data governance strategy © Copyright 2012 Your organization 132 Strategic architect Develops the overall architecture for both operational and BI environments to include: – – – – Software Utilities Tools Interfaces Determines if the BI/DW environment will be one-tier or multi-tier and what the platform components should be Participates in architecting databases and data flows © Copyright 2012 Your organization 133 Database administrator/designer Understands user requirements and how databases are accessed and updated Knows different database design techniques (relational, multi-dimensional) and when to apply them Is responsible for the physical aspects of application databases: – – – – – Logical and physical database design Partitioning and indexing Dataset placement Performance and tuning (databases and SQL) Backup and recovery Maintains the application databases © Copyright 2012 Your organization 134 Data administrator Knows the industry and the business processes Understands the data and the business rules that are used by those processes Has expertise in E/R modeling and knows the normalization rules Standardizes and integrates the data (logically) through the enterprise information architecture Creates and enforces data naming standards Collects and maintains business metadata: – Data names (fully spelled out business names) – Data definitions and metrics definitions – Business rules (data rules and process rules) © Copyright 2012 Your organization 135 Metadata administrator Knows industry metadata standards Understands DW databases and ETL architectures Builds and maintains a metadata repository or administers a purchased MDR product Selects and installs metadata integration and access tools Integrates and loads metadata from various BI and developer tools (Data Modeling, Data Profiling, DBMS, ETL, OLAP) © Copyright 2012 Your organization 136 Data quality analyst Knows the internal application databases and how to extract data from them Is familiar with data profiling and data cleansing tools Understands the user requirements, the business processes, and the business rules Audits operational source data to find and report violations of business rules and other DQ problems Participates in writing data cleansing specs Identifies root causes for dirty data Facilitates negotiations between data originators and information consumers about DQ improvements © Copyright 2012 Your organization 137 Security officer Knows the governmental security and privacy regulations (HIPAA) Understands the business requirements for securing the data Understands security features and capabilities of the application components (DBMS, BI tools, Web portals) Ensures that appropriate security settings are placed on: – – – – Databases BI tools Developer tools Web portals © Copyright 2012 Your organization 138 Organization – Do we have the right roles and responsibilities? Include and responsibilities that overlap and identify any gaps where some roles are not be filled. © Copyright 2012 Your organization 139 Organization – What should we be considering? Be careful here. You are likely to step on toes. Be sure to vet any proposed changes with the appropriate management. © Copyright 2012 Your organization 140 Outline Benefits of a data governance strategy Components of a data governance strategy Organization, roles and responsibilities Impact of a data governance strategy on BI and IT How to implement a data governance strategy program © Copyright 2012 Your organization 141 Impact of a data governance strategy on BI and IT Better and faster decisions Increased analyst productivity Employee empowerment Cost containment RELIABLE INFORMATION Cash flow acceleration Revenue enhancement Fraud reduction Demand chain management Better customer service Lower customer attrition Better relationships with suppliers and customers Public relations and reputation © Copyright 2012 Your organization 142 Gain Control Consistent security implementation Understand, define and assign ownership Understand, define and assign stewardship Minimize redundancy Inventory data Develop consistent terminology © Copyright 2012 Your organization 143 Support the IT Strategy Provide departments, projects and personnel with guidelines for storing and accessing data Minimize the number of RDBMSs Establish, disseminate and maintain standards for shared data resources Deliver a high level of service – – – – Performance Availability Response time Responsiveness to user requests © Copyright 2012 Your organization 144 Outline Benefits of a data governance strategy Components of a data governance strategy Organization, roles and responsibilities Impact of a data governance strategy on BI and IT How to implement a data governance strategy © Copyright 2012 Your organization 145 Incremental Data Governance Strategy Implementation Don’t get into the details too soon Don’t be seen as a theorist -- your actions must be pragmatic Don’t lead with long-term deliverables Don’t commit more than you can deliver Avoid unproven technology © Copyright 2012 Your organization 146 Steps to Implement a Data Governance Strategy Conduct a data environment assessment Establish a target data environment Develop an implementation plan Sell data governance strategy within the organization Evaluate progress and justify your existence Revisit the plan © Copyright 2012 Your organization 147 Summary Pitch the importance of a data governance strategy to your CIO or CTO Ask to either lead the effort or to be a permanent member of the team © Copyright 2012 Your organization 148 Thank you ISBN 0-201-61635-1 ISBN 0-321-24099-5 ISBN 0-201-78420-3 ISBN 0-201-76033-9 Larissa Moss Sid Adelman Method Focus, Inc. methodfocus@earthlink.net Sid Adelman & Associates sidadelman@aol.com © Copyright 2012 Your organization 149