data relationship management Data Integration to Data Governance Data In the News: Data slipups Rick Whiting , 10-May-2006 Inaccurate business data lead to botched marketing campaigns, failed CRM projects--and angry customers. A home valued at US$121,900 somehow wound up recorded in Porter County's computer system as being worth a whopping US$400 million. Naturally, the figure ended up on documents used to calculate tax rates. By the time the blunder was uncovered in February, the damage was done. 2 Market Forces Affecting the Use of Data Privacy Regulations HIPAA, GLBA, PIPEDA, EU DPD Competitive Edge Straight-through processing, customer service, Consumer confidence Data Inaccuracies, Over-billing SEC/NAD rule, SARBOX, legal liability, Mergers Business Governance 3 Customer Pressure What are Companies Doing in Response? Credit Card Company: Where is the Sensitive Data? Business Problem: • Risk of a security breach exposes potential regulatory fines, negative PR and customer backlash Proposed Solution: • Identify sensitive data flows in structured databases so critical data can be consolidated and properly secured Roadblock: • 50 data analysts over 5 years estimate makes project appear to be unbounded and infeasible Status: • Project put on hold 5 Health Insurance Company: Outsourcing Development Business Problem: • Data must be sent to India for offshore application development. • Sensitive data must be masked for HIPAA compliance Proposed Solution: • Mask sensitive data before sending it outside the company Roadblock: • Sensitive data, where is it? • Can two sets of data that individually contain no sensitive data be combined to make it sensitive? Status: • Manual discovery of sensitive data slows outsourcing to a crawl 6 Wall Street Firm: Data Consistency will Increase Profitability Business Problem: • Transaction errors are expensive and the risk of regulatory fines due to inconsistent reference data is unacceptable Proposed Solution: • Deploy a master data management solution Roadblock: • 5 years to determine the business rules that relate the master data system to legacy systems • Unable map two tables to each other after 6 weeks of work (70 tables total to map) Status: • Project on hold 7 Auto Insurance: Migrating Fragile Legacy Integration Code to Modern Tools Business Problem: • Business changes force expensive and difficult to implement changes in hand written legacy integration code Proposed Solution: • Migrate legacy code to a modern ETL (extract, transform, load) tool. Cost of maintenance of ETL is a fraction of legacy code Roadblock: • No one knows the code. The cost of migration is unpredictable. Status • Company continues to manually change hand written code ad hoc as the business demands 8 The Common “?” in the Project Schedule • T= 0 Data Relationship Discovery You have to know where your data is, how it flows and relates across systems if you hope to secure it, move it, consolidate, integrate it ... ? Consistency/ Master Data Internal Security Project Timeline Integration 9 Don’t We Know Our Own Data? Myth #1: “We know our data” I’m a professional. Of course I know my data! • Subject matter experts (SMEs) only know their own systems • But they can’t tell you how it changes and is transformed as it moves from system to system • Relationships between systems are complex: But, once it leaves my hands, it is someone else’s problem! Wow, that transformation is complex. Are you sure that is in my data? • SMEs sometimes change jobs! I’m going to start my own consulting firm 11 Myth #2: “We know our data” All of my data follows the business rules for this system! • Business rules are broken all the time as data crosses business and system boundaries: • 83 year old man in system A is a “youthful driver” in system B • Bond yield is listed as 5% in system X and 5.3% in system Y • Exceptions result in lost revenue, customer dissatisfaction, and regulatory fines 12 Myth #3: “We know our data” • Business rules change as organizations change • Mergers and Acquisitions • New products or services • Products/services are retired • Reorganizations • New IT systems are added I can’t keep up with all the acquisitions and reorganizations. They mess up the way systems work together. It is very inconvenient. 13 The Reality Companies lack a global view of their corporate data map 14 Current Trend: Data Governance What is it? • The latest over-hyped term • Data Integration is to Data Governance is to Tactical as Strategic Definition • Data Governance encompasses the people, processes and procedures to create a consistent, enterprise view of your data in order to: • Improve data security • Increase consistency & confidence in decision making • Decrease the risk of regulatory fines 15 The Problem with Data Governance • How do you do it? • Where is the sensitive data? • What are the business rules and data relationships • Where are the exceptions? • How do you ensure a consistent, repeatable process? 16 Traditional Proposed Approach: Metadata What is it? • Another over-hyped term • Data about data: datatype (character, integer, number, date etc), column width, frequency, cardinality etc Traditional Data Relationship Discovery Tool The Problem • Single system metadata only: • Profiling • Traditional data integration tools do not discover metadata • Cleansing, ETL, EAI and EII The Reality • Data analysts manually examine data values to figure out the data map • The most sophisticated tool generally used today is: 17 There is a Better Way The Solution: Data-Driven Relationship Discovery • New approach to a 40 year old problem • Sophisticated heuristics and algorithms analyze actual data values • Automates the discovery and validation of: • Sensitive data flows • Business rules • Complex transformations between structured data sets in a consistent and repeatable manner 19 Solution: Data-Driven Exception & Discrepancy Discovery • Identify exceptions to avoid: • Regulatory fines • Lost revenue • Customer dissatisfaction Transformation CASE WHEN AGE <=25 THEN Youthful_Driver = ‘Y’ ELSE ‘N’ END Transformation ApplicationA.BY * 10000 = ApplicationB.Bond_Yield Hit Rate = 90% Application A Application B B_Y Bond_Yield 0.053 530 0.062 620 0.071 710 0.034 340 0.055 550 0.072 720 0.055 550 0.067 670 Exception 0.056 580 0.06 600 Hit Rate = 90% Application A Application B AGE Youthful_Driver 17 Y 24 Y 55 N 28 N 40 N 33 N Y Exception 83 29 N 36 N 42 N 20 Data-Driven Discovery Results Credit Card Company Wall Street Firm Status: Project moving forward again Status: Back on track • Reduced estimated effort from 250 engineering years to 25 eng. years • Eliminated project feasibility risk • Over 5x (2 days vs 6 weeks manually) improvement in discovery of business rules made MDM project possible • Found bond yield discrepancies Health Insurance Company Auto Insurance Company Status: Outsourcing rollout accelerated Status: Predictable & affordable migration • Now confident in sensitive data discovery accuracy and speed • Launching new data masking service companywide • 80% reduction in effort required to migrate hand-code to ETL tool • Mapping process discovered potentially costly business rule errors 21 Summary: Data Governance = Strategic Data Integration • Companies are implementing data governance projects to: • Improve Security • Increase Consistency • Decrease Regulatory Risk • First step of data governance… Discovery • Automated data-driven discovery is a consistent, repeatable and proven approach to identify: • Sensitive Data • Business Rules • Data Exceptions 22 Key Contacts Bob Shannon: U.S. East Coast Sales Phone: (203) 878-8472 Email: bob@exeros.com Brian Smogard: U.S. Central Sales Phone: (612) 605-9236 Email: brian@exeros.com Clive Harrison: U.S. West Coast and International Sales Phone: 415-608-4632 Email: clive@exeros.com If you have any other follow up questions, contact me: Todd Goldman Phone: (408) 919-0191 ext 1115 Email: todd@exeros.com 23