Brian Kordelski – WW Sales Executive – IBM InfoSphere 12/07/2010 The Core Principles of Information Governance © 2010 IBM Corporation Governance is no longer an option “By 2013, 25% of the companies in highly regulated industries will create and staff positions in accounting, human resources, compliance and audit and law that deal explicitly with the management of information via technology.” – Gartner, Inc. “Organizing for Information Governance” Debra Logan, November 2009 2 “[A]n [information management] strategy should incorporate lifecycle information governance practices [to ensure] consistent execution of ... business optimization, agility, and transformation [initiatives].” – Forrester Research, Inc. “Refresh Your Information Management Strategy to Deliver Business Results” Rob Karel & James G. Kobielus, August 2009 “If you are going to protect your company's most valuable asset—your data—you will begin to view data security as a component of a more comprehensive information governance strategy.” – Hurwitz & Associates “Why you need an information governance strategy for 2010” Marcia Kaufman, December 2009 © 2010 IBM Corporation Information Governance Council Maturity Model Requires Enhances Supports 3 © 2010 IBM Corporation If we don’t proactively manage quality Increase costs and missed revenue opportunities, impacting both financials and customer relationships due to lack of data quality. Incomplete and inaccurate master data created problems in receiving and/or shipping products, marketing literature and regulatory mailings, and 360-degree customer visibility. Small error in the quality of the rating data leads to negative impact for the company and unhappy customers Large Telecom provider with massive volume of telephone calls and telephone customers, even a small error in the rating data can mean significant revenue loss or customer turnover. Data quality issues plague BI initiatives creating a lack of trust in the data Several attempts at implementation of a data warehouse and analytics application at a major retailer had stalled due to data quality issues which created frustration for the project team and a lack of trust of the data on the part of business users. 4 © 2010 IBM Corporation Requirements to manage the quality of data 5 Understand & Define Develop & Test Cleanse & Manage Continuously Discover your data across systems Develop database structures Define Rules & Cleanse Data Define common vocabulary Create & refresh test data Actively Monitor & Manage Data Design your data structures Validate test results Remediate Inconsistencies © 2010 IBM Corporation Understand your information ? ? ? ? ? ? ? ? ? ? ? ? ? ? Complex, poorly documented data relationships ? ? ? ? ? – Which data is sensitive, and which can be shared? – Whole and partial sensitive data elements can be found in hundreds of tables and fields Data relationships not understood because: ? ? – Where are those databases located? ? ? ? ? ? ? ? ? Data can be distributed over multiple applications, databases and platforms ? – Corporate memory is poor – Documentation is poor or nonexistent – Logical relationships (enforced through application logic or business rules) are hidden Distributed Data Landscape 6 © 2010 IBM Corporation Gain consistent terminology How does each user define: Financial Officer Business Analyst “Active Subscriber”? Mobile user who has used “any” service in the mobile network Compliance Officer User who paid for the service at least 1 time in the past 90 days. Sales Lead Marketing Manager Business Intelligence Manager Only post-paid customers, not pre-paid customers CRM Project Manager ERP Project Manager Mobile user who has a phone plan, but not SMS IT Architect User who makes at least 1 call over the period of 90 days Support Rep 7 © 2010 IBM Corporation Cleanse and continuously manage your data 1. Create reusable quality rules & cleanse your data – Leverage the knowledge gained during the understand & define steps – Define what quality means to you – Design your data quality rules and matching logic 2. Actively monitor & manage your data – Standardize data formats – Leverage precisely calibrated matching rules and remove duplicates – Develop rules & quality metrics for monitoring – Manage duplicate data, when required 3. Remediate inconsistencies in your data – Monitor for problems or trends – Investigate data lineage to find source of problem – Repair data and source of problem – Maintain monitoring to capture future problems 8 Make sure there is an owner of data quality AND management sponsorship © 2010 IBM Corporation Monitor quality with integrated data rules Create “Checks & Balances” to proactively identify quality concerns throughout the lifecycle – Build & test rules for common or complex conditions – Extend profiling through targeted analysis of specific data conditions or conformance to expected rules – Establish benchmarks and baselines to help track data quality – is it deteriorating or remaining constant? – Flag bad data for audit Examples of Rules: – The Gender field must be populated and must be in the list of accepted values – The Social Security Number must be numeric and in the format 999-99-9999 – If Date of Birth Exists AND Date of Birth > 1900-01-01 and < TODAY Then Customer Type Equals ‘P’ – The Bank Account Branch ID is valid in the Branch Reference master list 9 © 2010 IBM Corporation IBM provides the solutions required to create high quality information 10 Understand & Define Develop & Test Cleanse & Manage Continuously Discover your data across systems Develop database structures Define Rules & Cleanse Data Define common vocabulary Create & refresh test data Actively Monitor & Manage Data Design your data structures Validate test results Remediate Inconsistencies © 2010 IBM Corporation Organizational challenges from lack of data lifecycle management New application functionality to meet business needs is not deployed on schedule – No understanding of relationships between data objects repeatedly delays projects – Greater data volumes take longer to clone, test, validate and deploy which equates to longer test cycles Increased operational and infrastructure costs impact IT budget – Cloning databases requires more storage hardware – Larger databases impact staff productivity and could mean additional license costs Application defects are discovered after deployment – Costs to resolve defects in production can be 10 – 100 times greater than those caught in the development environment Unintentional disclosure of confidential data kept in test/development environments “ Forrester estimates that 85% of data stored in databases is inactive Source: Noel Yuhanna, Forrester Research, Database Archiving Remains An Important Part Of Enterprise DBMS Strategy, 8/13/07 11 © 2010 IBM Corporation The data multiplier effect Development 1 TB Test 1 TB 1 TB Production 1 TB Backup 1 TB User Acceptance 1 TB 6 TB Total Disaster Recovery Actual Data Burden = Size of production database + all replicated clones 12 © 2010 IBM Corporation Requirements to manage data across its lifecycle Discover & Define Develop & Test Optimize, Archive & Access Consolidate & Retire Discover where data resides Develop database structures & code Enhance performance Rationalize application portfolio Classify & define data and relationships Create & refresh test data Manage data growth Move only the needed information Validate test results Report & retrieve archived data Enable compliance with retention & e-discovery Define policies 13 © 2010 IBM Corporation Implement test data management with masking Production or Production Clone Create targeted, right-sized test environments instead of cloning entire production environments Mask data to protect privacy Development Environment Test Environment QA Environment Training Environment Compare data pre/post test to identify quality issues 14 © 2010 IBM Corporation Archive to manage data growth Production Archive Reference Data Historical Retrieved Current Archives Historical Data Reporting Data Retrieve Universal Access to Application Data Application Mashup Application XML ODBC / JDBC Archiving is an intelligent process for moving inactive or infrequently accessed data that still has value, while providing the ability to search and retrieve the data 15 © 2010 IBM Corporation Diagnose and solve performance problems Identify problems before they impact business Diagnose performance problems quickly & easily Implement a permanent solution, not a temporary workaround Plan for the future while avoiding past mistakes 16 © 2010 IBM Corporation When you retire or consolidate applications don’t move all of the data Application portfolio has redundant systems acquired via mergers and acquisitions Line of business divested; application is no longer needed Legacy technologies not compatible with current IT direction – Old database and/or application versions no longer supported by manufacturer Required technical skills or application knowledge no longer available Budget pressures – do more with less In almost ALL cases, access to legacy data MUST be retained while the application and database are eliminated 17 © 2010 IBM Corporation IBM provides the solutions required to manage information throughout its lifecycle from requirement to retirement Discover & Define Develop & Test Optimize, Archive & Access Consolidate & Retire Discover where data resides Develop database structures & code Enhance performance Rationalize application portfolio Classify & define data and relationships Create & refresh test data Manage data growth Move only the needed information Validate test results Report & retrieve archived data Enable compliance with retention & e-discovery Define policies 18 © 2010 IBM Corporation The data privacy and protection risk continues Confidential data inadvertently exposed or otherwise available to unauthorized viewers. February 2010: About 600,000 customers of a major NYC bank received their annual tax documents with their Social Security numbers (combined with other numbers & letters) printed on the outside of the envelope. SQL injection is fast becoming one of the biggest & most high profile web security threats. July 2010: Hackers obtained access to the user database and administration panel of a popular website by exploiting several SQL injection vulnerabilities. The exposed data included user names, passwords, e-mail addresses and IPs. Unprotected test data sent to and used by test/development teams as well as third-party consultants. February 2009: An FAA server used for application development & testing was breached, exposing the personally identifiable information of 45,000+ employees. Confidential data that should be redacted can be hidden or embedded April 2010: A PDF of a subpoena in the case of “United States vs. Rob Blagojevich” was posted to public website. However, the “redacted” text simply had black box placed on top to hide the content – the actual text was still available. 19 © 2010 IBM Corporation Can today’s organizations successfully protect their information? Where does your sensitive data reside across the enterprise? How can your data be protected from both authorized and unauthorized access? Can your confidential data in documents be safeguarded while still enabling the necessary business data to be shared? How can access to your enterprise databases be protected, monitored and audited? Can data in your non-production environments be protected, yet still be usable for training, application development and testing? “ Larry Ponemon, founder of the group that bears his name, said that survey shows a shift in the way C-level executives think about security software. Investing in data protection, he said, is now seen as less expensive than recovering from a data breach. -- InformationWeek 20 © 2010 IBM Corporation Requirements to manage the security and protection of data 21 Discover & Define Secure & Protect Monitor & Audit Discover where sensitive data resides Protect enterprise data from both authorized & unauthorized access Audit and report for compliance Classify & define data types Safeguard sensitive data in documents Monitor and enforce database access Define policies & metrics De-identify confidential data in non-production environments Assess database vulnerabilities © 2010 IBM Corporation Discover where sensitive data may be hidden Sensitive Relationship Discovery System A Table 1 Number Name 4600986 AlexFulltheim 8150928 BarneySolo 6736304 BillAlexander Patient ID # embedded 3802468 BobSmith 5567193 EileenKratchman 7409934 FredSimpson 6123913 GregLougainis 5061085 JamieSlattery 4182715 JimJohnson 8966020 MartinAston Code 53 72 32 47 34 System A Table 15 Patient Result Test 3802468 N 53 4182715 N 53 4600986another N field 32 within 5061085 N 53 5567193 N 72 6123913 Y 47 6736304 N 34 7409934 N 34 8150928 N 47 8966020 N 34 System Z Table 25 Name Streptococcus pyogenes Pregnancy Alzheimer Disease H1N1 Dermatamycoses Relationships and sensitive data can’t always be found just by a simple data scan – Sensitive data can be embedded within a field – Sensitive data could be revealed through relationships across fields & systems When dealing with hundreds of tables and millions of rows, this search is complex – you need the right solution Compound sensitive data: Test results could potentially be revealed. 22 © 2010 IBM Corporation Protecting data is both an external and internal issue Prevent “power users” from abusing their access to sensitive data (separation of duties) – DBA and power users Prevent authorized users from misusing sensitive data – For example, third-party or off-shore developers Prevent intrusion and theft of data – For example, someone walking off with a back-up tape – Hacker – Database vulnerabilities (user id with no password or default password) 23 © 2010 IBM Corporation Protection of data requires a 360-degree strategy Secure sensitive data values – Across both structured and unstructured De-identify data – Restricted data sharing with 3rd parties – Generation of fictionalized test data for non-production – Support off-shore deployment model Stop unauthorized data access – Render data useless via encryption – Lock down SQL to prevent SQL injection – Block suspicious network traffic Security makes it possible for us to take risk, and innovate confidently. 24 © 2010 IBM Corporation Protect sensitive data values within documents Redact (or remove) sensitive unstructured data found in documents and forms, protecting confidential information while supporting the need to share critical business information – Support compliance with industry-specific and global data privacy requirements or mandates Leverage an automated redaction process for speed, accuracy and efficiency – Ensure hidden source data (or metadata) within documents is redacted as well Prevent unintentional disclosure by using role-based masking to confidently share data Ensure multiple file formats are support, including PDF, text, TIFF and Microsoft Word documents Redact Full Name & Street Address 25 © 2010 IBM Corporation De-identify data without impacting test & development Mask or de-identify sensitive data elements that could be used to identify an individual Ensure masked data is contextually appropriate to the data it replaced, so as not to impede testing – Data is realistic but fictional – Masked data is within permissible range of values Support referential integrity of the masked data elements to prevent errors in testing JASON MICHAELS 26 ROBERT SMITH Personal identifiable information is masked with realistic but fictional data for testing & development purposes. © 2010 IBM Corporation What happens with security complacency Not being able to report compliance can lead to regulatory fines – No audit report mechanism – No fine grain audit trail of database activities Don’t know if there is a data breach until it’s too late – Lack of awareness of suspicious access patterns – On-going vs. single-invent: problems identifying patterns of unauthorized use Not able to monitor super user activity to ensure data security standards – Unable to detect intentional and unintentional events “ 27 Most organizations do not have mechanisms in place to prevent database administrators and other privileged database users from reading or tampering with sensitive information [in business applications]…Fewer than two out of five respondents said they could prevent such tampering by super users. -- Independent User Group © 2010 IBM Corporation Streamline and simplify compliance processes Alerts of suspicious activity Audit reporting and sign-offs – User activity – Object creation – Database configuration – Entitlements Separation of duties – creation of policies vs. reporting on application of policies Trace users between applications, databases Fine grained-policies Sign-off and escalation procedures Integration with enterprise security systems (SIEM) 28 © 2010 IBM Corporation IBM provides the solutions required secure and protect data privacy 29 Discover & Define Secure & Protect Monitor & Audit Discover where sensitive data resides Protect enterprise data from both authorized & unauthorized access Audit and report for compliance Classify & define data types Safeguard sensitive data in documents Monitor and enforce database access Define policies & metrics De-identify confidential data in non-production environments Assess database vulnerabilities © 2010 IBM Corporation The IBM security strategy: Make security, by design, an enabler of innovative change IBM as a trusted partner, delivering secure products and services IBM as a trusted security vendor, providing key solutions across all security domains 15,000 researchers, developers and SMEs on security initiatives – Data Security Steering Committee – Security Architecture Board – Secure Engineering Framework 3,000+ security & risk management patents 200+ security customer references and 50+ published case studies 40+ years of proven success securing the zSeries environment Managing more than 7 Billion security events per day for clients 30 © 2010 IBM Corporation Delivering trusted information for smarter business decisions across your entire information supply chain Transactional & Collaborative Applications Integrate Analyze Business Analytics Applications Big Data Manage External Information Sources Master Data Cubes Streams Data Data Warehouses Content Streaming Information Govern Quality 31 Lifecycle Security & Privacy © 2010 IBM Corporation Enabling success IBM Information Governance Unified Process Define Business Problem Obtain Executive Sponsorship Conduct Maturity Assessment Build Roadmap Establish Organization Blueprint Build Data Dictionary Understand Data Create Metadata Repository Appoint Data Stewards Create Specialized Centers of Excellence (COE) Implement Master Data Management Manage Data Quality Manage Security & Privacy Define Metrics Manage Life-cycle Measure Results = Enable through Process 32 = Enable through Technology © 2010 IBM Corporation What can you do next … Start small with a project, don’t try to do it all at once – Free workshops and assessments – Best of breed solutions to help you succeed Join a movement: www.infogovcommunity.com – Benchmark your organization online – Work with others on the Maturity Model – Compare best practices in online peer reviews – Be recognized for what you contribute on the leader board Read the book: – The IBM Data Governance Unified Process: Driving Business Value with IBM Software and Best Practices Visit our web page: – ibm.com/informationgovernance 33 © 2010 IBM Corporation Thank you © 2010 IBM Corporation