Data Quality: What you need to know to Create and Sustain a Data Quality Program Panel Members Daniel Wallace Manager, Financial Informatics Arkansas Blue Cross & Blue Shield Gayle Bunn, Data Warehouse Analyst, EDW Blue Cross and Blue Shield of Idaho Amit Bhagat, President & Principal Consultant Amitech Solutions 2 Data Quality Panel Objectives: To share information and insight on: Overall organizational approach to creating and sustaining data quality program Panel Presentation Please provide us with a brief overview of the overall approach to creating and sustaining the data quality program in your organization What You Need to Know to Create and Sustain a Data Quality Program Daniel Wallace Manager, Financial Informatics Arkansas Blue Cross & Blue Shield Contact Info: Phone: 501-396-4090 Email: dpwallace@arkbluecross.com Agenda Creating a Data Quality Program The People The Scope The Processes The Tools Sustaining a Data Quality Program Policy Communication Demonstrate Value 6 Data Quality Creating a Data Quality Program The People – Knowledge of the Business – Multidiscipline Staff – Skill Set • Ability to handle large and complex datasets • Ability to test and verify systems processes to understand causes of data issues • Ability to query/profile data using SQL, SAS, Excel • Ability to communicate with business areas and management 7 Data Quality Creating a Data Quality Program The Scope – Importance of Defining • Likely to solve a real problem • Able to quantify value of DQ program – Where to Begin • System Level? • Process Level? • Subject Area Level? • Application Level? • Project Level? 8 Data Quality Creating a Data Quality Program The Processes (Assess, Improve) Assessment – Data Profiling – Define DQ Rules – Define Measure (from DQ rules) Improvement – Data Cleansing – Improve Processes – Measure Quality – Monitor Quality 9 Data Quality Creating a Data Quality Program The Tools – Purpose/Need • Understanding your data • Profiling and Rule Discovery • Data Standardization • Data Cleansing • Metadata Management – People Manage Data Quality not Tools 10 Data Quality Sustaining a Data Quality Program The Need for a DQ Policy Policy Guidelines – Treat Information as a Product/Asset – Focus on the Business Side – Define Roles and Responsibilities – Resolution Management – Proactive Approach – Data Standards 11 Data Quality Sustaining a Data Quality Program Communication – Make/Break your DQ initiatives – Stakeholders • Their Role in DQ/DG Program • Successful DQ program must be done with them • Include all functional areas that create or use data • Regular meetings needed 12 Data Quality Sustaining a Data Quality Program Demonstrate Value & Communicate It – Identify DQ Issue to Target – Engage Management – Select Metrics to Measure, Establish Baseline – Implement Solution DQ program can mitigate inefficiencies, excessive costs associate with poor data, compliance risks, improve customer satisfaction 13 Data Quality Gayle Bunn Blue Cross of Idaho Biography Gayle Bunn, MBA, PMP, BSEE Data Warehouse Analyst Enterprise Data Warehouse (EDW) Blue Cross of Idaho Responsible for EDW Data Quality, Support & Maintenance, Training, Customer Service, and Data Governance Contact Info: Phone: (208)331-7487 Email: gbunn@bcidaho.com 15 Current Steps at BCI 1. 2. 3. 4. Started small – EDW focus Established data quality workflow Established 1 automated touch point Added initial data quality metrics 5. 6. 7. 8. 9. 10. 11. Timeliness Completeness Socialized timeliness Socialized completeness Data quality evolved into many flavors Established S.M.A.R.T. data quality metrics Performed ongoing process improvement Major milestone occurred! Data governance and MDM emerges 1. Started Small – EDW Focus We need better data quality! Data Analyst Community Enterprise Data Warehouse (EDW) Member Medical Dental Drug EDW Team We need to work together & discuss issues! Data Quality Review Team (DQRT) formed. 2. Established Data Quality Workflow The data is still wrong! Mark Fixed! Data Quality Review Team (DQRT) Faster please! Data Analyst Community Document data quality issues • • • • Yay! Enterprise Data Warehouse (EDW) Manual Fix EDW Team Prioritize Wrong? SharePoint List Title Description Assigned To Resolved (yes/no) 3. Established 1 Automated Touch Point TP Enterprise Data Warehouse (EDW) Member Medical Dental Drug (Check for missing data) Stop load if data is not complete! Yay! Some of the data is missing! Extract X-form Can we have the data faster? TP Load 1 Automated Touch Point = Touch Point Can we have more data? Cool! Hard to please! We need Service Level Agreements (SLA’s)! 4. Added Initial Data Quality Metrics More Data Delivered Yay! Enterprise Data Warehouse (EDW) Member Medical Vision Grouper Sales Dental Drug Premium Extract X-form TP We need to socialize this! TP Fix Timeliness Jobs completed on time. Load Automate Fix for Common Problems New Touch Point What does “completeness” mean? Very cool! Completeness Amount of data without noise. Noise = Missing data in Fact Tables 5. Socializing 1st Metric - Timeliness EDW SLA - SharePoint Manual: Track when weekly/monthly jobs complete. I can tell when jobs finish! SQL Server Reporting Services (SSRS) Automate: Graph when jobs miss SLA. I can see where to improve! 6. Socializing 1st Metric - Completeness SQL Server Integration Services (SSIS) to SharePoint Track Noise in Fact Tables Dimension PK Automate: Track noise in data. Value -1 Not Applicable -2 Error -3 Missing -4 Default Only 2.19% noise? The data is more complete than I thought! SQL Server Reporting Services (SSRS) NOISE Count when dimension data is not available in a Fact record (PK<0). Automate: Graph when noise issues occur. I can see where to improve! What do we mean when we say data quality anyway? 7. Data Quality Evolved into Many Flavors Reconciles Appropriately No Noise (missing data) Complete Appropriate Data Valid Correct Business Rules Accurate Consistent Integrity I have a data quality problem! Timely Matches Source You mean opportunity! What flavor? On Time Delivery Successfully Performed in BCI’s Enterprise Data Warehouse (EDW) 8. Established Data Quality Metrics Data Quality Metrics Accuracy (Reconciles) • % data loads where data reconciles • # accuracy incidents Consistency • % data loads where data matches source • # consistency incidents Timeliness • % data loads delivered on-time • # timeliness incidents Integrity • % load with Appropriate Business Rules Applied • # integrity incidents Validity • % loads with appropriate date range • # validity incidents (Match Source) (Right Time) (Right Rules) (Right Data) Completeness (No Noise) • % records without noise (missing data) • # noise incidents Potential Data Quality Metrics Accessibility • % of Critical Data Fields provided Uniqueness • % total where duplicate records exist Compliance Efficiency • # of regulatory noncompliance data issues with HIPAA, PHI • Avg. time taken for data quality issues to be resolved VALUE 9. Performed Ongoing Process Improvement Accuracy (Reconcile) Validate Enterprise Data Warehouse (EDW) Completeness (no noise) (matches source) Extract TP Data Sources Fix! Enterprise Service Bus (ESB) TP X-form Fix! TP Load TP Fix! Use Data Quality metrics to identify issues other TP’s don’t. Fix! Fix! Use Data Quality process to fix source issues. 10. Major Milestone Occurred Milestone: No Issues! Data Quality Review Team (DQRT) Finally! Data Analyst Community Yay! Yay! Yay! Title Description Assigned To Resolved (yes/no) • Data Quality Area • • • • Yay! Enterprise Data Warehouse (EDW) Yay! EDW Team SharePoint List Yes! There’s one in every crowd! 11. Data Governance & MDM Emerges Master Data Management (MDM) Data Governance Data Quality Complete Valid Accurate With Success: The small bird’s chirp of data quality was heard! Data Governance is emerging around Data Quality Consistent Integrity Timely MDM is emerging around Data Governance Critical Success Factors at BCI 1. 2. 3. 4. 5. 6. 7. 8. 9. Gain Steering Committee sponsorship Establish a clear Mission Statement/Purpose Develop Program Goals for the Team Establish cross-functional DQRT representation (including across IS) Create a non-blame, non-judgmental environment Use a divide and conquer approach to issue resolution (broad participation) Establish continuous improvement over time (Rome was not built in a day) Conduct regular meeting schedule, frequency dependent on need Appoint a data quality champion “Data Quality”: What you need to know to create and sustain data a quality program Amit Bhagat President & Principal Consultant Amitech Solutions Contact Info: Phone: 314-480-6301 Email: Amit.Bhagat@amitechsolutions.com Agenda DQ Symptoms Use Case DQ Myths & Reality DQ Design Approach Business Need Define Profile Remediate Sustain 30 DQ Symptoms “The data is wrong – I will do it myself.” “We spent $5 million on the ‘claims’ system and it still sends incorrect payments.” “We get a different member month count depending on whom we ask.” “We are not sure if our MLR is correct.” 31 Use Case Business Problem Ensure accurate risk scoring for membership under the ACA for payment transfer between carriers. Data Profiling Missing or incorrect diagnosis code in claims data. Outcomes Pay other plans, potentially 2% or more of loss ratio because we may "appear" healthier than others in our market. Focus on diagnosis code as a critical data element. 32 DQ Myths & Reality Myths Quality is solved by technology alone. Quality is an IT problem. Quality is best fixed at the point of entry. Quality is the sole responsibility of the data ”owners.” Quality requires all data to perfect. Reality Quality requires people, process, culture, and technology to work in concert. Quality is a “fit for purpose” process that delivers the highest data quality over time. 33 DQ Design: Approach Business Need Member Retention & Growth Function •Sales •Marketing •Customer Service Data Domain Membership Attributes •Email •Phone Number 34 DQ Design: 5 Step Process 1. Business need 2. Define 5. Sustain Optimal DQ 4. Remediate 3. Profile 35 1. Business Need Determine the scope and business relevance of DQ effort. Objective Acquire business goals Business Action Identify levers Information use (levers) Identify components Data Components Improve member retention by 5% Increase member satisfaction, Improve customer service. Reduce hold time, improve member portal for self service, provide mobile app for provider directory Identify dissatisfied members Member Satisfaction Surveys Customer Service Identify candidates for DQ Data Candidates Premium & Claims Survey Premium & Claims by Product Membership, Customer Service call data 36 2. Define: DQ Objectives & Measures Identify completion criteria for current DQ iteration: Reduce member duplicates by 10%. Determine metrics to be developed: What you are measuring (measure). When you are measuring (milestone). Why you are measuring (business impact). Business Driver Sample Data Quality Metrics Accurately calculate the number of net new members Number of duplicate members Number of Members with missing SSN Number of Members without Member ID Number of Members with missing address 37 3. Profile This step determines the exact sources, location, and types of techniques to use to assess DQ: Identify specific tools / techniques to be used. Review initial measures for relevance and accuracy. Verify accuracy of what was intended vs. actual. – Analyze data for business rule conformance. – Profiling reports are analyzed, and root causes and business impacts are identified and reported. 38 4. Remediate: Technology & Process Develop the immediate and ongoing technical architecture and process components required to reduce or eliminate DQ problems. Process & Standards Technology Category Sub-Category Rule Metric Consistent application of process and standards to outline the expectations for data quality across the enterprise. Develop and implement business processes Develop work flows to fix bad data at source Develop and implement data movement controls Source Data Files Extract Extract Verify Profile Data Remediate Transform Transform Metadata • • • • • Data quality • Integration Data governance • Rationalization Audit/Balance • Reconciliation Implement Business Rules Compliance Calculations & Aggregations Master & Reference Data Business Rules Certify Data Load Publish “Certified Data” Certified Data Store Reporting Tool Apply tools to cleanse and standardize data in the ETL process to ensure required levels of quality are met. Use cleansing & standardization tools Develop audit, balance, and control Integrate DQ with Enterprise Information Management program 39 5. Sustain This step covers the culture change, governance, and ongoing support and progress reporting of the DQ effort. Change Management Data Governance Users & Risk Transformers Data Sources Market Risk Low High Low High Low High User Population Operation Risk Data Stewards, Report Generators, Data Users, Problem Resolution, etc. Data Sources and Enabling Technology Certified Data Environment Lines of of Business Lines Business Subject Areas Data Governance Report & Metrics Governance Information Quality Governance Functional Areas LOB 1 Data Content Owner LOB 2 Data Content Owner LOB 3 Data Content Owner Finance Data Content Owner Credit Data Content Owner HR Data Content Owner Customer Data Structure Owner LOB 1 Customer Data Steward LOB 2 Customer Data Steward LOB 3 Customer Data Steward Finance Customer Data Steward Credit Customer Data Steward HR Customer Data Steward Deal Data Structure Owner LOB 1 Deal Data Steward LOB 2 Deal Data Steward LOB 3 Deal Data Steward Finance Deal Data Steward Credit Deal Data Steward HR Deal Data Steward CoE Customer Location Deal LOB 1 LOB 2 LOB 3 Finance Credit HR Data Stewards In coordinated team Data Stewards In coordinated team Invoice Data Steward Subject Area Lead (Data Steward) Provides the framework and ongoing oversight to enable effective management. Implementation of various culture change management efforts to sustain data quality efforts. 40 Summary Data quality is a known, “for sure” problem. Existing processes that create bad data must be addressed. Technology cannot be the only road to a solution. People: Perceptions of “doing bad things” are inevitable. Manage resistance, politics, priorities. Culture management mandatory. Technology: Integrate with EIM. Lots of new stuff! 41 Share Your Experience Panel Members Daniel Wallace Manager, Financial Informatics Arkansas Blue Cross & Blue Shield Gayle Bunn, Data Warehouse Analyst, EDW Blue Cross and Blue Shield of Idaho Amit Bhagat, President & Principal Consultant Amitech Solutions 42 Question # 1 How does data quality program fit into your strategy for information management? 43 Question # 2 Are you able to produce "one version of the truth" throughout the whole company, or do various versions surface from different areas? What subject areas are you currently managing in your data quality program? 44 Question # 3 Are data definitions established at the individual, department, or enterprise level? Are you leveraging data governance program for data quality? How? 45 Question # 4 Describe what impact data quality has on the delivery of business value through analytics and BI? Tell us how your organization manages data quality and how it responds to data quality issues (as a matter of project work, daily operations, planning, etc). Does your organization have ways of measuring or quantifying “poor quality” and the results of poor quality data? 46 Question # 5 In your organization, how do the various stakeholders around any given data quality project work together? 47 Question # 6 Have you integrated master data in your DQ program? What How was your approach? did it go? –Successes? –Lessons learned? 48 Question # 7 What are your next steps? New efforts toward data quality? 49