Columnar Database Experiences Unlocking the Value of Big Healthcare Data Enterprise Informatics, BCBSA Nasir Khan Bob Kero Biography Nasir Khan Executive Director of Enterprise Informatics BCBSA Nasir has been with BCBSA for over 12 years and has over 25 years of leadership experience in the Healthcare Insurance, Biomedical, Banking and Pharmaceutical industries. Recently he was named “One to Watch” by CIO magazine. 2 Biography Bob Kero Managing Director of Enterprise Informatics BCBSA Bob has been with BCBSA for over four years and has over 25 years of professional leadership experience in the Healthcare and P&C Insurance, Consulting, Healthcare Provider, and Federal R&D industries. Additionally, he has been a guest lecturer in the Graduate Department of Health Systems Management, RUSH University. 3 Agenda Living Big in Interesting Times Our Own Big Data Challenge Our Search For Effective Solutions What We Were Able to Achieve How Our Experience Can Help 4 Living Big In Interesting Times 5 What is Big Data? Is it a Terabyte of data? • = 1,000 Gigabytes of data Is it a Petabtye of data? • = 1,000 Terabytes of data Is it a Exabyte of data? • = 1,000 Petabytes of data Data whose size or structure is beyond the ability of an organization's existing technology or processes to use to full business advantage 6 2014: The Big Data Inflection Point For Healthcare? Medical Loss Ratio Meaningful Use ICD-10/HIPAA 5010 Accountable Care Organizations 2011 Health Info Exchanges HC Reform Coverage Mandates Disease mgmt/predictive modeling Payer M&As Consolidation Value-Based Reimbursement / Shared Risk 2012 Medical Home 7 Health Insurance Exchanges 2013 Medicare Advantage Cuts Diversification EvidenceBased Medicine Individual Insurance Growth Social Media 2014 Personalized Medicine 2015 Provider Collaboration International Expansion Medicaid Expansion Genetic Testing The Challenge: We Must Analyze Exponentially Growing Healthcare Data Assets 8 The Opportunity: Big Data Processing Offers $300B Potential Annual Savings to Healthcare Transparency in clinical data & clinical decision support $9B Public Health Public health surveillance & response systems Source: McKinsey Global Institute, May 2011 Personalized medicine & clinical trial design $165B Clinical $108B R&D $47B Accounts Aggregation of patient records & online communities $5B Business Model Advanced fraud detection & performance based drug pricing 9 The Objective: Big Data Solutions Facilitate Healthcare Transformation Evidence Based Healthcare • Driven by healthcare outcomes • Requires analyzing structured and semi-structured data Health Outcomes • Looking at all patient data to provide optimal care • Scoring and outcomes-based incentive calculations Patient Centered Care • Knowing about lifestyles choices helps improve health outcomes • Assess vital signs & diagnostic information from medical devices Disease Management • Processing of structured and unstructured data to identify and manage chronic and emerging diseases Drug Discovery & Genomics Analytics • Integration of clinical, compound & journal information • Combine clinical data with patient genomics 10 Our Own Big Data Challenge 11 BCBSA, FEP & BHI Healthcare Data Repositories Federal Employee Program Claims FEP Systems National Account Claims BHI Member Plan Claims Third Party Data Daily Claims Enrollment Provider IDs Plans Surveys Standard Reports Extract Files CCTI Members & Claims CCTI Marts PDR Providers Monthly Claims Membership Contracts Products Provider IDs NDW Members & Claims Ad-hoc Queries & Reports ADaM Members & Claims BDR BHI Data Repository Extract Files Plans 12 ADaM: Key Functionality Centered Around Business Flexibility Cost and utilization reporting (PMPM, Utilization and Trends) Detailed access to claim-line information, with data enhancements Development of custom comparisons Facility and provider level drill down Proactive identification of atrisk members (concurrent and prospective) Population risk adjustment functionality Prescription cost and utilization information Evidence-based physician quality performance Ability to design custom reports to suit specific business needs 13 ADaM: An Encounter with the Challenges of Big Data June 2009: Ad-hoc Queries only • Disappointing performance Aug 2009: Hardware & DB2 upgrade • Additional expense • No improvement in query performance Sep 2009: Indexes tuned for queries • Additional effort • Some performance improvement Dec 2009: Add summary tables • Additional effort • Places a heavy load on database servers 2010: Other Reports deferred • Each additional report needed summary tables to meet performance requirements 14 Our Search For Effective Solutions 15 Assessment: Emerging Trends in Big Data DBMS Technology Column Storage More data warehouses will be stored in highly compressed columnar fashion Clustering Most large-scale database servers will achieve horizontal scalability through clustering In Memory Most OLTP databases will be augmented by an inmemory database Smart Tuning NoSQL Many new systems will deemphasize partitioning Many reporting problems schemes, indexing will be solved with nostrategies anddatabases buffer schema/NoSQL management Smart Tuning Many new systems will deemphasize partitioning schemes, indexing strategies and buffer management 16 Initial Choice: HP Vertica An Advanced Columnar Database Massively Parallel Processing Column Storage Advanced Compression Standard SQL Interface High Availability Auto Database Design Native DB-aware clustering on lowcost x86 Linux nodes Simple integration with existing ETL and BI solutions Not Supported: Referential integrity Triggers Stored procedures 17 What is a Columnar Database? Imagine an Excel Spreadsheet to Load into a Database Customer Purchases CustID Name City Item ID Description Qty Total 000001 Smith Tucson 100101 Green Widgets 1 $50.00 000001 Smith Tucson 100102 Blue Widgets 2 $100.00 000001 Smith Tucson 100103 Yellow Widgets 1 $50.00 000002 Jones L.A. 100101 Green Widgets 2 $100.00 000002 Jones L.A. 100106 Orange Widgets 1 $50.00 18 What is a Columnar Database? Let’s First Load it Into a Standard Database, Oracle or IBM DB2 CustID Name City Item ID Description Qty Total 000001 Smith Tucson 100101 Green Widgets 1 $50.00 000001 Smith Tucson 100102 Blue Widgets 2 $100.00 000001 Smith Tucson 100103 Yellow Widgets 1 $50.00 000002 Jones L.A. 100101 Green Widgets 2 $100.00 000002 Jones L.A. 100106 Orange Widgets 1 $50.00 Storage Access Rule Query • Each record is stored by row • Read data row by row moving from left to right in each row • Read the description of each widget ordered by ‘Smith’ Result • Reads attributes of no interest, slow for big data 19 What is a Columnar Database? Now Let’s Load it Into a Columnar Database: HP Vertica CustID Name City Item ID Description Qty Total 000001 Smith Tucson 100101 Green Widgets 1 $50.00 000001 Smith Tucson 100102 Blue Widgets 2 $100.00 000001 Smith Tucson 100103 Yellow Widgets 1 $50.00 000002 Jones L.A. 100101 Green Widgets 2 $100.00 000002 Jones L.A. 100106 Orange Widgets 1 $50.00 Storage Access Rule Query Result • Each record is stored by column • Read data columns where needed to answer the query • Read the description of each widget ordered by ‘Smith’ • Skips all columns of no interest, fast for big data 20 Columnar Storage Allows Options for Data Compression of Repeating Values CustID Name City Item ID Description Qty Total 000001 Smith Tucson 100101 Green Widgets 1 $50.00 000001 Smith Tucson 100102 Blue Widgets 2 $100.00 000001 Smith Tucson 100103 Yellow Widgets 1 $50.00 000002 Jones L.A. 100101 Green Widgets 2 $100.00 000002 Jones L.A. Compress ~60% 100106 Orange Widgets 1 $50.00 Name City 3 X Smith 3 X Tucson 2 X Jones 2 X L.A. Requires much less disk I/O time to retrieve Trades faster CPU time for slower disk I/O time 21 Columnar Databases Are Mostly Plug Compatible with Standard Databases Like IBM DB2 Columnar Database DB2 Oracle NoSQL Database Interface SQL SQL Map Reduce JSON Other APIs Behavior Transactional Transactional Eventual Consistency Storage Column Storage Row Storage In Memory Document Key/Value 22 Compatibility Enables a Manageable Replacement Strategy Application Access Layer Access Layer+ Interface SQL IBM DB2 Oracle MS SQL Server SQL (w/ limits) Behavior Transactional Transactional Columnar Database Storage Rows Columns 23 Compatibility Enables a Manageable Replacement Strategy Update App Access Layer Update Database Schema Replace Stored Procedures IBM DB2 Oracle MS SQL Server Update Database Connectors Columnar Database Update Data Loading Process Tune Database Configuration 24 What We Were Able to Achieve 25 ADaM: Performance Issue Root Cause Assessment ADaM is a relatively small data mart! • Less than 2TB of raw data Current system only scales at great expense • Hardware upgrades • More DB2 licenses • Complex database optimizations Required queries are inefficient in IBM DB2 Optimizations only support predefined queries—limited performance when asking ‘what if’ 26 HP Vertica Offered Savings for Hardware and License Costs Vertica POC ADaM Production Database Software Vertica 4.0 pre-release build DB2 LUW 9.5 Operating System Red Hat Enterprise Linux 5 AIX 5.3 Compute Platform 3 HP DL380; Intel nodes (total SPECint2006 rate: 324) IBM P550 Power6 node; 2 IBM P570 Power6 nodes (total SPECint2006 rate: 294) Storage Platform 24 SCSI disks @ 146GB; 10,000 rpm; 3TB total usable space 288 FC disks @ 146GB, 15,000 rpm; 29TB total usable space Hardware Cost ~9x less Software Cost ~2.7x less 27 HP Vertica ADaM POC Findings Query Performance • 12 of 15 queries execute faster by 40% to 1000% • Most execute at least 150% faster • Run time reduced by at least 60% Load Performance • ADaM database can be loaded in less than 8 hours • 500M rows/hour 28 How Our Experiences Can Help 29 Help Clarify Your Business Vision For Big Data Support Inventory your usage scenarios • Current • Known future • Wishlist/dream-list future Establish reasonable constraints • People and platforms • Money and time Develop target SLAs with stakeholders • Must-haves • Nice-to-haves 30 Help Understand and Clarify Your Specific Requirements What’s your tolerance for specialized hardware? What’s your tolerance for set-up effort? What’s your tolerance for ongoing administration? What are your insert and update requirements? At what volumes will you run fairly simple queries? What are your complex queries like? For which third-party tools do you need support? 31 BCBSA Enterprise Informatics: A Big Data Projects Accelerator Flexible resource model Mature processes • Business intelligence focused practice • Fast ramp-up and roll-off (elastic staffing) • Rational Unified Process • Agile/SCRUM • Kimball Lifecycle World class solution options • Ability to deliver ultra-large scale informatics systems • Leveraging advanced database technologies • Service bus for fast deployment Medical informatics experience • How the business functions • What drives cost and revenue • What improves productivity and efficiency Value Proposition • Staffing cost benefits • Faster time to delivery • Instant business alignment 32 Informatics Service Bus Enables Lower Risk & Faster Deployment of Big Data Apps App Hosting Plan Connexion Plan Custom Apps BHI & Partner Apps Tools Layer Plan Custom Tools SAS COGNOS DataStage Other Standard Tools Data Layer Plan Data (Vertica) Plan Data (DB2) NDW IPDS Etc. LDAP SiteMinder S-FTP Web Services Monitoring Administration SAML (Certificates) Client Access Portal 33 Summary • Healthcare is undergoing radical & historic changes – Regulatory, business, research and clinical • Use of big data will be mandatory to adapt and succeed • Columnar databases enable efficient use of big data – Cost effectiveness, high performance, fault tolerance • We are ready to support entry into the big data area – Database, healthcare, analytics, and business expertise – Available tools & infrastructure for POCs and deployment 34 Questions? 35