BIG DATA & ANALYTICS 3.0 Chuck Lyon, Operations Lead Enterprise Data Warehouse 3 April 2015 What is Big Data? INTERMOUNTAIN’S DEFINTION Using additional data sources and new analytic tools to produce superior, actionable analytic insights (not previously possible or cost effective) leading to • Improved healthcare outcomes • Reduced cost • Improved patient experience New Data Sources and Analytic Tools POSSIBILITIES FOR NEW, SUPERIOR INSIGHTS Additional Data Sources Potential New Tools • • • • • • • Unstructured physician notes, discharge summaries and clinical documentation. High volume, streaming clinical device data Personal device data Genomics data Cerner population health data (47 million patient lives) External data, social media, etc. • • • • Low cost, high volume storage, distributed processing (Hadoop) Semantic content recognition (Unstructured to structured with clinical significance, NLP – natural language processing) Machine learning correlation and causation discovery High volume data federation indexing and search (SOLR) Care Coordination Analytics 3.0 * INTERMOUNTAIN’S PATH FORWARD Analytics 3.0 is an emerging analytics movement producing superior descriptive, predictive and prescriptive analytic insights by tightly integrating big data and traditional analytics to achieve insights and outcomes not previously possible. 1.0 2.0 3.0 Traditional Analytics Big Data New, Superior Business Value (structured data, relational, statistical) (unstructured data, volume, variety, velocity, veracity) (integrated big data AND traditional analytics) * International Institute for Analytics Planning and Roadmap ANTICIPATED PHASES Initial Use Cases • • • • • Physiologic Data EDW Augmentation • Historical EMR (HELP) archiving • ETL offload from EDW NLP • Concept extraction from Text Documents Search • SOLR search over EDW data • End user self-service • Data investigation Genomics • Storage of raw genomic files Physiologic Monitor Data Data Ingest to Visualization Device Interface at Intermountain – has existed for 30+ years Sampled data is pushed to EMR at 15 minute intervals Data is deleted after minimal storage time Requests from researchers Data pipe is dammed up, data collected and sent, then deleted Need: Store the data for historical analysis, complex event correlation Use algorithms to enhance clinical alerting and decision-making Current Status: Storing near-real-time data in Hadoop (10 minute queue processing) Visualization: Tableau connected to Hive Tableau Visualization of Hive Data 10 minute maximum latency Value WHAT HAS BIG DATA ACCOMPLISHED IN HEALTHCARE? • Clinical Benefits: • • Deriving optimum clinical pathways to reduce variance for treating various chronic and acute conditions • Predicting the "next top 5%" of expensive patients • Improving prediction accuracy of heart condition diagnosis from echocardiogram data • Improving prediction of 30 day CHF readmissions using unstructured clinical data. • Using genomic data to better predict and prevent pre-term births • • Using genomic data to prescribe personalized treatment for Leukemia patients Operational Benefits: • Retaining ICU device data for granular assessment of acute events • Off loading EDW data landing and expanding to include complete source application data • Reducing data modeling and prescriptive ETL through ELT and discovery based approaches • Retiring legacy applications with an active archive • Storage and processing of genomic data Financial Benefits: • Improving recovery of claims Contact Information Charles (Chuck) Lyon Enterprise Data Warehouse, Operations Lead Intermountain Healthcare Chuck.lyon@imail.org 801-507-8080