Health Analytics at Georgia Tech: From Information to Knowledge to Decision Making Nicoleta Serban, PhD and Julie Swann, PhD Industrial and Systems Engineering Georgia Institute of Technology May 2014 2 Data Science Framework Information • • Infrastructure Management Data • • Representation Sampling • Data architectures • Data integration, sharing and federation • Data privacy rules • Data wrangling • • • • • • Knowledge • • Computation Tools Data mining Machine learning Statistical inference Network analysis Simulations Visualization Decisions • System engineering Deriving hypotheses Validating hypotheses Eliciting causal relations Designing, planning, and optimizing • Testing, ranking, scoring • System dynamics • • • • Scope of Data in Healthcare Data Types Examples 1. Disease Registry 2. Disease Progression 3. Electronic Health Records 4. Facility Info 5. Medical Claims Data 6. National Survey or Examination Data 7. State Databases 8. General 1. Cystic Fibrosis 2. “Natural History” models 3. Queries (CHOA, VHA) on specific projects 4. VA satellite clinics 5. Medicaid (children and pregnant women, GA + 13 other states, 2005-2009) 6. NHANES, HCUP KIDS 7. GA’s Oasis, HCUP SEDD and SID 8. Census, National Provider Index, GIS Medicaid claims data will be used as a test bed for the decisionmaking support tools targeting knowledge representing the care of children with Medicaid. 5 CMS Medicaid Claims Data • MAX Claims Data ▫ ▫ ▫ ▫ ▫ Personal Summary: patients, demographics, birthdate, etc. Inpatient: claims, diagnoses, procedures, LOS, payment Other Therapy: claims for physician, lab, clinic, outpatient Long Term Care: facility type, date of service, etc. Prescription Drug: paid drug claims • Patient-level Identifiable-Files with locations and a provider-ID • Years 2005 – 2009 for 14 states (+2010-2011 upcoming) ▫ SE: Georgia, Alabama, Arkansas, Louisiana, Mississippi, N. Carolina, S. Carolina, Tennessee, Texas ▫ Other: California, Minnesota, New York, Pennsylvania • Study population: children and pregnant women GT Project Champion: Beth Mynatt (IPaT, GT) GT Lead on Information Technology: Matt Sanders (GTRI) GT Research Leads: Nicoleta Serban and Julie Swann 6 Medicaid Project: Approved Topics 1) MEASURING AND EXPLAINING INEQUITIES: To assess the impact of healthcare system characteristics vs. inequities in healthcare, including geographical, use, quality, expenditure and outcomes among Medicaid children enrollees, especially in states with historic inequities like in the southeast. 2) OPTIMIZING INTERVENTIONS AND DELIVERY SYSTEMS To analyze flows and policies across the system, e.g., the match between supply and demand, and financially, both geographically and across time, along with the corresponding costs or outcomes, to analyze improved methods of delivery including medical homes. 7 Medicaid Project: Implementation Information: • • Identifiable patient-level claims 5 years+14 states = 266,839,307,070 Observations 2 Terabytes of information Data: • • Represented as patient care trajectories: utilization, cost and patient characteristics Sampled by disease Challenge #1: HIPPA and CMS data safeguards compliance - data environment: access, sharing, linking, storage Challenge #2: Database backbone - projected research needs - projected computational needs Challenge #3: Data Processing - unavailability of tools to process-mine claims - additional data and information needs - expert opinion & collaborations 8 Medicaid Project: Safeguards • Data stored in secured location at Georgia Tech, with access to the identifiable patient files by a limited set of employees approved by CMS & IRB • Sharing of aggregated data is allowed with collaborators, if consistent with research protocol • Cells should have at least 11 entries Data undergoes review process at GT before release from data workstation • Significant liability involved if breach occurs 9 Medicaid Project: Health Analytics Knowledge Data • Baseline Metrics • Care Pathway • Access & Outcomes • Systematic disparities in access, outcomes and cost • Network of providers • Profiles of patient-level care pathways Process Mining Spatial Statistical Models Functional Data Analysis Unsupervised classification Sequence clustering Markov-decision processes Optimization 10 Medicaid Project: Health Analytics Knowledge: • Systematic disparities in access, outcomes and cost • Network of providers • Profiles of patient-level care pathways Decision Making: • Policy interventions • Network Interventions Markov-decision processes Causal Inference Optimization Modeling Simulations 11 Medicaid Project: Research Scope • Limitations Research must fit within the scope proposed to CMS Analysis of raw data must be conducted at GT Process for analyzing data is onerous, time-consuming, and “expensive” The most recent (~2) years of data are not available • Positives We can benchmark GA against 13 other states Patients and/or providers can be followed longitudinally Permission of pursuing research topics and publication of the related findings is not required to be submitted to CMS or GT Medicaid Project: Opportunities • Developing the proof of concept in building large infrastructures for protected information • Becoming the center for deployment of tools for mining claims data • Advancing rigor in health analytics • Educating students and visiting researchers • Informing policy making in understanding and managing the healthcare system Health analytics at GT bridges fundamental mathematical and computational modeling with health service research and health economics as a means of translating health and healthcare data into knowledge and decision making. 14 Health Analytics: Serban & Swann Group • Healthcare Access & Outcomes Measurement Linking Access & Outcomes • Interventions Policy & Network Interventions Cost-effectiveness: Telemedicine • Pediatric Asthma Baseline Metrics Care Pathways in Utilization & Cost • Collaborations between GT ISyE, GT IPaT, Children’s, CDC, VHA, DCH, DPH and other health entities We define healthcare access as the equal opportunity of people to get appropriate care to maintain or improve their health. We focus primarily on making inferences on spatial access, which is particularly important for managing chronic diseases where regular visits and adherence to recommended care practices can reduce severe outcomes. Healthcare Access: Five Dimensions 17 Health Analytics and Access Evaluate Interventions Infer • Disparities Measure Status Quo • • • Link to Outcomes Measurement Estimating spatial access of different populations by taking into account supply and demand trade-offs and system constraints. Inference: Equity Studying systematic disparities in access to services between population groups. Inference: Linking to Outcomes Understand how access is associated to health outcomes geographically and longitudinally Evaluating Interventions Informed decision making in healthcare delivery -- policy and network interventions targeting improvement in spatial access with a significant projected impact on outcomes Access Measures: Pediatric Primary Care Data: National Provider Index (NPI), Medicaid claims, Bureau Census, Geographic Information System among other sources • Study Population: Children in 14 states • Measurement Model: Matching patients to providers using optimization modeling estimated at the census tract level • Spatial Access Measures: Travel distance/time, Congestion & Coverage Access Measures: Pediatric Primary Care GEORGIA: Disparities across geography Congestion Coverage Large discrepancies between urban and rural care. High congestion across state except for some cities. High coverage in broad regions surrounding the most populated cities. Low coverage in many rural areas. o o o GEORGIA: Disparities between Medicaid & non-Medicaid Congestion Coverage o Yellow and red regions indicate areas where the non-Medicaid patients' availability of services is superior to Medicaid patients, while blue regions indicate areas where the reverse is true Access ~ Outcomes: Pediatric Asthma Data: Health Cost and Utilization Project (NC) DPH OASIS (GA) • Study Population: Children ages 4-17 in GA & NC • Geographic Access: travel distance between patients and matched providers using optimization estimated at the census tract level • Outcome measure: ED visit & hospitalization rates at the county level Access ~ Outcomes: Pediatric Asthma Access is significant alone and in interactions with other factors Impact of access varies with geography Improving access is expected to reduce the occurrence of severe outcomes. Predicted Reduction in Number of ED Visits in Georgia Number of County/Age Pairs 45 40 35 30 25 Specialist5 20 Specialist15 15 Specialist5:Primary15 10 Specialist15:Primary15 5 0 1 to 5 5 to 10 10 to 15 Reduction in Number of ED Visits >15 Access ~ Outcomes: Cystic Fibrosis Data: clinical information derived from the Cystic Fibrosis Foundation Registry • Study Population: 229,968 observations on 7823 patients from 2002 to 2011. • Geographic Access: realized travel distance from the zip code of each patient to the care centers. • Outcome measure: %FEV1, a common outcome for research in cystic fibrosis Access ~ Outcomes: Cystic Fibrosis • Not a consistent relationship between geographic access and outcomes • Access is significant for some age groups but not for all • State based analysis shows other factors impact outcome levels The distribution of %FEV1 & CF center locations in Georgia A policy intervention is a type of an action that involves design, revision, implementation or translation of a health policy for reducing costs and for improving health outcomes, healthcare access and quality. A network intervention refers to an action that involves altering an existing network of care, including networks consisting of medical facilities. 25 Policy Interventions Improving Access to Pediatricians for Medicaid Population • Considered policies that would: • Improve patients’ mobility. • Increase percentage of physicians accepting any Medicaid patients. • Increase percentage of caseload physicians devote to Medicaid patients. • Simulate policy change by altering inputs in the access measurement models. • Evaluate impact at the state-wide and local level with respect to access and health outcomes. Network Interventions Research Question: How can the existing network be modified to meet specified goals? Evaluation Criteria: Equity, Effectiveness, Efficiency Interventions Open new facilities Expand hours Mobile clinics, telemedicine 27 Cost-Effectiveness: Telemedicine Telemedicine: • Originated in the Netherlands in the early 1900s • More than 100 definitions of telemedicine (WHO, 2010) • Time Magazine has called telemedicine “healing by wire” • Countries of implementation: ▫ Almost everywhere on the globe! ▫ Provider-driven implementations in the US (e.g., VHA) • Example of an implementation: ▫ Tele-ophthalmology at VHA ▫ Cost-effectiveness: Diabetic Retinopathy Screening 28 Cost-Effectiveness: Telemedicine • Step 1: Model Individual Disease Progression Data: Sample of veterans with diabetes (Atlanta) Model: Markov Decision Model • Step 2: Simulate individual disease profiles Data: VHA & General parameter input Model: Estimated Disease Progression Model • Step 3: Compare traditional to telemedicine Simulate screening with and without telemedicine Utility measures: Quality Adjusted Life Years (QALY) vs. costs of the program 29 Cost-Effectiveness: Cost vs. QALY Ratio $100,000 Average Cost/QALY $50,000 Average Cost/QALY $60,000 Cost-effective $40,000 $30,000 $20,000 $10,000 $0 3000 $80,000 $60,000 Cost-effective $40,000 $20,000 $0 ($20,000) 3500 4000 6000 Patient Pool Size 9000 Patient Age in Years • Cost-effectiveness for pool sizes of 3500 or larger (~9000 VHA patients in Atlanta area) • Cost-effectiveness for patients between the ages of 50-80 Our research spans multiple directions, including deriving a set of baseline measures for asthma care, linking access to outcomes, and identifying care pathways in utilization and cost. The end point is to design policy and network interventions to improve health outcomes and access with limited resources. 31 Baseline: Utilization, Cost & Treatment Objective: develop a set of baseline metrics for pediatric asthma to be used in designing and evaluating interventions to have the greatest impact with limited resources. Pilot Study: Children population with Medicaid insurance ages 4-17 in Georgia, 2009 32 Baseline: Utilization Metrics by Race & Age The Other population (e.g., mostly Hispanic) has the most visits per patient and the African American population has the most patients per 1000 children on Medicaid. There are no differences in visits per patient, but the number of patients per 1000 children decreases with age. 33 Baseline: Cost Metrics by Race The Other population has the highest charge per visit, followed by the African American population. Payment amount and prepaid value show no significant differences. The Other population has the highest charge per enrollee per month, followed by the African American population. 34 Baseline: Treatment Control Metrics The African American population has a lower medication ratio than the other two populations, indicating a lower use of long term controller medication. Fulton county and the surrounding areas have the lowest medication ratio in the state. Baseline: Treatment Adherence Metrics 36 Care Pathways: Utilization & Cost Objective: To identify underlying care pathways and to visualize the utilization relational system for pediatric asthma care in the Medicaid system using large patient-level claims data. Pilot Study: Children population with Medicaid insurance ages 4-17 in Georgia, 2009 37 Care Pathways: Utilization 38 Care Pathways: Cost to MCO’s 39 Care Pathways: Cost to the Medicaid System 40 Care Pathways: Cost & Utilization Acknowledgements Supporting Institutes and Organizations - National Science Foundation (CAREER Award) - Institute of People and Technology - Children’s Healthcare of Atlanta Research Team IT Staff: Matthew Sanders and Paul Diederich Postdoctoral fellow: Dr. Monica Gentili Undergraduate students: Sarah Drath, Pravara Harati, Qiming Zhang, Sean Monahan PhD Graduate students: Erin Garcia, Ross Hilton, Ben Johnson, Kevin Johnson , Zihao Li, Rodrique Ngueyep, Richard Zheng Contact Us • Nicoleta Serban nserban@isye.gatech.edu or 404-385-7255 • Julie Swann jswann@isye.gatech.edu or 404-385-3054