New Developments in Predictive Modeling Jonathan Shreve, FSA, MAAA Principal and Consulting Actuary Milliman SOMMAIRE/ SUMMARY Overview Optimal Use of Risk Adjusters Lifestyle-Based Prediction Predictive Modeling • Methods to predict expected claims costs • Uses historical data and calibrated models • Many uses in health insurance context: – Renewal underwriting – Cost impact modeling – Payment equalization – Care management Health Insurance Market in the United States • • • • Individual Small group (2-50 employees) Medium group (51-500 employees) Large group (> 500 employees) Risk Adjusters: Overview • Risk adjusters measure morbidity • Used for adjusting payments (Medicare), predictive modeling (SG rating), and medical management (DM) • Function of age, gender, and claim history (diagnoses and services - medical and/or Rx). • ERG, ACG, DxCG, etc. Risk Adjusters: Overview • Claim detail is sorted and formatted • Software assigns members to relatively broad diagnosis categories (e.g. Symmetry has 120 categories called Episode Risk Groups (ERGs)) • Output file (array) of 0’s and 1’s under each demographic category and each condition category for each member • Regression to fit actual costs to array of 0s and 1s • Other risk adjusters Risk Adjusters: Theoretical Value COST CURVES 240 Rating Points Assigned 220 Acute - Apendicitis 200 Expected Points 180 Progressive - Osteoarth 160 140 120 100 80 60 40 20 -3 -2 -1 0 1 2 3 Years Since Diagnosis of Condition 4 5 6 United States: Small Group Underwriting • Small group rating – Health insurance coverage – Small group = 2 to 50 employees – Guaranteed Issue – Limits on rate adjustments due to health status – Limits on rates offered to different groups Introduction: Real World Considerations • Delay between when rates are developed and the rating period • Incomplete data (IBNR) • Rating limits (total Health Status Factor and changes) • Turnover • Competing against carrier’s new business methods, not their renewal methods Introduction: Prior Studies Society of Actuaries Report (May, 2002 Cummings et al) Society of Actuaries Health Section Council Article (Aug, 2003 Ellis - DxCGs) Society of Actuaries Report (Summer 2006) Society of Actuaries: Assessment of Available Claims Based Predictive Modeling/Risk Adjuster Tools • Objective analysis of predictive power of commercially available risk adjusters • Updates 2002 study • Measures , MAPE, and grouped statistics (including fit within disease category) R2 Society of Actuaries: Assessment of Available Claims Based Predictive Modeling/Risk Adjuster Tools Vendors/Products Included: Company Product Ingenix Ingenix Ingenix Johns Hopkins UCSD, Todd Gilmer MedAI DxCG DxCG DxCG 3M Episode Risk Groups (ERGs) Pharmacy Risk Groups (PRGs) Impact Pro Adjusted Clinical Groups (ACGs) Medicaid Rx MedAI Diagnostic Cost Groups (DCGs) RxGroups Underwriting Models Clinical Risk Groups Society of Actuaries: Assessment of Available Claims Based Predictive Modeling/Risk Adjuster Tools Biggest changes from prior study: • • • • New tools (i.e. MedAI) Improvement in tools Use of prior costs in some models Results with data lag Publicly Available Risk Adjusters • Medicaid Rx • RxRisk • CDPS • Information from 2002 Study: A Comparative Analysis of Claims-Based Methods of Health Risk Assessment for Commercial Populations, Cumming/Knutson/Cameron/Derrick • Some restrictions on use may exist Publicly Available Risk Adjusters • Medicaid Rx – Pharmacy based risk assessment model developed by Todd Gilmer and other at Univ. of California – Assigns each member to one or more of 45 condition categories based on prescription drugs used – Assigns each member to one of 11 age/gender categories – Predicts overall costs for each member – Includes separate sets of weights for adults and children Publicly Available Risk Adjusters • Rx Risk – Pharmacy based risk assessment model developed by Paul Fishman at Group Health Cooperative of Puget Sound – Assigns each member to one or more of 27 medical condition categories for adults, and up to 42 for children – Assigns members to one of 22 age/gender categories – Predicts total medical costs for each member Publicly Available Risk Adjusters • CDPS (www.medicine.ucsd.edu/fpm/cdps) – Diagnosis based risk assessment model developed by Richard Kronick and others at the Univ. of California – Orignally intended for use with Medicaid, including disabled and Temporary Aid for Need Familites (TANF) populations – Assigns members to up to 67 possible medical condition categories – Assigns members to one of 16 age/gender categories – Predicts total medical costs – Model contains different sets of weights for adults and children Milliman Research: Optimal Renewal Guidelines • Goal of Research: – Understand current small group renewal practices – Identify optimal renewal methodologies Introduction: Survey Results • What methods are currently practiced to rate small groups at renewal? – Surveyed 21 carriers on SG methods – 30% of carriers used risk adjusters – 60% of groups Introduction: Main Components • Individualized Data Analysis • Carrier Analysis • Competitive Simulation Introduction: Individualized Data • Large multicarrier database used to review individual predictions • Advantages – Large database – Good geographical representation • Disadvantages – No group identifiers – Manual rate unavailable Introduction: Carrier Data • Advantages – Actual Group Data – Group Manual Rates Available • Disadvantages – Medium sized data set – Geographical concentration – Biased Models: Loss Ratio Model • 1st Renewal 14 Future Claims 0 i Manuali 1Experience (last 9) i 1 • 2nd Renewal 14 Future Claims 0 i Manuali 1Experience (last 12) i 1 2 Experience (Prior 9) NOTE : i i 1 Models: Risk Adjuster Model • 1st Renewal 14 Future Claims 0 i Manuali 1Experience (last 9) i 1 134 2 j ERGArray j j 1 • 2nd Renewal 14 Future Claims 0 i Manuali 1Experience (last 12) i 1 134 2 Experience (Prior 9) 3 j ERGArray j j 1 3 where i 1 i 0 Models: Service Category Model • 1st Renewal 14 Future Claims 0 i Manuali 1Inpatient i 1 2 Outpatient 3 Rx 3 where i 1 i 0 Results: Error Measures • R-Squared - % of variance from the mean explained by rating variables 2 ˆ Y Y ESS R2 1 1 2 TSS Y Y • MAPE - Absolute error as % of total costs 1 Y Yˆ MAPE n Y Results: Theoretical For a Single Member, Uncapped Method R-Square MAPE (%) Manual Rate 5.70% 101.00% Traditional 16.40% 90.70% Service Category 22.60% 84.10% Risk Adjuster 24.10% 82.70% Results: Error Calculation Example • • • • • • Small Group ABC Traditional Prediction = 150% Risk Adjuster Prediction = 125% Actual Claims equal 120% of manual Which method is better? Error / R-squared? Results: Credibility Weights • 1st Renewal, Individual Analysis Methodology Predictors Manual Rate Loss Ratio Risk Adjuster Loss Ratio 73% 27% N/A ERG Svc Category * 11% 56% 14% 44% 75% N/A * Svc category = 2% IP, 24% OP, & 18% Rx Results: R-square • R-Square vs. Rating Caps (Group Size = 10) Manual Rate Traditional 50% 35% S ervice Category Risk Adjuster 80% 70% R-Square 60% 50% 40% 30% 20% 10% 0% Uncapped 25% Rating Caps 15% 10% Results: Mean Absolute Prediction Error (as %) • MAPE vs. Rating Caps (Group Size = 10) Manual Rate Traditional 50% 35% S ervice Category Risk Adjuster 40% 35% MAPE 30% 25% 20% 15% 10% 5% 0% Uncapped 25% Rating Caps 15% 10% Results: Mean Absolute Prediction Error (as %) • MAPE vs. Group Size (Rating Cap = 35%) Manual Rate Traditional 3 10 S ervice Category Risk Adjuster 35% 30% MAPE 25% 20% 15% 10% 5% 0% 1 25 Group S ize 50 150 Results: Mean Absolute Prediction Error (as %) • MAPE vs. Group Size (Uncapped) Manual Rate Traditional Service Category Risk Adjuster 100.0% 90.0% 80.0% 70.0% MAPE 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% 1 3 10 25 Group Size 50 150 Results: Carrier Analysis • • • • • Real groups Turnover Biased sample Traditional / Risk Adjuster very similar! Health status correlation Competitive Simulation: Introduction • • • • Based on carrier data Excel model - stochastic First renewal with 9 months of historic claims. New business method accuracy simulated relative to renewal method accuracy (less accurate) • New business quotes generated stochastically (Bayesian from renewal quote distribution) with some correlation among different carriers Competitive Simulation: Results • Small improvements in new business methods significantly increase profitability for new business and hurt profitability for renewal • Very sensitive to point at which group seeks new business quotes (try to keep your groups from getting quotes!) • Number of competing quotes is important. • Accuracy and results are sensitive to credibility of risk adjuster and/or historic experience components Research Conclusions • Marginal value of improvements decrease as allowable rate variation decreases, and as group size increases • New business is less profitable than renewal business. Don’t chase the wrong groups away. • Competitive results are very sensitive to accuracy of new business methods • Credibility is affected by accuracy / explanatory power of manual rate and level of health status correlation Recommendations • Understand effects of rating environment • Fundamentals (Blocking & Tackling) • Objectively analyze what prediction method is right for you. It may be that multiple methods are most appropriate (state, group size, costs, etc). • Use all relevant data / information on a group. • Understand what your competitors are doing with new business • Assign credibility explicitly and carefully. • Use a rigorous, systematic method to develop renewal quotes, with appropriate, efficient manual intervention. • Capture all information on each renewal quote and what happens with group. Analyze data and modify your approach. Lifestyle-Based Prediction The US Surgeon General: • 70% of the diseases and subsequent deaths in the U.S. are lifestyle-based The Centers for Disease Control: • Lifestyle-based chronic diseases account for 75% of the United States’ $1.4 trillion medical care costs Definition of Lifestyle Diseases • Lifestyle diseases (also called diseases of longevity or diseases of civilization) are diseases that appear to increase in frequency as countries become more industrialized and people live longer. (WHO) • Lifestyle disease is a disease associated with the way a person or group of people lives. • Lifestyle diseases include atherosclerosis, heart disease, and stroke; obesity and type 2 diabetes; diseases associated with smoking, alcohol, and drug abuse. Regular physical activity helps prevent obesity, heart disease, hypertension, diabetes, colon cancer, and premature mortality. (Stedman’s Medical Dictionary) Lifestyle-Based Diseases • Lifestyle-Based Diseases/Conditions – Diabetes – Hypertension – Cardiovascular – Stroke – COPD – Most cancers – Some mental health: Depression, Alzheimer’s, etc. – Others: Osteoporosis, Arthritis, Back Pain, etc. – Maternity Lifestyle-Based Diseases • Correlation between Lifestyle and Cancer Diet Smoking 35% 30% Sexual Behavior 7% Occupation 4% Alcohol 3% Sun Radiation 3% Non-Lifestyle 18% Source: American Cancer Society 2004 INTERHEART Study • Over 90% of the risk of a heart attack (myocardial infarction) is attributed to lifestyle factors – Factors include: abnormal lipids, smoking, hypertension, abdominal obesity, consumption of fruits and vegetables, alcohol and regular physical activity – Family history: thought by many to be the major risk, only accounts for 1% of the population attributable risk Lifestyle Based Prediction (LBP) • Most healthcare costs are driven by lifestyle choices • Claims data does not reflect lifestyle • How else can we gather this information? Lifestyle-Based Prediction (LBP) • Lifestyle-Based Prediction is based on strong correlations that exist between lifestyle-based behaviors and diseases; in particular, lifestyle-based diseases • LBP switches the method of detection focus from poorly correlated medical events to highly correlated lifestyle behaviors Challenges in Predictive Modeling • Predictive models are only as good as the data that drive them – Challenge 1 – New business – Challenge 2 – High employee turnover – Challenge 3 – Data consolidation – Challenge 4 – Increase in lifestyle diseases Development of Lifestyle-Based Prediction Models • Over 700 fields of lifestyle-based data are appended to two data sets – Individuals with a disease state – Base group – average representation of the group at large • Clinical datasets development • Various models are tested including linear regression, logistical regression, CHAID analysis, discriminative analysis, Bayesian methods, and cluster analysis Ties Between Lifestyles and Diseases • Two types of statistical principles used in LBP – Correlation – Lifestyle-based behaviors which will result in a higher propensity for an individual to have the disease • Obesity and latent lifestyle promote diabetes – Causality – There are lifestyle-based behaviors that exist or change as a result of the disease • Once diagnosed with diabetes, you become a diet food purchaser Lifestyle-Based Prediction Example Diabetes Profiling Example Data Element Age Vehicle Type # of Children Outdoor Rec Education Lifestyle Ind Hobbies ….. ….. Online Purchasing Employee A 40 MiniVan 2 4 plus College MI7 Active Outdoor ….. ….. Sporting Goods Employee B 40 MiniVan 0 No Below HS RE3 Reading ….. ….. Clothes Diabetes Ratio A to B 1 to 1 1 to 1 1 to 10 1 to 25 1 to 40 1 to 60 1 to 80 ….. ….. 1 to 110 Maternity Example • Traditional maternity factors are based on age/sex/geographic/family enrollment – In fact, a simple Bayesian model using number and ages of children can lift results by over 40% – Lifestyle-Based Prediction can dramatically improve accuracy by including number and ages of children, financial indicators, household living parameters, etc. Early Disease Detection Study (EDDS): Screening Data • Over 100,000 patient screening records per condition – Abdominal Aortic Aneurysm (AA Screening) – Carotid Artery Disease (CA Ultrasound) – Congestive Heart Failure (Cardiac Echo) – Diabetes (Fasting Plasma Glucose) – Osteoporosis (Bone Densitometer) – Peripheral Arterial Disease (Ankle Brachial Index) Early Disease Detection Study (EDDS): Health Information • Health History – 45 Personal health history elements • Medical histories – stroke, heart attack, CAD, etc. • Medical procedures – improve blood flow to heart or legs, prior screenings, medications, etc. • Medical symptoms – chest pain, loss of speech, blurred vision, etc. – 10 Family history elements • Medical conditions • Medical procedures Early Disease Detection Study (EDDS): Lifestyle Information • Lifestyle Elements – 8 Exercise elements • How often do you exercise • What types of exercise – 5 Tobacco elements – 8 Nutritional elements • Caffeine intake • Calcium intake • Fast food intake • Food group intake Early Disease Detection Study (EDDS): Results • Predictive coefficients for the 21 lifestyle-based elements were relatively equal to the 55 health elements in all six cases – Minimum: Coronary Artery Disease • Lifestyle-based elements relatively equal to the health history elements on stand alone basis – Maximum: Osteoporosis • Lifestyle elements have twice the potential to affect the score compared to health history elements – Combination of lifestyle with health elements increased health risk identification by over 45% (as defined by R-squared) Currently in Place • Applications and enrollment forms – Individuals and groups • • • • • • • Family information Age, sex and age differences in family members Employment Job description Height/weight Commute time Geography HRAs and Other Surveys • Excellent source for lifestyle-based data • Several key problems – – – – – Expensive to administer (>$10/member) Additional cost tied to participation incentives Poor participation rates Questionable results on the unhealthiest population Timing issues for new business/members Publicly Available Consumer Data: Who, What, Where & Why Consumer Data in the United States • The plethora of consumer data has dramatically changed our way of interacting with consumers • Consumer data measured in Disk Storage per Person (DSPS) – 1985: 0.02 Mbytes/yr – 1995: 26 Mbytes/yr – 2005: 3,500 Mbytes/yr Consumer Data – Why? • Primarily used for marketing, customer service and fraud purposes • United States: Graham-Leach-Bliley Act of 1999 – Requires opt-out – “Permitted by law” – Joint marketing agreements Consumer Data – Where? • • • • • • • • Government – Public Records Census Financial Services Surveys Warranties Loyalty Programs Internet Purchases Subscriptions Consumer Data – Who? • 95% of U.S. Households • Historically: household-based • Newest trend: individual-based – Observed – Implied Consumer Data – What? • Traditional Demographics – Age, sex, race, etc. • Financial – Homeowner, credit score, mortgage/auto/credit card balances, etc. • Household – Marriage status, number and ages of children, etc. Consumer Data – What? • Lifestyle-Based Elements – Physical activeness • Running, walking, cycling, aerobics, golf, tennis, etc. – Physical inactiveness • Television time, computer time, board games, stamp and coin collecting, etc. Consumer Data – What? • Lifestyle-Based Elements – Food purchases • Fast food, diet food, gourmet, vegetarian, etc. • Wine and other alcohol – Self improvement • Health fitness, dieting/weight loss, etc. • Mental wellness, personal improvement, etc. Consumer Data – What? • Lifestyle-Based Elements – Tobacco – Occupation – Travel – Motor vehicle type – Recreational vehicles – Other The Expense of Consumer Data • Medical Data Costs – MIB, Rx, historical medical, etc. start at about $10.00 per individual and go up • Consumer Data Costs – Rapidly decreasing in price due to fierce competition – 5 years ago 100 data elements cost $2.00/head – Today over 500 data elements cost $0.25/head – The data needed for medical modeling costs about $0.10/head or less Practical Applications: • • • • Individual Small group (2-50 employees) Medium group (51-500 employees) Large group (>500 employees) Practical Applications: Tele-underwriting • Determiner of “At Risk” population – Who to call • Identifier of Risk Conditions – What questions to ask Practical Application: Preferred Risk • Determination of Jet Issue Application – Clean application plus healthy score • Determination of Preferred Status – Current techniques rely on clean application plus “what”? – Lifestyle indicators provide the best “what” Massive Consumer Database • Over 55 million records in the US – Every US adult over the age of 50 – Over 500 fields of lifestyle-based data – Updated monthly – Scored for marketing and health risk status monthly • Looking at real-time hosted applications Cancer Policy Example • Model Objective – Develop Models to Identify the Most Risky Cancer Policies in Terms of Claims and Track the Quality of Portfolio – Rank Customers by Their Likelihood to Have Claims in the Next 2.5 Years – Used in Conjunction With the Underwriting Rules to Validate and Improve Underwriting Process Risk Model: Logistic Regression The risk model was based on the comparison of key customer demographics and lifestyle characteristics of policyholders or applicants who had claims in the performance window against the people who do not have claims. The rank and plot distribution of the claims vs. non-claims are compared for each demographic attribute. The attributes which showed significantly different distributions or ‘trends’ were selected for the Logistic regression analysis. The Key Drivers of the Application Risk Model: • ISSUEAGE Customer Age at the Time of Application • CHILD Presents of Children (Yes/no) • MARRITAL Marital Status • VEHREG Dominant Vehicle Life Style • KID610 Have Kids Between 6 to 10 Year Old • VEHSUV Dominant Vehicle Life Style • ADUL35 Adult Age Under 35 in Household • ADUL65P Adult Age Over 65 in Household • RATIO1 Weight/height for the First Individual • RATIO2 Weight/height for the Second Individual 5% of the Customers Ranked by Scores include 13% of Claims CANCER CLAIMS APPLICATION RISK MODEL CUM % OF CLAIMS 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 CUM % OF POLICIES RANKED BY SCORE Random Cum % of CLAIM Conclusions: the Lorenz Curve Shows the Application Risk Model Rank Orders Claim Risk Well. Model Summary • By working at the top 20% of the policies, we have potential to cut 43% of claims, which represents 45% of dollar losses. The hit rate (number of good policies sacrificed per bad policy stopped) is 14 (in the 2.5 year analysis window), model lifts renders 117% gains in targeting. Profit Impact Scenario CANCER MODEL APPLICATION PROFIT IMPACT SCENARIO SIMULATION PERCENT OF POLICY IMPTACTD: SCENARIOS 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 MODEL WINDOW LIFETIME WINDOW Hit Rate CUM Rate of CLAIM Hit Rate 11 13 13 14 15 16 17 18 19 20 21 22 22 23 25 26 27 28 29 31 8.4 7.3 7.0 6.9 6.4 5.9 5.5 5.3 5.0 4.8 4.6 4.4 4.3 4.1 3.9 3.7 3.5 3.4 3.3 3.2 4 4 5 5 5 6 6 7 7 7 8 8 8 9 9 10 10 11 11 12 Expected Profit CUM Rate Savings of CLAIM (mm) 21.0 6.3 18.3 9.1 17.5 12.3 17.3 15.8 16.0 15.9 14.8 14.5 13.8 12.6 13.3 11.9 12.5 9.3 12.0 7.2 11.5 4.5 11.0 1.2 10.8 -0.7 10.3 -5.0 9.8 -10.0 9.3 -15.6 8.8 -21.8 8.5 -25.9 8.3 -30.3 8.0 -35.0 Statistical Results • Compared Traditional Underwriting and LBA Scores to Actual Claims Results • LBA Beat Traditional Underwriting in All Statistical Measures – Adjusted R-squared – Bias – MSE – MAD – AAD Operational Overview - Individual LBA to Traditional Underwriting - Individual Category Top 3% Top 5% Top 10% Top 20% Top 50% Bottom 50% Bottom 20% Bottom 10% Bottom 5% Bottom 3% * Mean PMPM LBA / TUW Ratio 221% 201% 187% 163% 114% 85% 72% 70% 68% 62% $ 100.00 $ $ $ $ $ $ $ $ $ $ Actual PMPM 302.54 278.52 247.68 213.80 118.16 81.83 68.84 67.59 64.18 58.99 $ $ $ $ $ $ $ $ $ $ Average Error TUW LBA (76.55) $ (25.14) (69.68) $ (24.89) (61.08) $ (22.10) (48.25) $ (18.50) (14.25) $ (6.01) 13.22 $ 5.87 18.22 $ 8.50 22.30 $ 9.80 23.30 $ 11.02 28.07 $ 13.05 Conclusion • Recognize much of medical costs cannot be predicted by traditional methods • Look for nontraditional data sources • The real value of consumer data in the healthcare industry lies in its ability to predict lifestyle-based diseases. • Whether used as an identifier for health risks or as an early predictor of a disease state, we see the use of Lifestyle-Based Analytics accelerating rapidly within the healthcare and in particular disease management industries. Questions? Jonathan Shreve, FSA, MAAA Milliman Jon.Shreve@Milliman.com + 001 303-299-9400