Quality Ratings and Premiums in the Medicare Advantage Market Ian M. McCarthy∗ Department of Economics Emory University Michael Darden† Department of Economics Tulane University January 2015 Abstract We examine the response of Medicare Advantage contracts to published quality ratings. We identify the effect of star ratings on premiums using a regression discontinuity design that exploits plausibly random variation around rating thresholds. We find that 3, 3.5, and 4-star contracts in 2009 significantly increased their 2010 monthly premiums by $20 or more relative to contracts just below the respective threshold values. High quality contracts also disproportionately dropped $0 premium plans or expanded their offering of positive premium plans. Welfare results suggest that the estimated premium increases reduced consumer welfare by over $250 million among the affected beneficiaries. JEL Classification: D21; D43; I11; C51 Keywords: Medicare Advantage, Premiums, Quality Ratings, Regression Discontinuity ∗ Emory † 206 University, Rich Memorial Building, Room 306, Atlanta, GA 30322, Email: ian.mccarthy@Emory.edu Tilton Memorial Hall, Tulane University, New Orleans, LA 70115. E-mail: mdarden1@tulane.edu 1 1 Introduction The role of Medicare Advantage (MA) plans in the provision of health insurance to Medicare beneficiaries has grown substantially. Between 2003 and 2014, the share of Medicare eligible individuals in an MA health plan increased from 13.7% to 30%.1 To better inform enrollees of MA quality, in 2007, the Center for Medicare and Medicaid Services (CMS) introduced a five-star rating system that provided a rating of one to five stars to each MA contract – a private organization that administers potentially many differentiated plans – in each of five quality domains.2 For the 2009 enrollment period, CMS began aggregating the domain level quality scores into an overall star rating for each MA contract in which each plan offered by a contract would display the contract’s quality star rating. Since in 2012, contracts have been incentivized to earn high quality star ratings through star-dependent reimbursement and bonus schemes. Early studies on the effects of the star rating program focus on the informational benefits to Medicare beneficiaries. To this end, the program has been found to have a relatively small positive effect on beneficiary choice, with heterogeneous effects across star ratings (Reid et al., 2013; Darden & McCarthy, forthcoming). However, one area thus far overlooked concerns the supply-side response to MA star ratings, where a natural consequence of the star rating program could be for contracts to adjust premiums and other plan characteristics in response to published quality ratings.3 Indeed, while the quality star program is often presented as a potential information shock to enrollees, the program could also serve as an information shock to health insurance contracts, better informing them of competitor quality and better informing contracts of their own signal of quality to the market. For example, learning that its plans have the highest quality star rating in a market in 2009, a contract may choose to price out its quality advantage in 2010 by raising plan premiums. Conversely, a relatively low-rated contract may lower its 2010 premium in response to its 2009 quality star rating. More generally, the extent to which policy may cause health insurance companies to adjust premiums is a central question in health and public economics.4 The current paper provides a comprehensive analysis of 2010 premium adjustments to the 2009 publication of MA contract quality stars. We investigate the specific mechanisms by which contracts can adjust their premiums in response to their quality ratings, and we calculate the corresponding welfare effects. We adopt a regression discontinuity (RD) design that exploits plausibly random variation around 2009 star thresholds, allowing us to separately identify the effect of reported quality on price 1 Kaiser Family Foundation MA Update, available at http://kff.org/medicare/fact-sheet/medicare-advantage-factsheet/. 2 For example, one domain on which contracts were rated was “Helping You Stay Healthy.” 3 Preliminary evidence of a supply-side response to the publication of MA quality stars was found in Darden & McCarthy (forthcoming), albeit with a restricted sample of contract/plan/county/year observations. 4 For example, see Pauly et al. (2014) on the effects of the Affordable Care Act on individual insurance premiums. 2 from the overall relationship between quality and price. Our data on contract/plan market shares, reported contract quality, plan premiums, and other plan characteristics come from several publicly available sources. Our results suggest strong premium adjustments following the 2009 star rating program, with average to above average star-rated contracts significantly increasing premiums from 2009 to 2010. When we conduct our analysis at the contract level, we find that 3, 3.5, and 4-star contracts increase their average premiums across existing plans by $33.60, $29.30, $31.85, respectively, relative to contracts with 2009 ratings just below the respective threshold values. At the plan level, we estimate mean increases of $19.40, $41.99, and $31.52 for 3, 3.5, and 4-star contract/plans, respectively. These effects are sizable compared to overall average premium increases of between $9 and $15. The results are also broadly consistent across a range of sensitivity analyses, including consideration of alternative bandwidths, falsification tests with counter-factual threshold values, and the exclusion of market-level covariates. While an MA contract may directly adjust its plans’ premiums in response to quality stars, the contract may also adjust the mix of plans it offers within a market (county). For example, in response to the published star ratings, a contract could alter the number of zero-premium plans; adjust the number of plans that include Medicare Part D coverage; change the drug deductible in plans that offer part D coverage; or add/drop plans entirely. Indeed, our data show that nearly all of the regional variation in plan premiums is due to selection of plan offerings by contracts, as opposed to contracts charging different premiums in different areas of the country. We find that contracts just above the 3 and 3.5-star thresholds in 2009 are more likely to drop $0 premium plans in 2010, with 3.5-star contracts also more likely to introduce positive premium plans into new markets. We find no such disproportionate change in $0 or positive premium plans among contracts with a 4-star rating in 2009. Meanwhile, low quality contracts (those just above the 2.5-star threshold in 2009) maintain their 2009 plan offerings at largely the same premium levels in 2010, while contracts just below the 2.5-star threshold in 2009 are much more likely to exit the market altogether in 2010. Overall, our results suggest that the star rating program in 2009 may have caused low quality contracts to drop plans while generating large premium increases among contracts receiving 3-star ratings and above. Adopting the consumer welfare calculations used in Town & Liu (2003) and Maruyama (2011), our estimated increases in premiums imply a reduction in consumer surplus of over $250 million among those beneficiaries enrolled in the relevant plans. To the extent that higher quality plans are replacing low quality plans at reasonable premium levels, plan entry and exit behaviors induced by the star-rating program may partially offset this welfare loss; however, given the number of new plans estimated to have entered the market due to the star ratings, such offsets are likely relatively small (Maruyama, 2011). 3 In what follows, we discuss the institutional details of Medicare Advantage and the recent star rating program in Section 2. The data and methods are discussed in Sections 3 and 4, respectively. We present our results in Section 5, with a series of robustness checks discussed in Section 6. Section 7 examines the potential mechanisms underlying our estimated premium adjustments, and Section 8 summarizes the welfare effects associated with our estimated premium increases. The final section concludes. 2 Institutional Background Since Medicare’s inception, beneficiaries have had the option to receive benefits through private health insurance plans. The Balanced Budget Act of 1997 (BBA) classified all private Medicare health insurance plans as Medicare Part C plans, and it allowed for additional types of business models including Preferred Provider Organizations (PPOs), Provider-Sponsored Organizations (PSOs), Private fee-forservice (PFFS) plans, and Medical Savings Accounts (MSAs). Later, in addition to the beneficiary entitlement to prescription drug coverage, the Medicare Modernization Act of 2003 renamed Medicare Part C plans as Medicare Advantage (MA) plans. In each year since 2003, Medicare beneficiaries choose to enroll in traditional fee-for-service (FFS) Medicare or an MA plan during an open enrollment period from November 1st through December 31st. By enrolling in an MA plan, enrollees must pay Medicare Part B premiums in addition to any additional premium charged by the plan. In exchange, MA plans provide at least (often more than) the services covered by traditional FFS Medicare. In 2009, 38% of MA plans charged no additional premium, while 77% of plans also offered prescription drug coverage. Given the generosity of plan coverage at possibly no additional cost relative to traditional Medicare FFS, the MA has grown dramatically in recent years with share of Medicare eligible individuals in an MA plan increasing from 13.7% in 2003 to 30% in 2014.5 Broadly, an MA contract is an agreement between a private insurance company and CMS whereby the company agrees to insure Medicare beneficiaries in exchange for reimbursement. A contract is approved by CMS to operate in specific counties, and an approved contract typically offers a menu of MA plans that are differentiated by premium, prescription drug coverage, and, if covered, the prescription drug deductible. Most MA contracts are required to offer at least one plan that includes prescription drug coverage. For the 2015 enrollment year, 78% of all Medicare beneficiaries live in a county with access to at least one plan that offers prescription drug coverage (MA-PD) and charges no additional premium (above the Part B premium).6 In 2009, the mean number of MA plans available to 5 Kaiser Family Foundation MA Update, available at http://kff.org/medicare/fact-sheet/medicare-advantage-factsheet/. 6 http://kff.org/medicare/issue-brief/medicare-advantage-2015-data-spotlight-overview-of-plan-changes/ 4 beneficiaries was roughly 11 plans per county.7 However, there exists considerable regional variation in the availability of MA plans, and enrollments in MA plans are concentrated in a few national contracts. Indeed, according to the Kaiser Family Foundation (KFF), 60% of all plans offered in 2015 are affiliated with just seven health insurance companies.8 Staring in the 2007 enrollment year, CMS began collecting and distributing a one to five-star quality rating in each of five quality domains (e.g., “Helping You Stay Healthy”). Each domain was itself an aggregation of many individual quality metrics such as the percentage of enrollees with access to an annual flu vaccine. These individual quality metrics are calculated based on data from a variety of sources, including HEDIS, the Consumer Assessment of Healthcare Providers and Systems (CAHPS), the Health Outcomes Survey (HOS), the Independent Review Entity (IRE), the Complaints Tracking Module (CTM), and CMS administrative data. Starting in enrollment year 2009, CMS began aggregating the domain level quality stars to an overall contract rating of between one and five stars (in half-star increments).9 And since 2011, CMS constructs the contract-specific quality ratings as a function of Part D coverage, when relevant. Our focus is on the 2009 and 2010 enrollment years the first two years of the overall contract star rating program and the years in which all contracts, including those offering prescription drug coverage, were rated based on the same underlying quality metrics. The literature on the MA quality rating initiatives has generally focused on the enrollment effects. Recently, Reid et al. (2013) find large effects of increases in star-ratings on enrollment that are homogeneous across the reported quality distribution, but results from that paper fail to disentangle the effects of quality from quality reporting on enrollment. Attempting to disentangle these effects, Darden & McCarthy (forthcoming) find heterogeneous effects of the quality star rating program on MA plan enrollment in 2009 and no significant effect in 2010. At the plan level, they find that a marginally higher rated contract at the lower end of the quality distribution (e.g., a 3 as compared to 2.5 starred contract) realized a positive and significant enrollment effect equal to 4.75 percentage points relative to traditional FFS Medicare in 2009 enrollments. This effect diminishes for higher rated contracts, and vanishes for the 2010 enrollment year. The lack of an enrollment response to 2010 quality stars suggests that the 2009 star ratings may have acted as a one-time informational event, or that there was a supply-side response in 2010 based on the 2009 ratings. Generally, the potential for supply-side responses to Medicare Advantage policy has received little attention from researchers. One recent exception is Stockley et al. (2014), who examine how MA plan premiums and benefits respond to variation in the benchmark payment rate - the subsidy received 7 Author’s calculation. See Section 3 for a presentation of our data. http://kff.org/medicare/issue-brief/medicare-advantage-2015-data-spotlight-overview-of-plan-changes/. 9 For a complete discussion of the star rating program, see Darden & McCarthy (forthcoming). 8 See 5 by the MA contract for each enrollee. Those authors find that contracts do not adjust premiums directly as a result of changes in benchmark payment rates, but rather contracts adjust the generosity of plan benefits in response. Conversely, Darden & McCarthy (forthcoming) find that contract/plans in 2010 raise premiums in response to higher 2009 contract-level quality star ratings. However, the sample used to estimate the supply-side response of contracts in 2010 was restricted to just those contract/plans with a.) 10 or more enrollees in both 2009 and 2010 and b.) nonmissing quality ratings in 2010. Furthermore, that paper only focuses on direct premium increases, ignoring the possibility of indirect premium adjustments such as changing the number of zero-premium plans or adjust the plan-mix within a county. The current paper provides a comprehensive examination of the supply-side response to quality star ratings, examining the full population of approved MA contracts to evaluate several potential response mechanisms as well as potential welfare consequences. 3 Data We collect data on market shares, contract/plan characteristics, and market area characteristics from several publicly available sources for calendar years 2009 and 2010.10 As a base, we use the Medicare Service Area files to form a census of MA contracts that were approved to operate in each county in the United States in 2009 and 2010. To these contract/county/year observations, we merge contract/plan/county/year data on enrollment and other contract characteristics.11 To our market share data, we merge further information on MA contract quality ratings, contract/plan premiums, countylevel MA market share, CMS benchmark rates, fee-for-service costs, hospital discharges, and census data. The CMS quality information includes an overall summary star measure; star ratings for different domains of quality (e.g., helping you stay healthy); as well as star ratings and continuous summary scores for each individual metric (e.g., percentage of women receiving breast cancer screening and an associated star rating). Data are not available for the overall continuous summary score (i.e., the score rounded to generate an overall star rating), but we are able to replicate this variable by aggregating the specific quality measures following CMS instructions. We explain this process thoroughly in Appendix B. Hospital discharge data are from the annual Hospital Cost Reporting Information System (HCRIS), and CMS benchmark rates and average FFS costs by county are publicly available from CMS. Finally, county-level demographic and socioeconomic information are from the American Community Survey (ACS). 10 See Appendix C for a detailed discussion of our dataset and specific links. suppresses enrollment counts for contract/plans with 10 or fewer enrollees, but we keep these observations and impute enrollment. The Service Area files are needed because the enrollment files do not account for migration. For example, it is possible for the enrollment file to contain a positive enrollment record for a contract/plan in a county even if that contract is not approved to operate in the county. See Appendix C for futher details. 11 CMS 6 Our enrollment data is available monthly; however, there is little variation in enrollments across months due to the nature of the open enrollment process at the end of each calendar year. Furthermore, all other variables of interest are specific to a calendar year. Therefore, we take the average enrollment of each plan across months in a given year. The resulting unit of observation is the contract/plan/county/year. Our analysis focuses only on health maintenance organizations (HMO), local and regional preferred provider organizations (PPO), and private fee-for-service (PFFS) contracts. We exclude all special needs plans and employer/union-specific plans (also known as 800-series plans), and we drop all observations that pertain to United States Territories and Outlying Areas. Our final sample includes 247,978 contract/plan/county/years. Table 1 provides summary statistics for our final dataset at the plan, county, and contract level. The data consist of 51,442 and 34,642 plan/county observations in 2009 and 2010, respectively, with an increase in average MA enrollment per plan from 292 in 2009 to 361 in 2010.12 The county-level summary statistics also reveal an increasing penetration of MA in the overall Medicare market, from 15.6% in 2009 to 16.5% in 2010, alongside a decrease in the number of plans offered per county, an increase of just over $15 in average premiums, an increase in the percentage of plans offering prescription drug coverage, and an increase in the proportion of HMO and PPO plans relative to PFFS plans. Finally, the bottom panel of Table 1 illustrates a slight rightward shift in the distribution of star ratings from 2009 to 2010, with 1.5-star contracts either improving in rating in 2010 or exiting the market, and with a relative increase in the percentage of 4.5 and 5-star contracts in 2010. Table 1 4 Methodology Since star ratings are assigned to contracts (rather than specific plans operating within a contract), our initial analysis follows Town & Liu (2003), Cawley et al. (2005), Dafny & Dranove (2008), Frakt et al. (2012) and others in aggregating plan characteristics to the contract level by taking the mean values across plans within a contract (in the same county). We then examine the relationship between a contract’s quality star rating in 2009 and the contract’s premiums in 2010. Denoting the vector of mean characteristics in market m (county) for contract c by ȳcm = {ȳcm,1 , ..., ȳcm,K }, we specify the 12 As indicated in Table 1, enrollment data are not available for all plans as CMS does not provide enrollment counts for plans with 10 or fewer enrollments. As such, the mean enrollment figures presented are higher than the true mean as they exclude a large number of plans with missing enrollment data. 7 mean characteristic k for contract c as follows: ȳcmk = f (qc , Xcm , Wm ) + εcmk , (1) where qc denotes the contract’s star rating in 2009, Xcm denotes other contract characteristics, Wm denotes 2010 market-level data on the age, race, and education profile of a given county, and εcmk is an error term independently distributed across characteristics and markets.13 Given our focus on premiums, our plan characteristics of interest consist of the average premium and the proportion of the contract’s plans (in the same county) charging a $0 premium.14 The CMS quality rating system relies on a continuous summary score between 1 and 5 which is rounded to the nearest half. A contract with a 2.24 summary score is therefore rounded down to a 2-star rating, while a contract with a 2.26 summary score is rounded up to a 2.5-star rating. Intuitively, these two contracts are essentially identical in quality but received different quality ratings. We propose to exploit the nature of this rating system using a regression discontinuity (RD) design.15 More formally, denote by Rc the underlying summary score, by R̂ the threshold summary score at which a new star rating is achieved (e.g., R̂ = 2.25 when considering the 2.5 star rating), and by R̃c = Rc − R̂ the amount of improvement necessary to achieve an incremental improvement in rating. We then limit our analysis to contracts with summary scores within a pre-specified bandwidth, h, around each respective threshold value, R̂. For example, to analyze the impact of improving from 2.0 to 2.5 stars, the sample is restricted to contracts with summary scores of 2.25 ±h. To implement our approach, we specify plan/contract quality as follows: qc = γ1 + γ2 × I Rc > R̂ + γ3 × R̃c + γ4 × I Rc > R̂ × R̃c , (2) where γ2 is the main parameter of interest. Incorporating this RD framework into equation (1), and adopting a linear functional form for f (.), yields the final regression equation ȳckm = γ1 + γ2 × I Rc > R̂ + γ3 × R̃c + γ4 × I Rc > R̂ × R̃c +βc Xcm + βm Wm + εckm , (3) where Wm and Xcm are as discussed previously. Our baseline analysis estimates equation 3 using ordinary least squares with a bandwidth of h = 0.125. We consider alternative bandwidths in Section 13 We cluster standard errors by contract; however, the results are qualitatively unchanged when clustering standard errors at the county level. 14 The overall plan type (e.g., HMO versus PPO) is typically contract-specific and therefore does not vary across plans within the same contract. 15 See Imbens & Lemieux (2008) for a detailed discussion of the RD design and its application in economics. 8 6 as well as a more traditional RD design with a triangular kernel Imbens & Lemieux (2008). Changes in mean premiums at the contract level can arise in several ways, most directly via changes to premiums among specific plans. To investigate this possibility, we also estimate a regression of 2010 plan premiums as a function of the plans’ 2009 premiums, 2009 star ratings and other contract-level variables, and 2009 county characteristics. This analysis is akin to estimating equation 3 but where our analysis is at the plan level rather than aggregating to the contract level. For this analysis, we examine only plans that were available in the same county in both 2009 and 2010. 5 Results 5.1 Average Premiums at the Contract Level Table 2 presents the results of a standard OLS regression of mean contract characteristics in 2010 on the 2009 mean value, the contract’s 2009 star rating, as well as additional county and contractlevel covariates. To the extent that contract quality is already reflected in the contract’s mean plan characteristics, we would expect the effects of increasing star ratings to be relatively small in magnitude. This is the case in Table 2, where we see small decreases in average premiums among 2.5 and 4-star contracts with small increases in premiums among 3 and 3.5-star contracts (relative to contracts with one-half star lower ratings). Note that, in order to better reflect the premium charged to a given enrollee in a specific contract, our analysis of average premiums at the contract level excludes plans with 10 or fewer enrollments.16 Our analysis at the plan-level makes no such exclusion. Table 2 The OLS results say little about the specific effects of an increase in reported quality on premiums. To address this question directly, Table 3 presents the initial RD results at the contract level for a bandwidth of h = 0.125. The results suggest a large premium increase for contracts receiving a 3, 3.5, or 4 star rating in 2009, with these contracts increasing average premiums by between $29 and $34 per month from their 2009 levels relative to contracts with one-half star lower ratings. By contrast, contracts receiving a 2.5-star rating showed no statistically significant increase in premiums. By virtue of the RD design and the nature of the CMS star rating program, we argue that these estimates can be interpreted as the causal effect of a one-half star increase in quality ratings separate from the quality 16 Not surprisingly, low star-rated plans with 10 or fewer enrollments also charge much higher premiums relative to the same quality plans with higher enrollments. For example, in 2010, the average premium among 2.5-star plans with 10 or fewer enrollments was $63, compared to just $32 among 2.5-star plans with 11 or more enrollments. The results are nonetheless consistent when we include all plans and an indicator variable for missing enrollment data. 9 of the contract itself. For example, 3.5-star contracts of comparable “true” quality to 3-star contracts were able to increase their premiums on average $29 per month. Looking purely at sample averages, all other contracts receiving a 3.5-star rating in 2009 increased their premiums by an average of $12, while 3-star contracts falling just below the 3.25 threshold increased their premiums by just over $3. We provide extensive robustness and sensitivity analyses for these results in Section 6. Table 3 5.2 Premiums at the Plan Level Table 4 summarizes the RD results for 2010 plan premiums as a function of 2009 premiums, countylevel covariates, as well as the contract’s quality rating as specified in equation 2. This analysis therefore estimates premium changes at the plan level (for the same plans offered in both 2009 and 2010), rather than analyzing average premiums at the contract level as in Table 3. For the same plan/county/contract, the results again show a large and statistically significant increase in premiums for 3, 3.5, and 4-star contracts, with premiums increasing by between $19 and $42 per month for the same plans. Table 4 6 Robustness and Sensitivity Analysis The appropriateness of our proposed RD design depends critically on whether contracts can sufficiently adjust their summary scores. Intuitively, it is unlikely that contracts can manipulate their scores because the star ratings are calculated based on data two or three years prior to the current enrollment period. Contracts would therefore not have the opportunity to manipulate other observable plan characteristics in response to their same-year star ratings. To test this formally, McCrary (2008) proposes a test of discontinuity in the distribution of summary scores around the threshold values. The resulting t-statistics range from 0.15 to 0.96, suggesting no evidence of a discontinuity in the running variable at any of the threshold values. In the remainder of this section, we investigate the sensitivity of our results along several other dimensions, including: 1) bandwidth selection; 2) inclusion of covariates; and 3) falsification test with counter-factual threshold values. 10 6.1 Choice of Bandwidth The choice of bandwidth is a common area of concern in the RD literature (Imbens & Lemieux, 2008; Lee & Lemieux, 2010). To assess the sensitivity of our results to the choice of bandwidth, we replicated the local linear regression analysis from Tables 3 and 4 for alternative bandwidths ranging from 0.1 to 0.25 in increments of 0.005. The results for mean plan premiums at the contract level (Table 3) are illustrated in Figure 1, where each graph presents the estimated star-rating coefficient, γˆ2 , along with the upper and lower 95% confidence bounds. Similar results for plan-level premium adjustments are presented in Figure 2. In general, our results are consistent across a range of alternative bandwidths. Figure 1 6.2 Inclusion of Covariates The RD literature generally advises against including covariates in a standard RD design (Imbens & Lemieux, 2008; Lee & Lemieux, 2010). The intuition for this advice is as follows: if treatment assignment is random within the relevant bandwidth, then the covariates should also be randomly assigned to the treated and control groups. However, in our setting, purely randomized quality scores at the contract level would not necessarily imply randomization in county-level variables. As such, we argue that county-level covariates belong in our analysis in order to control for geographic variation influencing contract location and plan offerings. Nonetheless, we assess the sensitivity of our analysis to the exclusion of these covariates by estimating a more traditional RD model with right-hand side variables presented in equation 2. We estimate the effect of a one-half star increase in quality ratings with a triangular kernel and our preferred bandwidth of h = 0.125. The results, summarized in Table 8, are generally consistent with our initial findings in Tables 3 and 4, where we again see large increases in average premiums among 3, 3.5, and 4-star contracts relative to contracts just below the respective star-rating thresholds. One exception is the estimated effect on individual plan premiums for 4-star versus 3.5-star contracts presented in the bottom right of Table 8. In this case, unlike the estimates in Table 4, we find no significant increase in premiums among 4-star contracts along with a reduction in the magnitude of the estimated effect. This is perhaps not surprising given the location of higher rated contracts throughout the country, where 4-star contracts are more concentrated in specific geographic areas relative to lower star-rated contracts. 11 Table 8 6.3 Falsification Tests Finally, it is possible that the observed jumps at threshold values of 2.25, 2.75, etc. are driven more by specific contracts that happen to fall above or below the threshold versus the star rating system itself. As a test, we therefore considered a series of counter-factual threshold values above and below the true threshold values. Intuitively, we should not see any jumps in premiums around these thresholds. Figure 3 presents the results of this analysis for mean premiums at the contract/county level, where we estimated the effects just as we did for Figure 1 and Table 3. The results support 2.75 and 3.25 as the true threshold values, with the largest premium increases occurring just above those thresholds. The results for 2.25 and 3.75 thresholds are less conclusive, with apparent jumps in premiums for what should be irrelevant thresholds such as 1.9, 3.65, and 3.85. Figure 3 7 Mechanisms for Premium Adjustment Comparing our contract-level (Table 3) and plan-level (Table 4) analysis, we see larger premium increases at the plan level for 3.5-star contracts and smaller increases at the plan level for 3-star contracts. These results suggest that increases in average premiums at the contract level do not arise solely from increases in premiums of the same plans from 2009 to 2010. Rather, the results suggest that contracts also alter their plan mix from one year to the next (e.g., dropping plans within a contract, introducing new plans under the same contract, or expanding plans to new counties). Table 5 summarizes the exit behaviors from 2009 to 2010 by star rating, where we see low quality plans were significantly more likely to exit their respective markets than plans associated with higher star ratings. In particular, we see almost all 1.5-star plans leave the market from 2009 to 2010, with very little exit among 4 and 4.5-star plans.17 Regarding plan entry, Table 5 shows that of the contracts receiving a 1.5-star rating in 2009 that still operate in 2010, 37% of the underlying plans entered into a new county in 2010. Similarly, 55% of 2-star plans (in 2009) entered into a new county in 2010, while higher rated contracts were relatively less likely to enter into new markets. Collectively, the exit and 17 The 1.5-star contracts that stayed in the market from 2009 to 2010 also had a marginally higher star rating in 2010. As such, there are no 1.5-star contracts remaining in 2010 (see Table 1). 12 entry figures reflect larger turnover in plan offerings among lower rated contracts relative to higher rated contracts. This is perhaps expected as higher rated contracts may be more deliberate in their market entry/exit decisions and less likely to quickly cycle through new plans from one year to the next. Table 5 7.1 Analysis of Plan Exit To examine plan exit more directly, we follow Bresnahan & Reiss (1991), Cawley et al. (2005), Abraham et al. (2007), and others in assuming that an insurance company will only offer a plan in a given county if the plan positively contributes to the contract’s profit. Assuming profit is additively separable across geographic markets (counties), our observed plan choice indicator becomes: yc(j)m = 1 if πc(j)m = g Wm , Xc(j)m + εc(j)m ≥ 0 0 if πc(j)m < 0 (4) where Wm again denotes county-level demographics, Xc(j)m denotes contract and plan characteristics (including the contract’s 2009 quality, qc , plan premium, Part D participation, etc.), and εc(j)m is an error term independently distributed across markets and plans. We adopt a reduced form, linear profit specification with covariates including the benchmark CMS payment rates, 2009 contract quality (qc ), the plan’s enrollments in 2009, the number of physicians in the county, the average Medicare FFS cost per beneficiary in the county, and plan characteristics such as premiums, whether the plan offers prescription drug coverage, and indicators for HMO or PPO plan type. Within this specification, we also consider the RD design from equation 2. We estimate equation 4 with a linear probability model where yc(j)m = 1 indicates that the contract continued to offer the plan in 2010 and yc(j)m = 0 indicates the plan was dropped. By definition, this analysis is based on existing plans as of 2009. The results of our RD analysis of plan exit are summarized in Table 6. The top panel presents results for all plans, while the remaining panels present results for plans with $0 premiums and plans with positive premiums, respectively. Overall, we see that 2.5-star contracts are significantly less likely to exit markets than 2-star contracts of similar overall quality. Relative to 2.5-star contracts, 3-star contracts show no significant differences in exit behaviors, but they are significantly more likely to drop their $0 premium plans and less likely to drop positive premium plans. Somewhat surprisingly, 13 contracts receiving a 3.5-star rating are more likely to drop plans overall; however, from the middle panel of Table 6, we see that this result is entirely driven by 3.5-star contracts dropping their $0 premium plans. Finally, 4-star contracts are significantly less likely to exit overall, particularly for their positive premium plans.18 Table 6 7.2 Analysis of Plan Entry An important and relatively unique aspect of the MA market concerns the distinction between plan and contract-level decisions. Specifically, contracts must obtain CMS approval in order to be offered in a given county; however, conditional on receiving CMS approval, the decision of which plan(s) to offer in a county is relatively less regulated. As a result, we argue that the fixed costs of entry are primarily incurred at the contract level while the plan-level entry/exit decisions are based on the variable profits per enrollee (i.e., regardless of market share). With regard to plan entry, this unique CMS approval process alleviates many of the traditional econometric issues surrounding multiple equilibria or endogeneity of other players’ actions in models of market entry with incomplete information (Berry & Reiss, 2007; Bajari et al., 2010; Su, 2012). Conditional on plan characteristics, our entry analysis therefore need only consider variable cost shifters and should be largely independent of the number or type of competing plans in the county.19 The full set of plans available to a contract in a given market m is identified by taking all plans offered under that contract across the entire state in the same year. All such plans are therefore considered “eligible” to be operated in any given county, and the contract must choose which of those plans to offer in each county, where yc(j)m = 1 indicates that the plan was added to the county (under that contract) in 2010, and yc(j)m = 0 indicates that the plan was not offered. As with our analysis of plan exit, we estimate the entry-equivalent to equation 4 using a standard linear probability model, with entry considered as a function of 2010 county and plan characteristics as well as 2009 contract quality as in equation 2. Table 7 summarizes the results of our RD analysis for plan entry. Note that these results only apply to markets in which the contracts previously operated (i.e., we do not consider the contract-level 18 The robustness of our plan exit results to bandwidth selection is summarized in Appendix D. The overall results (top panel of Table 6) at the 2.75 threshold appear relatively sensitive to bandwidth selection, with the statistical significance, magnitude, and sign of the point estimates changing within bandwidths from 0.1 to 0.2. In terms of hypothesis testing, we interpret this as evidence in favor of the null that the star rating has no effect on plan exit at the 2.75 threshold. As such, the qualitative findings from our point estimates in Table 6 are unchanged. 19 Results are robust when we weaken this assumption and allow predicted 2010 market shares to influence entry behaviors. The results are excluded for brevity but available upon request. 14 entry decisions and instead focus specifically on the plan-level entry of pre-existing contracts). The RD results indicate that a one-half star improvement for 3 or 3.5-star contracts makes them significantly more likely to expand their plans into new markets. The bottom panels of Table 7 further reveal that the increase in probability of plan entry occurs for the positive premium plans, with 3.5-star contracts significantly less likely to enter new markets with their $0 premium plans.20 Table 7 8 Welfare Effects To examine the welfare effects of our estimated premium increases in Section 5, we follow Town & Liu (2003) and Maruyama (2011) in estimating a standard Berry-type model of plan choice based on market-level data (Berry, 1994). Specifically, let the utility of individual i from selecting Medicare option c(j) in market area m be given as Uic(j)m = δc(j)m + ξc(j)m + ζig + (1 − σ)ic(j)m , (5) where δc(j)m and ξc(j)m represent the mean level of utility derived from observed and unobserved contract-plan-market area characteristics, respectively. We include in δc(j)m observed characteristics at the contract and plan level, including premiums, plan type (HMO, PPO, or PFFS), and the underlying summary score of the contract. Similar to Town & Liu (2003), we partition the set of Medicare options into two groups: 1) MA plans that offer prescription drug coverage (MA-PD plans); and 2) MA plans that do not offer prescription drug coverage (MA-Only). Traditional Medicare FFS is taken as our outside option. In addition to the i.i.d. extreme value error ic(j)m , individual preferences are allowed to vary through group dummies ζig . This nested logit structure relaxes the independence of irrelevant alternatives assumption and allows for differential substitution patterns between nests. The nesting parameter, σ, captures the within-group correlation of utility levels. Following Berry (1994) and others, the parameters in equation 5 can be estimated using marketlevel data on the relative share of MA plans. Specifically, our estimation equation is as follows: ln(Sc(j)m ) − ln(S0m ) = xc(j)m β − αFc(j) + σln(Sc(j)m|g ) + ξc(j)m , 20 The robustness of our plan entry results to bandwidth selection is summarized in Appendix D. 15 (6) where xc(j)m denotes observed plan/contract characteristics, and ξc(j)m denotes the mean utility derived from unobserved plan characteristics. We estimate the parameters of equation 6 using two-stage least squares (2SLS) due to the endogeneity of within-group shares, Sc(j)m|g , and plan premiums, Fc(j) . We take as instruments the number of contracts operating in a county, the number of hospitals in a county, the Herfindahl-Hirschman Index (HHI) for hospitals in a county (based on discharges), and the number of physicians in the county. The results of this regression are presented in Appendix D. With estimates of the mean observed utility, δ̂c(j)m , and the within-group correlation, σ̂, estimated monthly consumer surplus for a representative MA beneficiary is then derived as follows (Manski & McFadden, 1981; Town & Liu, 2003; Maruyama, 2011): Wi = 1 (1 − σ̂) ln α̂ X exp j∈Jm ! ˆ δ̂c(j)m + ξc(j)m . 1 − σ̂ (7) Our results yield an estimated $120 reduction in yearly consumer surplus per beneficiary for every $10 increase in premiums (all else equal). In 2010, there were approximately 1,080,000 beneficiaries enrolled in a 3, 3.5, or 4-star MA plan with a summary score just above the relevant threshold value. Assuming a $20 increase in premiums from 2009 to 2010 (the smallest estimated effect in Tables 3 and 4), this yields a total reduction in consumer surplus of approximately $259 million. 9 Discussion The potential supply-side response of MA contracts to the CMS quality rating system is critical both from a policy perspective as well as a consumer welfare perspective. If contracts can take advantage of improved quality scores by increasing premiums (holding the contract’s true quality constant), then this suggests a lack of competitiveness in the MA market with contracts raising prices without any true improvement in quality. Building on the initial results of Darden & McCarthy (forthcoming), the current paper finds strong evidence of such premium increases among average to above average star-rated contracts. Based on the results in Section 5 and the range of sensitivity analyses in Section 6, we conclude that the increases in premiums for 3-star versus 2.5-star contracts (the 2.75 threshold) as well as 3.5star versus 3-star contracts (the 3.25 threshold) are not due to chance but are instead reflective of a true increase in premiums following an increase in reported quality. Meanwhile, we find no consistent changes in premiums for 2.5 relative to 2-star contracts. We find some initial evidence for increases in premiums among 4-star contracts relative to 3.5-star contracts; however, this finding is sensitive to bandwidth specification, and the effect does not persist in our falsification tests. Plan-level results for 16 4-star rated contracts are also sensitive to the inclusion of market-level covariates, There are likely several reasons for a contract to increase 2010 premiums in response to its prioryear quality ratings. One natural reason is pure rent extraction - contracts may seek to capitalize on their high reported quality by charging a higher price to its existing customers. However, contracts may also increase premiums in order to better curb adverse selection. In this case, contracts of higher reported quality but comparable true quality may want to price-out certain customers from the market, particularly if sicker beneficiaries are more likely to make decisions based in-part on the quality ratings. With market level data, we cannot empirically identify either of these effects individually. Nonetheless, our results generally suggest that the perceived benefits of the star rating program in terms of beneficiary decision-making are at least partially offset by the supply-side response of higher premiums. 17 References Abraham, Jean, Gaynor, Martin, & Vogt, William B. 2007. Entry and Competition in Local Hospital Markets. The Journal of Industrial Economics, 55(2), 265–288. Bajari, Patrick, Hong, Han, Krainer, John, & Nekipelov, Denis. 2010. Estimating static models of strategic interactions. Journal of Business & Economic Statistics, 28(4). Berry, Steven, & Reiss, Peter. 2007. Empirical models of entry and market structure. In: Armstrong, M., & Porter, R. (eds), Handbook of industrial organization, vol. 3. Amsterdam: Elsevier. Berry, Steven T. 1994. Estimating discrete-choice models of product differentiation. The RAND Journal of Economics, 242–262. Bresnahan, Timothy F, & Reiss, Peter C. 1991. Entry and competition in concentrated markets. Journal of Political Economy, 977–1009. Cawley, John, Chernew, Michael, & McLaughlin, Catherine. 2005. HMO participation in Medicare+ Choice. Journal of Economics & Management Strategy, 14(3), 543–574. Dafny, L., & Dranove, D. 2008. Do report cards tell consumers anything they don’t already know? The case of Medicare HMOs. The Rand journal of economics, 39(3), 790–821. Darden, M., & McCarthy, I. forthcoming. The Star Treatment: Estimating the Impact of Star Ratings on Medicare Advantage Enrollments. Journal of Human Resources. Frakt, Austin B, Pizer, Steven D, & Feldman, Roger. 2012. The Effects of Market Structure and Payment Rate on the Entry of Private Health Plans into the Medicare Market. Inquiry, 49(1), 15–36. Imbens, G.W., & Lemieux, T. 2008. Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142(2), 615–635. Lee, David S, & Lemieux, Thomas. 2010. Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48, 281–355. Manski, Charles F, & McFadden, Daniel. 1981. Structural analysis of discrete data with econometric applications. Mit Press Cambridge, MA. Maruyama, Shiko. 2011. Socially optimal subsidies for entry: The case of medicare payments to hmos*. International Economic Review, 52(1), 105–129. 18 McCrary, Justin. 2008. Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142(2), 698–714. Pauly, Mark, Harrington, Scott, & Leive, Adam. 2014. ‘Sticker Shock’ in Individual Insurance under Health Reform. Tech. rept. National Bureau of Economic Research. Reid, Rachel O, Deb, Partha, Howell, Benjamin L, & Shrank, William H. 2013. Association Between Medicare Advantage Plan Star Ratings and EnrollmentStar Ratings for Medicare Advantage Plan. JAMA, 309(3), 267–274. Stockley, Karen, McGuire, Thomas, Afendulis, Christopher, & Chernew, Michael E. 2014. Premium Transparency in the Medicare Advantage Market: Implications for Premiums, Benefits, and Efficiency. Tech. rept. National Bureau of Economic Research. Su, Che-Lin. 2012. Estimating discrete-choice games of incomplete information: Simple static examples. Quantitative Marketing and Economics, 1–41. Town, Robert, & Liu, Su. 2003. The welfare impact of Medicare HMOs. RAND Journal of Economics, 719–736. 19 A Appendix A: Star Rating Metrics The star rating system consists of five domains, with the names of each domain, the underlying metrics in each domain, and the data sources for each metric changing over the years. The metrics and relevant domains for 2009 are listed in Table 9. Table 9 20 B Appendix B: Star Rating Calculations Although the domains and individual metrics changed from year to year, the way in which overall star ratings were calculated was consistent across years. The calculations follow in five steps, as described in more detail in the CMS technical notes of the 2009, 2010, and 2011 star rating calculations: 1. Raw summary scores for each individual metric are calculated as per the definition of the metric in question. As discussed in the text, these scores are derived from a variety of different datasets including HEDIS, CAHPS, HOS, and others. The resulting summary scores are observed in our dataset. 2. The summary scores in each metric are translated into a star rating. For most measures, the star rating is assigned based on percentile rank; however, CMS makes additional adjustments in cases where the distribution of scores are skewed high or low. Scores derived from CAHPS have a more complicated star calculation, based on the percentile ranking combined with whether or not the score is significantly different from the national average. The resulting stars for each individual metric are observed in our dataset. 3. The star values from each metric are averaged among each respective domain to form domain level stars, provided a minimum number of metric-level scores are available for each domains. For example, in 2009 and 2010, a domain-level star was only calculated if the contract had a star value for at least 6 of the 12 individual measures. The domain-level star ratings are observed in our dataset. 4. Overall Part C summary scores are then calculated by averaging the domain-level star ratings and adding an integration factor (i-Factor). The i-Factor is intended to reward consistency in a plan’s quality across domains, and is calculated as follows: (a) Derive the mean and variance of all individual metric summary scores for each contract. (b) Form the distribution of the mean and variance across contracts. (c) Assign an i-Factor of 0.4 for low variance (below 30th percentile) and high mean (above 85th percentile), 0.3 for medium variance (30th to 70th percentile) and high mean, 0.2 for low variance and relatively high mean (65th to 85th percentile), and 0.1 for medium variance and relatively high mean. All other contracts are assigned an i-Factor of 0. 5. Overall Part C star ratings are then calculated by rounding the overall summary score to the nearest half-star value. 21 We do not observe the i-Factors in the data. We therefore replicated the CMS methodology, ultimately matching the overall star ratings for 98.8% and 98.5% of the plans in 2009 and 2010, respectively. As discussed in the text, plans for which we were unable to replicate star ratings were dropped from the analysis. Note also that star ratings are based on data from at least the previous calendar year and sometimes further back depending on ease of access from CMS. New plans therefore do not have a star rating available, nor was a star rating for such plans provided to beneficiaries. Tables 10 and 11 presents example calculations of the overall summary score and resulting star values for 5 contracts in 2009. The table lists the summary scores for the individual metrics along with the corresponding star values, each of which are observed in the raw data. The high mean and low mean thresholds for i-Factor calculations were calculated to be 3.6667 and 3.2381, respectively. Similarly, the high variance and low variance thresholds were 1.3462 and 1.0362, respectively. Table 10 and 11 The calculations for each contract in Table 10 are discussed individually below: 1. Contract H0150: With a mean star value of 2.583 and a variance of 0.879, the contract received an i-Factor of 0 (due to a low mean), which provided an overall summary score of 2.583 and a star rating of 2.5. 2. Contract H0151: With a mean star value of 2.667 and a variance of 0.8, the contract received an i-Factor of 0 (again from a low mean), which provided an overall summary score of 2.667 and a star rating of 2.5, just 0.083 points away from receiving a 3-star rating. 3. Contract H1558: With a mean star value of 3.967 and a variance of 1.275, the contract received an i-Factor of 0.3 (high mean and medium variance), which provided an overall summary score of 4.267, just 0.0167 above the 4.25 threshold required to round up to a 4.5-star rating. 4. Contract H0755: With a mean star value of 3.5278 and a variance of 1.285, the contract received an i-Factor of 0.1 (relatively high mean and medium variance), which provided an overall summary score of 3.6278 and a star rating of 3.5. 5. Contract H1230: With a mean star value of 3.694 and a variance of 1.018, the contract received an i-Factor of 0.4 (high mean and low variance), which provided an overall summary score of 4.094 and a star rating of 4.0. 22 C Appendix C: Data Our analysis merges publicly available data from several sources. As our starting point, we merge together enrollment and contract information by month/year/contract id/plan id for all Medicare Advantage MA contract/plans from June 2008 through December of 2011.21 For a small number of counties, CMS reports enrollment counts at the Social Security Administrative (SSA) level.22 For these observations, we aggregate enrollment to the county level, and, after limiting our focus to HMO, PPO, and PFFS type contracts, we have a dataset of 50,269,123 observations at the contract id/plan/county/month/year level. The enrollment files are invalid for providing a census of MA contracts that operate in a given market (county) because of migration. For example, if contract A is approved to operate in Orange County, North Carolina, and an enrollee in contract A moves to Miami-Dade County, Florida, the enrollment files will report positive enrollment in contract A in Miami-Dade County regardless if contract A is approved to operate in Miami-Dade. To overcome this problem, CMS provides separate service area files that list all contracts that are approved to operate in a given county.23 In addition to the CMS service files, we merge our enrollment dataset to quality star data at the contract/year level24 ; CMS contract/plan premium data25 ; Medicare Advantage market share data at the county/contract id level26 ; and county-level census data from the American Community Survey for 2006-2010 in wide format. Given the size of the resulting data, we proceed in cleaning the data for 2009 and 2010 separately. In what follows, we document our cleaning of the 2009 data, with 2010 in parenthesis. Our 2009 (2010) data contain 19,290,326 (13,427,779) contract id/plan id/county/month observations. We begin by dropping the 331,272 (204,355) observations from U.S. Territories and Outlying Areas. Next, we drop all contract/plans that are specific to an employer or union-only group (these are also known as the “800-series plans”). While the decision to eliminate these plans reduces our sample by 17,051,609 (11,988,547) observations, these contract/plans are not available to the public and are not our primary 21 CMS at records enrollment data in separate files from contract characteristic information. Data are available http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Monthly-Enrollment-by- Contract-Plan-State-County.html 22 The contract characteristic files contain a small number duplicate observations, which we drop. are available at http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-andReports/MCRAdvPartDEnrolData/MA-Contract-Service-Area-by-State-County.html. For the few counties that are sub-divided by SSA, we aggregate to the county level. 24 Contract-level quality data available at http://www.cms.gov/Medicare/Prescription-DrugCoverage/PrescriptionDrugCovGenIn/PerformanceData.html. 25 Data on plan premiums available at http://www.cms.gov/Medicare/Prescription-DrugCoverage/PrescriptionDrugCovGenIn/index.html?redirect=/PrescriptionDrugCovGenIn/. County names and FIPS codes available at http://www.census.gov/popest/about/geo/codes.html. 26 MA penetration data available at http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-andReports/MCRAdvPartDEnrolData/MA-State-County-Penetration.html. 23 Data 23 focus. Next, we drop the 231,655 (159,439) observations of special needs plans. Finally, we drop the observations that did not merge perfectly between the CMS enrollment files and the service area files. These reflect either contracts with positive enrollment in a month/year/county that were not approved to operate in that county (due to migration) or contracts that were approved to operate in a county but had no corresponding enrollment record. Our final sample size for 2009 is 1,422,887 (841,790) contract id/plan id/county/month. We also collect hospital discharge data from the annual Hospital Cost Reporting Information System (HCRIS) as well as CMS benchmark rates and average FFS costs by county.27 27 Data Year.html are available at http://www.cms.gov/Research-Statistics-Data-and-Systems/Files-for-Order/CostReports/Cost-Reports-by-Fiscal- . 24 D Appendix D: Additional Analyses D.1 Robustness Checks Figure 2 illustrates the sensitivity of the plan-level RD analysis to our bandwidth selection. As should be the case, the figure closely follows that of the contract-level analysis from Figure 1. Generally, Figure 2 suggests that the findings from the point estimates in Table 4 are relatively persistent across alternative bandwidths (provided the bandwidths are sufficiently narrow and include a sufficient number of contracts). Figure 2 Figures 4 and 5 present similar graphs for the analysis of plan exit and plan entry, respectively. The figures generally support the robustness of the point estimates in Tables 6 and 7 to our bandwidth selection. Our analysis of plan exit and entry at the 2.75 threshold (2.5 versus 3-star contracts) is one possible exception, with the statistical significance, magnitude, and sign of the point estimates changing within bandwidths from 0.1 to 0.2. In terms of hypothesis testing, we interpret this as evidence in favor of the null that the star rating has no effect on plan exit or entry at the 2.75 threshold. As such, the qualitative findings from our point estimates in Table 6 are unchanged, while the overall findings from our analysis of plan entry (top panel in in Table 7) are less definitive among 3.0 relative to 2.5-star rated contracts. Figures 4 and 5 D.2 Welfare Analysis The results of estimating equation 6 with OLS and 2SLS are presented in Table 12 along with the first-stage results for the 2SLS estimator. 25 E Tables and Figures Table 1: Summary Statistics 2009 2010 Mean (S.D.) Mean (S.D.) Plan-level Data, n=51,442 and 34,642 Enrollmenta 291.55 (1,413) 361.17 (1,600) Overall Share % 1.18 1.26 Within-nest Share, % 28.87 31.07 53.27 (52.97) Premium 37.69 (42.23) Drug Coverage, % 58.58 64.39 24.12 HMO, % 16.32 PPO, % 18.53 33.71 Market Characteristics, n=3,139 and 3,094 MA Penetration 15.59 (11.03) 16.50 (12.12) Mean Number of Plans 37.38 (22.31) 26.61 (17.58) 12.59 (35.74) Population > 65 in 1,000s 12.22 (34.90) Population > 85 in 1,000s 1.72 (5.11) 1.79 (5.34) Unemployed, % 5.79 9.01 White, % 86.30 86.41 Black, % 9.11 9.18 Female, % 50.16 50.17 College Graduates, % 18.68 18.62 42.63 South, % 42.02 Contract-level Star Ratings %, n=252 and 295 0.00 1.5 1.98 2.0 9.92 4.07 2.5 24.21 24.41 3.0 28.97 29.83 3.5 21.43 20.67 4.0 11.11 12.20 4.5 2.38 7.78 5.0 0.00 1.02 a Enrollment data available for 20,768 plans in 2009 and 17,334 plans in 2010. Remaining plans have 10 or fewer enrollments and specific enrollments are therefore not provided by CMS. 26 Table 2: OLS Results for Average Characteristicsa Star Indicator 2.5 3.0 3.5 4.0 y = Average Premium γ̂2 -5.18*** 6.74*** 6.15*** -8.84*** (1.55) (1.39) (1.54) (2.52) N 4,303 4,182 2,672 1,213 0.52 0.66 0.71 0.75 R2 y = Proportion of $0 Premium Plans γ̂2 0.17*** -0.13*** 0.03*** -0.03** (0.02) (0.01) (0.01) (0.01) 4,303 4,182 2,672 1,213 N R2 0.36 0.70 0.75 0.63 a OLS regression of the 2010 mean characteristics on the relevant 2009 mean characteristic and star ratings. Regressions estimated separately for each star rating, with γ̂2 denoting the estimated effect of a one-half star increase in quality ratings. Contract-level averages are based on all plans with more than 10 enrollments. Standard errors in parenthesis are robust to clustering at the county level. Additional controls not in the table include county-level variables on the population over 65, population over 85, unemployment rate, percent white, percent black, percent female, regional dummy (south), percent graduating college, and the number of MA plans and contracts in the county, the CMS benchmark payment rate and average FFS cost, and number of physicians in the county, as well as contract-level variables including the number of counties in which the contract operated in 2009, whether the contract operates as an HMO or PPO, and the total number of enrollees under the contract in 2009. * p<0.1. ** p<0.05. *** p<0.01. Table 3: RD Results for Average Characteristicsa Star Threshold 2.25 2.75 3.25 3.75 y = Average Premium γ̂2 4.81 33.60*** 29.30*** 31.85*** (4.27) (7.27) (6.12) (6.38) N 2,029 982 432 309 R2 0.39 0.72 0.69 0.92 y = Proportion of $0 Premium Plans γ̂2 -0.14* -0.16** 0.02 -0.13* (0.08) (0.06) (0.04) (0.07) N 2,029 982 432 309 R2 0.21 0.90 0.72 0.55 a Results based on OLS regressions with RD approach and a bandwidth of h = 0.125. Robust standard errors in parenthesis, clustered at the county level. Results were excluded for the 1.5 and 4.5 star ratings due to an insufficient number of contracts on the lower and upper ends of the 1.75 and 4.25 thresholds, respectively. Regressions estimated at the contract level, with dependent variables measured as the average value of each plan characteristic by contract (excluding plans with 10 or fewer enrollments). Additional controls not in the table include county-level variables on the population over 65, population over 85, unemployment rate, percent white, percent black, percent female, regional dummy (south), percent graduating college, and the number of MA plans and contracts in the county, the CMS benchmark payment rate and average FFS cost, and number of physicians in the county, as well as contract-level variables including the number of counties in which the contract operated in 2009, whether the contract operates as an HMO or PPO, and the total number of enrollees under the contract in 2009. * p<0.1. ** p<0.05. *** p<0.01. 27 Table 4: RD Results for Plan-level Characteristicsa Star Threshold 2.25 2.75 3.25 3.75 y = 2010 premium γ̂2 5.00** 19.40*** 41.99*** 31.52*** (2.10) (3.93) (5.17) (5.10) 4,912 6,894 1,024 1,082 N R2 0.63 0.76 0.83 0.94 y = Indicator for $0 premium plan in 2010 γ̂2 0.04 -0.32*** 0.02 -0.15*** (0.04) (0.05) (0.03) (0.05) N 4,912 6,894 1,024 1,082 R2 0.24 0.89 0.51 0.59 a Results based on OLS regressions with RD approach and a bandwidth of h = 0.125. Robust standard errors in parenthesis, clustered at the county level. Results were excluded for the 1.5 and 4.5 star ratings due to an insufficient number of contracts on the lower and upper ends of the 1.75 and 4.25 thresholds, respectively. Regressions estimated at the plan level for all plans in the dataset. Additional controls not in the table include county-level variables on the population over 65, population over 85, unemployment rate, percent white, percent black, percent female, regional dummy (south), percent graduating college, and the number of MA plans and contracts in the county, the CMS benchmark payment rate and average FFS cost, and number of physicians in the county, as well as the plan’s total number of enrollees in 2009 (set to 0 if missing), an indicator variable for missing number of enrollees (¡10 enrollees in the plan), an indicator for HMO or PPO plan type, and the lagged dependent variable. * p<0.1. ** p<0.05. *** p<0.01. Table 5: Summary of Plan Exit and Entrya 2009 Rating Exit (%) Entry (%) 1.5 Star 99.49 36.51 2.0 Star 51.40 55.16 2.5 Star 53.58 52.79 3.0 Star 29.37 23.91 3.5 Star 25.97 17.20 4.0 Star 8.25 32.45 4.5 Star 8.24 7.72 All 49.77 38.20 a Exit defined as the same plan-county-contract observation in 2009 no longer active in 2010. 28 Table 6: RD Results for Plan Exita Star Threshold 2.25 2.75 3.25 3.75 Overall Results γ̂2 -0.83*** -0.07 0.12** -0.25*** (0.06) (0.09) (0.06) (0.06) 10,791 9,806 1,177 1,435 N Among Plans with Premiums = $0 γ̂2 -0.84*** 0.25** 1.07*** -0.07 (0.06) (0.11) (0.30) (0.05) N 9,110 613 140 281 Among Plans with Premiums > $0 γ̂2 -1.37*** -0.82*** 0.04 -0.36*** (0.13) (0.12) (0.05) (0.07) N 1,681 9,193 1,037 1,154 a Results based on linear probability model with RD approach and a bandwidth of h = 0.125. Robust standard errors in parenthesis, clustered at the county level. Results were excluded for the 1.5 and 4.5 star ratings due to an insufficient number of contracts on the lower and upper ends of the 1.75 and 4.25 thresholds, respectively. Additional controls not in the table include county-level variables on the population over 65, population over 85, unemployment rate, percent white, percent black, percent female, regional dummy (south), percent graduating college, and the number of MA plans and contracts in the county, the CMS benchmark payment rate and average FFS cost, and number of physicians in the county, as well as 2009 plan characteristics and enrollment. * p<0.1. ** p<0.05. *** p<0.01. Table 7: RD Results for Plan Entrya Star Threshold 2.25 2.75 3.25 3.75 Overall Results γ̂2 0.06 -0.23*** 0.18*** 0.30*** (0.12) (0.07) (0.06) (0.06) N 6,352 2,453 1,252 852 Among Plans with Premiums = $0 γ̂2 -0.76*** -0.02 -1.80** 0.65*** (0.08) (0.09) (0.75) (0.12) N 3,360 793 171 331 Among Plans with Premiums > $0 γ̂2 2.34*** -1.28*** 0.22*** 0.20*** (0.16) (0.19) (0.07) (0.06) N 2,992 1,660 1,081 521 a Results based on linear probability model with RD approach and a bandwidth of h = 0.125. Robust standard errors in parenthesis, clustered at the county level. Results were excluded for the 1.5 and 4.5 star ratings due to an insufficient number of contracts on the lower and upper ends of the 1.75 and 4.25 thresholds, respectively. Additional controls not in the table include county-level variables on the population over 65, population over 85, unemployment rate, percent white, percent black, percent female, regional dummy (south), percent graduating college, the CMS benchmark payment rate and average FFS cost, and number of physicians in the county, as well as plan characteristics (premium, Part D participation, and HMO versus PPO). * p<0.1. ** p<0.05. *** p<0.01. 29 Table 8: RD Results for Premiums without Covariatesa Star Threshold 2.25 2.75 3.25 3.75 y = Mean Contract Premiums γ̂2 12.82 16.25*** 28.58*** 26.97*** (3.26) (4.53) (5.09) (12.66) N 2,029 982 432 309 y = Individual Plan Premiums γ̂2 -4.34*** 10.88*** 31.27*** 8.36 (1.59) (2.31) (3.42) (7.23) N 4,912 6,894 1,024 1,082 a Results based on RD with triangular kernel and a bandwidth of h = 0.125. Results were excluded for the 1.5 and 4.5 star ratings due to an insufficient number of contracts on the lower and upper ends of the 1.75 and 4.25 thresholds, respectively. * p<0.1. ** p<0.05. *** p<0.01. 30 31 Follow-up Visit within 30 Days of Discharge after Hospital Stay for Mental Illness (HEDIS) Doctor Follow-up for Depression (HEDIS) Colorectal Cancer Screening (HEDIS) a Description Call Answer Timeliness (HEDIS) Doctors Who Communicate Well (CAHPS) Customer Service (CAHPS) Overall Rating of Health Plan (CAHPS) Overall Rating of Health Care Quality (CAHPS) Getting Appointments and Care Quickly (CAHPS) Plan Responsiveness and Care Controlling Blood Pressure (HEDIS) Rheumatoid Arthritis Management (HEDIS) Testing to Confirm COPD (HEDIS) Continuous Beta Blocker Treatment (HEDIS) Improving Bladder Control (HOS) Reducing the Risk of Falling (HOS) Diabetes Care - Kidney Disease Monitoring (HEDIS) Diabetes Care - Blood Sugar Controlled (HEDIS) Diabetes Care - Cholesterol Controlled (HEDIS) Antidepressant Medication Management (HEDIS) Diabetes Care - Eye Exam (HEDIS) Osteoporosis Management (HEDIS) Managing Chronic Conditions Plan Makes Timely Decisions about Appeals (IRE) Reviewing Appeals Decisions (IRE) Handling of Appeals of domains and additional details available at www.cms.gov. Data source for CMS calculations provided in parenthesis. Monitoring Physical Activity (HOS) Appropriate Monitoring of Patients Taking Long-Term Medications (HEDIS) Annual Flu Vaccine (CAHPS) Pneumonia Vaccine (CAHPS) Improving or Maintaining Physical Health (HOS) Improving or Maintaining Mental Health (HOS) Osteoporosis Testing (HOS) Getting Needed Care without Delays (CAHPS) Access to Primary Care Doctor Visits (HEDIS) Breast Cancer Screening (HEDIS) Cardiovascular Care Cholesterol Screening (HEDIS) Diabetes Care - Cholesterol Screening (HEDIS) Glaucoma Testing (HEDIS) Getting Timely Care from Doctors Staying Healthy Table 9: Domains, Metrics, and Data Sources for 2009 MA Star Rating Programa 32 Breast Cancer Screening Colorectal Cancer Screening Cardiovascular Care - Cholesterol Screening Diabetes Care - Cholesterol Screening Glaucoma Testing Appropriate Monitoring for Long-term Medications Annual Flu Vaccine Pneumonia Vaccine Improving or Maintaining Physical Health Improving or Maintaining Mental Health Osteoporosis Testing Monitoring Physical Activity Access to Primary Care Doctor Visits Follow-up after Hospital Visit for Mental Illness Doctor Follow-up for Depression Getting Needed Care without Delays Getting Appointments and Care Quickly Overall Rating of Health Care Quality Overall Rating of Health Plan Call Answer Timeliness Doctors Who Communicate Well Customer Service Osteoporosis Management Diabetes Care - Eye Exam Diabetes Care - Kidney Disease Monitoring Diabetes Care - Blood Sugar Controlled Diabetes Care - Cholesterol Controlled Antidepressant Medication Management Controlling Blood Pressure Rheumatoid Arthritis Management Testing to Confirm COPD Continuous Beta Blocker Treatment Improving Bladder Control Reducing the Risk of Falling Plan Makes Timely Decisions about Appeals Reviewing Appeals Decisions H0150 2 2 3 3 3 4 3 3 3 3 1 3 4 3 1 3 1 3 4 4 3 3 1 3 3 2 2 2 1 2 2 3 2 3 4 1 H0151 2 3 3 2 3 3 2 2 3 3 2 3 4 2 1 5 2 3 4 2 4 3 1 3 3 2 2 2 2 3 2 2 2 3 4 4 2 3 1 3 2 4 5 3 4 4 5 5 5 5 4 5 5 H1558 5 4 4 4 5 5 5 5 3 3 3 3 5 Stars H0755 4 5 4 4 5 4 4 3 3 3 3 3 5 4 1 3 4 4 3 3 4 3 1 5 4 5 5 2 5 3 2 2 2 5 4 3 H1230 5 4 5 4 4 2 5 4 3 3 3 3 4 5 2 3 3 3 4 5 4 3 2 5 5 4 4 5 4 3 2 4 2 4 5 3 Table 10: Star Rating Calculation Examples H0150 59 35 79 77 60 90 67 67 60 81 56 46 94 46 5 83 68 84 86 83 90 88 17 53 76 53 33 44 33 68 24 79 37 55 86 66 39 55 43 79 33 63 82 77 82 58 90 83 90 92 84 93 92 H1558 87 62 93 88 84 93 87 80 59 81 68 44 99 H0755 75 71 90 92 76 90 77 68 60 82 68 48 97 72 20 86 77 86 86 81 91 88 19 79 85 87 63 43 68 73 32 73 38 63 91 77 Raw Scores H0151 57 45 81 74 60 88 66 63 54 78 58 41 92 41 3 88 72 85 87 72 91 87 16 55 77 55 30 40 51 71 21 69 34 55 88 86 H1230 87 59 96 94 73 82 84 77 55 80 71 51 89 77 22 83 75 85 87 96 91 86 28 91 97 83 59 63 62 75 30 85 37 61 100 77 Table 11: Star Rating Calculation H0150 H0151 Mean Summary Score 2.5833 2.6667 0.80 Variance Summary Score 0.8786 i-Factor 0 0 Summary Score 2.5833 2.6667 Star Rating 2.5 2.5 Examples, Cont. H1558 H0755 H1230 3.9667 3.5278 3.6944 1.2747 1.2849 1.0183 0.3 0.1 0.4 4.2667 3.6278 4.0944 4.5 3.5 4 Table 12: Welfare Analysisa OLS 2SLS Premium -0.00** -0.04*** (0.00) (0.01) Within-group Share 0.71*** 0.74*** (0.03) (0.10) HMO -0.03 -1.26** (0.09) (0.61) PPO -0.21 -0.55 (0.13) (0.38) Part D 1.19*** 2.22*** (0.11) (0.48) Part D Cost -0.00 -0.00 (0.00) (0.00) Summary Score 0.43*** 1.96*** (0.10) (0.63) N 20,738 18,300 First-stage Statistics Premium Within-group Share Contract Count 0.07 -0.00 (0.30) (0.01) -0.27 1.45*** Hospital Inpatient HHI (1.31) (0.04) Hospital Count -0.43*** -0.00 (0.10) (0.00) Total Physicians 0.00 -0.00*** (0.00) (0.00) F-stat 9.80 647.94 a Robust standard errors in parenthesis, clustered at the contract level. In the 2SLS estimation, premium and within group share were instrumented using number of contracts operating in a county, the number of hospitals in a county, the Herfindahl-Hirschman Index (HHI) for hospitals in a county (based on discharges), and the number of physicians in the county as instruments. * p<0.1. ** p<0.05. *** p<0.01. 33 60 10 -40 20 Star Rating Coefficient, g2 -20 0 Star Rating Coefficient, g2 30 50 40 20 Figure 1: Effect of Star Rating on Mean Contract Premium for Varying Bandwidths Around Thresholds 2.25, 2.75, 3.25 and 3.75 .1 .15 .2 .25 .1 .15 Bandwidth .2 .25 .2 .25 Bandwidth -20 10 Star Rating Coefficient, g2 0 20 40 Star Rating Coefficient, g2 20 30 40 60 b. 2.75 50 a. 2.25 .1 .15 .2 .25 .1 Bandwidth .15 Bandwidth c. 3.25 d. 3.75 34 40 0 -10 -5 Star Rating Coefficient, g2 10 20 30 Star Rating Coefficient, g2 0 5 10 15 Figure 2: Effect of Star Rating on Plan Premiums for Varying Bandwidths Around Thresholds 2.25, 2.75, 3.25 and 3.75 .1 .2 .3 Bandwidth .4 .5 .1 .2 .4 .5 .4 .5 30 0 40 Star Rating Coefficient, g2 20 40 Star Rating Coefficient, g2 50 60 70 60 b. 2.75 80 a. 2.25 .3 Bandwidth .1 .2 .3 Bandwidth .4 .5 .1 c. 3.25 .2 .3 Bandwidth d. 3.75 35 -50 -40 Star Rating Coefficient, g2 0 50 Star Rating Coefficient, g2 -20 0 20 40 60 100 Figure 3: Falsification Test: Effect of Star Rating on Mean Contract Premium around Counter-factual Thresholds 2.1 2.2 2.3 Counterfactual Threshold 2.4 2.6 2.9 b. Around the true 2.75 threshold -100 -50 Star Rating Coefficient, g2 0 Star Rating Coefficient, g2 -50 0 50 50 100 a. Around the true 2.25 threshold 2.7 2.8 Counterfactual Threshold 3.1 3.2 3.3 Counterfactual Threshold 3.4 3.6 c. Around the true 3.25 threshold 3.7 3.8 Counterfactual Threshold d. Around the true 3.75 threshold 36 3.9 -1.5 -1.5 Star Rating Coefficient, g2 -1 -.5 Star Rating Coefficient, g2 -1 -.5 0 0 .5 Figure 4: Effect of Star Rating on Plan Exit for Varying Bandwidths Around Thresholds 2.25, 2.75, 3.25 and 3.75 .1 .2 .3 Bandwidth .4 .5 .1 .2 .4 .5 .4 .5 b. 2.75 Star Rating Coefficient, g2 -.4 -.2 -.4 -.6 Star Rating Coefficient, g2 -.2 0 .2 0 .4 a. 2.25 .3 Bandwidth .1 .2 .3 Bandwidth .4 .5 .1 c. 3.25 .2 .3 Bandwidth d. 3.75 37 -1 -.6 -.4 Star Rating Coefficient, g2 0 1 Star Rating Coefficient, g2 -.2 0 .2 2 .4 Figure 5: Effect of Star Rating on Plan Entry for Varying Bandwidths Around Thresholds 2.25, 2.75, 3.25 and 3.75 .1 .2 .3 Bandwidth .4 .5 .1 .2 .4 .5 .4 .5 -.1 0 0 .1 Star Rating Coefficient, g2 .1 .2 .3 Star Rating Coefficient, g2 .2 .3 .4 .5 b. 2.75 .4 a. 2.25 .3 Bandwidth .1 .2 .3 Bandwidth .4 .5 .1 c. 3.25 .2 .3 Bandwidth d. 3.75 38