Details of study selection and data abstraction Data sources and searches We conducted a MEDLINE search (PubMed interface) for studies published from January 1, 2003 through July 23, 2014, limited to human subjects, English language, and titles with abstracts. We excluded studies from earlier than 2003, deeming them unlikely to be relevant to the current environment. We used an iterative process to identify search terms. We began by identifying relevant policy proposals from a summary of reform recommendations from reputable policy centers4, identifying relevant articles from each proposal’s reference list. We extracted medical subject heading (MeSH) terms from each article and performed a Medline search using those terms. From search results we identified additional relevant articles, from which we extracted additional unique MeSH terms and re-ran the search, continuing this process until we identified no new relevant MeSH terms (Figure 1). After the initial search, we identified additional articles through author tracking, examining publications of every first and last author of included studies, searching reference lists of included articles, and searching relevant citations using Web of Science. We performed a second round of reference and author tracking using papers identified during initial reference and author tracking and from the Web of Science search. After completing our analysis we updated the search to include the most recent reports. We reviewed tables of contents from journals deemed likely to publish relevant articles, including any journal in which a previously included article had been published, for the dates August 1, 2014 through August 11, 2015. Reviewed journals included: Academic Pediatrics, American Journal of Managed Care, American Journal of Public Health, Gerontologist, Health Affairs, Health Services Research, Journal of American Medical Association, Journal of American Medical Association Internal Medicine, Journal of General Internal Medicine, Journal of Managed Care Pharmacy, Journal of Hospital Medicine, Medical Care, Military Medicine, New England Journal of Medicine, and Pediatrics. Figure 2 demonstrates the flow of articles in the review. Study selection We included studies evaluating the impact of system-level interventions on value in clinical environments (e.g. physician’s offices, hospitals). We excluded studies in non-clinical environments (e.g. visiting nurse services) and those targeting only patients with a particular disease. We included only controlled studies defined as pre-post studies, observational studies with a concurrent comparator or historic control, and randomized controlled trials; we excluded simulations. If a research group reported data on the same intervention for multiple time periods, we included the report from the longest time period. Titles identified in searches were reviewed by one of 4 investigators (MJD, KD, DK, SK) for relevance. Possibly relevant articles (including those identified through author, reference, and citation tracking) underwent full-text review by one of the same investigators for inclusion. A total of 296 full-text articles were reviewed by two pairs of investigators for determination of interrater reliability (Cohen ). One investigator pair (SK, KD) reviewed the same 148 articles and another pair (DK, MJD) reviewed a second set of 148 articles for decision to include in the review. Disagreements were resolved by consensus. Data extraction All investigators extracted data from included articles. We collected information regarding study design, setting, intervention, and results, including the type of intervener and payer, study size, specific intervention of concern, prospective vs retrospective approach, cost or utilization outcomes, and quality metrics. We classified each article by type of intervention(s) and health care setting (primary care, multispecialty practice, hospital, single specialty, or other), and noted specifically if interventions were identified as patient-centered medical home (PCMH) implementations or pay-for-performance initiatives. We also abstracted information on reported clinical quality measures and cost and utilization outcomes. Data extraction was performed by one reviewer (RA, KD, MJD, DK, or SK) and checked by a second reviewer (RA or DK) for accuracy. For articles identified in the updated search, data extraction was performed by RA and checked by DK for accuracy. Differences were resolved by discussion and consensus.