NCHS LINKED DATA FILES: A RESOURCE FOR RESEARCH AND POLICY Academy Health Research Meeting June 27, 27 2010 U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics Overview 2 NCHS Data Linkage Activities Analytic y Issues Research Examples D t Access Data A NCHS Record Linkage Program 3 Links survey data with data collected from administrative records Designed to maximize the scientific value of the NCHS population-based surveys Examine factors that influence chronic disease, disability, health care utilization, morbidity, and mortality NCHS Linkage Activities 4 Mortality National Retirement and Disability Social Security Administration (SSA) Medicare and Medicaid enrollment and claims Centers C Death Index (NDI) for f Medicare M di and d Medicaid M di id Services S i (CMS) Air Quality Environmental Protection Agency (EPA) Research Potential of NCHS Linked Data 5 Di Disparities ii Health services Effects of chronic illness and obesity on disability and mortality Methodologic studies Functional impairment and health care costs Disability Mortality patterns by race/ethnicity or socioeconomic status Validation of self-reports vs. administrative records E i Environmental lH Health lh Effects of air pollution on health outcomes 6 Linked Mortality Files: Mortality Follow-up through 2006 Restricted-use Public-use National Health Interview Survey (1986-2004) X X Second Longitudinal Study of Aging (LSOA II) X X NHANES I Epidemiologic Follow-up Study (NHEFS 1971-1992) (NHEFS, X NHANES II (1976-1980) X NHANES III (1988-1994) X X NHANES (1999-2004) X X 1985, 1995, 1997 National Nursing Home Surveys (NNHS) X 2004 National Nursing Home Survey (NNHS) X X Restricted-use Restricted use vs. Public Public-use use Files 7 Restricted-use files Mortality Includes follow-up for all eligible survey participants children for the NHIS, NHANES II, NHANES III, NHANES 1999-2004, and NNHS Detailed date of death, birth date, and interview date Underlying and multiple cause of death information No perturbed information Restricted-use Restricted use vs. Public Public-use use Files 8 Public-use files f Mortality follow-up for all eligible adult survey participants ii Date of death limited to quarter/year or a calculated follow-up follow up variable Underlying cause of death MCOD information for diabetes, hypertension, and hip fracture Perturbed information for date or cause of death for selected decedents Vital status not perturbed Comparative analyses published on NCHS website NCHS Linked Mortality Files 9 NCHS Survey NHIS 1986-2004 Total Deaths through 2006 181,188 LSOA SO II 5,806 806 NHEFS 8,002 NHANES II 5 276 5,276 NHANES III 5,478 NHANES 1999-2000 611 NHANES 2001-2002 470 NHANES 2003-2004 276 2004 NNHS 6,767 1997 NNHS 12,525 1995 NNHS 6 778 6,778 1985 NNHS 10,226 Linked Social Security Files 10 Social Security data spans 1953-2007 Old Age, g , Survivors,, and Disability y Insurance (OASDI) Master Beneficiary Record (MBR) Payment History Update System (PHUS) 831 Disability Master File (831 DMF) Quarters of Coverage (Master Earnings File) Supplemental Security Income (SSI) Supplemental Security Record (SSR) Linked Social Security Files 11 NC S surveys NCHS National Health Interview Survey (NHIS, 1994-2005) Second S d Longitudinal L i di l Study S d off Aging A i (LSOA II, II 1994-2000) 1994 2000) NHANES I Epidemiologic Follow-up Study (NHEFS, 19711992) NHANES III (1988-1994) NHANES (1999 (1999-2004) 2004) National Nursing Home Survey (1985, 1995, 1997, 2004) Data access through the NCHS Research Data Center Periods of Coverage for SSA Files 12 Social Security Data Elements 13 MBR (1962-2007) and SSR (1974 -2007) files Type of claimant Benefit amount Filing g and eligibility g y status Dual entitlement PHUS file (1984 -2007) 2007) Actual monthly amount that a beneficiary received Amount of monthly Medicare premium Social Security Data Elements 14 831 Disability Master file (1989-2007) Status of disability application, e.g. filing date Diagnosis coding, e.g. primary and secondary impairments Quarters of Coverage (1953-2007) Summarized variables indicating an individual’s individual s insured status, i.e. eligibility for receiving retirement or disability d sab y benefits be e s Research Potential of Linked Social Security Data 15 EExamine i reliability li bili off survey information i f i for f SSA program participation and benefits Compare the health characteristics of early retirees ((age g 62)) to those who postpone p p benefits Policyy analysis y using g validated surveyy data Predicting the number of people who will become disabled based upon survey reported health conditions Determining whether current disability entitlement funding levels will be adequate in the future Linked Medicare and Medicaid Files 16 CMS data spans 1991-2007 THESE UPDATED FILES EXPECTED BY END 2010 Medicare Standard Analytic Files (SAF) Denominator file MEDPAR Hospital outpatient Home Health Agency Hospice Carrier (physician/supplier Part B file) Durable Medical Equipment Summary Medicare Enrollment and Claims Files (SMEC) Linked Medicare and Medicaid Files 17 THESE UPDATED FILES EXPECTED BY MID 2011 Medicare Part D Ch i Condition Chronic C di i Warehouse W h (CCW) summary files fil End Stage Renal Disease (ESRD) Medicaid Analytic eXtract (MAX) Linked Medicare and Medicaid Files 18 NCHS surveys National Health Interview Survey (NHIS,1994-2005) Second Longitudinal Study of Aging (LSOA II, 1994-2000) NHANES I Epidemiologic Follow-up Study (NHEFS, 19711992) NHANES II (1976-1980) NHANES III (1988-1994) NHANES (1999-2004) 2004 National Nursing Home Survey Data access through the NCHS Research Data Center Periods of Coverage for CMS Files 19 Surveys: 1994-1998 NHIS, NHEFS, NHANES II, NHANES III, LSOA II 2 2008 2 Periods of Coverage for CMS Files 20 Surveys: 1999-2005 NHIS, 1999-2004 NHANES, 2004 NNHS Medicare Data Elements 21 Denominator file (1991-2007) Monthly enrollment status Monthly HMO enrollment CMS date of death Standard analytic files (1991-2007) Diagnoses Di codes d Service dates Reimbursement amount Charge amount Medicare Data Elements 22 M Medicare Part D (2006 (2006-2007) 200 ) Denominator file Demographics Enrollment Subsidy Drug information information, ee.g. g low income income, retiree event file Summary extracts, extracts no individual claims Drug costs and payments submitted by Part D plan providers Quantity dispensed and days supply Patient paid amount Medicare Data Elements 23 CCW summary file fil (2005 (2005-2007) 2007) Clinical information for 21 chronic health conditions Acute MI, Alzheimer Alzheimer’s, s, Alzheimer Alzheimer’ss related/senile dementia, atrial fibrillation, cataract, chronic kidney disease, COPD, depression, diabetes, glaucoma, heart failure, hip/pelvic fracture, ischemic heart disease, osteoporosis, RA/OA, Stroke, breast cancer, colorectal cancer, prostate cancer, lung cancer, endometrial cancer Chronic condition flags g Yearly (Dec. 31 end of the reference year) Mid-year (July 1 end of the reference year) Ever (first occurrence of condition) Yearly and mid-year flags provide indication of whether claims and/or coverage were met Medicare Data Elements 24 ESRD (1975-2008) (1975 2008) Combined ESRD Patient Master and Death Notification File ESRD Entitlement/Registration file Primaryy and contributing g causes of ESRD diagnosis g Date of first dialysis and information on kidney transplants Information from Medicare Entitlement and/or Registration (Form 2728) Health insurance status Primary and contributing causes of ESRD diagnosis ESRD-related test results, dialysis information, and kidney transplant status ESRD Patient Event file ESRD related events, e.g. event date, type of event, and patient modality Summary Medicare Data File 25 Summary Medicare S M Enrollment and Claims C file f (SMEC) for 1991-2007 Enrollment information from the Denominator file plus summary variables of claims and payments Variables V i bl modeled d l d after ft MCBS costt and d use files fil Total reimbursements per year Total number of claims by Medicare record type Summary of charges by Medicare record type Termination status & reason for termination Monthly HMO enrollment Medicare status code (i.e. Part A, B or both) Medicaid Data Elements 26 M di id (MAX Medicaid (MAX, 1999 1999-2004) 2004) Person Summary Claims Cl i records d th thatt iinclude l d ffacility ilit ttype, d dates t off service i and d discharge di h status Other Therapy Complete C l hospital h l stay records, d e.g. diagnoses, d procedures, d discharge d h status, length of stay and payment Long Term Care Eligibility, demographics, HMO enrollment, summary of utilization and Medicaid payment by type of service Inpatient Hospitalization Medicaid eligibility, service utilization and payment Claims for all non-institutional non institutional Medicaid services, services e.g. e g physician services, services lab/x-ray and premium payments Prescription Drug Final action claims for paid prescription drugs One procedure code but no diagnoses codes Research Potential of Linked Medicare and Medicaid Data 27 Examine risk factors for health conditions Examine uninsured estimates Examine disparities in Medicare service utilization E Examine i effect ff t off obesity b it or Medicare M di and d Medicaid costs Linked EPA Air Quality Data Files 28 Annual pollution averages collected from air monitoring locations Linked by geographic codes (e.g. county or zip code)) NCHS surveys National Health Inter Interview iew SSurvey r e (1986 (1986-2005) 2005) NHANES III (1986-1994) National Hospital Discharge Survey (1999-2005) Analytic Issues (Medicare selected) 29 Survey respondents ineligible for linkage or unable to be matched Variance estimation Death Linked but no Medicare data Managed care enrollment Issues with administrative data Study feasibility Ineligible Respondents 30 Cannot attempt to link survey records to other data sources if respondent Refused to provide SSN; or Lacks sufficient key y identifying y g information These cases are INELIGIBLE for linkage and MUST BE DROPPED from all analysis They Th are nott a random d sample l off respondents d t Ineligibles: Linked CMS Files 31 % lilinked k d NHIS % ineligible (among eligible) % ineligible % lilinked k d (among eligible) 1994 19 92 LSOA II 19 98 1995 21 91 NHEFS 6 96 1996 25 90 NHANES III 3 96 1997 31 89 NHANES 1998 37 88 1999-2000 17 95 1999 40 86 2001-2002 10 94 2000 42 85 2003-2004 10 95 2001 45 86 2004 NNHS 1 98 2002 26 77 2003 29 75 2004 37 77 2005 40 76 Ineligibles: Linked CMS Files 32 Increasing ineligibility among NHIS But match rate among those eligible remains high No eligibility adjusted sample weights available from NCHS At this time, analysts must determine extent of bias and methods to address Linked Medicare and Medicaid files Identify Id if using i the h variable i bl (CMS (CMS_MATCH) MATCH) on the h Feasibility Study Data Variance Estimation 33 In general,l ffollow ll guidelines d l and d recommendations d off the specific survey P l d analyses Pooled l off NHIS Fil Files Span design periods (1994 and 1995-2005) Create new (consistent) design variables NHIS 1995-1996 1995 1996 treated as if they were statistically independent from the 1997-2005 years Different design periods have different variable names and stratum identifiers have different lengths Follow instructions available at www.cdc.gov/nchs/data/nhis/9705var.pdf Identifying Deaths 34 SSurvey participants interviewed d before b f the h availability l bl of linked Medicare files could have died (e.g. NHEFS, NHANES II or NHANES III ) Survey respondents who died before linked Medicare data can be identified byy merging g g mortalityy information from the Linked Mortality files Needed to create analytic sample Persons may die during study period and cease to have Medicare records Look for a CMS date of death on each of the Denominator or SMEC files No Denominator Record 35 Why does not having a denominator record matter? Lack of denominator record can affect your analytic sample – why? Can’t determine managed care enrollment In general, managed care enrollees are excluded from sample (more on this to come) No Denominator Record 36 D Deceased d Prior to availability of linked Medicare data During periods of coverage for linked files Check CMS date of death L off entitlement Loss titl t check termination codes on Denominator or SMEC files (PART_A_TERM_CD and PART_B_TERM_CD) Check mortalityy through g linked mortalityy files Deceased Non payment of premium Voluntaryy withdrawal Other termination CMS record keeping inconsistencies Denominator record but no claims data 37 Entitled to Medicare, but not utilizing Medicare services during period that linked files cover Entitled to Medicare, but services being used are not reimbursable claims Note: Small fraction of linked participants may ma ha have e claims data but no associated Denominator record Some S Medicare M di beneficiaries b fi i i only l enrolled ll d in i PPartt A CMS record keeping inconsistencies Managed Care Enrollment 38 No claims submitted for N f beneficiaries f enrolled in managed care plans (HMO) Do not have complete information on payments or services received Could C ld miss i hhealth lth events t th thatt are b being i counted t db based d upon submitted claims Complex issue issue. See ResDAC Technical brief at www.resdac.umn.edu/Tools/TBs/TN-009.asp d d /T l /TB /TN 009 Other studies www.resdac.umn.edu/AboutUs/Staff.asp Managed Care Enrollment 39 How managed H d care enrollees ll affect ff your researchh depends upon your question… Studies on reimbursements/charges Option may be to exclude those with any managed care enrollment because you don’t have complete information on payments or services received Studies on health outcomes/events Option may be to exclude those with any managed care enrollment because you could miss events Option may be to censor observations at time of first HMO enrollment Other O h methods h d for f addressing dd i HMO enrollment ll possible ibl depending upon research question Issues with Administrative Data 40 Created for non-research purposes (i.e. program administration) Undocumented variation (e.g. among reporting entities) Administrative data updates p Lack of documentation on data editing and/or p procedures p imputation Very large files which may include many non-relevant variables CMS Feasibility Study Data 41 Public-use data Indicates whether respondent was linked to CMS data (CMS MATCH) (CMS_MATCH) Provides limited variables on whether linked respondent has data on any of administrative files Determine maximum sample size for each linked file Does not contain specific information information, ee.g. g no information about amount or types of claims or benefits Access data from data linkage website Data User Tools 42 File l llayouts & d detailed l d notes Sample SAS & STATA input statements for public-use li k d mortality linked li fil files Matching methodology reports Linkage rates for SSA & CMS linked data Analytic guidelines Feasibility data files for SSA & CMS Files - Download from web Comparative analysis of the public-use and restricteduse linked mortality data Research Examples 43 Ob Obesity: Impact of being uninsured: Examining whether uninsured status before age 65 affects Medicare services Disparities: Determining the impact of obesity on Medicare and Medicaid program expenditures Examining socioeconomic differences in life expectancy Risk factors: Assessing their impact Longevity, BMI, and Lifetime Medicare Expenditures 44 Estimated Lifetime Medicare Expenditures BMI at age 45 Percent survived to age 65 LE at age g 65 years Normal 86.3 19.5 $ 91,144 , Overweight 85.9 17.9 102,595 Ob Obese 78 3 78.3 17 4 17.4 133 483 133,483 Year 2000 dollars, BMI is body mass index (Kg/m2) Data source: National Health and Nutrition Examination Survey Linked Medicare File Fil Insurance Status Prior to Age 65 and Medicare Services 45 Percent difference Private Public Uninsured Uninsured relative to Private p Expenditures (Year 2,000 dollars) $ 4,119 , $ 7,053 , $ 4,409 , 7 71 T l physician Total h i i visits ii 6 05 6.05 7 45 7.45 4 60 4.60 -24 24 23 Office-based 5.56 6.02 3.44 -38 8 Hospital outpatient 0.22 0.66 0.54 145 200 Hospital ER 0.27 0.53 0.38 41 96 Medicare services beginning at age 65 Insurance status before age 65 Public relative to Private Source: Decker S. et al. “Health Health Service Use Among the Previously Uninsured: Is Subsidized Health Insurance Enough?” (under review) Data Source: National Health Interview Survey Linked Medicare Files Gaps in Life Expectancy at Age 25 by Education 46 10 Total = 8.8 years 8 3.6 6 Total = 6.3 years 0.9 1.4 4 1.8 2 3.4 4.0 Men Women 0 < HS to HS HS to Some college Some college to college Effect of Vitamin D on Risk of Hip Fracture 47 1 Vitamin D distribution from NHANES (in quartiles) Relativ ve risk of frracture Q1 Q2 Q3 Q4 0.5 0 Non-Hispanic whites 65 years of age and older *Adjusted for age, sex, femur neck BMD, previous fractures, dietary calcium, kilocalories and weight loss from maximum Source: NHANES III, Medicare Utilization and Expenditure Linked File; Looker et al J Bone Miner Res 2008;23:143. Additional Information 48 Contact us at: datalinkage@cdc.gov C d l k @d NCHS data linkage activities Mortality linkage www cdc gov/nchs/data access/data linkage/mortality htm www.cdc.gov/nchs/data_access/data_linkage/mortality.htm SSA linkage www.cdc.gov/nchs/data_access/data_linkage_activities.htm d / h /d /d li k i ii h www cdc gov/nchs/data access/data linkage/ssa htm www.cdc.gov/nchs/data_access/data_linkage/ssa.htm CMS linkage www.cdc.gov/nchs/data_access/data_linkage/cms.htm www.cdc.gov/nchs/data access/data linkage/cms.htm Contact ResDAC - www.resdac.umn.edu/ 49 Data Access: The Research Data Center 50 Identify a need for restricted data Submit a proposal p p Choose a mode of access A i d an RDC Analyst Assigned A l t Review Committee Compile public data Complete confidentiality training and paperwork Fees Step 1: Identify a need for restricted data 51 Linked Data Products Geography g p y Genetic Variables E t Dates Exact D t (e.g. ( birth, bi th d death, th exam, iinterview) t i ) Detailed Race/Ethnicity Sensitive Variables (e.g. adolescent sexual behavior)) Step 2: Submit a proposal 52 P Purpose off the th Proposal: P l To ensure that researchers have a defined research question. To assess disclosure risk based on types of output and requested restricted variables. Feasibility To determine the restricted variables needed for the project. project To determine the mode of access and required software. Approval does NOT provide: Endorsement by NCHS Approval of scientific merit Indication of whether or not there are multiple projects looking at the same question q Step 3: Choose a mode of access 53 NCHS RDC Census RDC Remote Access System (ANDRE) St ff Assisted Staff A it d Combination of any of the above We provide access to the data We never give you the dataset Step 3: Choose a mode of access 54 NCHS RDC Hyattsville, MD Hyattsville Atlanta, GA Census RDC Many software M f options (STATA, (S A A SAS, SAS SUDAAN, R, M+, HLM, ArcGIS) 10+ locations Many software options (STATA, SAS, SUDAAN RR, HLM) SUDAAN, See the dataset Receive output within 3 weeks S the See h dataset d Personal interaction with RDC staff Receive output within 5 days $300/day No daily cost if associated with the member organizations Requires Special Sworn Status Step 3: Choose a mode of access 55 Remote Access System (ANDRE) Submit programs from personal/work computer Staff Assisted Receive output within 2-24 2 24 hours Limited to SAS and SUDAAN, some procedures are restricted Cannot see the dataset $750/month S Supplement l to ANDRE Submit programs not allowed by ANDRE N Submit programs using a different software $750/any fraction of day Step 4: Assigned an RDC Analyst 56 Roles of the Analyst: Review proposal for completeness, feasibility, disclosure risk Facilitate communication between researcher and Committee Merge restricted variables to public data provided by the researcher based on unique identifier Coordinate visits to NCHS RDCs Review output for disclosure risk Step 5: Review Committee 57 RReview i Committee C i (reviews ( i for f di disclosure l and d availability of the data) RDC Analyst RDC Director Data System Representative Linkage Representative (if applicable) Confidentiality Officer D ii Decisions: Approve Resubmit Reject (poses a disclosure risk, data do not exist) Step 6: Compile public data 58 Wh Does Why D the th Researcher R h Compile C il the th Public P bli Data? D t ? Familiarizes researcher with the data Maintains efficiency of RDC Helpful Hints: Recommend doing this before submitting the proposal Proc contents = data dictionary Cannot include all variables – only those needed to answer this specific research question Try to include everything necessary, so we don’t have to keep updating the data set Decreases efficiency and increases cost Step 7: Complete confidentiality training and paperwork 59 Wh ? Who? Paperwork Primary Investigator, programmer, anyone else coming in contact with the data 3 forms, f one must be b notarized i d Completed for every new project Confidentiality Orientation 10-15 minute online course One time ever Reviews policies and procedures Step 8: Fees 60 Set up fee = $750/day of set up Access fees: p per dayy on site,, per p month remote access Payment must be received in advance Step 9: Analyze data and submit output for review 61 Data never leave the secure access points Released output must be reviewed first by RDC Analyst/ANDRE able to answer the question in your proposal final output intended for publication Emailed l d to you Data, Programs Data Programs, and all Output O tp t remain on the computer comp ter for 1-2 years RDC Resources 62 PPeter Meyer, M MPH, MPH MA Director, Research Data Center 301-458-4375 pmeyer1@cdc.gov Stephanie Robinson Robinson, MPH Health Research Analyst 770-488-5092 srobinson7@cdc.gov Gaps in Coverage: Example 63 NHANES III baseline data Phase 1 Phase 2 Annual Medicare enrollment and claims data No Medicare information 1988 1991 1994 1997 2000