NCHS LINKED DATA FILES: A RESOURCE FOR RESEARCH AND POLICY

advertisement
NCHS LINKED DATA FILES:
A RESOURCE FOR RESEARCH AND POLICY
Academy Health Research Meeting
June 27,
27 2010
U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES
Centers for Disease Control and Prevention
National Center for Health Statistics
Overview
2




NCHS Data Linkage Activities
Analytic
y Issues
Research Examples
D t Access
Data
A
NCHS Record Linkage Program
3



Links survey data with data collected from
administrative records
Designed to maximize the scientific value of the
NCHS population-based surveys
Examine factors that influence chronic disease,
disability, health care utilization, morbidity, and
mortality
NCHS Linkage Activities
4

Mortality
 National

Retirement and Disability
 Social

Security Administration (SSA)
Medicare and Medicaid enrollment and claims
 Centers
C

Death Index (NDI)
for
f Medicare
M di
and
d Medicaid
M di id Services
S i (CMS)
Air Quality
 Environmental
Protection Agency (EPA)
Research Potential of
NCHS Linked Data
5

Di
Disparities
ii


Health services


Effects of chronic illness and obesity on disability and
mortality
Methodologic studies


Functional impairment and health care costs
Disability


Mortality patterns by race/ethnicity or socioeconomic status
Validation of self-reports vs. administrative records
E i
Environmental
lH
Health
lh

Effects of air pollution on health outcomes
6
Linked Mortality Files:
Mortality Follow-up through 2006
Restricted-use
Public-use
National Health Interview Survey (1986-2004)
X
X
Second Longitudinal Study of Aging (LSOA II)
X
X
NHANES I Epidemiologic Follow-up Study
(NHEFS 1971-1992)
(NHEFS,
X
NHANES II (1976-1980)
X
NHANES III (1988-1994)
X
X
NHANES (1999-2004)
X
X
1985, 1995, 1997 National Nursing Home
Surveys (NNHS)
X
2004 National Nursing Home Survey (NNHS)
X
X
Restricted-use
Restricted
use vs. Public
Public-use
use Files
7

Restricted-use files
 Mortality
 Includes
follow-up for all eligible survey participants
children for the NHIS, NHANES II, NHANES III,
NHANES 1999-2004, and NNHS
 Detailed date of death, birth date, and interview date
 Underlying and multiple cause of death information
 No perturbed information
Restricted-use
Restricted
use vs. Public
Public-use
use Files
8

Public-use files
f
 Mortality
follow-up for all eligible adult survey
participants
ii
 Date
of death limited to quarter/year or a calculated
follow-up
follow
up variable
 Underlying cause of death

MCOD information for diabetes, hypertension, and hip fracture
 Perturbed
information for date or cause of death for
selected decedents

Vital status not perturbed
 Comparative
analyses published on NCHS website
NCHS Linked Mortality Files
9
NCHS Survey
NHIS 1986-2004
Total Deaths through 2006
181,188
LSOA
SO II
5,806
806
NHEFS
8,002
NHANES II
5 276
5,276
NHANES III
5,478
NHANES 1999-2000
611
NHANES 2001-2002
470
NHANES 2003-2004
276
2004 NNHS
6,767
1997 NNHS
12,525
1995 NNHS
6 778
6,778
1985 NNHS
10,226
Linked Social Security Files
10
Social Security data spans 1953-2007
 Old Age,
g , Survivors,, and Disability
y Insurance
(OASDI)
 Master
Beneficiary Record (MBR)
 Payment History Update System (PHUS)
 831 Disability Master File (831 DMF)
 Quarters of Coverage (Master Earnings File)

Supplemental Security Income (SSI)
 Supplemental
Security Record (SSR)
Linked Social Security Files
11

NC S surveys
NCHS
National Health Interview Survey (NHIS, 1994-2005)
 Second
S
d Longitudinal
L i di l Study
S d off Aging
A i (LSOA II,
II 1994-2000)
1994 2000)
 NHANES I Epidemiologic Follow-up Study (NHEFS, 19711992)
 NHANES III (1988-1994)
 NHANES (1999
(1999-2004)
2004)
 National Nursing Home Survey (1985, 1995, 1997, 2004)


Data access through the NCHS Research Data Center
Periods of Coverage for SSA Files
12
Social Security Data Elements
13

MBR (1962-2007) and SSR (1974 -2007) files
 Type
of claimant
 Benefit amount
 Filing
g and eligibility
g
y status
 Dual entitlement

PHUS file (1984 -2007)
2007)
 Actual
monthly amount that a beneficiary received
 Amount of monthly Medicare premium
Social Security Data Elements
14

831 Disability Master file (1989-2007)
 Status
of disability application, e.g. filing date
 Diagnosis coding, e.g. primary and secondary
impairments

Quarters of Coverage (1953-2007)
 Summarized
variables indicating an individual’s
individual s insured
status, i.e. eligibility for receiving retirement or
disability
d
sab y benefits
be e s
Research Potential of
Linked Social Security Data
15



EExamine
i reliability
li bili off survey information
i f
i for
f SSA
program participation and benefits
Compare the health characteristics of early retirees
((age
g 62)) to those who postpone
p p
benefits
Policyy analysis
y using
g validated surveyy data
Predicting the number of people who will become disabled
based upon survey reported health conditions
 Determining whether current disability entitlement funding
levels will be adequate in the future

Linked Medicare and Medicaid Files
16
CMS data spans 1991-2007
THESE UPDATED FILES EXPECTED BY END 2010

Medicare Standard Analytic Files (SAF)








Denominator file
MEDPAR
Hospital outpatient
Home Health Agency
Hospice
Carrier (physician/supplier Part B file)
Durable Medical Equipment
Summary Medicare Enrollment and Claims Files (SMEC)
Linked Medicare and Medicaid Files
17
THESE UPDATED FILES EXPECTED BY MID 2011




Medicare Part D
Ch i Condition
Chronic
C di i Warehouse
W h
(CCW) summary files
fil
End Stage Renal Disease (ESRD)
Medicaid Analytic eXtract (MAX)
Linked Medicare and Medicaid Files
18

NCHS surveys
National Health Interview Survey (NHIS,1994-2005)
 Second Longitudinal Study of Aging (LSOA II, 1994-2000)
 NHANES I Epidemiologic Follow-up Study (NHEFS, 19711992)
 NHANES II (1976-1980)
 NHANES III (1988-1994)
 NHANES (1999-2004)
 2004 National Nursing Home Survey


Data access through the NCHS Research Data Center
Periods of Coverage for CMS Files
19
Surveys: 1994-1998 NHIS, NHEFS, NHANES II, NHANES III, LSOA II
2
2008
2
Periods of Coverage for CMS Files
20
Surveys: 1999-2005 NHIS, 1999-2004 NHANES, 2004 NNHS
Medicare Data Elements
21

Denominator file (1991-2007)
 Monthly
enrollment status
 Monthly HMO enrollment
 CMS date of death

Standard analytic files (1991-2007)
 Diagnoses
Di
codes
d
 Service dates
 Reimbursement amount
 Charge amount
Medicare Data Elements
22

M
Medicare
Part D (2006
(2006-2007)
200 )
 Denominator
file
 Demographics
 Enrollment
 Subsidy
 Drug
information
information, ee.g.
g low income
income, retiree
event file
 Summary
extracts,
extracts no individual claims
 Drug costs and payments submitted by Part D plan
providers
 Quantity dispensed and days supply
 Patient paid amount
Medicare Data Elements
23

CCW summary file
fil (2005
(2005-2007)
2007)

Clinical information for 21 chronic health conditions


Acute MI, Alzheimer
Alzheimer’s,
s, Alzheimer
Alzheimer’ss related/senile dementia,
atrial fibrillation, cataract, chronic kidney disease, COPD,
depression, diabetes, glaucoma, heart failure, hip/pelvic fracture,
ischemic heart disease, osteoporosis, RA/OA, Stroke, breast
cancer, colorectal cancer, prostate cancer, lung cancer,
endometrial cancer
Chronic condition flags
g
Yearly (Dec. 31 end of the reference year)
 Mid-year (July 1 end of the reference year)
 Ever (first occurrence of condition)


Yearly and mid-year flags provide indication of whether
claims and/or coverage were met
Medicare Data Elements
24

ESRD (1975-2008)
(1975 2008)

Combined ESRD Patient Master and Death Notification File



ESRD Entitlement/Registration file





Primaryy and contributing
g causes of ESRD diagnosis
g
Date of first dialysis and information on kidney transplants
Information from Medicare Entitlement and/or Registration
(Form 2728)
Health insurance status
Primary and contributing causes of ESRD diagnosis
ESRD-related test results, dialysis information, and kidney transplant
status
ESRD Patient Event file

ESRD related events, e.g. event date, type of event, and patient
modality
Summary Medicare Data File
25

Summary Medicare
S
M
Enrollment and Claims
C
file
f
(SMEC) for 1991-2007
 Enrollment
information from the Denominator file plus
summary variables of claims and payments
 Variables
V i bl modeled
d l d after
ft MCBS costt and
d use files
fil
 Total
reimbursements per year
 Total number of claims by Medicare record type
 Summary of charges by Medicare record type
 Termination status & reason for termination
 Monthly HMO enrollment
 Medicare status code (i.e. Part A, B or both)
Medicaid Data Elements
26

M di id (MAX
Medicaid
(MAX, 1999
1999-2004)
2004)


Person Summary


Claims
Cl
i records
d th
thatt iinclude
l d ffacility
ilit ttype, d
dates
t off service
i and
d discharge
di h
status
Other Therapy


Complete
C
l
hospital
h
l stay records,
d e.g. diagnoses,
d
procedures,
d
discharge
d h
status,
length of stay and payment
Long Term Care


Eligibility, demographics, HMO enrollment, summary of utilization and
Medicaid payment by type of service
Inpatient Hospitalization


Medicaid eligibility, service utilization and payment
Claims for all non-institutional
non institutional Medicaid services,
services e.g.
e g physician services,
services
lab/x-ray and premium payments
Prescription Drug


Final action claims for paid prescription drugs
One procedure code but no diagnoses codes
Research Potential of Linked Medicare
and Medicaid Data
27




Examine risk factors for health conditions
Examine uninsured estimates
Examine disparities in Medicare service utilization
E
Examine
i effect
ff t off obesity
b it or Medicare
M di
and
d
Medicaid costs
Linked EPA Air Quality Data Files
28



Annual pollution averages collected from air
monitoring locations
Linked by geographic codes (e.g. county or zip
code))
NCHS surveys
 National
Health Inter
Interview
iew SSurvey
r e (1986
(1986-2005)
2005)
 NHANES III (1986-1994)
 National Hospital Discharge Survey (1999-2005)
Analytic Issues (Medicare selected)
29







Survey respondents ineligible for linkage or unable
to be matched
Variance estimation
Death
Linked but no Medicare data
Managed care enrollment
Issues with administrative data
Study feasibility
Ineligible Respondents
30

Cannot attempt to link survey records to other data
sources if respondent
 Refused
to provide SSN; or
 Lacks sufficient key
y identifying
y g information

These cases are INELIGIBLE for linkage and MUST
BE DROPPED from all analysis
 They
Th
are nott a random
d sample
l off respondents
d t
Ineligibles: Linked CMS Files
31
% lilinked
k d
NHIS % ineligible (among eligible)
% ineligible
% lilinked
k d
(among eligible)
1994
19
92
LSOA II
19
98
1995
21
91
NHEFS
6
96
1996
25
90
NHANES III
3
96
1997
31
89
NHANES
1998
37
88
1999-2000
17
95
1999
40
86
2001-2002
10
94
2000
42
85
2003-2004
10
95
2001
45
86
2004 NNHS
1
98
2002
26
77
2003
29
75
2004
37
77
2005
40
76
Ineligibles: Linked CMS Files
32

Increasing ineligibility among NHIS
 But

match rate among those eligible remains high
No eligibility adjusted sample weights available
from NCHS
 At
this time, analysts must determine extent of bias and
methods to address

Linked Medicare and Medicaid files
 Identify
Id if
using
i the
h variable
i bl (CMS
(CMS_MATCH)
MATCH) on the
h
Feasibility Study Data
Variance Estimation
33


In general,l ffollow
ll guidelines
d l
and
d recommendations
d
off
the specific survey
P l d analyses
Pooled
l
off NHIS Fil
Files

Span design periods (1994 and 1995-2005)


Create new (consistent) design variables


NHIS 1995-1996
1995 1996 treated as if they were statistically independent
from the 1997-2005 years
Different design periods have different variable names and
stratum identifiers have different lengths
Follow instructions available at

www.cdc.gov/nchs/data/nhis/9705var.pdf
Identifying Deaths
34

SSurvey participants interviewed
d before
b f
the
h availability
l bl
of linked Medicare files could have died (e.g. NHEFS,
NHANES II or NHANES III )
Survey respondents who died before linked Medicare data
can be identified byy merging
g g mortalityy information from the
Linked Mortality files
 Needed to create analytic sample


Persons may die during study period and cease to have
Medicare records

Look for a CMS date of death on each of the Denominator
or SMEC files
No Denominator Record
35


Why does not having a denominator record matter?
Lack of denominator record can affect your
analytic sample – why?
 Can’t
determine managed care enrollment
 In general, managed care enrollees are excluded from
sample (more on this to come)
No Denominator Record
36

D
Deceased
d

Prior to availability of linked Medicare data


During periods of coverage for linked files


Check CMS date of death
L off entitlement
Loss
titl
t

check termination codes on Denominator or SMEC files
(PART_A_TERM_CD and PART_B_TERM_CD)





Check mortalityy through
g linked mortalityy files
Deceased
Non payment of premium
Voluntaryy withdrawal
Other termination
CMS record keeping inconsistencies
Denominator record
but no claims data
37



Entitled to Medicare, but not utilizing Medicare
services during period that linked files cover
Entitled to Medicare, but services being used are
not reimbursable claims
Note:
 Small
fraction of linked participants may
ma ha
have
e claims
data but no associated Denominator record
 Some
S
Medicare
M di
beneficiaries
b fi i i only
l enrolled
ll d in
i PPartt A

CMS record keeping inconsistencies
Managed Care Enrollment
38

No claims submitted for
N
f beneficiaries
f
enrolled in
managed care plans (HMO)
 Do
not have complete information on payments or
services received
 Could
C ld miss
i hhealth
lth events
t th
thatt are b
being
i counted
t db
based
d
upon submitted claims

Complex issue
issue. See ResDAC
 Technical
brief at
www.resdac.umn.edu/Tools/TBs/TN-009.asp
d
d /T l /TB /TN 009
 Other studies www.resdac.umn.edu/AboutUs/Staff.asp
Managed Care Enrollment
39

How managed
H
d care enrollees
ll
affect
ff your researchh
depends upon your question…

Studies on reimbursements/charges


Option may be to exclude those with any managed care
enrollment because you don’t have complete information on
payments or services received
Studies on health outcomes/events
Option may be to exclude those with any managed care
enrollment because you could miss events
 Option may be to censor observations at time of first HMO
enrollment


Other
O
h methods
h d for
f addressing
dd
i HMO enrollment
ll
possible
ibl
depending upon research question
Issues with Administrative Data
40

Created for non-research purposes (i.e. program
administration)
 Undocumented
variation (e.g. among reporting entities)
 Administrative data updates
p
 Lack of documentation on data editing and/or
p
procedures
p
imputation
 Very large files which may include many non-relevant
variables
CMS Feasibility Study Data
41

Public-use data
 Indicates
whether respondent was linked to CMS data
(CMS MATCH)
(CMS_MATCH)
 Provides limited variables on whether linked respondent
has data on any of administrative files
 Determine


maximum sample size for each linked file
Does not contain specific information
information, ee.g.
g no
information about amount or types of claims or
benefits
Access data from data linkage website
Data User Tools
42







File
l llayouts & d
detailed
l d notes
Sample SAS & STATA input statements for public-use
li k d mortality
linked
li fil
files
Matching methodology reports
Linkage rates for SSA & CMS linked data
Analytic guidelines
Feasibility data files for SSA & CMS Files - Download
from web
Comparative analysis of the public-use and restricteduse linked mortality data
Research Examples
43

Ob
Obesity:


Impact of being uninsured:


Examining whether uninsured status before age 65 affects
Medicare services
Disparities:


Determining the impact of obesity on Medicare and
Medicaid program expenditures
Examining socioeconomic differences in life expectancy
Risk factors:

Assessing their impact
Longevity, BMI, and Lifetime Medicare
Expenditures
44
Estimated
Lifetime
Medicare
Expenditures
BMI
at age 45
Percent
survived
to age 65
LE at
age
g 65
years
Normal
86.3
19.5
$ 91,144
,
Overweight
85.9
17.9
102,595
Ob
Obese
78 3
78.3
17 4
17.4
133 483
133,483
Year 2000 dollars, BMI is body mass index (Kg/m2)
Data source: National Health and Nutrition Examination Survey Linked Medicare
File
Fil
Insurance Status Prior to Age 65 and
Medicare Services
45
Percent difference
Private
Public
Uninsured
Uninsured
relative to
Private
p
Expenditures
(Year 2,000 dollars)
$ 4,119
,
$ 7,053
,
$ 4,409
,
7
71
T l physician
Total
h i i visits
ii
6 05
6.05
7 45
7.45
4 60
4.60
-24
24
23
Office-based
5.56
6.02
3.44
-38
8
Hospital outpatient
0.22
0.66
0.54
145
200
Hospital ER
0.27
0.53
0.38
41
96
Medicare services
beginning at age 65
Insurance status before age 65
Public
relative to
Private
Source: Decker S. et al. “Health
Health Service Use Among the Previously Uninsured: Is Subsidized
Health Insurance Enough?” (under review)
Data Source: National Health Interview Survey Linked Medicare Files
Gaps in Life Expectancy at Age 25 by
Education
46
10
Total = 8.8 years
8
3.6
6
Total = 6.3 years
0.9
1.4
4
1.8
2
3.4
4.0
Men
Women
0
< HS to HS
HS to Some college
Some college to college
Effect of Vitamin D on Risk of
Hip Fracture
47
1
Vitamin D distribution
from NHANES (in quartiles)
Relativ
ve risk of frracture
Q1
Q2
Q3
Q4
0.5
0
Non-Hispanic whites 65 years of age and older
*Adjusted for age, sex, femur neck BMD, previous fractures, dietary calcium, kilocalories and
weight loss from maximum
Source: NHANES III, Medicare Utilization and Expenditure Linked File; Looker et al J Bone Miner Res
2008;23:143.
Additional Information
48


Contact us at: datalinkage@cdc.gov
C
d l k
@d
NCHS data linkage activities


Mortality linkage


www cdc gov/nchs/data access/data linkage/mortality htm
www.cdc.gov/nchs/data_access/data_linkage/mortality.htm
SSA linkage


www.cdc.gov/nchs/data_access/data_linkage_activities.htm
d
/ h /d
/d
li k
i ii h
www cdc gov/nchs/data access/data linkage/ssa htm
www.cdc.gov/nchs/data_access/data_linkage/ssa.htm
CMS linkage


www.cdc.gov/nchs/data_access/data_linkage/cms.htm
www.cdc.gov/nchs/data
access/data linkage/cms.htm
Contact ResDAC - www.resdac.umn.edu/
49
Data Access: The Research Data Center
50








Identify a need for restricted data
Submit a proposal
p p
Choose a mode of access
A i d an RDC Analyst
Assigned
A l t
Review Committee
Compile public data
Complete confidentiality training and paperwork
Fees
Step 1: Identify a need for restricted
data
51






Linked Data Products
Geography
g p y
Genetic Variables
E t Dates
Exact
D t (e.g.
(
birth,
bi th d
death,
th exam, iinterview)
t i )
Detailed Race/Ethnicity
Sensitive Variables (e.g. adolescent sexual
behavior))
Step 2: Submit a proposal
52

P
Purpose
off the
th Proposal:
P
l






To ensure that researchers have a defined research question.
To assess disclosure risk based on types of output and requested
restricted variables.
Feasibility
To determine the restricted variables needed for the project.
project
To determine the mode of access and required software.
Approval does NOT provide:



Endorsement by NCHS
Approval of scientific merit
Indication of whether or not there are multiple projects looking at
the same question
q
Step 3: Choose a mode of access
53







NCHS RDC
Census RDC
Remote Access System (ANDRE)
St ff Assisted
Staff
A it d
Combination of any of the above
We provide access to the data
We never give you the dataset
Step 3: Choose a mode
of access
54
NCHS RDC


Hyattsville, MD
Hyattsville
Atlanta, GA
Census RDC





Many software
M
f
options (STATA,
(S A A SAS,
SAS
SUDAAN, R, M+, HLM, ArcGIS)
10+ locations
Many software options (STATA, SAS,
SUDAAN RR, HLM)
SUDAAN,

See the dataset

Receive output within 3 weeks
S the
See
h dataset
d
Personal interaction with RDC staff


Receive output within 5 days

$300/day

No daily cost if associated with the
member organizations
Requires Special Sworn Status
Step 3: Choose a mode
of access
55
Remote Access System (ANDRE)

Submit programs from
personal/work computer
Staff Assisted



Receive output within 2-24
2 24 hours


Limited to SAS and SUDAAN, some
procedures are restricted


Cannot see the dataset

$750/month
S
Supplement
l
to ANDRE
Submit programs not allowed by
ANDRE
N
Submit programs using a
different software
$750/any fraction of day
Step 4: Assigned an RDC
Analyst
56

Roles of the Analyst:
 Review
proposal for completeness, feasibility, disclosure
risk
 Facilitate communication between researcher and
Committee
 Merge restricted variables to public data provided by
the researcher based on unique identifier
 Coordinate visits to NCHS RDCs
 Review output for disclosure risk
Step 5: Review Committee
57

RReview
i Committee
C
i
(reviews
( i
for
f di
disclosure
l
and
d
availability of the data)
RDC Analyst
 RDC Director
 Data System Representative
 Linkage Representative (if applicable)
 Confidentiality Officer


D ii
Decisions:
Approve
 Resubmit
 Reject (poses a disclosure risk, data do not exist)

Step 6: Compile public data
58

Wh Does
Why
D
the
th Researcher
R
h Compile
C
il the
th Public
P bli Data?
D t ?



Familiarizes researcher with the data
Maintains efficiency of RDC
Helpful Hints:

Recommend doing this before submitting the proposal




Proc contents = data dictionary
Cannot include all variables – only those needed to answer this
specific research question
Try to include everything necessary, so we don’t have to keep
updating the data set
Decreases efficiency and increases cost
Step 7: Complete confidentiality
training and paperwork
59

Wh ?
Who?


Paperwork



Primary Investigator, programmer, anyone else coming in contact
with the data
3 forms,
f
one must be
b notarized
i d
Completed for every new project
Confidentiality Orientation



10-15 minute online course
One time ever
Reviews policies and procedures
Step 8: Fees
60



Set up fee = $750/day of set up
Access fees: p
per dayy on site,, per
p month remote
access
Payment must be received in advance
Step 9: Analyze data and submit
output for review
61


Data never leave the secure access points
Released output must be





reviewed first by RDC Analyst/ANDRE
able to answer the question in your proposal
final output intended for publication
Emailed
l d to you
Data, Programs
Data
Programs, and all Output
O tp t remain on the computer
comp ter for
1-2 years
RDC Resources
62
PPeter Meyer,
M
MPH,
MPH MA
Director, Research Data Center
301-458-4375
pmeyer1@cdc.gov
Stephanie Robinson
Robinson, MPH
Health Research Analyst
770-488-5092
srobinson7@cdc.gov
Gaps in Coverage: Example
63
NHANES III baseline data
Phase 1
Phase 2
Annual Medicare enrollment and claims data
No Medicare
information
1988
1991
1994
1997
2000
Download