Using the Gateway to Global Aging Data for Cross

advertisement
Using the Gateway to Global Aging Data for Cross-Country Analysis
A sample analysis for the APRU Data Workshop
This document provides users with examples of how to set up and perform some simple descriptive
analyses with the RAND Harmonized data products, including the RAND HRS and the H SHARE, using
Stata. This sample analysis assumes that the user has already created an H SHARE file for analysis.
Sample Analysis
Research question: Does retirement affect self-report of health?
Analysis parameters: Our analysis will look at changes in self-report of health for respondents who were
working in 2004 but were retired in 2006. Comparing self-reported health over time for these persons
will help us determine whether retirement affects self-reported health. We can also analyze whether
the change in self-report of health before and after retirement is the same for all countries or differs by
country.
Part 1 – Creating cross-national dataset
Our cross-national dataset will contain respondents from the RAND HRS and the H SHARE. Using the
Gateway to Global Aging Data, we will create a cross-national dataset with the measures required to
identify our subpopulations and measure changes in self-report of health at retirement.
1. List the location of the RAND HRS Stata dataset on your computer in your user profile on the
Gateway to Global Aging Data. Select the Data Options link from the top right user menu. Using
the “+ add data locations” button, list the location of the RAND HRS dataset file by adding the
data location for the RAND HRS. Make sure your data set location ends in either a forward slash
(/) or a backslash (\), depending on your operating system.
2. Add RAND HRS variables of interest to your item cart. For our analysis of whether self-reported
health changed with retirement between 2004 and 2006, we will be using the following HRS
Wave 7 and Wave 8 variables.
 r7lbrf – W7 R Labor Force Status
 r8lbrf – W8 R Labor Force Status
 r7shlt - W7 Self-report of health
 r8shlt – W8 Self-report of health
 r8wtresp – W8 Person-Level Analysis Weight
The RAND HRS does not include longitudinal weights so, for this analysis, we will use the
provided analysis weight. For a more accurate cross-country analysis, we would need to create
and use a longitudinal weight.
3. Add H SHARE variables of interest to your item cart. For this analysis of whether self-reported
health changed with retirement between 2004 and 2006, you will need the following SHARE
Wave 1 and Wave 2 variables.
 r1lbrf_s – W1 R Labor Force Status
 r2lbrf_s – W2 R Labor Force Status
 r1shlt – W1 Self-report of health
 r2shlt – W2 Self-report of health
 r2lwtresp – W2 Respondent-Level Longitudinal Weight, combined sample
1
4. Generate a cross-national dataset using the Gateway to Global Aging Data. To do this, once you
have added all variables of interest to the item cart, click on Item Cart in the right user menu
and select all variables added from the RAND HRS and H SHARE. Then click the “create .do file”
link below the item cart. If prompted by your browser to either save or open the .do file, opt to
open the .do file using Stata. If your browser automatically downloads the .do file, open the .do
file using Stata. Stata should begin to run the .do program without any manipulation of the .do
file. Once finished, Stata should be loaded with all variables selected in your item cart as well as
with survey-specific identifiers and country/survey identifiers.
5. If you prefer to analyze this cross-country dataset in a statistical package other than Stata, you
can save and convert the Stata dataset using the Stata save command and a program such as
Stat/Transfer. If you do not have access to Stat/Transfer you can usually read the .dta dataset
into your stat package using an “import” or “get” function. When reading a .dta dataset into
another package, it is best to first save the dataset in a Stata version 9 dataset format using the
command saveold.
Part 2 – Identifying population of interest
Our subsample of interest is HRS and SHARE respondents who reported they were working in 2004
interview and reported they were retired in the 2006 interview. We have already looked at how SHARE
surveys respondent’s employment status but let’s also check the coding for the RAND HRS employment
measures. Because we are interested in respondent-level data and interested in the values for HRS
Wave 7 (2004) and Wave 8 (2006) we will be using RAND HRS variables r7lbrf and r8lbrf to
identify our subsample. The coding of these variables can be found in the RAND HRS Codebook or by
using a tab statement in Stata.
tab r7lbrf, m
tab r8lbrf, m
r7lbrf:w7 r
labor force
status
Freq.
Percent
Cum.
1.works ft
2.works pt
3.unemployed
4.partly ret
5.retired
6.disabled
7.not in lbrf
.
5,182
1,119
313
1,540
9,451
563
1,961
102,614
4.22
0.91
0.26
1.25
7.70
0.46
1.60
83.60
4.22
5.13
5.39
6.64
14.34
14.80
16.40
100.00
Total
122,743
100.00
2
r8lbrf:w8 r
labor force
status
Freq.
Percent
Cum.
1.works ft
2.works pt
3.unemployed
4.partly ret
5.retired
6.disabled
7.not in lbrf
.
4,222
933
215
1,487
9,480
519
1,613
104,274
3.44
0.76
0.18
1.21
7.72
0.42
1.31
84.95
3.44
4.20
4.37
5.59
13.31
13.73
15.05
100.00
Total
122,743
100.00
We can see that HRS respondents are surveyed about employment using different categories than
SHARE respondents. We will need to identity our subsample using one criteria for the HRS respondents
and one criteria for the SHARE respondents. Let’s look at our subsample by country. We can use the
automatically included H variable isocountry to identify all countries included in our dataset.
gen subsamp=0
replace subsamp=1 if (inlist(r7lbrf,1,2) & inlist(r8lbrf,4,5)) | ///
(r1lbrf_s==1 & r2lbrf_s==5)
label variable subsamp "Subsample flag: newly retired"
tab isocountry subsamp
UN numerical country
code
Subsample flag: newly
retired
0
1
Total
040.Austria
056.Belgium
203.Czech Republic
208.Denmark
233.Estonia
250.France
276.Germany
300.Greece
348.Hungary
372.Ireland
376.Israel
380.Italy
528.Netherlands
616.Poland
620.Portugal
705.Slovenia
724.Spain
752.Sweden
756.Switzerland
840.United States of
6,164
7,097
7,643
3,491
6,828
7,787
4,008
3,815
3,076
1,134
3,096
5,205
4,673
2,684
2,080
2,756
5,149
3,785
4,416
36,012
50
82
0
71
0
71
85
44
0
0
137
76
68
0
0
0
39
117
30
974
6,214
7,179
7,643
3,562
6,828
7,858
4,093
3,859
3,076
1,134
3,233
5,281
4,741
2,684
2,080
2,756
5,188
3,902
4,446
36,986
Total
120,899
1,844
122,743
3
As the above table indicates, the HRS contains a large number of respondents who fit our sample
criteria.
Part 3 – Adjusting for multiple country sampling and applying weights
Because the unit of interest is the respondent, we use respondent-level weights. Because our analysis is
longitudinal, we use longitudinal weights where provided (and would derive a longitudinal weight for
HRS respondents for a more serious analysis). Because sampling procedures differ in each country, we
also allow for survey design by country by treating countries as strata and all households as an
independent but unequally weighted sample within the country.
gen weight=.
replace weight = r8wtresp if survey=="HRS"
replace weight = r2lwtresp if survey=="SHARE"
svyset [pw=weight], strata(isocountry)
svydes
Survey: Describing stage 1 sampling units
pweight:
VCE:
Single unit:
Strata 1:
SU 1:
FPC 1:
weight
linearized
missing
isocountry
<observations>
<zero>
#Obs per Unit
Stratum
#Units
#Obs
min
mean
max
40
56
208
250
276
300
380
528
724
752
756
840
1092
2687
1183
1902
1518
2098
1726
1724
1341
1974
664
18469
1092
2687
1183
1902
1518
2098
1726
1724
1341
1974
664
18469
1
1
1
1
1
1
1
1
1
1
1
1
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1
1
1
1
1
1
1
1
1
1
1
1
12
36378
36378
1
1.0
1
86365 = #Obs with missing values in the
survey characteristics
122743
4
Part 4 – Estimating employed self-report of health and retired self-report of health
The HRS and the SHARE both measure self-report of health using the same 1 (excellent) to 5 (poor)
scale. We have already looked at how the H SHARE codes employment status but let’s also check the
coding for the RAND HRS. Because we are interested in respondent-level data and interested in the
values for Wave 7 (2004) and Wave 8 (2006) we will be using RAND HRS variables r7shlt and r8shlt
to identify changes in self-report of health. Let’s check their coding in Stata.
tab r7shlt, m
tab r8shlt, m
r7shlt:w7
self-report
of health
Freq.
Percent
Cum.
1. excellent
2. very good
3. good
4. fair
5. poor
.
.d
.r
2,363
5,476
6,280
4,135
1,858
102,614
13
4
1.93
4.46
5.12
3.37
1.51
83.60
0.01
0.00
1.93
6.39
11.50
14.87
16.39
99.99
100.00
100.00
Total
122,743
100.00
r8shlt:w8
self-report
of health
Freq.
Percent
Cum.
1. excellent
2. very good
3. good
4. fair
5. poor
.
.d
.m
.r
2,032
5,261
5,623
3,874
1,654
104,274
23
1
1
1.66
4.29
4.58
3.16
1.35
84.95
0.02
0.00
0.00
1.66
5.94
10.52
13.68
15.03
99.98
100.00
100.00
100.00
Total
122,743
100.00
5
We can see that the RAND HRS uses the same set of codes as does the H SHARE (as is indicated by the
use of the same variable name). To assess change in self-reported health, we first produce a measure of
change health between the 2004 and the 2006 interviews.
gen shltch=.
replace shltch=r8shlt-r7shlt if survey=="HRS"
replace shltch=r2shlt-r1shlt if survey=="SHARE"
sum shltch
. sum shltch
Variable
Obs
Mean
shltch
38140
.1219979
Std. Dev.
.931309
Min
Max
-4
4
As with our SHARE analysis, a negative value for our change variable indicates that the respondent
reported better health in 2006 than in 2004 and a positive value indicates that the respondent reported
worse health.
Next, we produce population estimates of the change in self-reported health for our subsample.
Because we have already produced estimates for France and Sweden, let’s use the Stata svy mean
dialog to produce an estimate for the U.S. population.
svy, subpop(if subsamp & isocountry==840): mean shltch
Survey: Mean estimation
Number of strata =
Number of PSUs
=
1
16958
Mean
shltch
.0874471
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
Linearized
Std. Err.
.0355286
=
=
=
=
=
18466
77783565
944
4452968
16957
[95% Conf. Interval]
.0178073
.1570869
Note: 11 strata omitted because they contain no subpopulation
members.
We see that newly-retired U.S. respondents reported, on average, a decline in health after retirement,
but the decline appears to be much smaller than that for newly-retired respondents in Sweden.
Part 5 – Testing cross-country differences in self-report of health before and after retirement
Using our three country-populations of interest, we have seen three patterns of change in self-reported
health for newly-retired respondents: no significant change for those in France, modest but statistically
significant decline for those in the U.S., and larger decline in Sweden. Using an estimation test in Stata,
6
let’s check whether the differences in the declines for self-reported health at retirement in U.S. and
Swedish populations are statistically significant.
svy, subpop(if subsamp & inlist(isocountry,840,752)): mean shltch, over(isocountry)
test [shltch]_subpop_1=[shltch]_subpop_2
Survey: Mean estimation
Number of strata =
Number of PSUs
=
2
18932
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
=
20440
= 80811887
=
1061
= 4603925.9
=
18930
_subpop_1: isocountry = 752.Sweden
_subpop_2: isocountry = 840.United States of America
Over
Mean
shltch
_subpop_1
_subpop_2
.5374067
.0874471
Linearized
Std. Err.
.1008804
.0355286
[95% Conf. Interval]
.3396722
.0178078
.7351412
.1570864
Note: 10 strata omitted because they contain no subpopulation
members.
Adjusted Wald test
( 1)
[shltch]_subpop_1 - [shltch]_subpop_2 = 0
F(
1, 18930) =
Prob > F =
17.70
0.0000
Using an Adjusted-Wald test to test the differences between our mean estimates for the U.S. and
Swedish populations, we see that the differences in self report of health for newly-retired respondents
are statistically significant. More specifically, we see that the U.S. population on average experiences a
decline in self-reported health in the first and second year after retirement but that the decline is
significantly smaller than what the Swedish population experiences. Combining all our analyses, we can
say that these data suggest that different countries experience different patterns in changes in selfreported health at retirement.
7
Download