Using the Gateway to Global Aging Data for Cross-Country Analysis A sample analysis for the APRU Data Workshop This document provides users with examples of how to set up and perform some simple descriptive analyses with the RAND Harmonized data products, including the RAND HRS and the H SHARE, using Stata. This sample analysis assumes that the user has already created an H SHARE file for analysis. Sample Analysis Research question: Does retirement affect self-report of health? Analysis parameters: Our analysis will look at changes in self-report of health for respondents who were working in 2004 but were retired in 2006. Comparing self-reported health over time for these persons will help us determine whether retirement affects self-reported health. We can also analyze whether the change in self-report of health before and after retirement is the same for all countries or differs by country. Part 1 – Creating cross-national dataset Our cross-national dataset will contain respondents from the RAND HRS and the H SHARE. Using the Gateway to Global Aging Data, we will create a cross-national dataset with the measures required to identify our subpopulations and measure changes in self-report of health at retirement. 1. List the location of the RAND HRS Stata dataset on your computer in your user profile on the Gateway to Global Aging Data. Select the Data Options link from the top right user menu. Using the “+ add data locations” button, list the location of the RAND HRS dataset file by adding the data location for the RAND HRS. Make sure your data set location ends in either a forward slash (/) or a backslash (\), depending on your operating system. 2. Add RAND HRS variables of interest to your item cart. For our analysis of whether self-reported health changed with retirement between 2004 and 2006, we will be using the following HRS Wave 7 and Wave 8 variables. r7lbrf – W7 R Labor Force Status r8lbrf – W8 R Labor Force Status r7shlt - W7 Self-report of health r8shlt – W8 Self-report of health r8wtresp – W8 Person-Level Analysis Weight The RAND HRS does not include longitudinal weights so, for this analysis, we will use the provided analysis weight. For a more accurate cross-country analysis, we would need to create and use a longitudinal weight. 3. Add H SHARE variables of interest to your item cart. For this analysis of whether self-reported health changed with retirement between 2004 and 2006, you will need the following SHARE Wave 1 and Wave 2 variables. r1lbrf_s – W1 R Labor Force Status r2lbrf_s – W2 R Labor Force Status r1shlt – W1 Self-report of health r2shlt – W2 Self-report of health r2lwtresp – W2 Respondent-Level Longitudinal Weight, combined sample 1 4. Generate a cross-national dataset using the Gateway to Global Aging Data. To do this, once you have added all variables of interest to the item cart, click on Item Cart in the right user menu and select all variables added from the RAND HRS and H SHARE. Then click the “create .do file” link below the item cart. If prompted by your browser to either save or open the .do file, opt to open the .do file using Stata. If your browser automatically downloads the .do file, open the .do file using Stata. Stata should begin to run the .do program without any manipulation of the .do file. Once finished, Stata should be loaded with all variables selected in your item cart as well as with survey-specific identifiers and country/survey identifiers. 5. If you prefer to analyze this cross-country dataset in a statistical package other than Stata, you can save and convert the Stata dataset using the Stata save command and a program such as Stat/Transfer. If you do not have access to Stat/Transfer you can usually read the .dta dataset into your stat package using an “import” or “get” function. When reading a .dta dataset into another package, it is best to first save the dataset in a Stata version 9 dataset format using the command saveold. Part 2 – Identifying population of interest Our subsample of interest is HRS and SHARE respondents who reported they were working in 2004 interview and reported they were retired in the 2006 interview. We have already looked at how SHARE surveys respondent’s employment status but let’s also check the coding for the RAND HRS employment measures. Because we are interested in respondent-level data and interested in the values for HRS Wave 7 (2004) and Wave 8 (2006) we will be using RAND HRS variables r7lbrf and r8lbrf to identify our subsample. The coding of these variables can be found in the RAND HRS Codebook or by using a tab statement in Stata. tab r7lbrf, m tab r8lbrf, m r7lbrf:w7 r labor force status Freq. Percent Cum. 1.works ft 2.works pt 3.unemployed 4.partly ret 5.retired 6.disabled 7.not in lbrf . 5,182 1,119 313 1,540 9,451 563 1,961 102,614 4.22 0.91 0.26 1.25 7.70 0.46 1.60 83.60 4.22 5.13 5.39 6.64 14.34 14.80 16.40 100.00 Total 122,743 100.00 2 r8lbrf:w8 r labor force status Freq. Percent Cum. 1.works ft 2.works pt 3.unemployed 4.partly ret 5.retired 6.disabled 7.not in lbrf . 4,222 933 215 1,487 9,480 519 1,613 104,274 3.44 0.76 0.18 1.21 7.72 0.42 1.31 84.95 3.44 4.20 4.37 5.59 13.31 13.73 15.05 100.00 Total 122,743 100.00 We can see that HRS respondents are surveyed about employment using different categories than SHARE respondents. We will need to identity our subsample using one criteria for the HRS respondents and one criteria for the SHARE respondents. Let’s look at our subsample by country. We can use the automatically included H variable isocountry to identify all countries included in our dataset. gen subsamp=0 replace subsamp=1 if (inlist(r7lbrf,1,2) & inlist(r8lbrf,4,5)) | /// (r1lbrf_s==1 & r2lbrf_s==5) label variable subsamp "Subsample flag: newly retired" tab isocountry subsamp UN numerical country code Subsample flag: newly retired 0 1 Total 040.Austria 056.Belgium 203.Czech Republic 208.Denmark 233.Estonia 250.France 276.Germany 300.Greece 348.Hungary 372.Ireland 376.Israel 380.Italy 528.Netherlands 616.Poland 620.Portugal 705.Slovenia 724.Spain 752.Sweden 756.Switzerland 840.United States of 6,164 7,097 7,643 3,491 6,828 7,787 4,008 3,815 3,076 1,134 3,096 5,205 4,673 2,684 2,080 2,756 5,149 3,785 4,416 36,012 50 82 0 71 0 71 85 44 0 0 137 76 68 0 0 0 39 117 30 974 6,214 7,179 7,643 3,562 6,828 7,858 4,093 3,859 3,076 1,134 3,233 5,281 4,741 2,684 2,080 2,756 5,188 3,902 4,446 36,986 Total 120,899 1,844 122,743 3 As the above table indicates, the HRS contains a large number of respondents who fit our sample criteria. Part 3 – Adjusting for multiple country sampling and applying weights Because the unit of interest is the respondent, we use respondent-level weights. Because our analysis is longitudinal, we use longitudinal weights where provided (and would derive a longitudinal weight for HRS respondents for a more serious analysis). Because sampling procedures differ in each country, we also allow for survey design by country by treating countries as strata and all households as an independent but unequally weighted sample within the country. gen weight=. replace weight = r8wtresp if survey=="HRS" replace weight = r2lwtresp if survey=="SHARE" svyset [pw=weight], strata(isocountry) svydes Survey: Describing stage 1 sampling units pweight: VCE: Single unit: Strata 1: SU 1: FPC 1: weight linearized missing isocountry <observations> <zero> #Obs per Unit Stratum #Units #Obs min mean max 40 56 208 250 276 300 380 528 724 752 756 840 1092 2687 1183 1902 1518 2098 1726 1724 1341 1974 664 18469 1092 2687 1183 1902 1518 2098 1726 1724 1341 1974 664 18469 1 1 1 1 1 1 1 1 1 1 1 1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1 1 1 1 1 1 1 1 1 1 1 1 12 36378 36378 1 1.0 1 86365 = #Obs with missing values in the survey characteristics 122743 4 Part 4 – Estimating employed self-report of health and retired self-report of health The HRS and the SHARE both measure self-report of health using the same 1 (excellent) to 5 (poor) scale. We have already looked at how the H SHARE codes employment status but let’s also check the coding for the RAND HRS. Because we are interested in respondent-level data and interested in the values for Wave 7 (2004) and Wave 8 (2006) we will be using RAND HRS variables r7shlt and r8shlt to identify changes in self-report of health. Let’s check their coding in Stata. tab r7shlt, m tab r8shlt, m r7shlt:w7 self-report of health Freq. Percent Cum. 1. excellent 2. very good 3. good 4. fair 5. poor . .d .r 2,363 5,476 6,280 4,135 1,858 102,614 13 4 1.93 4.46 5.12 3.37 1.51 83.60 0.01 0.00 1.93 6.39 11.50 14.87 16.39 99.99 100.00 100.00 Total 122,743 100.00 r8shlt:w8 self-report of health Freq. Percent Cum. 1. excellent 2. very good 3. good 4. fair 5. poor . .d .m .r 2,032 5,261 5,623 3,874 1,654 104,274 23 1 1 1.66 4.29 4.58 3.16 1.35 84.95 0.02 0.00 0.00 1.66 5.94 10.52 13.68 15.03 99.98 100.00 100.00 100.00 Total 122,743 100.00 5 We can see that the RAND HRS uses the same set of codes as does the H SHARE (as is indicated by the use of the same variable name). To assess change in self-reported health, we first produce a measure of change health between the 2004 and the 2006 interviews. gen shltch=. replace shltch=r8shlt-r7shlt if survey=="HRS" replace shltch=r2shlt-r1shlt if survey=="SHARE" sum shltch . sum shltch Variable Obs Mean shltch 38140 .1219979 Std. Dev. .931309 Min Max -4 4 As with our SHARE analysis, a negative value for our change variable indicates that the respondent reported better health in 2006 than in 2004 and a positive value indicates that the respondent reported worse health. Next, we produce population estimates of the change in self-reported health for our subsample. Because we have already produced estimates for France and Sweden, let’s use the Stata svy mean dialog to produce an estimate for the U.S. population. svy, subpop(if subsamp & isocountry==840): mean shltch Survey: Mean estimation Number of strata = Number of PSUs = 1 16958 Mean shltch .0874471 Number of obs Population size Subpop. no. obs Subpop. size Design df Linearized Std. Err. .0355286 = = = = = 18466 77783565 944 4452968 16957 [95% Conf. Interval] .0178073 .1570869 Note: 11 strata omitted because they contain no subpopulation members. We see that newly-retired U.S. respondents reported, on average, a decline in health after retirement, but the decline appears to be much smaller than that for newly-retired respondents in Sweden. Part 5 – Testing cross-country differences in self-report of health before and after retirement Using our three country-populations of interest, we have seen three patterns of change in self-reported health for newly-retired respondents: no significant change for those in France, modest but statistically significant decline for those in the U.S., and larger decline in Sweden. Using an estimation test in Stata, 6 let’s check whether the differences in the declines for self-reported health at retirement in U.S. and Swedish populations are statistically significant. svy, subpop(if subsamp & inlist(isocountry,840,752)): mean shltch, over(isocountry) test [shltch]_subpop_1=[shltch]_subpop_2 Survey: Mean estimation Number of strata = Number of PSUs = 2 18932 Number of obs Population size Subpop. no. obs Subpop. size Design df = 20440 = 80811887 = 1061 = 4603925.9 = 18930 _subpop_1: isocountry = 752.Sweden _subpop_2: isocountry = 840.United States of America Over Mean shltch _subpop_1 _subpop_2 .5374067 .0874471 Linearized Std. Err. .1008804 .0355286 [95% Conf. Interval] .3396722 .0178078 .7351412 .1570864 Note: 10 strata omitted because they contain no subpopulation members. Adjusted Wald test ( 1) [shltch]_subpop_1 - [shltch]_subpop_2 = 0 F( 1, 18930) = Prob > F = 17.70 0.0000 Using an Adjusted-Wald test to test the differences between our mean estimates for the U.S. and Swedish populations, we see that the differences in self report of health for newly-retired respondents are statistically significant. More specifically, we see that the U.S. population on average experiences a decline in self-reported health in the first and second year after retirement but that the decline is significantly smaller than what the Swedish population experiences. Combining all our analyses, we can say that these data suggest that different countries experience different patterns in changes in selfreported health at retirement. 7