HRS Summer 2015 Workshop Harmonized CHARLS example

advertisement
Using the Harmonized CHARLS
A sample analysis for the HRS Summer Workshop
This document provides users with an example of how to perform some simple descriptive analyses with
the Harmonized China Health and Retirement Longitudinal Study (Harmonized CHARLS) data using Stata.
Research question: Are their gender differences in cognition and is this difference different for different
cohorts?
Analysis parameters: Our analysis will examine cognition scores for both male and female and examine
the gender difference across age categories.
Part A – Measuring cognition
CHARLS surveys cognition using a several different measures. For this analysis we will consider two
measures. The first is memory recall based on a respondent’s ability to immediately repeat in any order
ten Chinese nouns just read to them (immediate word recall) and to recall the same list of words 4
minutes later (delayed recall).
The Harmonized CHARLS includes a summary score of these two word recall measures which is the sum
of the number of number of words the respondent recalled in the immediate word recall and the
delayed word recall. This summary score is stored in the Harmonized CHARLS variable r1tr20.
Information about the distribution of this variable can be found in the Harmonized CHARLS codebook or
by using the sum command in Stata:
sum r1tr20
Variable
Obs
Mean
r1tr20
14294
7.148244
Std. Dev.
3.433539
Min
Max
0
20
Part B – Measuring gender and age
The Harmonized CHARLS stores gender in the variable ragender. The coding of this variable can be
found in the Harmonized CHARLS Codebook or by using the tab command in Stata:
tab ragender, m
ragender:R
Gender
Freq.
Percent
Cum.
1.male
2.female
.d:dk
.m:missing
.r:refuse
8,471
9,221
2
8
3
47.85
52.08
0.01
0.05
0.02
47.85
99.93
99.94
99.98
100.00
Total
17,705
100.00
1
Age is stored in the Harmonized CHARLS variable r1agey. Let’s look at a brief summary of the ages
sampled in CHARLS using the sum command in Stata:
sum r1agey, d
r1agey:w1 R age in years
1%
5%
10%
25%
Percentiles
43
46
47
51
50%
Smallest
22
22
25
31
58
75%
90%
95%
99%
66
74
78
85
Largest
100
101
101
102
Obs
Sum of Wgt.
17682
17682
Mean
Std. Dev.
59.22611
10.16237
Variance
Skewness
Kurtosis
103.2738
.5764451
2.911414
For this analysis let’s consider 5 year age groups. To build a categorical variable of age groups we can
use the egen cut command:
egen r1agecat = cut(r1agey), at(45,50,55,60,65,70,75,103) icodes
tab r1agecat, m
r1agecat
Freq.
Percent
Cum.
0
1
2
3
4
5
6
.
3,415
2,568
3,543
2,917
1,892
1,385
1,609
376
19.29
14.50
20.01
16.48
10.69
7.82
9.09
2.12
19.29
33.79
53.80
70.28
80.97
88.79
97.88
100.00
Total
17,705
100.00
2
Part C – Applying CHARLS weights
Like the HRS, CHARLS provides weights to use in producing population estimates. For this analysis, we
use individual-level weights which account for both the household and individual non-response. This
weight is recorded in the Harmonized CHARLS variable r1wtrespb. To set this weight for our analysis
we will use the svyset command in Stata:
svyset [pw=r1wtrespb]
Once we have set the survey weight we can then prefix our commands with svy in Stata to have Stata
produce weighted estimates.
Part D – Analyzing cognition across genders
Using the Stata svy mean dialog, we first estimate our cognition scores for the entire Chinese
population and then separately for males and females.
svy: mean r1tr20
Survey: Mean estimation
Number of strata =
Number of PSUs
=
1
14294
Mean
r1tr20
7.320842
Number of obs
Population size
Design df
Linearized
Std. Err.
=
=
=
14294
439988435
14293
[95% Conf. Interval]
.0456508
7.23136
7.410323
svy, subpop(if ragender==1): mean r1tr20
3
Survey: Mean estimation
Number of strata =
Number of PSUs
=
1
16004
Mean
r1tr20
7.428225
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
Linearized
Std. Err.
.0682556
=
=
=
=
=
16004
501824823
6770
207192844
16003
[95% Conf. Interval]
7.294436
7.562014
svy, subpop(if ragender==2): mean r1tr20
Survey: Mean estimation
Number of strata =
Number of PSUs
=
1
15997
Mean
r1tr20
7.226687
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
Linearized
Std. Err.
.0611051
=
=
=
=
=
15997
500334685
7513
232416974
15996
[95% Conf. Interval]
7.106914
7.34646
It does appear that men on average have a slightly higher cognitive score than women. We can also test
whether the difference between cognition scores for men and woman is statically significant:
svy: mean r1tr20, over(ragender)
4
Survey: Mean estimation
Number of strata =
Number of PSUs
=
1
14283
Number of obs
Population size
Design df
=
=
=
14283
439609818
14282
_subpop_1: ragender = 1.male
_subpop_2: ragender = 2.female
Over
Mean
r1tr20
_subpop_1
_subpop_2
7.428225
7.226687
Linearized
Std. Err.
.0682559
.0611054
[95% Conf. Interval]
7.294435
7.106912
7.562015
7.346461
test [r1tr20]_subpop_1=[ r1tr20]_subpop_2
Adjusted Wald test
( 1)
[r1tr20]_subpop_1 - [r1tr20]_subpop_2 = 0
F(
1, 14282) =
Prob > F =
4.84
0.0278
Using an Adjusted-Wald test to test the differences between our mean estimates for men and women,
we see that indeed there seems to be a somewhat statistically significant difference between cognition
in men and women.
Part E – Analyzing cognition across genders and age categories
We also wanted to consider whether there might be a cohort effect where we see different sorts of
differences across age groups. Again let’s test this using the svy mean dialog:
svy: mean r1tr20, over(ragender r1agecat)
5
Survey: Mean estimation
Number of strata =
Number of PSUs
=
Over:
_subpop_1:
_subpop_2:
_subpop_3:
_subpop_4:
_subpop_5:
_subpop_6:
_subpop_7:
_subpop_8:
_subpop_9:
_subpop_10:
_subpop_11:
_subpop_12:
_subpop_13:
_subpop_14:
1
13995
ragender
1.male 0
1.male 1
1.male 2
1.male 3
1.male 4
1.male 5
1.male 6
2.female
2.female
2.female
2.female
2.female
2.female
2.female
Number of obs
Population size
Design df
=
=
=
13995
430382865
13994
r1agecat
0
1
2
3
4
5
6
Over
Mean
r1tr20
_subpop_1
_subpop_2
_subpop_3
_subpop_4
_subpop_5
_subpop_6
_subpop_7
_subpop_8
_subpop_9
_subpop_10
_subpop_11
_subpop_12
_subpop_13
_subpop_14
8.580753
7.669246
7.609232
7.215394
7.441968
6.23575
5.615598
8.460031
7.512848
7.414409
7.030072
6.597254
5.967469
4.442941
Linearized
Std. Err.
.1344866
.1806594
.1217161
.0984498
.3388299
.155438
.3246259
.1239588
.1533828
.1705229
.1305285
.1740406
.1915682
.1512499
[95% Conf. Interval]
8.317141
7.315129
7.370652
7.02242
6.777817
5.931071
4.979288
8.217055
7.212198
7.080162
6.774218
6.256111
5.59197
4.146471
8.844365
8.023362
7.847812
7.408369
8.10612
6.54043
6.251908
8.703007
7.813499
7.748657
7.285925
6.938397
6.342969
4.739411
6
It does appear that the gender difference in our youngest cohort is smaller than the gender difference in
our oldest cohort. Let’s now test the statistical significance of both:
test [r1tr20]_subpop_1=[ r1tr20]_subpop_8
Adjusted Wald test
( 1)
[r1tr20]_subpop_1 - [r1tr20]_subpop_8 = 0
F(
1, 13994) =
Prob > F =
0.44
0.5092
test [r1tr20]_subpop_7=[ r1tr20]_subpop_14
Adjusted Wald test
( 1)
[r1tr20]_subpop_7 - [r1tr20]_subpop_14 = 0
F(
1, 13994) =
Prob > F =
10.72
0.0011
Using an Adjusted-Wald we can see that there is not likely a statically significant difference in cognition
between men and women for our youngest cohort but we can see that there is a statically significant
difference in cognition between men and women for our oldest cohort.
7
Download