DO FILE

advertisement
DO FILE
clear all
version 10
set more off
capture log close
set memory 250m
* set up a global macro containing file path to data directory
*global dir1 "\\iserraid2\ConferenceData\final"
*cd M:\
cd "D:\Home\anandi\1-Courses\EC969\SL-AN\DataPrep"
* open log file
log using Week2Lecture1.log, replace
* open dataset
use Week2Lecture1, clear
// Let's see what these variables look like: What are the variable names, value
// labels, their mean, s.d., frequency distribution? Are there any missings?
describe
summ
// Are there any missing values?
tab1 sex white country memorig ivfio, missing
// What are the different samples?
tab memorig
// Look at distributions of cross-sectional respondent weights and see how they vary by sample.
tabstat xrwtuk1, stat(mean min max sd) by(memorig) longstub nototal
// Examine the variable of interest - wage
summ wage, de
// Why is this missing for some people?
tab employed if wage==., m
tab ivfio if employed==1 & wage==., m
// Compute unweighted mean, standard errors and confidence interval for wage.
summ wage
ci wage
// Compute weighted mean, standard errors, confidence interval and standard error for wage.
summ wage [aweight=xrwtuk1]
ci wage [aweight=xrwtuk1]
// What happens if you use -pweight- instead?
capture noisily summ wage [pweight=xrwtuk1]
capture noisily ci wage [pweight=xrwtuk1]
// Compute weighted mean, standard errors, confidence interval and standard deviation for wage,
correctly informing Stata that the weights are probability weights
// but again not correcting for the sample design, i.e. assuming that the sample is a simple
random sample. This will produce correct UK population estimate of mean
// monthly pay but the standard error of the estimate will be incorrect as the BHPS is a
stratified and clustered sample.
// First inform Stata about the design variables and then compute the weighted means etc.
svyset [pweight = xrwtuk1]
svy: mean wage
estat sd
// What happens if you use aweight instead?
capture noisily svyset [aweight = lxrwtuk1]
// Compute weighted mean, standard errors and confidence interval for wage after informing Stata
of the correct sample design.
// First inform Stata about the design variables. (But before doing that remember to clear
Stata’s memory of any existing design information) and then compute the weighted means etc.
1
svyset, clear
svyset [pweight = xrwtuk1], psu(psu) strata(strata)
svy: mean wage
// This returns mean income, but does not return standard error or confidence interval: Find out
why?.
svydes
// You will find that there is a stratum (-8) with just 1 unit (psu) within it. Which region or
sample is that?
tab1 memorig if strata==-8
// Exclude that sample from the analysis
svy: mean wage if memorig ~= 7
// Compute the different weighted and unweighted mean wage for the different countries (England,
Scotland, Wales and Northern Ireland)
tab country memorig, missing // (optional) How does country compare with memorig?
// Look at distributions of (cross-section) RESPONDENT weights and see how these vary by country
of residence (not by sample):
tabstat xrwtuk1, stat(mean min max sd count) by(country) longstub nototal
// Compute the unweighted mean wage for each country.
bysort country: ci wage
// Drop Northern Ireland sub-sample
drop if memorig==7
// Drop missing country cases
drop if country==.
// Compute the weighted mean of wage for each country after telling Stata that the weights are
probabiilty weights and correcting for sample design.
svyset [pweight = xrwtuk1], psu (psu) strata (strata)
** Use the if option
svy: mean wage if country==1
svy: mean wage if country==2
svy: mean wage if country==3
** Use the subpop option
svy, subpop(if country==1): mean wage
svy, subpop(if country==2): mean wage
svy, subpop(if country==3): mean wage
** Use the over option
svy: mean wage, over(country)
// Compute the weighted mean of wage for men and women in the four countries
** Use the over option
svy: mean wage, over(country sex)
** Use the subpop option
svy, subpop(if country==1
svy, subpop(if country==1
svy, subpop(if country==2
svy, subpop(if country==2
svy, subpop(if country==3
svy, subpop(if country==3
&
&
&
&
&
&
sex==1):
sex==2):
sex==1):
sex==2):
sex==1):
sex==2):
mean
mean
mean
mean
mean
mean
wage
wage
wage
wage
wage
wage
// Test differences in pay across the different countries.
svy: mean wage, over(country)
test [wage]England = [wage]Scotland = [wage]Wales
// Test gender differences in pay across the different countries.
svy: mean wage, over(country sex)
test [wage]_subpop_1=[wage]_subpop_2
test [wage]_subpop_3=[wage]_subpop_4
test [wage]_subpop_5=[wage]_subpop_6
2
// Compute design effects and design factor
quietly svy: mean wage
estat effects, deff deft
// [Optional] Plot the weighted mean and the confidence interval using the code
// -ciplot- Use -findit- to find it and then install it
ciplot wage, by(country) saving(graph1, replace)
ciplot wage [aw=xrwtuk1], by(country) saving(graph2, replace)
** Including Northern Ireland
use Week2Lecture1, clear
replace psu=hid if memorig==7
svyset [pweight = xrwtuk1], psu (psu) strata (strata)
svy: mean wage, over(country)
log close
exit
3
LOG FILE
----------------------------------------------------------------------------------------------------------------------------------------------name: <unnamed>
log: D:\Home\anandi\1-Courses\EC969\SL-AN\DataPrep\Week2Lecture1.log
log type: text
opened on:
1 Mar 2011, 15:23:40
.
.
. * open dataset
. use Week2Lecture1, clear
.
. // Let's see what these variables look like: What are the variable names, value
. // labels, their mean, s.d., frequency distribution? Are there any missings?
.
. describe
Contains data from Week2Lecture1.dta
obs:
14,897
vars:
56
24 Feb 2011 15:53
size:
1,564,185 (99.4% of memory free)
----------------------------------------------------------------------------------------------------------------------------------------------storage display
value
variable name
type
format
label
variable label
----------------------------------------------------------------------------------------------------------------------------------------------pid
long
%12.0g
cross-wave person identifier
sex
byte
%8.0g
sex
sex
dobm
byte
%8.0g
dobm
month of birth
doby
int
%8.0g
doby
year of birth
memorig
byte
%8.0g
memorig
sample origin
racel
byte
%8.0g
racel
ethnic group membership (long version)
hid
long
%12.0g
household identification number
pno
byte
%8.0g
person number
jbstat
byte
%8.0g
mjbstat
current economic activity
hlprb
byte
%8.0g
mhlprb
health problems: none
hlprba
byte
%8.0g
mhlprba
health problems: arms, legs, hands, etc
hlprbb
byte
%8.0g
mhlprbb
health problems: sight
hlprbc
byte
%8.0g
mhlprbc
health problems: hearing
hlprbd
byte
%8.0g
mhlprbd
health problems: skin conditions/allergy
hlprbe
byte
%8.0g
mhlprbe
health problems: chest/breathing
hlprbf
byte
%8.0g
mhlprbf
health problems: heart/blood pressure
hlprbg
byte
%8.0g
mhlprbg
health problems: stomach or digestion
hlprbh
byte
%8.0g
mhlprbh
health problems: diabetes
hlprbi
byte
%8.0g
mhlprbi
health problems: anxiety, depression, et
hlprbj
byte
%8.0g
mhlprbj
health problems: alcohol or drugs
hlprbk
byte
%8.0g
mhlprbk
health problems: epilepsy
hlprbl
byte
%8.0g
mhlprbl
health problems: migraine
hlprbn
byte
%8.0g
mhlprbn
health problems: cancer
hlprbo
byte
%8.0g
mhlprbo
health problems: stroke
hlprbm
byte
%8.0g
mhlprbm
health problems: other
jbhas
byte
%8.0g
mjbhas
did paid work last week
jboff
byte
%8.0g
mjboff
no work last week but has job
jbsemp
byte
%8.0g
mjbsemp
employee or self-employed: current job
jbhrs
byte
%8.0g
mjbhrs
no. of hours normally worked per week
ivfio
byte
%8.0g
mivfio
individual interview outcome
mastat
byte
%8.0g
mmastat
marital status
age
byte
%8.0g
mage
age at date of interview
nchild
byte
%8.0g
mnchild
number of own children in household
region
byte
%8.0g
mregion
region / metropolitan area
qfedhi
byte
%8.0g
mqfedhi
highest educational qualification
paygu
float %9.0g
mpaygu
usual gross pay per month: current job
xrwght
float %9.0g
x-sectional respondent weight
xrwtuk1
float %9.0g
x-sect'l resp. weight inc new samples
xrwtuk2
float %9.0g
x-sect'l resp. weight within uk estimate
nch02
byte
%8.0g
mnch02
number children in household aged 0-2
nch34
byte
%8.0g
mnch34
number children in household aged 3-4
nch511
byte
%8.0g
mnch511
number children in household aged 5-11
nch1215
byte
%8.0g
mnch1215
number children in household aged 12-15
nch1618
byte
%8.0g
mnch1618
number dependent children in hh 16+
strata
int
%8.0g
stratification class
psu
int
%8.0g
primary sampling unit
4
_merge
byte
%8.0g
wage
float %9.0g
usual hourly wage
employed
float %9.0g
whether in paid employment last week
youngchildren
float %9.0g
If children <5 yrs in HH
england
float %9.0g
wales
float %9.0g
scotland
float %9.0g
N_Ireland
float %9.0g
country
byte
%16.0g
country
countries of UK
white
float %9.0g
Ethnicity: White
----------------------------------------------------------------------------------------------------------------------------------------------Sorted by:
. summ
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------pid |
14897
6.44e+07
4.63e+07
1.00e+07
1.40e+08
sex |
14897
1.558569
.4965745
1
2
dobm |
14897
6.458884
3.420427
-9
12
doby |
14897
1956.784
24.89484
-9
1988
memorig |
14897
3.361952
2.586713
1
7
-------------+-------------------------------------------------------racel |
14897
1.575955
3.005079
-8
18
hid |
14897
1.34e+07
294516.3
1.30e+07
1.39e+07
pno |
14897
1.673021
.901808
1
11
jbstat |
14897
3.455461
1.948806
1
10
hlprb |
14897
-4.86749
3.904957
-9
0
-------------+-------------------------------------------------------hlprba |
14897
.206216
.9160706
-9
1
hlprbb |
14897
-.0216151
.8083313
-9
1
hlprbc |
14897
.0098678
.8278009
-9
1
hlprbd |
14897
.0429617
.8465235
-9
1
hlprbe |
14897
.0632342
.8571598
-9
1
-------------+-------------------------------------------------------hlprbf |
14897
.1041149
.8767922
-9
1
hlprbg |
14897
.0072498
.8262452
-9
1
hlprbh |
14897
-.0348392
.7996424
-9
1
hlprbi |
14897
.012083
.8291086
-9
1
hlprbj |
14897
-.068403
.776142
-9
1
-------------+-------------------------------------------------------hlprbk |
14897
-.066255
.7777107
-9
1
hlprbl |
14897
.001074
.8225304
-9
1
hlprbn |
14897
-.0598107
.7823625
-9
1
hlprbo |
14897
-.062093
.7807242
-9
1
hlprbm |
14897
-.0277908
.8043123
-9
1
-------------+-------------------------------------------------------jbhas |
14897
1.456333
.499721
-1
2
jboff |
14897
-3.455729
4.955244
-8
3
jbsemp |
14897
-2.849634
4.452904
-8
1
jbhrs |
14897
15.73673
22.22558
-8
99
ivfio |
14897
1.086326
.3776966
1
3
-------------+-------------------------------------------------------mastat |
14897
2.576626
2.029402
-1
6
age |
14897
45.8305
18.98171
15
99
nchild |
14897
.4921125
.9107354
0
7
region |
14897
12.50064
6.846798
-9
19
qfedhi |
14897
5.710143
5.060525
-9
13
-------------+-------------------------------------------------------paygu |
14897
781.0079
1095.092
-8
29794.92
xrwght |
14897
.502307
.5843616
0
2.5
xrwtuk1 |
14897
.9690967
.9195383
0
5.054481
xrwtuk2 |
14897
.9369517
.5081068
0
16.24493
nch02 |
14897
.0614218
.2486266
0
2
-------------+-------------------------------------------------------nch34 |
14897
.0659193
.2584848
0
2
nch511 |
14897
.2438075
.5782308
0
5
nch1215 |
14897
.1784252
.4657831
0
3
nch1618 |
14897
.0649795
.2613045
0
2
strata |
14897
54.52937
51.95504
-8
151
-------------+-------------------------------------------------------psu |
14897
210.3743
194.614
-8
575
_merge |
14897
3
0
3
3
wage |
8038
9.855252
7.044385
.251938
238.9328
employed |
14897
.5722629
.4947671
0
1
youngchild~n |
14897
.1190173
.3790271
0
3
5
-------------+-------------------------------------------------------england |
14897
.4551252
.4979989
0
1
wales |
14897
.1727865
.3780753
0
1
scotland |
14897
.1805733
.3846771
0
1
N_Ireland |
14897
.1753373
.3802681
0
1
country |
14656
2.077374
1.163245
1
4
-------------+-------------------------------------------------------white |
14106
.9762512
.1522708
0
1
.
. // Are there any missing values?
. tab1 sex white country memorig ivfio, missing
-> tabulation of sex
sex |
Freq.
Percent
Cum.
----------------+----------------------------------male
|
6,576
44.14
44.14
female |
8,321
55.86
100.00
----------------+----------------------------------Total |
14,897
100.00
-> tabulation of white
Ethnicity: |
White |
Freq.
Percent
Cum.
------------+----------------------------------0 |
335
2.25
2.25
1 |
13,771
92.44
94.69
. |
791
5.31
100.00
------------+----------------------------------Total |
14,897
100.00
-> tabulation of country
countries of UK |
Freq.
Percent
Cum.
-----------------+----------------------------------England |
6,780
45.51
45.51
Wales |
2,574
17.28
62.79
Scotland |
2,690
18.06
80.85
Northern Ireland |
2,612
17.53
98.38
. |
241
1.62
100.00
-----------------+----------------------------------Total |
14,897
100.00
-> tabulation of memorig
sample origin
|
Freq.
Percent
Cum.
------------------------+----------------------------------original sample |
7,941
53.31
53.31
wales new sample
|
2,206
14.81
68.11
scotland new sample
|
2,138
14.35
82.47
n.i. new sample |
2,612
17.53
100.00
------------------------+----------------------------------Total |
14,897
100.00
-> tabulation of ivfio
individual interview outcome |
Freq.
Percent
Cum.
--------------------------------+----------------------------------full interview |
14,086
94.56
94.56
proxy interview |
336
2.26
96.81
telephone intvw |
475
3.19
100.00
--------------------------------+----------------------------------Total |
14,897
100.00
.
. // What are the different samples?
. tab memorig
sample origin
|
Freq.
Percent
Cum.
------------------------+----------------------------------original sample |
7,941
53.31
53.31
wales new sample
|
2,206
14.81
68.11
scotland new sample
|
2,138
14.35
82.47
n.i. new sample |
2,612
17.53
100.00
------------------------+-----------------------------------
6
Total |
14,897
100.00
.
. // Look at distributions of cross-sectional respondent weights and see how they vary by
sample.
. tabstat xrwtuk1, stat(mean min max sd) by(memorig) longstub nototal
memorig
variable |
mean
min
max
sd
------------------------------+---------------------------------------original sample
xrwtuk1 | 1.570268
0 3.025512 .8695618
wales new sample
xrwtuk1 | .2765412
0 5.054481 .2173516
scotland new sam
xrwtuk1 | .4626941
0 3.753553 .3132991
n.i. new sample
xrwtuk1 | .1408289
0 .7508973 .0602463
----------------------------------------------------------------------.
. // Examine the variable of interest - wage
. summ wage, de
usual hourly wage
------------------------------------------------------------Percentiles
Smallest
1%
2.11209
.251938
5%
3.979806
.503876
10%
4.534883
.503876
Obs
8038
25%
5.818429
.5436955
Sum of Wgt.
8038
50%
75%
90%
95%
99%
8.169528
12.05271
17.00976
20.68929
31.97686
Largest
116.3686
117.632
152.9346
238.9328
Mean
Std. Dev.
9.855252
7.044385
Variance
Skewness
Kurtosis
49.62336
8.247111
188.2287
.
. // Why is this missing for some people?
. tab employed if wage==., m
whether in |
paid |
employment |
last week |
Freq.
Percent
Cum.
------------+----------------------------------0 |
6,372
92.90
92.90
1 |
487
7.10
100.00
------------+----------------------------------Total |
6,859
100.00
. tab ivfio if employed==1 & wage==., m
individual interview outcome |
Freq.
Percent
Cum.
--------------------------------+----------------------------------proxy interview |
183
37.58
37.58
telephone intvw |
304
62.42
100.00
--------------------------------+----------------------------------Total |
487
100.00
.
. // Compute unweighted mean, standard errors and confidence interval for wage.
. summ wage
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------wage |
8038
9.855252
7.044385
.251938
238.9328
. ci wage
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------wage |
8038
9.855252
.0785722
9.70123
10.00927
.
. // Compute weighted mean, standard errors, confidence interval and standard error for wage.
. summ wage [aweight=xrwtuk1]
Variable |
Obs
Weight
Mean
Std. Dev.
Min
Max
-------------+-----------------------------------------------------------------
7
wage |
7924
8293.86803
10.28872
7.357536
.251938
238.9328
. ci wage [aweight=xrwtuk1]
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------wage |
7924
10.28872
.0826533
10.1267
10.45074
.
. // What happens if you use -pweight- instead?
. capture noisily summ wage [pweight=xrwtuk1]
pweight not allowed
. capture noisily ci wage [pweight=xrwtuk1]
pweight not allowed
.
.
. // Compute weighted mean, standard errors, confidence interval and standard deviation for
wage, correctly informing Stata that the weights are
> probability weights
. // but again not correcting for the sample design, i.e. assuming that the sample is a simple
random sample. This will produce correct UK popu
> lation estimate of mean
. // monthly pay but the standard error of the estimate will be incorrect as the BHPS is a
stratified and clustered sample.
.
. // First inform Stata about the design variables and then compute the weighted means etc.
. svyset [pweight = xrwtuk1]
pweight:
VCE:
Single unit:
Strata 1:
SU 1:
FPC 1:
xrwtuk1
linearized
missing
<one>
<observations>
<zero>
. svy: mean wage
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
1
8926
Number of obs
Population size
Design df
=
8926
= 8293.87
=
8925
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
10.28872
.1063986
10.08015
10.49729
-------------------------------------------------------------. estat sd
------------------------------------|
Mean
Std. Dev.
-------------+----------------------wage |
10.28872
7.357536
------------------------------------.
. // What happens if you use aweight instead?
. capture noisily svyset [aweight = lxrwtuk1]
aweight not allowed
.
. // Compute weighted mean, standard errors and confidence interval for wage after informing
Stata of the correct sample design.
. // First inform Stata about the design variables. (But before doing that remember to clear
Stata’s memory of any existing design information)
> and then compute the weighted means etc.
. svyset, clear
. svyset [pweight = xrwtuk1], psu(psu) strata(strata)
pweight: xrwtuk1
8
VCE:
Single unit:
Strata 1:
SU 1:
FPC 1:
linearized
missing
strata
psu
<zero>
. svy: mean wage
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
121
399
Number of obs
Population size
Design df
=
8926
= 8293.87
=
278
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
10.28872
.
.
.
-------------------------------------------------------------Note: missing standard error because of stratum with single
sampling unit.
.
. // This returns mean income, but does not return standard error or confidence interval: Find
out why?.
. svydes
Survey: Describing stage 1 sampling units
pweight:
VCE:
Single unit:
Strata 1:
SU 1:
FPC 1:
Stratum
--------8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
xrwtuk1
linearized
missing
strata
psu
<zero>
#Units
#Obs
-------- -------1*
2612
3
63
3
57
4
68
3
53
3
60
3
105
3
73
3
103
3
49
3
88
8
225
8
256
8
293
8
278
8
204
8
241
3
89
4
168
3
110
4
117
3
110
4
162
2
78
2
91
3
94
2
98
3
104
3
88
3
116
3
120
3
126
3
110
3
72
3
71
#Obs per Unit
---------------------------min
mean
max
-------- -------- -------2612
2612.0
2612
14
21.0
29
17
19.0
23
6
17.0
30
8
17.7
28
14
20.0
24
33
35.0
37
20
24.3
30
31
34.3
37
7
16.3
21
16
29.3
40
15
28.1
39
14
32.0
51
24
36.6
46
24
34.8
50
8
25.5
35
18
30.1
51
22
29.7
34
32
42.0
48
25
36.7
44
16
29.3
41
35
36.7
40
30
40.5
50
38
39.0
40
43
45.5
48
24
31.3
38
45
49.0
53
32
34.7
36
27
29.3
32
21
38.7
53
27
40.0
55
20
42.0
60
35
36.7
39
22
24.0
27
16
23.7
33
9
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
131
132
133
134
135
136
137
138
139
140
141
142
3
3
2
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
2
3
2
2
2
2
3
3
3
2
2
2
3
3
3
3
3
4
4
3
4
3
4
3
5
4
4
3
4
3
4
3
3
3
4
3
3
2
3
3
2
3
3
2
5
2
3
3
2
4
3
3
4
4
3
4
4
3
4
74
55
72
73
149
108
97
89
83
51
80
119
87
112
76
98
106
125
73
92
57
48
60
69
132
115
80
70
80
87
126
77
74
123
130
128
105
102
107
79
133
78
116
93
133
115
118
92
112
92
84
90
130
66
83
67
54
92
66
92
94
58
168
65
105
87
56
81
122
85
86
111
60
90
99
108
125
19
15
36
22
39
32
28
23
20
10
26
33
20
35
37
21
20
35
27
26
28
13
23
30
34
29
21
27
33
38
38
22
15
17
41
17
20
27
14
7
19
20
19
7
28
36
26
28
19
25
23
24
20
15
20
30
16
25
26
19
19
27
21
29
31
26
28
9
22
26
18
5
18
15
16
32
24
24.7
18.3
36.0
24.3
49.7
36.0
32.3
29.7
27.7
17.0
26.7
39.7
29.0
37.3
38.0
32.7
35.3
41.7
36.5
30.7
28.5
24.0
30.0
34.5
44.0
38.3
26.7
35.0
40.0
43.5
42.0
25.7
24.7
41.0
43.3
32.0
26.3
34.0
26.8
26.3
33.3
26.0
23.2
23.3
33.3
38.3
29.5
30.7
28.0
30.7
28.0
30.0
32.5
22.0
27.7
33.5
18.0
30.7
33.0
30.7
31.3
29.0
33.6
32.5
35.0
29.0
28.0
20.3
40.7
28.3
21.5
27.8
20.0
22.5
24.8
36.0
31.3
10
28
21
36
27
60
39
37
38
37
23
27
50
36
41
39
39
55
51
46
38
29
35
37
39
52
53
35
43
47
49
47
32
34
57
45
44
33
41
42
42
50
34
33
30
40
41
35
35
35
40
32
38
45
26
34
37
21
37
40
39
39
31
40
36
38
34
28
26
63
30
26
44
23
34
39
39
37
143
144
145
146
147
148
149
150
151
-------121
3
4
3
4
3
4
3
4
3
-------400
82
105
105
124
97
119
138
116
85
-------14897
21
21
32
27
26
20
41
22
21
-------5
27.3
26.3
35.0
31.0
32.3
29.8
46.0
29.0
28.3
-------37.2
35
35
37
37
40
43
54
43
37
-------2612
.
. // You will find that there is a stratum (-8) with just 1 unit (psu) within it. Which region
or sample is that?
. tab1 memorig if strata==-8
-> tabulation of memorig if strata==-8
sample origin
|
Freq.
Percent
Cum.
------------------------+----------------------------------n.i. new sample |
2,612
100.00
100.00
------------------------+----------------------------------Total |
2,612
100.00
.
. // Exclude that sample from the analysis
. svy: mean wage if memorig ~= 7
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
120
398
Number of obs
Population size
Design df
=
7527
= 8101.44
=
278
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
10.31203
.1224545
10.07097
10.55308
-------------------------------------------------------------.
.
.
.
. // Compute the different weighted and unweighted mean wage for the different countries
(England, Scotland, Wales and Northern Ireland)
. tab country memorig, missing // (optional) How does country compare with memorig?
|
sample origin
countries of UK | original
wales new scotland
n.i. new |
Total
-----------------+--------------------------------------------+---------England |
6,697
58
25
0 |
6,780
Wales |
444
2,127
3
0 |
2,574
Scotland |
634
0
2,056
0 |
2,690
Northern Ireland |
0
0
0
2,612 |
2,612
. |
166
21
54
0 |
241
-----------------+--------------------------------------------+---------Total |
7,941
2,206
2,138
2,612 |
14,897
.
. // Look at distributions of (cross-section) RESPONDENT weights and see how these vary by
country of residence (not by sample):
. tabstat xrwtuk1, stat(mean min max sd count) by(country) longstub nototal
country
variable |
mean
min
max
sd
N
------------------------------+-------------------------------------------------England
xrwtuk1 | 1.750575
0 5.054481 .7905738
6780
Wales
xrwtuk1 | .2686751
0 2.653648
.156446
2574
Scotland
xrwtuk1 |
.447012
0 1.837523 .2589535
2690
Northern Ireland
xrwtuk1 | .1408289
0 .7508973 .0602463
2612
--------------------------------------------------------------------------------.
11
. // Compute the unweighted mean wage for each country.
. bysort country: ci wage
-----------------------------------------------------------------------------------------------------------------------------------------------> country = England
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------wage |
3889
10.36125
.1262336
10.11376
10.60874
-----------------------------------------------------------------------------------------------------------------------------------------------> country = Wales
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------wage |
1208
8.707197
.1525756
8.407855
9.00654
-----------------------------------------------------------------------------------------------------------------------------------------------> country = Scotland
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------wage |
1451
9.79849
.1594425
9.485727
10.11125
-----------------------------------------------------------------------------------------------------------------------------------------------> country = Northern Ireland
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------wage |
1324
9.330372
.1879657
8.961628
9.699115
-----------------------------------------------------------------------------------------------------------------------------------------------> country = .
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------wage |
166
11.03793
.4429368
10.16338
11.91249
.
. // Drop Northern Ireland sub-sample
. drop if memorig==7
(2612 observations deleted)
.
. // Drop missing country cases
. drop if country==.
(241 observations deleted)
.
.
.
. // Compute the weighted mean of wage for each country after telling Stata that the weights are
probabiilty weights and correcting for sample d
> esign.
. svyset [pweight = xrwtuk1], psu (psu) strata (strata)
pweight:
VCE:
Single unit:
Strata 1:
SU 1:
FPC 1:
xrwtuk1
linearized
missing
strata
psu
<zero>
.
. ** Use the if option
. svy: mean wage if country==1
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
94
257
Number of obs
Population size
12
=
4255
= 6836.02
Design df
=
163
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
10.40639
.
.
.
-------------------------------------------------------------Note: 4 strata omitted because they contain no population
members.
Note: missing standard error because of stratum with single
sampling unit.
. svy: mean wage if country==2
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
49
111
Number of obs
Population size
Design df
=
1439
= 351.385
=
62
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
8.81579
.
.
.
-------------------------------------------------------------Note: 1 stratum omitted because it contains no population
members.
Note: missing standard error because of stratum with single
sampling unit.
. svy: mean wage if country==3
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
33
100
Number of obs
Population size
Design df
=
1641
= 691.216
=
67
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
9.738896
.
.
.
-------------------------------------------------------------Note: 1 stratum omitted because it contains no population
members.
Note: missing standard error because of stratum with single
sampling unit.
.
. ** Use the subpop option
. svy, subpop(if country==1): mean wage
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
94
314
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
=
7136
= 7833.91
=
3821
= 6836.02
=
220
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
10.40639
.1409328
10.12864
10.68414
-------------------------------------------------------------Note: 26 strata omitted because they contain no subpopulation
members.
. svy, subpop(if country==2): mean wage
(running mean on estimation sample)
13
Survey: Mean estimation
Number of strata =
Number of PSUs
=
49
163
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
=
3882
= 4076.64
=
1182
= 351.385
=
114
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
8.81579
.1765602
8.466026
9.165555
-------------------------------------------------------------Note: 71 strata omitted because they contain no subpopulation
members.
. svy, subpop(if country==3): mean wage
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
33
112
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
=
2266
= 1607.04
=
1433
= 691.216
=
79
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
9.738896
.222134
9.296749
10.18104
-------------------------------------------------------------Note: 87 strata omitted because they contain no subpopulation
members.
.
. ** Use the over option
. svy: mean wage, over(country)
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
120
398
Number of obs
Population size
Design df
=
7346
= 7878.62
=
278
England: country = England
Wales: country = Wales
Scotland: country = Scotland
-------------------------------------------------------------|
Linearized
Over |
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage
|
England |
10.40639
.1409328
10.12896
10.68382
Wales |
8.81579
.1765602
8.468226
9.163355
Scotland |
9.738896
.222134
9.301618
10.17617
-------------------------------------------------------------.
.
. // Compute the weighted mean of wage for men and women in the four countries
. ** Use the over option
. svy: mean wage, over(country sex)
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
120
398
Number of obs
Population size
Design df
Over: country sex
14
=
7346
= 7878.62
=
278
_subpop_1:
_subpop_2:
_subpop_3:
_subpop_4:
_subpop_5:
_subpop_6:
England male
England female
Wales male
Wales female
Scotland male
Scotland female
-------------------------------------------------------------|
Linearized
Over |
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage
|
_subpop_1 |
11.66497
.209374
11.25281
12.07713
_subpop_2 |
9.133208
.1466152
8.844591
9.421825
_subpop_3 |
9.775749
.253263
9.277192
10.27431
_subpop_4 |
7.960065
.1906461
7.584771
8.335358
_subpop_5 |
10.86205
.3361621
10.2003
11.52379
_subpop_6 |
8.702004
.2262984
8.256528
9.14748
-------------------------------------------------------------.
. ** Use the subpop option
. svy, subpop(if country==1 & sex==1): mean wage
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
86
288
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
=
7847
= 10641.1
=
1855
= 3437.72
=
202
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
11.66497
.209374
11.25213
12.0778
-------------------------------------------------------------Note: 34 strata omitted because they contain no subpopulation
members.
. svy, subpop(if country==1 & sex==2): mean wage
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
87
289
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
=
7424
= 9552.9
=
1966
= 3398.29
=
202
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
9.133208
.1466152
8.844116
9.422301
-------------------------------------------------------------Note: 33 strata omitted because they contain no subpopulation
members.
. svy, subpop(if country==2 & sex==1): mean wage
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
44
148
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
=
4046
= 3585.01
=
539
= 165.607
=
104
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------
15
wage |
9.775749
.253263
9.273519
10.27798
-------------------------------------------------------------Note: 76 strata omitted because they contain no subpopulation
members.
. svy, subpop(if country==2 & sex==2): mean wage
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
47
158
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
=
4195
= 3977.85
=
643
= 185.779
=
111
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
7.960065
.1906461
7.582287
8.337842
-------------------------------------------------------------Note: 73 strata omitted because they contain no subpopulation
members.
. svy, subpop(if country==3 & sex==1): mean wage
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
32
110
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
=
2788
= 1735.1
=
669
= 331.806
=
78
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
10.86205
.3360138
10.19309
11.531
-------------------------------------------------------------Note: 88 strata omitted because they contain no subpopulation
members.
. svy, subpop(if country==3 & sex==2): mean wage
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
30
105
Number of obs
Population size
Subpop. no. obs
Subpop. size
Design df
=
2419
= 1339.54
=
764
= 359.409
=
75
-------------------------------------------------------------|
Linearized
|
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage |
8.702004
.2261375
8.251515
9.152493
-------------------------------------------------------------Note: 90 strata omitted because they contain no subpopulation
members.
.
. // Test differences in pay across the different countries.
. svy: mean wage, over(country)
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
120
398
Number of obs
Population size
Design df
England: country = England
16
=
7346
= 7878.62
=
278
Wales: country = Wales
Scotland: country = Scotland
-------------------------------------------------------------|
Linearized
Over |
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage
|
England |
10.40639
.1409328
10.12896
10.68382
Wales |
8.81579
.1765602
8.468226
9.163355
Scotland |
9.738896
.222134
9.301618
10.17617
-------------------------------------------------------------. test [wage]England
= [wage]Scotland = [wage]Wales
Adjusted Wald test
( 1)
( 2)
[wage]England - [wage]Scotland = 0
[wage]England - [wage]Wales = 0
F(
2,
277) =
Prob > F =
24.07
0.0000
.
. // Test gender differences in pay across the different countries.
. svy: mean wage, over(country sex)
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
Over:
_subpop_1:
_subpop_2:
_subpop_3:
_subpop_4:
_subpop_5:
_subpop_6:
120
398
Number of obs
Population size
Design df
=
7346
= 7878.62
=
278
country sex
England male
England female
Wales male
Wales female
Scotland male
Scotland female
-------------------------------------------------------------|
Linearized
Over |
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage
|
_subpop_1 |
11.66497
.209374
11.25281
12.07713
_subpop_2 |
9.133208
.1466152
8.844591
9.421825
_subpop_3 |
9.775749
.253263
9.277192
10.27431
_subpop_4 |
7.960065
.1906461
7.584771
8.335358
_subpop_5 |
10.86205
.3361621
10.2003
11.52379
_subpop_6 |
8.702004
.2262984
8.256528
9.14748
-------------------------------------------------------------. test [wage]_subpop_1=[wage]_subpop_2
Adjusted Wald test
( 1)
[wage]_subpop_1 - [wage]_subpop_2 = 0
F(
1,
278) =
Prob > F =
122.84
0.0000
. test [wage]_subpop_3=[wage]_subpop_4
Adjusted Wald test
( 1)
[wage]_subpop_3 - [wage]_subpop_4 = 0
F(
1,
278) =
Prob > F =
45.08
0.0000
. test [wage]_subpop_5=[wage]_subpop_6
Adjusted Wald test
17
( 1)
[wage]_subpop_5 - [wage]_subpop_6 = 0
F(
1,
278) =
Prob > F =
35.44
0.0000
.
.
. // Compute design effects and design factor
. quietly svy: mean wage
. estat effects, deff deft
---------------------------------------------------------|
Linearized
|
Mean
Std. Err.
DEFF
DEFT
-------------+-------------------------------------------wage |
10.27689
.1249507
1.84038
1.35661
---------------------------------------------------------.
.
. // [Optional] Plot the weighted mean and the confidence interval using the code
. // -ciplot- Use -findit- to find it and then install it
. ciplot wage, by(country) saving(graph1, replace)
(file graph1.gph saved)
. ciplot wage [aw=xrwtuk1], by(country) saving(graph2, replace)
(file graph2.gph saved)
.
. ** Including Northern Ireland
. use Week2Lecture1, clear
. replace psu=hid if memorig==7
psu was int now long
(2612 real changes made)
. svyset [pweight = xrwtuk1], psu (psu) strata (strata)
pweight:
VCE:
Single unit:
Strata 1:
SU 1:
FPC 1:
xrwtuk1
linearized
missing
strata
psu
<zero>
. svy: mean wage, over(country)
(running mean on estimation sample)
Survey: Mean estimation
Number of strata =
Number of PSUs
=
121
1301
England:
Wales:
Scotland:
_subpop_4:
=
=
=
=
country
country
country
country
Number of obs
Population size
Design df
=
8745
= 8071.04
=
1180
England
Wales
Scotland
Northern Ireland
-------------------------------------------------------------|
Linearized
Over |
Mean
Std. Err.
[95% Conf. Interval]
-------------+-----------------------------------------------wage
|
England |
10.40639
.1409328
10.12988
10.6829
Wales |
8.81579
.1765602
8.469383
9.162197
Scotland |
9.738896
.222134
9.303074
10.17472
_subpop_4 |
9.307484
.1992926
8.916477
9.698492
-------------------------------------------------------------.
. log close
name:
log:
log type:
closed on:
<unnamed>
D:\Home\anandi\1-Courses\EC969\SL-AN\DataPrep\Week2Lecture1.log
text
1 Mar 2011, 15:23:53
18
Download