Third International Conference on
Establishment Surveys
Montreal, Quebec
June 18-21, 2007
Presented by:
Kirk Wolter, NORC and the
University of Chicago
Outline of Lecture –
Introduction (Chapter 1)
Textbook Methods (Chapter 1)
Replication-Based Methods
Random Group (Chapter 2)
Balanced Half-Samples (Chapter 3)
Jackknife (Chapter 4)
Bootstrap (Chapter 5)
Taylor Series (Chapter 6)
Generalized Variance Functions
(Chapter 7)
2
Chapter 1: Introduction
1.
Finite population,
U
- Residents of Canada
1 , , N
- Restaurants in Montreal
- Farms in Quebec
- Schools in Ottawa
2.
Sample, s
- Simple random sampling, without replacement
- Systematic sampling
- Stratification
- Clustering
- Double sampling
3
Chapter 1: Introduction
5. Probability sampling design,
P
-
P ( s )
0
P
1 s
8. Characteristic of interest, Y i
Y i
1
0
, if i th resident is employed
, if not employed
-
Y i
yield in tons of i
th farm
4
Chapter 1: Introduction
12. Parameter,
- Proportion of residents who are employed
- Total production of farms
- Trend in price index for restaurants
- Regression of sales on area for pharmacies
13. Estimator, ˆ
-
ˆ
5
Chapter 1: Introduction
14.
Expectation and variance
-
E
Var
s
E
P
s
ˆ
P
E
2
E
2
16.
Estimator of variance
v
E v
Var
-
P v
Var
ˆ
1
0
6
Textbook Methods
1.
Design: srs wor of size n
Estimator: f
Y
ˆ f
1 i n
1
n / N y i
Variance Estimator: v
N
2
1
f
s
2
/ n s
2 i n
1
y i
y
n
1
y
i n
1 y i
/ n
7
Textbook Methods
2.
Design: srs wor at both the first and second stages of sampling
Estimator:
Y
ˆ f
1
f
1
1 i n
1
1
2 i m i
1 j y ij
n / N f
2 i
m i
/ M i
Variance Estimator: v Y
ˆ
N
2
1
f
1
i n
1
M i y i .
Y
ˆ
/ N
2
/
N / n
i n
1
M i
2
1
f
2 i
s i
2
/ m i s i
2 m i j
1
y ij
y i .
2
/
m i
1
y i .
j m i
1 y ij
/ m i
8
Replication-Based
Methods v
C
k
1
ˆ
ˆ
2
9
Chapter 2: The Method of Random Groups
Interpenetrating samples
Replicated samples
Ultimate cluster
Resampling
Random groups
10
Chapter 2: The Method of Random Groups
(i) Draw a sample, methodology s
1
No restrictions on the sampling
(ii) Replace the first sample
Draw second sample, s
2
Use same sampling methodology obtained, k
2 s
1
, s
2
, , s k
11
Chapter 2: The Method of Random Groups
Common estimation procedure:
Editing procedures
Adjustments for nonresponse
Outlier procedures
Estimator of parameter
12
Chapter 2: The Method of Random Groups
Common measurement process:
Field work
Callbacks
Clerical screening and coding
Conversion to machine-readable form
13
Chapter 2: The Method of Random Groups
Estimators of the Parameter of Interest,
:
Random group estimators
ˆ
1
,
ˆ
2
, ,
ˆ k
Overall estimators
ˆ
k
1
k
1
ˆ
ˆ
14
Chapter 2: The Method of Random Groups
Two Examples:
Population total
Ratio
i
N
1
Y i
Y
ˆ
ˆ
i
s
W i
Y i
i
s
W i
Y i
Y
ˆ
Y
ˆ
ˆ
k
1
k
1 i s
W i
Y i
Y
X
ˆ
ˆ
ˆ
1 k k
1
ˆ
ˆ
15
Chapter 2: The Method of Random Groups
Estimators of Var
ˆ
or Var
: v (
ˆ
)
k
1
(
ˆ
ˆ
)
2
/ k
k
1
v
1
v (
ˆ
) v
2
1
ˆ
ˆ 2
/ k
k
1
16
Chapter 2: The Method of Random Groups
Properties:
E
v
ˆ
Var
ˆ
CV
v
ˆ
Var
v
Var
ˆ
ˆ
1
2
4
1
k
3
k
1
k
1
2 v
1
ˆ ˆ
17
Chapter 2: The Method of Random Groups
Confidence Intervals:
ˆ
c v (
ˆ
) ,
ˆ
c v (
ˆ
) c
z
/ 2 or t k
1 ,
/ 2
18
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
L strata
N h units per stratum
N size of entire population n h
= 2 units selected per stratum srs wr
Example: restaurants in Montreal
19
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
Y
average number of customers served by
Montreal restaurants on a Monday night y st
h
L
1
W h y h
W h
N h
/ N y h
y h 1
y h 2
/ 2
20
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
Textbook Estimator of Variance v
st
h
L
1
W h
2 s h
2
/ 2
h
L
1
W h
2 d h
2
/ 4 s h
2 d h
i
2
1
y hi y h 1
y h 2
y h
2
1
21
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
Random Group Estimator of Variance k
= 2 independent random groups are available y
11
, y
21
, , y
L 1 y
12
, y
22
, , y
L 2 v
RG
y st , 1
2
1 y st , 2
2
y st ,
/ 4
y st
2
y st ,
h
L
1
W h y h
22
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
Half-Sample Methodology
h 1
1 , if unit the
-
( h , 1 ) is selected for th half sample
0 , otherwise
2
L possible half samples y st ,
h
L
1
W h
h 1
y h 1
h 2
y h 2
23
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
Choosing a Manageable Number, k
, of Half-
Samples v k
st
k
1
y st ,
y st
2
/ k k random k balanced
24
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
Table 3.2.1. Definition of Balanced Half-Sample
Replicates for 5, 6, 7, or 8 Strata
Replicate 1 2
Stratum (
3 4 5 h )
6 7
h
h
h
h
h
h
h
h
+1
-1
+1
-1
-1
-1
-1 -1 +1 -1
+1 +1 -1 -1 +1
+1 +1 +1 -1 -1
+1 +1 +1 -1
-1 +1 +1 +1
+1 -1 +1 +1
-1 +1 -1 +1
-1 -1 -1 -1
+1 +1
-1
+1
-1
-1
+1
+1 +1
-1
+1
-1
+1
-1
-1
-1
8
-1
-1
-1
-1
-1
-1
-1
-1
25
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
Properties of the Balanced Half-Sample
Methods v k
st
st k
1
k
1 y st ,
y st
, provided k
L
26
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
Y
total number of employed persons in Canada
Y
ˆ p h 1
h
L
1
Y
ˆ h 1
/ 2
housing p h 1
Y
ˆ h 2 units
/ 2 p h 2
Y
ˆ h 1
estimator employed of total number persons in
of th PSU v
h
1
Y
ˆ h 1
/ p h 1
Y
ˆ h 2
/ p h 2
2
/ 4
27
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
Balanced Half-Sample Methodology
Y
ˆ
h
L
1
h 1
Y
ˆ h 1
/ p h 1
h 2
Y
ˆ h 2
/ p h 2
v k
1
Y
ˆ
Y
ˆ 2
/ k
28
Chapter 3: Variance
Estimation Based on
Balanced Half-Samples
Alternative Half-Sample Estimators of
Variance v k v c k v k v
† k v
k
1
ˆ
ˆ 2
ˆ
1 k
k
1
ˆ
c ˆ 2
/ k
k
ˆ
1
v k c
ˆ
ˆ
1
4 k
k
1
ˆ
ˆ
c
2
ˆ
1
2 k
k
1
ˆ
ˆ 2
Estimators are not necessaril y equal
29
Chapter 4:
The Jackknife Method
Quenouille (1949) – bias reduction
Tukey (1958) – variance estimation testing interval estimation
Resampling
30
Chapter 4:
The Jackknife Method
Random sample y
1
, y
2
, , y n
Random groups n
km
Parameter
(example : yield per acre of farms in
Quebec)
Estimator
ˆ
31
Chapter 4:
The Jackknife Method
Drop out m
ˆ
1 , , k
Pseudovalue
ˆ
k
ˆ
k
1
ˆ
Quenouille’s estimator
ˆ
k
1
k
1
ˆ
Variance estimator v
1
(
ˆ
)
v
2
(
ˆ
)
k
k
1
1
k
1
(
ˆ
ˆ
)
2 k
k
1
1
k
1
ˆ
ˆ
2
Special case k
n , m
1
32
Chapter 4:
The Jackknife Method
Full-sample estimator
ˆ
Variance estimator v
k
k
1
1
i k
1
ˆ
ˆ
2
33
Chapter 4:
The Jackknife Method
Example: ratio
Y / X
ˆ y / x
ˆ y / x
ˆ k y / x
k
1
y / x
34
Chapter 4:
The Jackknife Method
Drop out observation(s) from individual strata
ˆ
ˆ v
1
L h
1 n h n h
1 i n h
1
ˆ ˆ 2
35
Chapter 4:
The Jackknife Method
Example
total employed persons
Drop out ultimate clusters
ˆ
ˆ
1 n
i n
1
Y
ˆ i
/ p i
i m
k
1
1
k
1
m i
1
Y
ˆ i
/
j p i
W ij
Y ij
i
j
W
(
) ij
Y ij
W
(
) ij
m ( mk k
1 )
W ij
0
, if PSU is not dropped out
, if PSU is dropped out
36
Chapter 5: The
Bootstrap Method
Works with replicates of potentially any size, n
*
Original Application –
Y
1
, , Y n
are iid random variables (scalar or vector) from a distribution function F
is to be estimated
ˆ
37
Chapter 5: The
Bootstrap Method
A bootstrap sample (or bootstrap replicate ) is a simple random sample with replacement (srs wr) of size
* n selected from the original sample.
Y
1
*
, , Y
* n
*
ˆ
*
denotes the estimator of the same functional form as
ˆ
38
Chapter 5: The
Bootstrap Method
Ideal Bootstrap Estimator of Var
v
1
Var
*
, where Var
*
signifies the conditional variance, given the original sample
Monte Carlo Bootstrap Estimator of Var
i.
Draw a large number, A , of independent bootstrap replicates from the main sample and label the corresponding observations as
Y
*
1
, , Y
*
n
*
, for
1 , , A ; ii.
For each bootstrap replicate, compute the corresponding estimator
ˆ
*
of the parameter of interest; and iii.
Calculate the variance between the
ˆ
*
values v
2
A
1
1
A
1
ˆ
*
ˆ
*
2
,
ˆ
*
1
A
A
1
ˆ
*
.
39
Chapter 5: The
Bootstrap Method
Application to the Finite Population
–
Simple Random Sampling with Replacement (srs wr)
Data y
1
, , y n
Parameter of Interest
Y
Standard Estimator y
y i
40
Chapter 5: The
Bootstrap Method
Bootstrap Sample y
*
1
, , y
* n *
Estimator y
*
y i
*
Bootstrap Moments
E
*
1
1 n n i y i
Var
*
1
1 n n i
y i
y
2 n
1 s n
2
Ideal Bootstrap Estimator of Variance v
1
Var
*
Var
*
{ y
1
*
}
n
* n
1 s
2 n n
*
Unbiased Choice n
* n
1
41
Chapter 5: The
Bootstrap Method
Multistage Sampling with pps wr Sampling at the First
Stage
Observed Data y ij
, where i indexes the selected PSU and j indexes the completed interview within the PSU
Parameter of Interest
Y
Estimator
Y
ˆ z i
i n
1
Y
ˆ i
/ p i
j w ij y ij
n
1 n
i
Y
ˆ i p i
n
1 n
i z i
42
Chapter 5: The
Bootstrap Method
Bootstrap Sample z
1
*
, z
2
*
,...,
* z n
*
Bootstrap Moments
E
*
n
1
Var
*
1
n
1 n
i z i
Y
ˆ n
i z
i
Y
ˆ 2
Ideal Bootstrap Estimator of Variance v
1
Var
*
Var
*
1
n
*
1 n
* n
1 n h i
z i
Y
ˆ 2
.
Unbiased Choice n
* n
1
43
Chapter 6: Taylor
Series Methods
Assume a complex survey design
Y
( Y
1
,..., Y p
)
vector of population totals
( Y
ˆ
1
,..., Y
ˆ p
)
g ( Y ) parameter of interest, such as
Y
1 the ratio
Y
2
ˆ g ( )
44
Chapter 6: Taylor
Series Methods
ˆ j p
1
g ( Y )
y j
( Y
ˆ j
Y j
)
R
MSE {
ˆ
} Var { j p
1
g ( Y ) d
j y j
E {(
g ( Y )
y j
( Y
ˆ j
Y j
)}
d
d
Y )(
Y )
}
45
Chapter 6: Taylor
Series Methods
v (
ˆ
)
ˆ ˆ d
ˆ j
g (
ˆ
)
y j
by textbook or replication-based method applied to the y -data
Alternative algorithm
Y
ˆ j
i
s
W i
Y ji
U
ˆ i
s
W i
U i
U i
j p
1
g ( Y )
y j
Y ji
MSE {
ˆ
} Var { U
ˆ
}
46
Chapter 7: Generalized
Variance Functions
1. Population total, X
2. Estimator of the total, X
ˆ
3. Relative variance,
V
2
Var
X 2
4. V
2
/ X
47