The Delta Method and Applications

advertisement
The Delta Method and Applications
This handout introduces the “Delta Method” of finding approximations based on Taylor
series expansions to the variance of functions of random variables. The basic idea will be
illustrated with some simple examples, and the use of the Delta Method in sampling applications will be indicated.
In the previous handout on ratio estimation, the variance of the ratio estimator r of the
population ratio R was approximated as:
µ ¶
µ
y
Var(r) = Var
x
≈
¶
N − n 1 σr2
· .
N
µ2x n
In general, variances of ratios, as in the above case, do not have exact formulas.
Taylor Series Expansion: The Taylor series expansion of a function f (·) about a value a
is given as:
(x − a)2
f (x) = f (a) + f 0 (a)(x − a) + f 00 (a)
+ ···,
2!
where we can often drop the higher order terms to give the approximation:
f (x) ≈ f (a) + f 0 (a)(x − a).
Letting a = µx , the mean of X, a Taylor series expansion of y = f (x) about µx gives the
approximation:
y = f (x) ≈ f (µx ) + f 0 (µx )(x − µx ).
Taking the variance of both sides yields:
Var(Y ) = Var(f (X)) ≈ [f 0 (µx )]2 Var(X).
• So, if Y is any function of a random variable X, we need only calculate the variance of
X and the first derivative of the function to approximate the variance of Y .
Example: Suppose Y = X 2 . Then f (x) = x2 and f 0 (x) = 2x, so that:
Var(Y ) ≈ (2µx )2 Var(X) = 4µ2x σx2 .
Example: Suppose Y = 1/X. Then f (x) = 1/x and f 0 (x) = −1/x2 , so that:
"
1
Var(Y ) ≈ − 2
µx
#2
σx2
Var(X) = 4 .
µx
Two-Variable Taylor Series Expansion: Suppose now we have random variables X, Y . A
Taylor series expansion of f (x, y) about the values (x0 , y0 ) is given by:
¯
¯


∂f (x, y) ¯¯
∂f (x, y) ¯¯
2nd and higher 
f (x, y) = f (x0 , y0 ) +
¯
(x
−
x
¯
(y − y0 ) + 
0) +
¯
¯
∂x
∂y
order terms
(x0 ,y0 )
(x0 ,y0 )
37
Example: Suppose f (x, y) =
y
∂f (x, y)
−y ∂f (x, y)
1
. Then:
= 2,
=
x
∂x
x
∂y
x
y
µy −µy
1
≈
+ 2 (x − µx ) + (y − µy )
x
µx
µx
µx
µ ¶
2
µ
Y
1
2µy
=⇒ Var
≈ 4y Var(X) + 2 Var(Y ) − 3 Cov(X, Y ).
X
µx
µx
µx
=⇒ f (x, y) =
By analogy with the above result then, the approximate variance of the ratio estimator is:
" 2
µ
µ ¶
#
µ
¶
σ2
1 σ 2 2µy ρσx σy
N −n
,
· x+ 2 · y − 3 ·
·
≈
n
µx n
µx
n
N
{z
}
|
if the fpc
Cov(X, Y )
ρσx σy
where: Cov(X, Y ) =
=
.
is required
n
n
• Is this the same as the approximate variance for the ratio estimator given earlier?
y
Var
x
y
µ4x
The corresponding estimated variance of the ratio estimator is given by:
µ ¶
y
x
d
Var
"
#
1 y2 2
1 2 2y
b x sy .
≈
s
+
s − ρs
x
n x4
x2 y x3
Some Useful Approximations: The linear approximation via a Taylor series expansion gives
the approximate variance for the following three useful functions of random variables X and
Y where ρ is the correlation between X and Y .
µ
1
1. Var
X
µ
Y
2. Var
X
Ã
¶
=
Ã
¶
=
1
µ4X
µ2Y
µ4X
!
2
σX
.
!
Ã
2
σX
1
+
µ2X
!
Ã
σY2
µY
−2
µ3X
!
ρσX σY .
2
3. Var(XY ) = µ2Y σX
+ µ2X σY2 + 2µX µY ρσX σY .
To obtain estimates of these variances, simply substitute sample values of the means, variances and correlation.
Some applications:
• Estimating the Population Ratio: To estimate the population ratio, R = µY /µX , from
a simple random sample (x1 , y1 ), . . ., (xn , yn ), the estimator is r = y/x. Using formula
2 above, and letting X & Y be the random variables, this formula yields the variance
given earlier in this handout, and in equation (4) of page 60 of the text, although some
algebra is involved to attain this simplified form.
• PPS sampling with replacement (Chapter 6, Section 1): The following summarizes the
results on the handout titled “Sampling with Probability Proportional to Size (Supplement),” and includes estimated variances from the Delta Method where needed.
38
As before, consider the following notation. Let:
N = the population size,
n = the sample size,
xi = the size of the ith unit in the population,
pi = xi /τx = the probability of selecting the ith unit on each draw,
yi = the response variable (variable of interest).
The estimates and approximate standard errors, using the Hansen-Hurwitz estimator
where possible, of relevant population quantities are given below.
1. Estimating N :
n
X
1
c= 1
(unbiased),
N
n i=1 pi
!2
Ã
d N
c) =
Var(
n
X
1
1
c
−N
n(n − 1) i=1 pi
(Hansen-Hurwitz)
2. Estimating µX and τX :
µb X =
where vi =
τbX =
1
n
1X
1
n i=1 xi
µ
d µ
d
Var(
X) =
(biased),
1
v4
¶ 2
s
V
n
(Delta Method)
1
, and v & s2V are the sample mean and variance of the vi ’s.
xi
Ã
N
n
1X
1
n i=1 xi
(biased),
d τb ) = N 2 Var(
d µ
bX ) =
Var(
X
N2
v4
!
s2v
(Delta Method)
n
3. Estimating τY and µY :
n
1X
yi
b
τy =
(unbiased),
n i=1 pi
where wi =
Ã
n
X
1
yi
d
b
Var(τY ) =
− τby
n(n − 1) i=1 pi
!2
s2w
=
(Hansen-Hurwitz)
n
yi
and s2w is the sample variance of the wi ’s (note that τby = w).
pi
Estimating µY , N known:
1 d
τbY
d µ
bY ) =
(unbiased), Var(
Var(τbY ) (Hansen-Hurwitz)
N
N2
Estimating µY , N unknown:
µb Y =
µb Y
n
n
X
1X
yi
yi
n i=1 pi
τbY
i=1 xi
= c= X
=X
(biased)
n
n
1
1
1
N
n i=1 pi
i=1 xi
d µ
bY ) =
Var(
à 2!
t s2
v
4
µ
1
+ 2
n
v
v
¶ 2
s
39
Ã
t
−2 3
n
v
t
!
ρbt,v st sv
(Delta Method)
n
yi
1
, vi = , and t, v, s2t , &s2v are the sample means and variances, and
xi
xi
is the sample correlation between the ti ’s and the vi ’s.
where ti =
ρbt,v
• Other Applications: Suppose we want to estimate R = µY /µX and we already have
independent estimates µb X and µb Y of the means available (from two different studies,
d µ
d µ
b X ) and Var(
b Y ) (i.e., the standard
for example) along with estimated variances Var(
errors squared). It doesn’t matter what sampling plans or sample sizes were used to
generate these independent estimates as long as valid standard errors can be calculated.
b =µ
b
b Y /µ
b X and, by the Delta Method, estimate Var(R)
Then, we can estimate R by R
by:
Ã
d R)
b =
Var(
µb 2Y
µb 4X
!
Ã
d µ
bX ) +
Var(
1
µb 2X
!
d µ
b Y ).
Var(
– Note that the covariance term drops out because these are assumed to be independent estimates.
Example: Suppose it is desired to estimate the average expenditure per day for visitors to
Yellowstone National Park. One study estimated the average expenditures per trip, but did
not obtain trip length information. The estimate was $240 with a standard error of $60.
Another study estimated the average length of a trip but did not gather expenditure data.
The estimate was 2.3 days with a standard error of 0.5 days. With these two studies then,
the estimated expenditure per day is 240/2.3 = $104.3 per person. The estimated variance
of the estimate is
!
!
Ã
Ã
2402
1
2
(.5) +
(60)2 = 1195.1,
4
2
(2.3)
(2.3)
√
so the estimated standard error is 1195.1 = $34.60.
Similarly, if we were interested in estimating the product of two means for which we already had independent estimates of the individual means, we could use formula 3 from page
38 of this handout, dropping the last term since the estimates are independent.
40
Download