Recent Advances in the analysis of missing data with non-ignorable missingness

advertisement
Recent Advances in the analysis of missing data with
non-ignorable missingness
Jae-Kwang Kim
Department of Statistics, Iowa State University
July 4th, 2014
1
Introduction
2
Full likelihood-based ML estimation
3
GMM method
4
Partial Likelihood approach
5
Exponential tilting method
6
Concluding Remarks
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
2 / 59
Observed likelihood
• (X , Y ): random variable, y is subject to missingness
• f (y | x; θ): model of y on x
• g (δ | x, y ; φ): model of δ on (x, y )
• Observed likelihood
Lobs (θ, φ)
=
Y
f (yi | xi ; θ) g (δi | xi , yi ; φ)
δi =1
×
YZ
f (yi | xi ; θ) g (δi | xi , yi ; φ) dyi
δi =0
• Under what conditions the parameters are identifiable (or estimable) ?
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
3 / 59
Lemma
Suppose that we can decompose the covariate vector x into two parts, u and z, such that
g (δ|y , x) = g (δ|y , u)
(1)
and, for any given u, there exist zu,1 and zu,2 such that
f (y |u, z = zu,1 ) 6= f (y |u, z = zu,2 ).
(2)
Under some other minor conditions, all the parameters in f and g are identifiable.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
4 / 59
Remark
• Condition (1) means
δ ⊥ z | y , u.
• That is, given (y , u), z does not help in explaining δ.
• Thus, z plays the role of instrumental variable in econometrics:
f (y ∗ | x ∗ , z ∗ ) = f (y ∗ | x ∗ ), Cov (z ∗ , x ∗ ) 6= 0.
Here, y ∗ = δ, x ∗ = (y , u), and z ∗ = z.
• We may call z the nonresponse instrument variable.
• Rigorous theory developed by Wang et al. (2014).
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
5 / 59
Introduction
Remark
• MCAR (Missing Completely at random): P(δ | y) does not depend on y.
• MAR (Missing at random): P(δ | y) = P(δ | yobs )
• NMAR (Not Missing at random): P(δ | y) 6= P(δ | yobs )
• Thus, MCAR is a special case of MAR.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
6 / 59
Parameter estimation under the existence of nonresponse
instrument variable
• Full likelihood-based ML estimation
• Generalized method of moment (GMM) approach (Section 6.3 of KS)
• Conditional likelihood approach (Section 6.2 of KS)
• Pseudo likelihood approach (Section 6.4 of KS)
• Exponential tilting method (Section 6.5 of KS)
• Latent variable approach (Section 6.6 of KS)
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
7 / 59
1
Introduction
2
Full likelihood-based ML estimation
3
GMM method
4
Partial Likelihood approach
5
Exponential tilting method
6
Concluding Remarks
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
8 / 59
Full likelihood-based ML estimation
• Wish to find η̂ = (θ̂, φ̂), that maximizes the observed likelihood
Lobs (η)
=
Y
f (yi | xi ; θ) g (δi | xi , yi ; φ)
δi =1
×
YZ
f (yi | xi ; θ) g (δi | xi , yi ; φ) dyi
δi =0
• Mean score theorem: Under some regularity conditions, finding the MLE of
maximizing the observed likelihood is equivalent to finding the solution to
S̄(η) ≡ E {S(η) | yobs , δ; η} = 0,
where yobs is the observed data. The conditional expectation of the score function
is called mean score function.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
9 / 59
Example 1 [Example 2.5 of KS]
1 Suppose that the study variable y is randomly distributed with Bernoulli
distribution with probability of success pi , where
pi = pi (β) =
exp (x0i β)
1 + exp (x0i β)
for some unknown parameter β and xi is a vector of the covariates in the logistic
regression model for yi . We assume that 1 is in the column space of xi .
2 Under complete response, the score function for β is
S1 (β) =
n
X
(yi − pi (β)) xi .
i=1
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
10 / 59
Example 1 (Cont’d)
3 Let δi be the response indicator function for yi with distribution Bernoulli(πi )
where
πi =
exp (x0i φ0 + yi φ1 )
.
1 + exp (x0i φ0 + yi φ1 )
We assume that xi is always observed, but yi is missing if δi = 0.
4 Under missing data, the mean score function for β, the conditional expectation of
the original score function given the observed data, is
S̄1 (β, φ) =
X
{yi − pi (β)} xi +
1
XX
wi (y ; β, φ) {y − pi (β)} xi ,
δi =0 y =0
δi =1
where wi (y ; β, φ) is the conditional probability of yi = y given xi and δi = 0:
wi (y ; β, φ)
=
Pβ (yi = y | xi ) Pφ (δi = 0 | yi = y , xi )
P1
z=0 Pβ (yi = z | xi ) Pφ (δi = 0 | yi = z, xi )
Thus, S̄1 (β, φ) is also a function of φ.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
11 / 59
Example 1 (Cont’d)
5 If the response mechanism is MAR so that φ1 = 0, then
wi (y ; β, φ)
Pβ (yi = y | xi )
= Pβ (yi = y | xi )
P1
z=0 Pβ (yi = z | xi )
=
and so
S̄1 (β, φ) =
X
{yi − pi (β)} xi = S̄1 (β) .
δi =1
6 If MAR does not hold, then (β̂, φ̂) can be obtained by solving S̄1 (β, φ) = 0 and
S̄2 (β, φ) = 0 jointly, where
X
S̄2 (β, φ) =
{δi − π (φ; xi , yi )} (xi , yi )
δi =1
+
1
XX
wi (y ; β, φ) {δi − πi (φ; xi , y )} (xi , y ) .
δi =0 y =0
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
12 / 59
EM algorithm
• Interested in finding η̂ that maximizes Lobs (η). The MLE can be obtained by
solving Sobs (η) = 0, which is equivalent to solving S̄(η) = 0 by the mean score
theorem.
• EM algorithm provides an alternative method of solving S̄(η) = 0 by writing
S̄(η) = E {Scom (η) | yobs , δ; η}
and using the following iterative method:
n
o
η̂ (t+1) ← solve E Scom (η) | yobs , δ; η̂ (t) = 0.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
13 / 59
5. EM algorithm
Definition
Let η (t) be the current value of the parameter estimate of η. The EM algorithm can be
defined as iteratively carrying out the following E-step and M-steps:
• E-step: Compute
n
o
Q η | η (t) = E ln f (y, δ; η) | yobs , δ, η (t)
• M-step: Find η (t+1) that maximizes Q(η | η (t) ) w.r.t. η.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
14 / 59
EM algorithm
Example 1 (Cont’d)
• E-step:
S̄1 β | β (t) , φ(t)
=
X
{yi − pi (β)} xi +
1
XX
wij(t) {j − pi (β)} xi ,
δi =0 j=0
δi =1
where
wij(t)
=
Pr (Yi = j | xi , δi = 0; β (t) , φ(t) )
=
P1
Pr (Yi = j | xi ; β (t) )Pr (δi = 0 | xi , j; φ(t) )
(t)
(t)
y =0 Pr (Yi = y | xi ; β )Pr (δi = 0 | xi , y ; φ )
and
S̄2 φ | β (t) , φ(t)
=
X
{δi − π (xi , yi ; φ)} x0i , yi
0
δi =1
+
1
XX
wij(t) {δi − πi (xi , j; φ)} x0i , j
0
.
δi =0 j=0
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
15 / 59
EM algorithm
Example 1 (Cont’d)
• M-step:
The parameter estimates are updated by solving
h i
S̄1 β | β (t) , φ(t) , S̄2 φ | β (t) , φ(t) = (0, 0)
for β and φ.
• For categorical missing data, the conditional expectation in the E-step can be
computed using the weighted mean with weights wij(t) . Ibrahim (1990) called this
method EM by weighting.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
16 / 59
Monte Carlo EM
Motivation: Monte Carlo samples in the EM algorithm can be used as imputed values.
Monte Carlo EM
1 In the EM algorithm defined by
• [E-step] Compute
n
o
Q η | η (t) = E ln f (y, δ; η) | yobs , δ; η (t)
• [M-step] Find η (t+1) that maximizes Q η | η (t) ,
E-step is computationally cumbersome because it involves integral.
2 Wei and Tanner (1990): In the E-step, first draw
∗(1)
∗(m)
ymis , · · · , ymis ∼ f
ymis | yobs , δ; η (t)
and approximate
m
1 X
∗(j)
Q η | η (t) ∼
ln f yobs , ymis , δ; η .
=
m j=1
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
17 / 59
Monte Carlo EM
Example 2 [Example 3.15 of KS]
• Suppose that
yi ∼ f (yi | xi ; θ)
Assume that xi is always observed but we observe yi only when δi = 1 where
δi ∼ Bernoulli [πi (φ)] and
πi (φ) =
exp (φ0 + φ1 xi + φ2 yi )
.
1 + exp (φ0 + φ1 xi + φ2 yi )
• To implement the MCEM method, in the E-step, we need to generate samples from
f (yi | xi , δi = 0; θ̂, φ̂) = R
Jae-Kwang Kim (ISU)
f (yi | xi ; θ̂){1 − πi (φ̂)}
.
f (yi | xi ; θ̂){1 − πi (φ̂)}dyi
Nonignorable missing
July 4th, 2014
18 / 59
Monte Carlo EM
Example 2 (Cont’d)
• We can use the following rejection method to generate samples from
f (yi | xi , δi = 0; θ̂, φ̂):
1
2
Generate yi∗ from f (yi | xi ; θ̂).
Using yi∗ , compute
πi∗ (φ̂) =
exp(φ̂0 + φ̂1 xi + φ̂2 yi∗ )
1 + exp(φ̂0 + φ̂1 xi + φ̂2 yi∗ )
.
Accept yi∗ with probability 1 − πi∗ (φ̂).
3 If yi∗ is not accepted, then goto Step 1.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
19 / 59
Monte Carlo EM
Example 2 (Cont’d)
• Using the m imputed values of yi , denoted by yi∗(1) , · · · , yi∗(m) , and the M-step can
be implemented by solving
n X
m
X
∗(j)
=0
S θ; xi , yi
i=1 j=1
and
n X
m n
o
X
∗(j)
∗(j)
δi − π(φ; xi , yi ) 1, xi , yi
= 0,
i=1 j=1
where S (θ; xi , yi ) = ∂ log f (yi | xi ; θ)/∂θ.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
20 / 59
1
Introduction
2
Full likelihood-based ML estimation
3
GMM method
4
Partial Likelihood approach
5
Exponential tilting method
6
Concluding Remarks
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
21 / 59
Basic setup
• (X , Y ): random variable
• θ: Defined by solving
E {U(θ; X , Y )} = 0.
• yi is subject to missingness
δi =
1
0
if yi responds
if yi is missing.
• Want to find wi such that the solution θ̂w to
n
X
δi wi U(θ; xi , yi ) = 0
i=1
is consistent for θ.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
22 / 59
Basic Setup
• Result 1: The choice of
wi =
1
E (δi | xi , yi )
(3)
makes the resulting estimator θ̂w consistent.
• Result 2: If δi ∼ Bernoulli(πi ), then using wi = 1/πi also makes the resulting
estimator consistent, but it is less efficient than θ̂w using wi in (3).
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
23 / 59
Parameter estimation : GMM method
• Because z is a nonresponse instrumental variable, we may assume
P(δ = 1 | x, y ) = π(φ0 + φ1 u + φ2 y )
for some (φ0 , φ1 , φ2 ).
• Kott and Chang (2010): Construct a set of estimating equations such as
n X
i=1
δi
− 1 (1, ui , zi ) = 0
π(φ0 + φ1 ui + φ2 yi )
that are unbiased to zero.
• May have overidentified situation: Use the generalized method of moments
(GMM).
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
24 / 59
GMM method
Example 3
• Suppose that we are interested in estimating the parameters in the regression
model
yi = β0 + β1 x1i + β2 x2i + ei
(4)
where E (ei | xi ) = 0.
• Assume that yi is subject to missingness and assume that
P(δi = 1 | x1i , xi2 , yi ) =
exp(φ0 + φ1 x1i + φ2 yi )
.
1 + exp(φ0 + φ1 x1i + φ2 yi )
Thus, x2i is the nonresponse instrument variable in this setup.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
25 / 59
Example 3 (Cont’d)
• A consistent estimator of φ can be obtained by solving
Û2 (φ) ≡
n X
i=1
δ
− 1 (1, x1i , x2i ) = (0, 0, 0).
π(φ; x1i , yi )
(5)
Roughly speaking, the solution to (5) exists almost surely if E {∂ Û2 (φ)/∂φ} is of
full rank in the neighborhood of the true value of φ. If x2 is vector, then (5) is
overidentified and the solution to (5) does not exist. In the case, the GMM
algorithm can be used.
• Finding the solution to Û2 (φ) = 0 can be obtained by finding the minimizer of
Q(φ) = Û2 (φ)0 Û2 (φ) or QW (φ) = Û2 (φ)0 W Û2 (φ) where W = {V (Û2 )}−1 .
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
26 / 59
Example 3 (Cont’d)
• Once the solution φ̂ to (5) is obtained, then a consistent estimator of
β = (β0 , β1 , β2 ) can be obtained by solving
Û1 (β, φ̂) ≡
n
X
δi
{yi − β0 − β1 x1i − β2 x2i } (1, x1i , x2i ) = (0, 0, 0)
π̂
i
i=1
(6)
for β.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
27 / 59
Asymptotic Properties
• The asymptotic variance of the GMM estimator φ̂ that minimizes Û2 (φ)0 Σ̂−1 Û2 (φ)
is
−1
0 −1
V (φ̂) ∼
= ΓΣ Γ
where
Γ
=
E {∂ Û2 (φ)/∂φ}
Σ
=
V (Û2 ).
• The variance is estimated by
(Γ̂0 Σ̂−1 Γ̂)−1 ,
where Γ̂ = ∂ Û/∂φ evaluated at φ̂ and Σ̂ is an estimated variance-covariance
matrix of Û2 (φ) evaluated at φ̂.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
28 / 59
Asymptotic Properties
• The asymptotic variance of β̂ obtained from (6) with φ̂ computed from the GMM
can be obtained by
−1
0 −1
V (θ̂) ∼
= Γa Σa Γa
where
E {∂ Û(θ)/∂θ}
Γa
=
Σa
=
V (Û)
Û
=
(Û10 , Û20 )0
and θ = (β, φ).
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
29 / 59
1
Introduction
2
Full likelihood-based ML estimation
3
GMM method
4
Partial Likelihood approach
5
Exponential tilting method
6
Concluding Remarks
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
30 / 59
Partial Likelihood approach
• A classical way of likelihood-based approach for parameter estimation under non
ignorable nonresponse is to maximize Lobs (θ, φ) with respect to (θ, φ), where
Y
Lobs (θ, φ) =
f (yi | xi ; θ) g (δi | xi , yi ; φ)
δi =1
×
YZ
f (yi | xi ; θ) g (δi | xi , yi ; φ) dyi
δi =0
• Such approach can be called full likelihood-based approach because it uses full
information available in the observed data.
• However, it is well known that such full likelihood-based approach is quite sensitive
to the failure of the assumed model.
• On the other hand, partial likelihood-based approach (or conditional likelihood
approach) uses a subset of the sample.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
31 / 59
Conditional Likelihood approach
Idea
• Since
f (y | x)g (δ | x, y ) = f1 (y | x, δ)g1 (δ | x),
for some f1 and g1 , we can write
Y
Lobs (θ) =
f1 (yi | xi , δi = 1) g1 (δi | xi )
δi =1
×
YZ
f1 (yi | xi , δi = 0) g1 (δi | xi ) dyi
δi =0
=
Y
f1 (yi | xi , δi = 1) ×
n
Y
g1 (δi | xi ) .
i=1
δi =1
• The conditional likelihood is defined to be the first component:
Lc (θ) =
Y
f1 (yi | xi , δi = 1) =
δi =1
Y
R
δi =1
f (yi | xi ; θ)π(xi , yi )
,
f (y | xi ; θ)π(xi , y )dy
where π(x, yi ) = Pr (δi = 1 | xi , yi ).
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
32 / 59
Conditional Likelihood approach
Example
• Assume that the original sample is a random sample from an exponential
distribution with mean µ = 1/θ. That is, the probability density function of y is
f (y ; θ) = θ exp(−θy )I (y > 0).
• Suppose that we observe yi only when yi > K for a known K > 0.
• Thus, the response indicator function is defined by δi = 1 if yi > K and δi = 0
otherwise.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
33 / 59
Conditional Likelihood approach
Example
• To compute the maximum likelihood estimator from the observed likelihood, note
that
Sobs (θ) =
X 1
δi =1
θ
− yi
+
X 1
δi =0
θ
− E (yi | δi = 0) .
• Since
K exp(−θK )
1
−
,
θ
1 − exp(−θK )
the maximum likelihood estimator of θ can be obtained by the following iteration
equation:
(
)
n
o−1
K exp(−K θ̂(t) )
n−r
(t+1)
= ȳr −
,
(7)
θ̂
r
1 − exp(−K θ̂(t) )
P
P
where r = ni=1 δi and ȳr = r −1 ni=1 δi yi .
E (Y | y > K ) =
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
34 / 59
Conditional Likelihood approach
Example
• Since πi = Pr (δi = 1 | yi ) = I (yi > K ) and E (πi ) = E {I (yi > K )} = exp(−K θ),
the conditional likelihood reduces to
Y
θ exp{−θ(yi − K )}.
δi =1
The maximum conditional likelihood estimator of θ is
θ̂c =
1
.
ȳr − K
Since E (y | y > K ) = µ + K , the maximum conditional likelihood estimator of µ,
which is µ̂c = 1/θ̂c , is unbiased for µ.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
35 / 59
Conditional Likelihood approach
Remark
• Under some regularity conditions, the solution θ̂c that maximizes Lc (θ) satisfies
L
Ic1/2 (θ̂c − θ) −→ N(0, I )
where
Ic (θ) = −E
∂
Sc (θ) | xi ; θ
∂θ0
=
n X
{E (Si πi | xi ; θ)}⊗2
E Si Si0 πi | xi ; θ −
,
E (πi | xi ; θ)
i=1
Sc (θ) = ∂ ln Lc (θ)/∂θ, and Si (θ) = ∂ ln f (yi | xi ; θ) /∂θ.
• Works only when π(x, y ) is a known function.
• Does not require nonresponse instrumental variable assumption.
• Popular for biased sampling problem.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
36 / 59
Pseudo Likelihood approach
Idea
• Consider bivariate (xi , yi ) with density f (y | x; θ)h(x) where yi are subject to
missingness.
• We are interested in estimating θ.
• Suppose that Pr (δ = 1 | x, y ) depends only on y . (i.e. x is nonresponse
instrument)
• Note that f (x | y , δ) = f (x | y ).
• Thus, we can consider the following conditional likelihood
Lc (θ) =
Y
f (xi | yi , δi = 1) =
δi =1
Y
f (xi | yi ).
δi =1
• We can consider maximizing the pseudo likelihood
Lp (θ) =
Y
R
δi =1
f (yi | xi ; θ)ĥ(xi )
,
f (yi | x; θ)ĥ(x)dx
where ĥ(x) is a consistent estimator of the marginal density of x.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
37 / 59
Pseudo Likelihood approach
Idea
• We may use the empirical density in ĥ(x). That is, ĥ(x) = 1/n if x = xi . In this
case,
Lc (θ) =
Y
δi =1
f (y | x ; θ)
Pn i i
.
k=1 f (yi | xk ; θ)
• We can extend the idea to the case of x = (u, z) where z is a nonresponse
instrument. In this case, the conditional likelihood becomes
Y
i:δi =1
Jae-Kwang Kim (ISU)
p(zi | yi , ui ) =
Y
R
i:δi =1
f (yi | ui , zi ; θ)p(zi |ui )
.
f (yi | ui , z; θ)p(z|ui )dz
Nonignorable missing
July 4th, 2014
(8)
38 / 59
Pseudo Likelihood approach
• Let p̂(z|u) be an estimated conditional probability density of z given u. Substituting
this estimate into the likelihood in (8), we obtain the following pseudo likelihood:
Y
R
i:δi =1
f (yi | ui , zi ; θ)p̂(zi |ui )
.
f (yi | ui , z; θ)p̂(z|ui )dz
(9)
• The pseudo maximum likelihood estimator (PMLE) of θ, denoted by θ̂p , can be
obtained by solving
Sp (θ; α̂) ≡
X
[S(θ; xi , yi ) − E {S(θ; ui , z, yi ) | yi , ui ; θ, α̂}] = 0
δi =1
for θ, where S(θ; x, y ) = S(θ; u, z, y ) = ∂ log f (y | x; θ)/∂θ and
R
S(θ; ui , z, yi )f (yi | ui , z; θ)p(z | ui ; α̂)dz
R
E {S(θ; ui , z, yi ) | yi , ui ; θ, α̂} =
.
f (yi | ui , z; θ)p(z | ui ; α̂)dz
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
39 / 59
Pseudo Likelihood approach
• The Fisher-scoring method for obtaining the PMLE is given by
n o−1
θ̂p(t+1) = θ̂p(t) + Ip θ̂(t) , α̂
Sp (θ̂(t) , α̂)
where
Ip (θ, α̂) =
Xh
i
E {S(θ; ui , z, yi )⊗2 | yi , ui ; θ, α̂} − E {S(θ; ui , z, yi ) | yi , ui ; θ, α̂}⊗2 .
δi =1
• Variance estimation is very complicated. Jackknife or bootstrap can be used.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
40 / 59
1
Introduction
2
Full likelihood-based ML estimation
3
GMM method
4
Partial Likelihood approach
5
Exponential tilting method
6
Concluding Remarks
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
41 / 59
Exponential tilting method
Motivation
• Observed likelihood function can be written
Lobs (φ) =
n
Y
{πi (φ)}δi
1−δi
Z
{1 − πi (φ)} f (y |xi )dy
,
i=1
where f (y |x) is the true conditional distribution of y given x.
• To find the MLE of φ, we solve the mean score equation S̄(φ) = 0, where
S̄(φ)
=
n
X
[δi Si (φ) + (1 − δi )E {Si (φ)|xi , δi = 0}] ,
(10)
i=1
where Si (φ) = {δi − πi (φ)}(∂logitπi (φ)/∂φ) is the score function of φ for the
density g (δ | x, y ; φ) = π δ (1 − π)1−δ with π = π(x, y ; φ).
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
42 / 59
Motivation
• The conditional expectation in (10) can be evaluated by using
f (y |x, δ = 0)
=
f (y |x)
P(δ = 0|x, y )
E {P(δ = 0|x, y )|x}
(11)
Two problems occur:
Requires correct specification of f (y | x; θ). Known to be sensitive to
the choice of f (y | x; θ).
2 Computationally heavy: Often uses Monte Carlo computation.
1
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
43 / 59
Exponential tilting method
Remedy (for Problem One)
Idea
Instead of specifying a parametric model for f (y | x), consider specifying a parametric
model for f (y | x, δ = 1), denoted by f1 (y | x). In this case,
R
Si (φ)f1 (y | xi )O(xi , y ; φ)dy
R
E {Si (φ) | xi , δi = 0} =
f1 (y | xi )O(xi , y ; φ)dy
where
O(x, y ; φ) =
Jae-Kwang Kim (ISU)
1 − π(φ; x, y )
.
π(φ; x, y )
Nonignorable missing
July 4th, 2014
44 / 59
Remark
• Based on the following identity
f0 (yi | xi ) = f1 (yi | xi ) ×
O (xi , yi )
,
E {O (xi , Yi ) | xi , δi = 1}
(12)
where fδ (yi | xi ) = f (yi | xi , δi = δ) and
O (xi , yi ) =
Pr (δi = 0 | xi , yi )
Pr (δi = 1 | xi , yi )
(13)
is the conditional odds of nonresponse.
• Kim and Yu (2011) considered a Kernel-based nonparametric regression method of
estimating f (y | x, δ = 1) to obtain E (Y | x, δ = 0).
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
45 / 59
• If the response probability follows from a logistic regression model
π(ui , yi ) ≡ Pr (δi = 1 | ui , yi ) =
exp (φ0 + φ1 ui + φ2 yi )
,
1 + exp (φ0 + φ1 ui + φ2 yi )
(14)
the expression (12) can be simplified to
f0 (yi | xi ) = f1 (yi | xi ) ×
exp (γyi )
,
E {exp (γY ) | xi , δi = 1}
(15)
where γ = −φ2 and f1 (y | x) is the conditional density of y given x and δ = 1.
• Model (15) states that the density for the nonrespondents is an exponential tilting
of the density for the respondents. The parameter γ is the tilting parameter that
determines the amount of departure from the ignorability of the response
mechanism. If γ = 0, the the response mechanism is ignorable and
f0 (y |x) = f1 (y |x).
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
46 / 59
Exponential tilting method
Problem Two
How to compute
R
E {Si (φ) | xi , δi = 0} =
Si (φ)O(xi , y ; φ)f1 (y | xi )dy
R
O(xi , y ; φ)f1 (y | xi )dy
without relying on Monte Carlo computation ?
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
47 / 59
• Computation for
Z
E1 {Q(xi , Y ) | xi } =
Q(xi , y )f1 (y | xi )dy .
• If xi were null, then we would approximate the integration by the empirical
distribution among δ = 1.
• Use
Z
Z
Q(xi , y )f1 (y | xi )dy
=
∝
f1 (y | xi )
f1 (y )dy
f1 (y )
X
f1 (yj | xi )
Q(xi , yj )
f1 (yj )
Q(xi , y )
δj =1
where
Z
f1 (y )
=
∝
f1 (y | x)f (x | δ = 1)dx
X
f1 (y | xi ).
δi =1
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
48 / 59
Exponential tilting method
• In practice, f1 (y | x) is unknown and is estimated by fˆ1 (y | x) = f1 (y | x; γ̂).
• Thus, given γ̂, a fully efficient estimator of φ can be obtained by solving
S2 (φ, γ̂) ≡
n
X
δi S(φ; xi , yi ) + (1 − δi )S̄0 (φ | xi ; γ̂, φ) = 0,
(16)
i=1
where
P
S̄0 (φ | xi ; γ̂, φ) =
S(φ; xi , yj )f1 (yj | xi ; γ̂)O(φ; xi , yj )/fˆ1 (yj )
P
ˆ
δj =1 f1 (yj | xi ; γ̂)O(φ; xi , yj )/f1 (yi )
δj =1
and
fˆ1 (y ) = nR−1
n
X
δi f1 (y | xi ; γ̂).
i=1
• May use EM algorithm to solve (16) for φ.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
49 / 59
Exponential tilting method
• Step 1: Use the responding part of (xi , yi ), obtain γ̂ in the model f1 (y | x; γ).
S1 (γ) ≡
X
S1 (γ; xi , yi ) = 0.
(17)
δi =1
• Step 2: Given γ̂ from Step 1, obtain φ̂ by solving (16):
S2 (φ, γ̂) = 0.
• Step 3: Using φ̂ computed from Step 2, the PSA estimator of θ can be obtained by
solving
n
X
δi
U(θ; xi , yi ) = 0,
π̂
i
i=1
(18)
where π̂i = πi (φ̂).
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
50 / 59
Exponential tilting method
Remark
• In many cases, x is categorical and f1 (y | x) can be fully nonparametric.
• If x has a continuous part, nonparametric Kernel smoothing can be used.
• The proposed method seems to be robust against the failure of the assumed model
on f1 (y | x; γ) .
• Asymptotic normality of PSA estimator can be obtained & Linearization method
can be used for variance estimation (Details skipped)
• By augmenting the estimating function, we can also impose a calibration
constraint such as
Jae-Kwang Kim (ISU)
n
n
X
X
δi
xi =
xi .
π̂i
i=1
i=1
Nonignorable missing
July 4th, 2014
51 / 59
Exponential tilting method
Example 4 (Example 6.5 of KS)
• Assume that both xi = (zi , ui ) and yi are categorical with category
{(i, j); i ∈ Sz × Su } and Sy , respectively.
• We are interested in estimating θk = Pr (Y = k), for k ∈ Sy .
• Now, we have nonresponse in y and let δi be the response indicator function for yi .
We assume that the response probability satisfies
Pr (δ = 1 | x, y ) = π (u, y ; φ) .
• To estimate φ, we first compute the observed conditional probability of y among
the respondents:
P
p̂1 (y | xi ) =
I (xj = xi , yj = y )
δj =1
P
δj =1
Jae-Kwang Kim (ISU)
I (xj = xi )
Nonignorable missing
.
July 4th, 2014
52 / 59
Exponential tilting method
Example 4 (Cont’d)
• The EM algorithm can be implemented by (16) with
P
S̄0 (φ | xi ; φ) =
S(φ; δi , ui , yj )p̂1 (yj | xi )O(φ; ui , yj )/p̂1 (yj )
P
,
δj =1 p̂1 (yj | xi )O(φ; ui , yj )/p̂1 (yj )
δj =1
where O(φ; u, y ) = {1 − π(u, y ; φ)}/π(u, y ; φ) and
p̂1 (y ) = nR−1
n
X
δi p̂1 (y | xi ).
i=1
• Alternatively, we can use
P
S̄0 (φ | xi ; φ) =
Jae-Kwang Kim (ISU)
y ∈Sy
S(φ; δi , ui , y )p̂1 (y | xi )O(φ; ui , y )
P
.
y ∈Sy p̂1 (y | xi )O(φ; ui , y )
Nonignorable missing
July 4th, 2014
(19)
53 / 59
Exponential tilting method
Example 4 (Cont’d)
• Once π̂(u, y ) = π(u, y ; φ̂) is computed, we can use
θ̂k,ET = n−1

X

δi =1
I (yi = k) +
XX
wiy∗ I (y = k)
δi =0 y ∈Sy


,

where wiy∗ is the fractional weights computed by
{π̂(ui , y )}−1 − 1}p̂1 (y |xi )
.
−1 − 1}p̂ (y |x )
1
i
y ∈Sy {π̂(ui , y )
wiy∗ = P
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
54 / 59
Real Data Example
Exit Poll: The Assembly election (2012 Gang-dong district in Seoul)
Gender
Male
Female
Age
20-29
30-39
40-49
5020-29
30-39
40-49
50-
Total
Truth
Jae-Kwang Kim (ISU)
Party A
93
104
146
560
106
129
170
501
1,809
62,489
Party B
115
233
295
350
159
242
262
218
1,874
57,909
Other
4
8
5
3
8
5
5
7
45
1,624
Nonignorable missing
Refusal
28
82
49
174
62
70
69
211
745
Total
240
427
495
1,087
335
446
506
937
4,473
122,022
July 4th, 2014
55 / 59
Comparison of the methods (%)
Table : Analysis result : Gang-dong district in Seoul
Method
No adjustment
Adjustment (Age * Sex)
New Method
Truth
Jae-Kwang Kim (ISU)
Party A
48.5
49.0
51.0
51.2
Nonignorable missing
Party B
50.3
49.8
47.7
47.5
Other
1.2
1.2
1.2
1.3
July 4th, 2014
56 / 59
Analysis result in Seoul (48 Seats)
Table : Analysis result : 48 seats in Seoul
Method
No adjustment
Adjustment (Age* Sex)
New Method
Truth
Jae-Kwang Kim (ISU)
Party A
10
10
15
16
Nonignorable missing
Party B
36
36
29
30
Other
2
2
4
2
July 4th, 2014
57 / 59
1
Introduction
2
Full likelihood-based ML estimation
3
GMM method
4
Partial Likelihood approach
5
Exponential tilting method
6
Concluding Remarks
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
58 / 59
Concluding remarks
• Uses a model for the response probability.
• Parameter estimation for response model can be implemented using the idea of
maximum likelihood method.
• Instrumental variable needed for identifiability of the response model.
• Likelihood-based approach vs GMM approach
• Less tools for model diagnostics or model validation
• Promising areas of research
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
59 / 59
REFERENCES
Ibrahim, J. G. (1990), ‘Incomplete data in generalized linear models’, Journal of the
American Statistical Association 85, 765–769.
Kim, J. K. and C. L. Yu (2011), ‘A semi-parametric estimation of mean functionals with
non-ignorable missing data’, Journal of the American Statistical Association
106, 157–165.
Kott, P. S. and T. Chang (2010), ‘Using calibration weighting to adjust for nonignorable
unit nonresponse’, Journal of the American Statistical Association 105, 1265–1275.
Wang, S., J. Shao and J. K. Kim (2014), ‘Identifiability and estimation in problems with
nonignorable nonresponse’, Statistica Sinica 24, 1097 – 1116.
Wei, G. C. and M. A. Tanner (1990), ‘A Monte Carlo implementation of the EM
algorithm and the poor man’s data augmentation algorithms’, Journal of the
American Statistical Association 85, 699–704.
Jae-Kwang Kim (ISU)
Nonignorable missing
July 4th, 2014
59 / 59
Download