Generalized Functional Linear Models with Semiparametric Single-Index Interactions Web Appendix

advertisement
Generalized Functional Linear Models with
Semiparametric Single-Index Interactions
Web Appendix
Yehua Li
Department of Statistics, University of Georgia, Athens, GA 30602, yehuali@uga.edu
Naisyin Wang
Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107,
nwangaa@umich.edu
Raymond J. Carroll
Department of Statistics, Texas A&M University, TAMU 3143, College Station, TX
77843-3143, carroll@stat.tamu.edu
A. Algorithm Implementation With One-step Updates
To speed up the estimation algorithm, we use a one-step update version of the iterative
algorithm in Stages 1 and 2, as follows. The key is to replace the full optimization steps of
minimizing (5) within the iterative updating of S(·) and Θ by a one-step Newton-Raphson
algorithm. This eliminates the computational burden of full iteration within the inner loop
when the estimates of S(·) and Θ are still far away from the target. As a consequence, it
also speeds up the iterative process.
Recall that q1 (x, y) = {y − g −1 (x)}ρ1 (x), q2 (x, y) = {y − g −1 (x)}ρ01 (x) − ρ2 (x), where
ρ` (x) = {dg −1 (x)/dx}` /V {g −1 (x)}, ` = 1, 2, and Z ij,1 = Z i1 − Z j1 . When we fix the
parametric component, Θ , and minimize (5) with respect to (a0j , a1j ), we use a NewtonRaphson algorithm to perform the following one-step update:
!T −1
!Ã
Ã
! 
!
Ã
Ã


Pn
1
1
ǎ0j
b
a0j
T
2
α
w
q
(η̌
,
Y
)(α̌
ξ
)
−
=
ij
2
ij
i
T
T
2 i

 i=1
ǎ1j
b
a1j
θ̌θ Z ij,1
θ̌θ Z ij,1
!)
Ã
(
Pn
1
αT
,
×
(A.1)
T
2 ξ i)
i=1 wij q1 (η̌ij , Yi )(α̌
θ̌θ Z ij,1
1
T
T
α1 + (ǎ0j + ǎ1j θ̌θ Z ij,1 )α̌
α2 }Tξ i + β̌
β Z i : all estimates on the right-hand-side of
where η̌ij = {α̌
(A.1), and later (A.2) and (A.3), denote their values before the update. To enforce the
P
P
constraint that n−1 ni=1 S(θθ TZ i1 ) = 0, we set b
a0j = b
a0j − n−1 n`=1 b
a0` .
Similarly, we update the parametric components other than θ by



 


T −1




α
b1
α
ξ
ξ
α̌ 1
i
i
P P








n
n
b 2  =  α̌
α2  −
a0j ξ i   b
a0j ξ i 
α
j=1
i=1 wij q2 (η̌ij , Yi )  b




b


β
β
β̌
Zi
Zi





ξ
i
P P



n
n
(A.2)
×
w
q
(η̌
,
Y
)
ξ
a0j i  .
i  b
j=1
i=1 ij 1 ij




Zi
b 2 = sign(b
b 2 /kb
To enforce the constraint, we adjust the value of α 2 by letting α
α21 ) × α
α 2 k.
Finally, we update θ ;
½
¾−1 ½
¾
P
P
P
P
n
n
T
n
n
T
T
2
b 2 ξ i ) Z ij,1Z ij,1
b 2 ξ i )Z
Z ij,1 (A.3)
a1j α
a1j α
θb = θ̌θ −
.
j=1
i=1 wij q2 (η̌ij , Yi )(b
j=1
i=1 wij q1 (η̌ij , Yi )(b
To adjust for the constraint on θ , we let θb = sign(θb1 ) × θb/kθbk.
B. Technical Proofs
We sketch the proofs of our main theoretical results.
α, β , θ , S(·)} =
Throughout, η{α
T
Θ, S(·)} denotes the expression α T
θ TZ 1 )α
αT
η{Θ
1 ξ + S(θ
2 ξ + β Z , and the subindices “i” and
“ij” of η indicate the replacement of each random variable by the corresponding observations. Provided that the clarity of the presentation is preserved, we also leave out arguments
of functions to simplify certain equations.
B.1
Proof of Theorem 1
For the iterative estimation procedure proposed in Section 3.1, we denote the current value
of the parameter with a subscript “curr”, and the previous values with a subscript “prev”.
e prev , and update S(z) by minimizing (4). We let a0 = S0 (θθ T z),
For the first step, we fix Θ
0
∗
Z i1 − z)/b, a∗1 = ba1 and define, for any z ∈ Rd1 , the
a∗1 = bS00 (θθ T
0 z), denote Z i0,1 = (Z
2
multivariate kernel weight as wi (z) = [
Pn
Z `1 − z)/b}]−1 H{(Z
Z i1 − z)/b}, where b is
H{(Z
`=1
Z j1 . The updated estimate (e
the bandwidth. Recall that Z ij,1 = Z i1 −Z
a0,curr , e
a∗1,curr )T satisfies
the local estimating equation
0 =
Pn
e
T
e
+e
a∗1,currθeprevZ ∗i0,1 )}, Yi ]
Ã
!
1
T
.
×(e
α 2,prevξ i )
T
θeprevZ ∗i0,1
α prev , β prev , θ prev , (e
a0,curr
i=1 wi (z)q1 [ηi {e
By a standard Taylor expansion, we have
0=
Ã
Pn
e
e
α prev , β prev , θ prev , (a0
i=1 wi (z)q1 [ηi {e
+
³P
+
T
a∗1θeprevZ ∗i0,1 )}, Yi ](e
αT
2,prevξ i )
n
α prev , βeprev , θeprev , (a0
i=1 wi (z)q2 [ηi {e
³
×
´³
1
T
θeprevZ ∗i0,1
´T
1
!
T
θeprevZ ∗i0,1
!
1
T
θeprevZ ∗i0,1
T
2
+ a∗1θeprevZ ∗i0,1 )}, Yi ](e
αT
2,prevξ i )
³e
a0,curr − a0 ´
× {1 + op (1)}.
e
a∗1,curr − a∗1
(B.1)
T T
eT T
e prev ) = k(e
αT
Denote δn∗ = b2 + {log(n)/(nbd1 )}1/2 and ∆(Θ
αT
prev , β prev ) − (α
0 , β 0 ) k. Define
Θ, S), Yi }(α
αT
Z i1 = z],
γ 1 (z; Θ , S) = E[q1 {ηi (Θ
2 ξ i )|Z
2
Θ, S), Yi }(α
αT
Z i1 = z].
V 1 (z; Θ , S) = −E[q2 {ηi (Θ
2 ξ i ) |Z
By (B.1) and some standard derivation, we obtain that, uniformly for z ∈ D,
Pn
e
e
α prev , β prev , θ prev , (a0
i=1 wi (z)q1 [ηi {e
=
Pn
αT
1,prevξ i
i=1 wi (z)q1 [e
T
+ a∗1θeprevZ ∗i0,1 )}, Yi ](e
αT
2,prevξ i )
T
eT
+ (a0 + a∗1θeprevZ ∗i0,1 )e
αT
αT
2,prevξ i + β prevZ i , Yi ](e
2,prevξ i )
e prev , βeprev , θ 0 , S0 ) + Op (δn∗ ),
= γ 1 (z; α
and
Pn
αT
1,prev ξ i
i=1 wi (z)q1 {e
T
T
T
∗
e
e
αT
αT
θ prevZ ∗i0,1 )e
+ (a0 + a∗1e
2,prev ξ i )θ prev Z i0,1
2,prev ξ i + β prev Z i , Yi }(e
T P
eT
Z ∗i0,1
θ 0Z i1 )e
αT
αT
αT
= θeprev ni=1 wi (z)q1 {e
2,prevξ i )Z
1,prevξ i + S(θ
2,prevξ i + β prevZ i , Yi }(e
hP
T
n
0 T e
θ 0 z)θ prevZ ∗i0,1 }
θT
θT
+
0 Z i1 ) + bS (θ
0 z) − S0 (θ
i=1 wi (z){S0 (θ
i
T
T
T
T
T
∗
2 e
e
Z
Z
α
ξ
θ
α
ξ
θ
α
ξ
β
×q2 {e 1,prev i + S(θ 0 z)e 2,prev i + prev i , Yi }(e 2,prev i ) ( prev i0,1 ) × {1 + op (1)}
3
T
∂
1
∂
e prev , βeprev , θ 0 , S0 )
= bθeprev { γ 1 +
fZ (z)γγ 1 }(z; α
∂z
fZ 1 (z) ∂z 1
T
V 1 (z; α
e prev , βe , θ 0 , S0 ) + Op (δ ∗ )
−b(1 − θe θ 0 )S 0 (θθ 0 z)V
prev
prev
= −b(1
T
V 1 (z; α
e prev , βeprev , θ 0 , S0 )
− θeprevθ 0 )S 0 (θθ 0 z)V
n
e prev )} + Op (δ ∗ ),
+ Op {b∆(Θ
n
By similar calculations, we can derive from (B.1) that, uniformly for z ∈ D,
³e
o−1
e prev , βeprev , θ 0 , S0 )
V 12
a0,curr − a0 ´ n V 1 (z; α
=
e prev , βeprev , θ 0 , S0 )
V 12
V 1 (z; α
e
a∗1,curr − a∗1
o
n
e prev , βeprev , θ 0 , S0 )
γ 1 (z; α
×
T
e , θ 0 , S0 ) + Op {b∆(Θ
e prev )}
θ prevθ 0 )S 0 (θθ 0 z)V
V 1 (z; α
e prev , β
−b(1 − e
prev
+Op (δn∗ ),
where
T
∂
1
∂
V 2 }(z, α
e prev , βeprev , θ 0 , S0 ) = Op (b)
V 12 = bθeprev { V 2 +
fZ (z)V
∂z
fZ 1 (z) ∂z 1
is an negligible term.
Define V 2 (z; Θ , S) as




ξi

T
T
Θ, S), Yi }(α
α2 ξ i )
−E q2 {ηi (Θ
S(θθ Z i1 )ξξ i


Zi
¯

¯

¯
¯

¯ Z i1 = z .

¯
¯
By the fact that γ 1 (z; Θ 0 , S0 ) = 0 for all z, a Taylor expansion of γ 1 on α and β , and further
derivations, we obtain
V −1
e prev , βeprev , θ 0 , S0 )
e
a0,curr − a0 = −V
1 (z; α




e 1,prev − α 1,0 
α




T
× V 2 (z; Θ 0 , S0 )  α
e 2,prev − α 2,0  × {1 + op (1)} + Op (δn∗ ), (B.2)




βeprev − β 0
and
∗
eT
e
e
a∗1,curr = bS 0 (θθ T
0 z)θ prevθ 0 + Op {b∆(Θ prev )} + Op (δn ).
4
(B.3)
Next, holding the link function and θ fixed at their current values, we update α and β .
e curr and βecurr satisfy the estimating equations (B.4),
At convergence, α
0=
=
Pn Pn
j=1
∗
e , θeprev , (e
eT Z ∗ )}, Yi ]
e
w
q
[η
{e
α
,
β
a
+
a
θ
ij
1
i
curr
0j,curr
curr
1j,curr
prev ij,1
i=1


ξ


i


T
∗
∗ e
× (e
a0j + e
a1j θ prevZ ij,1 )ξξ i




Zi

Pn Pn
+
e
α0 , β 0 , θ prev , (e
a0j,curr
i=1 wij q1 [η{α
j=1
³P P
n
n
j=1
e
⊗2 
ξi


× e
a0j,currξ i 
Zi
e 1,curr − α 1,0
α

e 2,curr − α 2,0
α
βe
−β
curr

T


+e
a∗1j,currθeprevZ ∗ij,1 )}, Yi ]  e
a0j,currξ i 
Zi
α0 , β 0 , θ prev , (e
a0j,curr
i=1 wij q2 [ηij {α

ξi
T
+e
a∗1j,currθeprevZ ∗ij,1 )}, Yi ]


2

 × {1 + op (1)} + Op (b ),
(B.4)
0
where d⊗2 = ddT . Define



ξi
¯
¯


Θ, S), Yi } S(θθ TZ i1 )ξξ i
γ 2 (z; Θ, S) = E q1 {ηi (Θ
¯Z i1 = z ,




Zi


⊗2



ξi
 ¯



¯
T

Θ, S), Yi } S(θθ Z i1 )ξξ i
V 3 (z; Θ , S) = −E q2 {ηi (Θ
¯Z i1 = z
.




Zi




Z j1 ; Θ 0 , Securr ) =
By (B.2), (B.4), the fact γ 2 (z; Θ 0 , S0 ) = 0 for all z, the expansion γ 2 (Z
V 2 (Z
Z j1 ; Θ 0 , S0 )(e
−V
a0j,curr − a0j ) × {1 + op (1)}, we obtain


e 1,curr − α 1,0
α
o
n
o−1 n
X
X


V
Z
Z j1 ; Θ 0 , S0 )
V 2V T
/V
)(Z
;
Θ
,
S
)
n−1
V 3 (Z
n−1
(V
e 2,curr − α 2,0  =
α
1
j1
0
0
2
j
j
βecurr − β 0


e 1,prev − α 1,0
α


× α
(B.5)
e 2,prev − α 2,0  × {1 + op (1)} + Op (δn∗ ).
βeprev − β 0
e prev embedded within
One source of additional variation that could be easily missed is the Θ
5
the expressions of b
acurr , as given by (B.2) and (B.3). This is carefully taken into account in
our derivations.
Asymptotically, the iterations converge if the distances between the current and previous
estimates of α ’s and β ’s go to zero as the iteration number goes to ∞. By (B.5) and for
V 3 (Z
Z 1 ; Θ 0 , S0 )}−1 E{(V
V 2V T
V 1 )(Z
Z 1 ; Θ 0 , S0 )} has eigenvalues
a large n, this occurs when E{V
2 /V
strictly less than 1. By the Cauchy-Schwartz inequality, we can show that, for all z ∈ D,
V 2V T
V 1 × V 3 )(z; Θ , S), where equality holds only if the order of the products
(V
2 )(z; Θ , S) ≤ (V
of nonlinear functions and the conditional expectations are exchangeable, which does not
hold in our model. Consequently, this establishes the asymptotic convergence property for
the iteration procedure. As a result, at the limit,
³
´
∗
e
α
e 1,curr − α1,0 , α
e 2,curr − α
e 2,0 , β
β
curr − 0 = Op (δn ).
(B.6)
The constraints on S(·) and α 2 guarantee that the estimators converge to the uniquely
defined parameters.
Finally, using the (e
α , βe) at convergence, we update θ by solving the local estimating
equation
0 = n−1
Pn Pn
j=1
e e
α , β , θ curr , (e
a0j
i=1 wij q1 [ηij {e
T
+e
a∗1j θecurrZ ∗ij,1 )}, Yi ](e
αT
a∗1j Z ∗ij,1 .
2 ξ i )e
We expand the right hand side of the equation as before,
0 = n−1
= n−1
Pn Pn
j=1
αT
1 ξi
i=1 wij q1 {e
Pn Pn
+n−1
j=1
αT
1 ξi
i=1 wij q1 {e
Pn Pn
j=1
T
eT
+ (e
a0j + e
a∗1j θecurrZ ∗ij,1 )e
αT
αT
a∗1j Z ∗ij,1
2 ξ i + β Z i , Yi }(e
2 ξ i )e
eT
+ S(θθ T
αT
αT
a∗1j Z ∗ij,1
0 Z i1 )e
2 ξ i + β Z i , Yi }(e
2 ξ i )e
αT
1 ξi
i=1 wij q2 {e
2 ∗
eT
+ S(θθ T
αT
αT
a1j Z ∗ij,1
0 Z i1 )e
2 ξ i + β Z i , Yi }(e
2 ξ i) e
T
θ currZ ∗ij,1 ) × {1 + op (1)}
×(e
a0j − a0j + a0j − a0i + e
a∗1j e
P
∂
1
∂
Z j1 )}γ1 (Z
Z j1 ; α
e , βe, θ 0 , S0 ) + Op (bδn∗ )
a∗1j { +
= n−1 nj=1 be
fZ 1 (Z
Z j1 ) ∂z
∂z fZ 1 (Z
P
∂
1
∂
Z j1 )}V
V 1 (Z
Z j1 ; α
e , βe, θ 0 , S0 )
a0j − a0j ){ +
a∗1j (e
+n−1 nj=1 be
fZ (Z
Z j1 ) ∂z 1
∂z fZ 1 (Z
P
V 1 (Z
Z j1 ; α
e , βe, θ 0 , S0 )θθ 0
a∗ {−S 0 (θθ TZ j1 )}V
−n−1 n be
j=1
−n−1
1j
0
0
Pn
Z j1 ; α
e , βe, θ 0 , S0 )θecurr
a∗1j )2V 1 (Z
j=1 (e
6
+ op (b2 δn∗ ).
T
By (B.3), we obtain that θecurr = (θeprevθ 0 )−1θ 0 + Op (b−1 δn∗ ). By standardizing θecurr and
correcting the sign of the first entry, we have
θecurr − θ 0 = Op (b−1 δn∗ ),
(B.7)
which converges to 0 by Condition (C2.1).
B.2
Proof of Theorem 2
b bT
Following similar notation as in Carroll et al. (1997), we denote Ui = θ T
0 Z i1 , Ui = θ Z i1 For
P
the refined estimator, the kernel weight is defined as wi (u) = K{(Ubi − u)/h}/ n`=1 K{(Ub` −
b ) = kΘ
b − Θ 0 k.
u)/h} for u ∈ R. Denote δn = {log(n)/(nh)}1/2 , ∆(Θ
b prev . For
First, we fix the value of Θ at the value from the previous iteration, denoted by Θ
Z i1 − z)/h and (b
any z ∈ D ⊂ Rd1 , as in the previous subsection, we let Z ∗i0,1 = (Z
a0,curr , b
a∗1,curr )
be the updated estimators for a0 and a∗1 which solves the local estimation equation (B.8),
h
i
P
T
T
0 = ni=1 wi (θbprev z)q1 ηi {b
α prev , βbprev , θbprev , (b
a0,curr + b
a∗1,currθbprevZ ∗i0,1 )}, Yi
Ã
!
1
×(b
αT
T
2,prevξ i )
θbprevZ ∗i0,1
Ã
!
1
Pn
T
b
αT
=
× {1 + op (1)}
T
2,0ξ i )
i=1 wi (θ prev z)q1 (ηi , Yi )(α
θbprevZ ∗i0,1
"
Ã
!
1
Pn
T
T
2
∗
b z)q2 (ηi , Yi )(α
bT Z ∗ − S(θθ TZ i1 )}
α
+
w
(
θ
ξ
)
{b
a
+
b
a
θ
T
i
0,curr
i
prev
2,0
1,curr
prev i0,1
0
i=1
θbprevZ ∗i0,1
T 


Ã
!

b
−
ξ
α
α
1,prev
1,0
i


1
Pn
T


T
T
b
α
+
w
(
θ
z)q
(η
,
Y
)(α
ξ
)
α
b
α
θ
Z
ξ
−
S(θ
)ξ


T
2 i
i
i1
2,prev
2,0
2,0 i
i
0
i=1 i prev


θbprevZ ∗i0,1 


Zi
βbprev − β 0
(B.8)
×{1 + op (1)}.
T
For the observations with |θbprevZ i1 − u| ≤ h, we obtain
T
b
a0,curr + b
a∗1,currθbprevZ ∗i0,1 − S(θθ T
0 Z i1 )
!
!T Ã
Ã
1
b
a0,curr − a0
b
ZT
+ S 0 (θθ T
=
T
i0,1 (θ prev − θ 0 )
0 z)Z
∗
∗
∗
b
b
a1,curr − a1
θ prevZ i0,1
T
1
− S (2) (θb0 z)(θbprevZ i0,1 )2 + O(h3 ) + op (kθbprev − θ 0 k).
2
7
Define
2 T
αT
θ Z i1
V1 (z; θ ) = E{ρ2 (ηi )(α
2,0ξ i ) |θ


ξi

T
T
α2,0ξ i ) S(θθ 0 Z i1 )ξξ i
π i1 = ρ2 (ηi )(α


Zi
= θ T z},



2
αT
, π i2 = ρ2 (ηi )(α
2,0ξ i ) Z i1 ,


(B.9)
π i1 |θθ TZ i1 = θ T z), πb2 (z; θ ) = E(π
π i2 |θθ TZ i1 = θ T z),
πb1 (z; θ ) = E(π
"
#
πb1 (z; θ )
V2 (z; θ ) =
.
S (1) (θθ T z){b
π 2 (z; θ ) − V1 (z; θ )z}
By solving the local estimating equation in (B.8), we obtain the following results, uniformly
for all u = θ T
0 z with z ∈ D:
b prev − Θ 0 ) + R0,n (z)
b
a0,curr − a0 = −(V1−1 V2T )(z; θ 0 ) × (Θ
σ2
2
3
b
+ K S (2) (θθ T
0 z)h + Op (h ) + op {∆(Θ prev )},
2
b prev )} + Op (h3 + δn ),
b
a∗1,curr − a∗1 = Op {(h + δn ) × ∆(Θ
(B.10)
where, by letting ²i = q1 {ηi , Yi },
P
T
αT
R0,n (z) = V1−1 (z; θbprev ) ni=1 wi (θbprev z)(α
2,0ξ i )²i .
(B.11)
b prev embedded with the estiAgain, the expressions in (B.10) allow us to account for the Θ
b by solving
b and Sb(1) (·) and update Θ
mated a’s in equation (B.13) below. Next, we fix S(·)
the estimating equation (B.12),
0 = n−1
Pn Pn
j=1
T
+b
a1j,currθbcurrZ ij,1 )}, Yi ]


ξi



T

b
(b
a0j,curr + b
a1j,currθ currZ ij,1 )ξξ i
(B.12)
.

Zi




(b
α T ξ )b
a
Z
b
b
α curr , β curr , θ curr , (b
a0j,curr
i=1 wij q1 [ηi {b






×





2,curr i
1j,curr
ij,1
T
For observations with |θbcurrZ ij,1 | ≤ h, we have
T
θ currZ ij,1 − S(θθ 0Z i1 )
b
a0j,curr + b
a1j,currb
!
!Ã
Ã
1
b
a
−
a
0j,curr
0j
T
Z ij,1 (θbcurr − θ 0 ) +
= S (1) (θθ 0Z j1 )Z
T
b
a1j,curr − a1j
θbcurrZ ij,1
1
2
3
b
bT
− S (2) (θθ T
0 Z j1 )(θ prevZ ij,1 ) + Op (h ) + op (kθ curr − θ 0 k).
2
8
Z 1 |θθ TZ 1 = θ T z). The right hand side of equation (B.12) can be
Define νZ 1 (z; θ ) = E(Z
rewritten as

−1
0 = n


i=1 ²i 

Pn
Pn n
ξi
T
S(θθ 0 Z i1 )ξξ i
Zi
T
(1) T
α2,0ξ i )S (θθ 0 Z i1 ){Z
Z i1 − νZ 1 (Z
Z i1 , θ 0 )}
(α



 × {1 + op (1)}

b curr − Θ 0 ) − V2 (Z
Z j1 ; θ 0 )(Θ
Z j1 ; θ 0 )(b
− V3 (Z
a0j,curr − a0j )
o
2 2
σK
h (2) T
b prev )}, (B.13)
Z j1 ; θ 0 ) {1 + op (1)} + op (n−1/2 ) + op {∆(Θ
+
S (θθ 0 Z j1 )V2 (Z
2
+n−1
j=1
[` ,`2 ]
where V3 (z; θ ) is a symmetric matrix whose (`1 , `2 )th block is denoted by V3 1
for `1 , `2 =
1, 2, and




ξi

[1,1]
T
V3 (z; θ ) = E 
ρ2 (ηi )  S(θθ 0 Z i1 )ξξ i

Zi

⊗2
¯

 ¯

¯θ TZ i1 = θ T z ,
¯



[1,2]
π i1Z T
θ TZ i1 = θ T z) − πb1 (z; θ )zT },
(z; θ ) = S (1) (θθ T
0 z){E(π
i1 |θ
[2,1]
(z; θ ) = {V3
V3
[1,2]
(z; θ )}T ,
h
[2,2]
T T
2
2
αT
θ Z i1 = θ T z} − πb2 (z; θ )zT
V3 (z; θ ) = {S (1) (θθ T
z)}
E{ρ2 (ηi )(α
0
2,0ξ i ) Z i1Z i1 |θ
i
T
−zb
πT
(z;
θ
)
+
V
(z;
θ
)zz
.
1
2
V3
By plugging (B.10) into (B.13) we have
b curr − Θ 0 = A− Nn + A− C(Θ
b prev − Θ 0 ) + op (n−1/2 ) + op (kΘ
b prev − Θ 0 k),
Θ
(B.14)
where

T
ξ |θθ T
αT
Z i1 ; θ 0 )(α
αT
ξ i − V1−1 (Z
0 Z 1 = θ 0 Z i1 }
2,0ξ )ξ
2,0ξ i )E{ρ2 (η)(α
 S(θθ TZ )ξξ − V −1 (Z
T
Pn
ξ |θθ T
θT
αT
Z i1 ; θ 0 )(α
αT

1
0 Z 1 = θ 0 Z i1 }
0 Z 1 )ξ
2,0ξ )S(θ
2,0ξ i )E{ρ2 (η)(α
0 i1 i
−1
Nn = n
i=1 ²i 
T

Z |θθ T
αT
Z i1 ; θ 0 )(α
αT
Z i − V1−1 (Z
0 Z 1 = θ 0 Z i1 }
2,0ξ )Z
2,0ξ i )E{ρ2 (η)(α
(1) T
Z i1 − V1−1 (Z
Z i1;θθ 0 )b
π 2 (Z
Z i1 , θ 0 )}
αT
(θθ 0 Z i1 ){Z
(α
2,0ξ i )S
)#
"
(
T
(1)
T
b
b
θ
π
π
S
(θ
Z
)b
π
(b
π
−
V
Z
)
1
0
1
1
2
1
1
1
,
C = E V1−1
S (1) (θθ 0Z 1 )(b
π 2 − V1Z 1 )b
πT
{S (1) (θθ 0Z 1 )}2 (b
π 2 − V1Z 1 )(b
π 2 − V1Z 1 )T
1
Z 1 ; θ 0 )}.
A = E{V3 (Z



,

(B.15)
9
Define
¤T
£
1/2
ξ }T , Z T ,
g1 = ρ2 (η) ξ T , {S(θθ T
0 Z 1 )ξ
g = (g1T , g2T )T ,
1/2
αT
Z 1,
g2 = S (1) (θθ T
0 Z 1 )ρ2 (η)(α
2,0ξ )Z
Z 1 ; θ 0 ), S (1) (θθ T
Z 1 ; θ 0 )}T ,
% = {b
πT
πT
1 (Z
0 Z 1 )b
2 (Z
and let D = A − C = E(ggT ) − E(V1−1%% T ). It can be shown that both A and C are positive
semi-definite matrices with rank 2p + 2d − 1. As in the previous subsection, an asymptotic
convergence property is achieved provided that the eigenvalues of A− C are strictly less than
1. This can be established by the Cauchy-Schwartz inequality so that D is positive semidefinite with the same rank as A and C, and the eigenvalues of A− C are strictly less than
b curr − Θ 0 = A− Nn + op (n−1/2 ). By the Central
1. Consequently, at the end of iterations, Θ
Limit Theorem, Nn is asymptotically normal. Defining
B = n × cov(Nn ),
(B.16)
b follows immediately. Finally, the asymptotic normal
the asymptotic normality result of Θ
b
distribution of S(u)
follows from (B.10).
B.3
Proof of Theorem 3
Proof of Lemma 1: The first result in Lemma 1 that kψbk − ψk k = Op {h2$ + (nh$ )−1/2 }
was established by Hall, et al. (2006) for the case of fixed m. With the smoothing parameter
h$ in the range defined in (C4.4), we have kψbk − ψk k = Op (n−1/3 ). It is easily shown that
ξbik − ξik = ζ1,ik + ζ2,ik + ζ3,ik ,
(B.17)
Rb
P
Pm
where ζ1,ik = {(b−a)/m} m
j=1 Xi (tij )ψk (tij )− a Xi (t)ψk (t)dt, ζ2,ik = {(b−a)/m}
j=1 Uij ψk (tij ),
Pm
and ζ3,ik = {(b − a)/m} j=1 Wij {ψbk (tij ) − ψk (tij )}. It can then be established that, with
Xi (t) and ψk (t) being continuously differentiable, ζ1,ik , representing the numerical integration error, is of order Op (m−1 ), that ζ2,ik = Op (m−1/2 ). Lemma 1 now follows because
ζ3,ik = Op (n−1/3 ) by the Cauchy-Schwartz inequality, since
|ζ3,ik | ≤
³b − aP
m
2
j=1 Wij
´1/2 h b − a P
m
m
−1/3
b
= Op (kψk − ψk k) = Op (n
).
10
m
b
j=1 {ψk (tij )
− ψk (tij )}2
i1/2
Proof of Theorem 3: The proof of Theorem 3 essentially follows the same lines as those
for Theorem 1 but with extra effort on keeping track of additional terms due to ξbi − ξ i , as
given by equation (B.17). It can be shown that, for equations (B.2), (B.3) and (B.6), the
effect due to ξ i being estimated is an additional term of order Op (m−1 + n−1/3 ) on the right
hand side of the equations. For equation (B.7), this results in an additional term of order
e remains
Op (b−1 m−1 + b−1 n−1/3 ). Under the bandwidth assumption (C4.4), the estimator Θ
consistent.
B.4
Proof of Theorem 4
The proof of Theorem 4 follows the same lines as those for Theorem 2, except that the term
²i in equations (B.11)-(B.13) should be replaced by
T
b
b
αT
θT
αT
e
²i = q1 {α
1,0ξ i + S0 (θ
0 Z i1 )α
2,0ξ i + β 0 Z i , Yi }
α1,0 + S0 (θθ T
α2,0 }T (b
ξ i − ξ i ).
= ²i + q2 (ηi , Yi ){α
0 Z i1 )α
Let A and C be as defined in the Proof of Theorem 2. Equation (B.14) becomes
b curr − Θ 0 = A− N
b prev − Θ 0 ) + op (n−1/2 ) + op (kΘ
b prev − Θ 0 k),
en + A− C(Θ
Θ
en = Nn + N1,n , N1,n = n−1 Pn Ei (ξbi − ξ i ) and
where N
i=1



Ei = 

T
Z i1 ; θ 0 )(α
αT
αT
ξ |θθ T
ξ i − V1−1 (Z
2,0ξ i )E{ρ2 (η)(α
2,0ξ )ξ
0 Z 1 = θ 0 Z i1 }
T
ξ i − V1−1 (Z
Z i1 ; θ 0 )(α
αT
αT
θT
ξ |θθ T
S(θθ T
0 Z i1 )ξ
2,0ξ i )E{ρ2 (η)(α
2,0ξ )S(θ
0 Z 1 )ξ
0 Z 1 = θ 0 Z i1 }
T
Z i1 ; θ 0 )(α
αT
αT
Z |θθ T
Z i − V1−1 (Z
2,0ξ i )E{ρ2 (η)(α
2,0ξ )Z
0 Z 1 = θ 0 Z i1 }
(1) T
αT
Z i1 − V1−1 (Z
Z i1;θθ 0 )b
Z i1 , θ 0 )}
(α
(θθ 0 Z i1 ){Z
π 2 (Z
2,0ξ i )S





α1,0 + S0 (θθ T
α2,0 }T .
×q2 (ηi , Yi ){α
0 Z i1 )α
Define Ξ (t) = E{Xi (t)Ei }, which is a (2p + d + d1 ) × p matrix of functions. Denote ψ (t) =
b − ψ i 2 + op (n−1/2 ). By Lemma 2, the j th element in N1,n
Ξ, ψ
(ψ1 , · · · , ψp )T (t), then N1,n = hΞ
L
is
[j]
Pp
(t), ψb` (t) − ψ` (t)iL2 + op (n−1/2 )
Z
Pp P hΞ
Ξ[j,`] , ψk iL2
−1/2
= n
Z(s, t)ψ` (s)ψk (t)dsdt + op (n−1/2 ).
`=1
k6=`
ω` − ωk
N1,n =
Ξ
`=1 hΞ
[j,`]
11
Since Z is a Gaussian random field, n1/2 N1,n converges to a Gaussian random vector. It can
be further verified that Nn and N1,n are independent. Define
B1 = lim n × cov(N1,n ).
n→∞
(B.18)
b now follows from similar arguments to those for Theorem 2.
The asymptotic normality of Θ
12
Download