Generalized Functional Linear Models with Semiparametric Single-Index Interactions Web Appendix Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, yehuali@uga.edu Naisyin Wang Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107, nwangaa@umich.edu Raymond J. Carroll Department of Statistics, Texas A&M University, TAMU 3143, College Station, TX 77843-3143, carroll@stat.tamu.edu A. Algorithm Implementation With One-step Updates To speed up the estimation algorithm, we use a one-step update version of the iterative algorithm in Stages 1 and 2, as follows. The key is to replace the full optimization steps of minimizing (5) within the iterative updating of S(·) and Θ by a one-step Newton-Raphson algorithm. This eliminates the computational burden of full iteration within the inner loop when the estimates of S(·) and Θ are still far away from the target. As a consequence, it also speeds up the iterative process. Recall that q1 (x, y) = {y − g −1 (x)}ρ1 (x), q2 (x, y) = {y − g −1 (x)}ρ01 (x) − ρ2 (x), where ρ` (x) = {dg −1 (x)/dx}` /V {g −1 (x)}, ` = 1, 2, and Z ij,1 = Z i1 − Z j1 . When we fix the parametric component, Θ , and minimize (5) with respect to (a0j , a1j ), we use a NewtonRaphson algorithm to perform the following one-step update: !T −1 !à à ! ! à à Pn 1 1 ǎ0j b a0j T 2 α w q (η̌ , Y )(α̌ ξ ) − = ij 2 ij i T T 2 i i=1 ǎ1j b a1j θ̌θ Z ij,1 θ̌θ Z ij,1 !) à ( Pn 1 αT , × (A.1) T 2 ξ i) i=1 wij q1 (η̌ij , Yi )(α̌ θ̌θ Z ij,1 1 T T α1 + (ǎ0j + ǎ1j θ̌θ Z ij,1 )α̌ α2 }Tξ i + β̌ β Z i : all estimates on the right-hand-side of where η̌ij = {α̌ (A.1), and later (A.2) and (A.3), denote their values before the update. To enforce the P P constraint that n−1 ni=1 S(θθ TZ i1 ) = 0, we set b a0j = b a0j − n−1 n`=1 b a0` . Similarly, we update the parametric components other than θ by T −1 α b1 α ξ ξ α̌ 1 i i P P n n b 2 = α̌ α2 − a0j ξ i b a0j ξ i α j=1 i=1 wij q2 (η̌ij , Yi ) b b β β β̌ Zi Zi ξ i P P n n (A.2) × w q (η̌ , Y ) ξ a0j i . i b j=1 i=1 ij 1 ij Zi b 2 = sign(b b 2 /kb To enforce the constraint, we adjust the value of α 2 by letting α α21 ) × α α 2 k. Finally, we update θ ; ½ ¾−1 ½ ¾ P P P P n n T n n T T 2 b 2 ξ i ) Z ij,1Z ij,1 b 2 ξ i )Z Z ij,1 (A.3) a1j α a1j α θb = θ̌θ − . j=1 i=1 wij q2 (η̌ij , Yi )(b j=1 i=1 wij q1 (η̌ij , Yi )(b To adjust for the constraint on θ , we let θb = sign(θb1 ) × θb/kθbk. B. Technical Proofs We sketch the proofs of our main theoretical results. α, β , θ , S(·)} = Throughout, η{α T Θ, S(·)} denotes the expression α T θ TZ 1 )α αT η{Θ 1 ξ + S(θ 2 ξ + β Z , and the subindices “i” and “ij” of η indicate the replacement of each random variable by the corresponding observations. Provided that the clarity of the presentation is preserved, we also leave out arguments of functions to simplify certain equations. B.1 Proof of Theorem 1 For the iterative estimation procedure proposed in Section 3.1, we denote the current value of the parameter with a subscript “curr”, and the previous values with a subscript “prev”. e prev , and update S(z) by minimizing (4). We let a0 = S0 (θθ T z), For the first step, we fix Θ 0 ∗ Z i1 − z)/b, a∗1 = ba1 and define, for any z ∈ Rd1 , the a∗1 = bS00 (θθ T 0 z), denote Z i0,1 = (Z 2 multivariate kernel weight as wi (z) = [ Pn Z `1 − z)/b}]−1 H{(Z Z i1 − z)/b}, where b is H{(Z `=1 Z j1 . The updated estimate (e the bandwidth. Recall that Z ij,1 = Z i1 −Z a0,curr , e a∗1,curr )T satisfies the local estimating equation 0 = Pn e T e +e a∗1,currθeprevZ ∗i0,1 )}, Yi ] à ! 1 T . ×(e α 2,prevξ i ) T θeprevZ ∗i0,1 α prev , β prev , θ prev , (e a0,curr i=1 wi (z)q1 [ηi {e By a standard Taylor expansion, we have 0= à Pn e e α prev , β prev , θ prev , (a0 i=1 wi (z)q1 [ηi {e + ³P + T a∗1θeprevZ ∗i0,1 )}, Yi ](e αT 2,prevξ i ) n α prev , βeprev , θeprev , (a0 i=1 wi (z)q2 [ηi {e ³ × ´³ 1 T θeprevZ ∗i0,1 ´T 1 ! T θeprevZ ∗i0,1 ! 1 T θeprevZ ∗i0,1 T 2 + a∗1θeprevZ ∗i0,1 )}, Yi ](e αT 2,prevξ i ) ³e a0,curr − a0 ´ × {1 + op (1)}. e a∗1,curr − a∗1 (B.1) T T eT T e prev ) = k(e αT Denote δn∗ = b2 + {log(n)/(nbd1 )}1/2 and ∆(Θ αT prev , β prev ) − (α 0 , β 0 ) k. Define Θ, S), Yi }(α αT Z i1 = z], γ 1 (z; Θ , S) = E[q1 {ηi (Θ 2 ξ i )|Z 2 Θ, S), Yi }(α αT Z i1 = z]. V 1 (z; Θ , S) = −E[q2 {ηi (Θ 2 ξ i ) |Z By (B.1) and some standard derivation, we obtain that, uniformly for z ∈ D, Pn e e α prev , β prev , θ prev , (a0 i=1 wi (z)q1 [ηi {e = Pn αT 1,prevξ i i=1 wi (z)q1 [e T + a∗1θeprevZ ∗i0,1 )}, Yi ](e αT 2,prevξ i ) T eT + (a0 + a∗1θeprevZ ∗i0,1 )e αT αT 2,prevξ i + β prevZ i , Yi ](e 2,prevξ i ) e prev , βeprev , θ 0 , S0 ) + Op (δn∗ ), = γ 1 (z; α and Pn αT 1,prev ξ i i=1 wi (z)q1 {e T T T ∗ e e αT αT θ prevZ ∗i0,1 )e + (a0 + a∗1e 2,prev ξ i )θ prev Z i0,1 2,prev ξ i + β prev Z i , Yi }(e T P eT Z ∗i0,1 θ 0Z i1 )e αT αT αT = θeprev ni=1 wi (z)q1 {e 2,prevξ i )Z 1,prevξ i + S(θ 2,prevξ i + β prevZ i , Yi }(e hP T n 0 T e θ 0 z)θ prevZ ∗i0,1 } θT θT + 0 Z i1 ) + bS (θ 0 z) − S0 (θ i=1 wi (z){S0 (θ i T T T T T ∗ 2 e e Z Z α ξ θ α ξ θ α ξ β ×q2 {e 1,prev i + S(θ 0 z)e 2,prev i + prev i , Yi }(e 2,prev i ) ( prev i0,1 ) × {1 + op (1)} 3 T ∂ 1 ∂ e prev , βeprev , θ 0 , S0 ) = bθeprev { γ 1 + fZ (z)γγ 1 }(z; α ∂z fZ 1 (z) ∂z 1 T V 1 (z; α e prev , βe , θ 0 , S0 ) + Op (δ ∗ ) −b(1 − θe θ 0 )S 0 (θθ 0 z)V prev prev = −b(1 T V 1 (z; α e prev , βeprev , θ 0 , S0 ) − θeprevθ 0 )S 0 (θθ 0 z)V n e prev )} + Op (δ ∗ ), + Op {b∆(Θ n By similar calculations, we can derive from (B.1) that, uniformly for z ∈ D, ³e o−1 e prev , βeprev , θ 0 , S0 ) V 12 a0,curr − a0 ´ n V 1 (z; α = e prev , βeprev , θ 0 , S0 ) V 12 V 1 (z; α e a∗1,curr − a∗1 o n e prev , βeprev , θ 0 , S0 ) γ 1 (z; α × T e , θ 0 , S0 ) + Op {b∆(Θ e prev )} θ prevθ 0 )S 0 (θθ 0 z)V V 1 (z; α e prev , β −b(1 − e prev +Op (δn∗ ), where T ∂ 1 ∂ V 2 }(z, α e prev , βeprev , θ 0 , S0 ) = Op (b) V 12 = bθeprev { V 2 + fZ (z)V ∂z fZ 1 (z) ∂z 1 is an negligible term. Define V 2 (z; Θ , S) as ξi T T Θ, S), Yi }(α α2 ξ i ) −E q2 {ηi (Θ S(θθ Z i1 )ξξ i Zi ¯ ¯ ¯ ¯ ¯ Z i1 = z . ¯ ¯ By the fact that γ 1 (z; Θ 0 , S0 ) = 0 for all z, a Taylor expansion of γ 1 on α and β , and further derivations, we obtain V −1 e prev , βeprev , θ 0 , S0 ) e a0,curr − a0 = −V 1 (z; α e 1,prev − α 1,0 α T × V 2 (z; Θ 0 , S0 ) α e 2,prev − α 2,0 × {1 + op (1)} + Op (δn∗ ), (B.2) βeprev − β 0 and ∗ eT e e a∗1,curr = bS 0 (θθ T 0 z)θ prevθ 0 + Op {b∆(Θ prev )} + Op (δn ). 4 (B.3) Next, holding the link function and θ fixed at their current values, we update α and β . e curr and βecurr satisfy the estimating equations (B.4), At convergence, α 0= = Pn Pn j=1 ∗ e , θeprev , (e eT Z ∗ )}, Yi ] e w q [η {e α , β a + a θ ij 1 i curr 0j,curr curr 1j,curr prev ij,1 i=1 ξ i T ∗ ∗ e × (e a0j + e a1j θ prevZ ij,1 )ξξ i Zi Pn Pn + e α0 , β 0 , θ prev , (e a0j,curr i=1 wij q1 [η{α j=1 ³P P n n j=1 e ⊗2 ξi × e a0j,currξ i Zi e 1,curr − α 1,0 α e 2,curr − α 2,0 α βe −β curr T +e a∗1j,currθeprevZ ∗ij,1 )}, Yi ] e a0j,currξ i Zi α0 , β 0 , θ prev , (e a0j,curr i=1 wij q2 [ηij {α ξi T +e a∗1j,currθeprevZ ∗ij,1 )}, Yi ] 2 × {1 + op (1)} + Op (b ), (B.4) 0 where d⊗2 = ddT . Define ξi ¯ ¯ Θ, S), Yi } S(θθ TZ i1 )ξξ i γ 2 (z; Θ, S) = E q1 {ηi (Θ ¯Z i1 = z , Zi ⊗2 ξi ¯ ¯ T Θ, S), Yi } S(θθ Z i1 )ξξ i V 3 (z; Θ , S) = −E q2 {ηi (Θ ¯Z i1 = z . Zi Z j1 ; Θ 0 , Securr ) = By (B.2), (B.4), the fact γ 2 (z; Θ 0 , S0 ) = 0 for all z, the expansion γ 2 (Z V 2 (Z Z j1 ; Θ 0 , S0 )(e −V a0j,curr − a0j ) × {1 + op (1)}, we obtain e 1,curr − α 1,0 α o n o−1 n X X V Z Z j1 ; Θ 0 , S0 ) V 2V T /V )(Z ; Θ , S ) n−1 V 3 (Z n−1 (V e 2,curr − α 2,0 = α 1 j1 0 0 2 j j βecurr − β 0 e 1,prev − α 1,0 α × α (B.5) e 2,prev − α 2,0 × {1 + op (1)} + Op (δn∗ ). βeprev − β 0 e prev embedded within One source of additional variation that could be easily missed is the Θ 5 the expressions of b acurr , as given by (B.2) and (B.3). This is carefully taken into account in our derivations. Asymptotically, the iterations converge if the distances between the current and previous estimates of α ’s and β ’s go to zero as the iteration number goes to ∞. By (B.5) and for V 3 (Z Z 1 ; Θ 0 , S0 )}−1 E{(V V 2V T V 1 )(Z Z 1 ; Θ 0 , S0 )} has eigenvalues a large n, this occurs when E{V 2 /V strictly less than 1. By the Cauchy-Schwartz inequality, we can show that, for all z ∈ D, V 2V T V 1 × V 3 )(z; Θ , S), where equality holds only if the order of the products (V 2 )(z; Θ , S) ≤ (V of nonlinear functions and the conditional expectations are exchangeable, which does not hold in our model. Consequently, this establishes the asymptotic convergence property for the iteration procedure. As a result, at the limit, ³ ´ ∗ e α e 1,curr − α1,0 , α e 2,curr − α e 2,0 , β β curr − 0 = Op (δn ). (B.6) The constraints on S(·) and α 2 guarantee that the estimators converge to the uniquely defined parameters. Finally, using the (e α , βe) at convergence, we update θ by solving the local estimating equation 0 = n−1 Pn Pn j=1 e e α , β , θ curr , (e a0j i=1 wij q1 [ηij {e T +e a∗1j θecurrZ ∗ij,1 )}, Yi ](e αT a∗1j Z ∗ij,1 . 2 ξ i )e We expand the right hand side of the equation as before, 0 = n−1 = n−1 Pn Pn j=1 αT 1 ξi i=1 wij q1 {e Pn Pn +n−1 j=1 αT 1 ξi i=1 wij q1 {e Pn Pn j=1 T eT + (e a0j + e a∗1j θecurrZ ∗ij,1 )e αT αT a∗1j Z ∗ij,1 2 ξ i + β Z i , Yi }(e 2 ξ i )e eT + S(θθ T αT αT a∗1j Z ∗ij,1 0 Z i1 )e 2 ξ i + β Z i , Yi }(e 2 ξ i )e αT 1 ξi i=1 wij q2 {e 2 ∗ eT + S(θθ T αT αT a1j Z ∗ij,1 0 Z i1 )e 2 ξ i + β Z i , Yi }(e 2 ξ i) e T θ currZ ∗ij,1 ) × {1 + op (1)} ×(e a0j − a0j + a0j − a0i + e a∗1j e P ∂ 1 ∂ Z j1 )}γ1 (Z Z j1 ; α e , βe, θ 0 , S0 ) + Op (bδn∗ ) a∗1j { + = n−1 nj=1 be fZ 1 (Z Z j1 ) ∂z ∂z fZ 1 (Z P ∂ 1 ∂ Z j1 )}V V 1 (Z Z j1 ; α e , βe, θ 0 , S0 ) a0j − a0j ){ + a∗1j (e +n−1 nj=1 be fZ (Z Z j1 ) ∂z 1 ∂z fZ 1 (Z P V 1 (Z Z j1 ; α e , βe, θ 0 , S0 )θθ 0 a∗ {−S 0 (θθ TZ j1 )}V −n−1 n be j=1 −n−1 1j 0 0 Pn Z j1 ; α e , βe, θ 0 , S0 )θecurr a∗1j )2V 1 (Z j=1 (e 6 + op (b2 δn∗ ). T By (B.3), we obtain that θecurr = (θeprevθ 0 )−1θ 0 + Op (b−1 δn∗ ). By standardizing θecurr and correcting the sign of the first entry, we have θecurr − θ 0 = Op (b−1 δn∗ ), (B.7) which converges to 0 by Condition (C2.1). B.2 Proof of Theorem 2 b bT Following similar notation as in Carroll et al. (1997), we denote Ui = θ T 0 Z i1 , Ui = θ Z i1 For P the refined estimator, the kernel weight is defined as wi (u) = K{(Ubi − u)/h}/ n`=1 K{(Ub` − b ) = kΘ b − Θ 0 k. u)/h} for u ∈ R. Denote δn = {log(n)/(nh)}1/2 , ∆(Θ b prev . For First, we fix the value of Θ at the value from the previous iteration, denoted by Θ Z i1 − z)/h and (b any z ∈ D ⊂ Rd1 , as in the previous subsection, we let Z ∗i0,1 = (Z a0,curr , b a∗1,curr ) be the updated estimators for a0 and a∗1 which solves the local estimation equation (B.8), h i P T T 0 = ni=1 wi (θbprev z)q1 ηi {b α prev , βbprev , θbprev , (b a0,curr + b a∗1,currθbprevZ ∗i0,1 )}, Yi à ! 1 ×(b αT T 2,prevξ i ) θbprevZ ∗i0,1 à ! 1 Pn T b αT = × {1 + op (1)} T 2,0ξ i ) i=1 wi (θ prev z)q1 (ηi , Yi )(α θbprevZ ∗i0,1 " à ! 1 Pn T T 2 ∗ b z)q2 (ηi , Yi )(α bT Z ∗ − S(θθ TZ i1 )} α + w ( θ ξ ) {b a + b a θ T i 0,curr i prev 2,0 1,curr prev i0,1 0 i=1 θbprevZ ∗i0,1 T à ! b − ξ α α 1,prev 1,0 i 1 Pn T T T b α + w ( θ z)q (η , Y )(α ξ ) α b α θ Z ξ − S(θ )ξ T 2 i i i1 2,prev 2,0 2,0 i i 0 i=1 i prev θbprevZ ∗i0,1 Zi βbprev − β 0 (B.8) ×{1 + op (1)}. T For the observations with |θbprevZ i1 − u| ≤ h, we obtain T b a0,curr + b a∗1,currθbprevZ ∗i0,1 − S(θθ T 0 Z i1 ) ! !T à à 1 b a0,curr − a0 b ZT + S 0 (θθ T = T i0,1 (θ prev − θ 0 ) 0 z)Z ∗ ∗ ∗ b b a1,curr − a1 θ prevZ i0,1 T 1 − S (2) (θb0 z)(θbprevZ i0,1 )2 + O(h3 ) + op (kθbprev − θ 0 k). 2 7 Define 2 T αT θ Z i1 V1 (z; θ ) = E{ρ2 (ηi )(α 2,0ξ i ) |θ ξi T T α2,0ξ i ) S(θθ 0 Z i1 )ξξ i π i1 = ρ2 (ηi )(α Zi = θ T z}, 2 αT , π i2 = ρ2 (ηi )(α 2,0ξ i ) Z i1 , (B.9) π i1 |θθ TZ i1 = θ T z), πb2 (z; θ ) = E(π π i2 |θθ TZ i1 = θ T z), πb1 (z; θ ) = E(π " # πb1 (z; θ ) V2 (z; θ ) = . S (1) (θθ T z){b π 2 (z; θ ) − V1 (z; θ )z} By solving the local estimating equation in (B.8), we obtain the following results, uniformly for all u = θ T 0 z with z ∈ D: b prev − Θ 0 ) + R0,n (z) b a0,curr − a0 = −(V1−1 V2T )(z; θ 0 ) × (Θ σ2 2 3 b + K S (2) (θθ T 0 z)h + Op (h ) + op {∆(Θ prev )}, 2 b prev )} + Op (h3 + δn ), b a∗1,curr − a∗1 = Op {(h + δn ) × ∆(Θ (B.10) where, by letting ²i = q1 {ηi , Yi }, P T αT R0,n (z) = V1−1 (z; θbprev ) ni=1 wi (θbprev z)(α 2,0ξ i )²i . (B.11) b prev embedded with the estiAgain, the expressions in (B.10) allow us to account for the Θ b by solving b and Sb(1) (·) and update Θ mated a’s in equation (B.13) below. Next, we fix S(·) the estimating equation (B.12), 0 = n−1 Pn Pn j=1 T +b a1j,currθbcurrZ ij,1 )}, Yi ] ξi T b (b a0j,curr + b a1j,currθ currZ ij,1 )ξξ i (B.12) . Zi (b α T ξ )b a Z b b α curr , β curr , θ curr , (b a0j,curr i=1 wij q1 [ηi {b × 2,curr i 1j,curr ij,1 T For observations with |θbcurrZ ij,1 | ≤ h, we have T θ currZ ij,1 − S(θθ 0Z i1 ) b a0j,curr + b a1j,currb ! !à à 1 b a − a 0j,curr 0j T Z ij,1 (θbcurr − θ 0 ) + = S (1) (θθ 0Z j1 )Z T b a1j,curr − a1j θbcurrZ ij,1 1 2 3 b bT − S (2) (θθ T 0 Z j1 )(θ prevZ ij,1 ) + Op (h ) + op (kθ curr − θ 0 k). 2 8 Z 1 |θθ TZ 1 = θ T z). The right hand side of equation (B.12) can be Define νZ 1 (z; θ ) = E(Z rewritten as −1 0 = n i=1 ²i Pn Pn n ξi T S(θθ 0 Z i1 )ξξ i Zi T (1) T α2,0ξ i )S (θθ 0 Z i1 ){Z Z i1 − νZ 1 (Z Z i1 , θ 0 )} (α × {1 + op (1)} b curr − Θ 0 ) − V2 (Z Z j1 ; θ 0 )(Θ Z j1 ; θ 0 )(b − V3 (Z a0j,curr − a0j ) o 2 2 σK h (2) T b prev )}, (B.13) Z j1 ; θ 0 ) {1 + op (1)} + op (n−1/2 ) + op {∆(Θ + S (θθ 0 Z j1 )V2 (Z 2 +n−1 j=1 [` ,`2 ] where V3 (z; θ ) is a symmetric matrix whose (`1 , `2 )th block is denoted by V3 1 for `1 , `2 = 1, 2, and ξi [1,1] T V3 (z; θ ) = E ρ2 (ηi ) S(θθ 0 Z i1 )ξξ i Zi ⊗2 ¯ ¯ ¯θ TZ i1 = θ T z , ¯ [1,2] π i1Z T θ TZ i1 = θ T z) − πb1 (z; θ )zT }, (z; θ ) = S (1) (θθ T 0 z){E(π i1 |θ [2,1] (z; θ ) = {V3 V3 [1,2] (z; θ )}T , h [2,2] T T 2 2 αT θ Z i1 = θ T z} − πb2 (z; θ )zT V3 (z; θ ) = {S (1) (θθ T z)} E{ρ2 (ηi )(α 0 2,0ξ i ) Z i1Z i1 |θ i T −zb πT (z; θ ) + V (z; θ )zz . 1 2 V3 By plugging (B.10) into (B.13) we have b curr − Θ 0 = A− Nn + A− C(Θ b prev − Θ 0 ) + op (n−1/2 ) + op (kΘ b prev − Θ 0 k), Θ (B.14) where T ξ |θθ T αT Z i1 ; θ 0 )(α αT ξ i − V1−1 (Z 0 Z 1 = θ 0 Z i1 } 2,0ξ )ξ 2,0ξ i )E{ρ2 (η)(α S(θθ TZ )ξξ − V −1 (Z T Pn ξ |θθ T θT αT Z i1 ; θ 0 )(α αT 1 0 Z 1 = θ 0 Z i1 } 0 Z 1 )ξ 2,0ξ )S(θ 2,0ξ i )E{ρ2 (η)(α 0 i1 i −1 Nn = n i=1 ²i T Z |θθ T αT Z i1 ; θ 0 )(α αT Z i − V1−1 (Z 0 Z 1 = θ 0 Z i1 } 2,0ξ )Z 2,0ξ i )E{ρ2 (η)(α (1) T Z i1 − V1−1 (Z Z i1;θθ 0 )b π 2 (Z Z i1 , θ 0 )} αT (θθ 0 Z i1 ){Z (α 2,0ξ i )S )# " ( T (1) T b b θ π π S (θ Z )b π (b π − V Z ) 1 0 1 1 2 1 1 1 , C = E V1−1 S (1) (θθ 0Z 1 )(b π 2 − V1Z 1 )b πT {S (1) (θθ 0Z 1 )}2 (b π 2 − V1Z 1 )(b π 2 − V1Z 1 )T 1 Z 1 ; θ 0 )}. A = E{V3 (Z , (B.15) 9 Define ¤T £ 1/2 ξ }T , Z T , g1 = ρ2 (η) ξ T , {S(θθ T 0 Z 1 )ξ g = (g1T , g2T )T , 1/2 αT Z 1, g2 = S (1) (θθ T 0 Z 1 )ρ2 (η)(α 2,0ξ )Z Z 1 ; θ 0 ), S (1) (θθ T Z 1 ; θ 0 )}T , % = {b πT πT 1 (Z 0 Z 1 )b 2 (Z and let D = A − C = E(ggT ) − E(V1−1%% T ). It can be shown that both A and C are positive semi-definite matrices with rank 2p + 2d − 1. As in the previous subsection, an asymptotic convergence property is achieved provided that the eigenvalues of A− C are strictly less than 1. This can be established by the Cauchy-Schwartz inequality so that D is positive semidefinite with the same rank as A and C, and the eigenvalues of A− C are strictly less than b curr − Θ 0 = A− Nn + op (n−1/2 ). By the Central 1. Consequently, at the end of iterations, Θ Limit Theorem, Nn is asymptotically normal. Defining B = n × cov(Nn ), (B.16) b follows immediately. Finally, the asymptotic normal the asymptotic normality result of Θ b distribution of S(u) follows from (B.10). B.3 Proof of Theorem 3 Proof of Lemma 1: The first result in Lemma 1 that kψbk − ψk k = Op {h2$ + (nh$ )−1/2 } was established by Hall, et al. (2006) for the case of fixed m. With the smoothing parameter h$ in the range defined in (C4.4), we have kψbk − ψk k = Op (n−1/3 ). It is easily shown that ξbik − ξik = ζ1,ik + ζ2,ik + ζ3,ik , (B.17) Rb P Pm where ζ1,ik = {(b−a)/m} m j=1 Xi (tij )ψk (tij )− a Xi (t)ψk (t)dt, ζ2,ik = {(b−a)/m} j=1 Uij ψk (tij ), Pm and ζ3,ik = {(b − a)/m} j=1 Wij {ψbk (tij ) − ψk (tij )}. It can then be established that, with Xi (t) and ψk (t) being continuously differentiable, ζ1,ik , representing the numerical integration error, is of order Op (m−1 ), that ζ2,ik = Op (m−1/2 ). Lemma 1 now follows because ζ3,ik = Op (n−1/3 ) by the Cauchy-Schwartz inequality, since |ζ3,ik | ≤ ³b − aP m 2 j=1 Wij ´1/2 h b − a P m m −1/3 b = Op (kψk − ψk k) = Op (n ). 10 m b j=1 {ψk (tij ) − ψk (tij )}2 i1/2 Proof of Theorem 3: The proof of Theorem 3 essentially follows the same lines as those for Theorem 1 but with extra effort on keeping track of additional terms due to ξbi − ξ i , as given by equation (B.17). It can be shown that, for equations (B.2), (B.3) and (B.6), the effect due to ξ i being estimated is an additional term of order Op (m−1 + n−1/3 ) on the right hand side of the equations. For equation (B.7), this results in an additional term of order e remains Op (b−1 m−1 + b−1 n−1/3 ). Under the bandwidth assumption (C4.4), the estimator Θ consistent. B.4 Proof of Theorem 4 The proof of Theorem 4 follows the same lines as those for Theorem 2, except that the term ²i in equations (B.11)-(B.13) should be replaced by T b b αT θT αT e ²i = q1 {α 1,0ξ i + S0 (θ 0 Z i1 )α 2,0ξ i + β 0 Z i , Yi } α1,0 + S0 (θθ T α2,0 }T (b ξ i − ξ i ). = ²i + q2 (ηi , Yi ){α 0 Z i1 )α Let A and C be as defined in the Proof of Theorem 2. Equation (B.14) becomes b curr − Θ 0 = A− N b prev − Θ 0 ) + op (n−1/2 ) + op (kΘ b prev − Θ 0 k), en + A− C(Θ Θ en = Nn + N1,n , N1,n = n−1 Pn Ei (ξbi − ξ i ) and where N i=1 Ei = T Z i1 ; θ 0 )(α αT αT ξ |θθ T ξ i − V1−1 (Z 2,0ξ i )E{ρ2 (η)(α 2,0ξ )ξ 0 Z 1 = θ 0 Z i1 } T ξ i − V1−1 (Z Z i1 ; θ 0 )(α αT αT θT ξ |θθ T S(θθ T 0 Z i1 )ξ 2,0ξ i )E{ρ2 (η)(α 2,0ξ )S(θ 0 Z 1 )ξ 0 Z 1 = θ 0 Z i1 } T Z i1 ; θ 0 )(α αT αT Z |θθ T Z i − V1−1 (Z 2,0ξ i )E{ρ2 (η)(α 2,0ξ )Z 0 Z 1 = θ 0 Z i1 } (1) T αT Z i1 − V1−1 (Z Z i1;θθ 0 )b Z i1 , θ 0 )} (α (θθ 0 Z i1 ){Z π 2 (Z 2,0ξ i )S α1,0 + S0 (θθ T α2,0 }T . ×q2 (ηi , Yi ){α 0 Z i1 )α Define Ξ (t) = E{Xi (t)Ei }, which is a (2p + d + d1 ) × p matrix of functions. Denote ψ (t) = b − ψ i 2 + op (n−1/2 ). By Lemma 2, the j th element in N1,n Ξ, ψ (ψ1 , · · · , ψp )T (t), then N1,n = hΞ L is [j] Pp (t), ψb` (t) − ψ` (t)iL2 + op (n−1/2 ) Z Pp P hΞ Ξ[j,`] , ψk iL2 −1/2 = n Z(s, t)ψ` (s)ψk (t)dsdt + op (n−1/2 ). `=1 k6=` ω` − ωk N1,n = Ξ `=1 hΞ [j,`] 11 Since Z is a Gaussian random field, n1/2 N1,n converges to a Gaussian random vector. It can be further verified that Nn and N1,n are independent. Define B1 = lim n × cov(N1,n ). n→∞ (B.18) b now follows from similar arguments to those for Theorem 2. The asymptotic normality of Θ 12