Supplemental Paper for: Anti-Proportional Bandwidth Selection for Smoothing (Non-)Sparse Functional Data with Covariate Adjustments Dominik Liebl Institute for Financial Economics and Statistics University of Bonn Adenauerallee 24-42 53113 Bonn, Germany Outline: Section 1 contains the proofs of the Theorems 3.1 and 3.2. Further discussions and supplementary results on the bandwidth results and the rule-of-thumb procedure used in the application are found in Section 2. The data sources are described in detail in Section 3. 1 Proofs 1.1 Proof of Theorem 3.1 Proof of Theorem 3.1, part (i): Let x and z be interior points of supp(fXZ ) and define Hµ = diag(h2µ,X , h2µ,Z ), X = (X11 , . . . , XnT )> , and Z = (Z1 , . . . , ZT )> . Taylor-expansion of µ around (x, z), the conditional bias of the estimator µ̂(x, z; H) can be written as −1 1 −1 > E(µ̂(x, z; Hµ ) − µ(x, z)|X, Z) = u> × 1 (T n) [1, Xx , Zz ] Wµ,xz [1, Xx , Zz ] 2 ×(T n)−1 [1, Xx , Zz ]> Wµ,xz (Qµ (x, z) + Rµ (x, z)) , where Qµ (x, z) is a T n × 1 vector with typical elements (Xit − x, Zt − z)Hµ (x, z)(Xit − x, Zt − z)> ∈ R (24) with Hµ (x, z) being the Hessian matrix of the regression function µ(x, z). The T n × 1 vector Rµ (x, z) holds the remainder terms o (Xit − x, Zt − z)(Xit − x, Zt − z)> ∈ R. Next we derive asymptotic approximations for the 3 × 3 matrix −1 (T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ] and the 3×1 matrix (T n)−1 [1, Xx , Zz ]> Wµ,xz Qµ (x, z) of the right hand side of Eq. (24). Using standard procedures from kernel density estimation it is easy to derive that (T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ] = fXZ (x, z) + op (1) ν2 (Kµ )Hµ DfXZ (x, z) + op (Hµ 1) ν2 (Kµ )DfXZ (x, z) Hµ + op (1 Hµ ) , ν2 (Kµ )Hµ fXZ (x, z) + op (Hµ ) > > where 1 = (1, 1)> and DfXZ (x, z) is the vector of first order partial derivatives (i.e., the gradient) of the pdf fXZ at (x, z). Inversion of the above block matrix yields (T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ] −1 (fXZ (x, z)) + op (1) −DfXZ (x, z) (fXZ (x, z))−2 + op (1) −1 = (25) −DfXZ (x, z)> (fXZ (x, z))−2 −1 (ν2 (Kµ )Hµ fXZ (x, z)) + op (1> ) + op (Hµ ) . The 3 × 1 matrix (T n)−1 [1, Xx , Zz ]> Wµ,xz Qµ (x, z) can be partitioned as following: upper element , (T n)−1 [1, Xx , Zz ]> Wxz Qµ (x, z) = lower bloc where the 1×1 dimensional upper element can be approximated by a scalar variable equal to (T n)−1 X Kµ,h (Xit − x, Zt − z)(Xit − x, Zt − z)Hµ (x, z)(Xit − x, Zt − z)> (26) it = (ν2 (κ))2 tr {Hµ Hµ (x, z)} fXZ (x, z) + op (tr(Hµ )) and the 2 × 1 dimensional lower bloc is equal to (T n)−1 X Kµ,h (Xit − x, Zt − z)(Xit − x, Zt − z)Hµ (x, z)(Xit − x, Zt − z)> × it × (Xit − x, Zt − z)> = Op (H3/2 µ 1). 2 (27) Plugging in the approximations of Eqs. (25)-(27) into the first summand of the conditional bias expression in Eq. (24) leads to the following expression 1 > u1 ((T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ])−1 (T n)−1 [1, Xx , Zz ]> Wµ,xz Qµ (x, z) = 2 1 = (ν2 (κ))2 tr {Hµ Hµ (x, z)} + op (tr(Hµ )). 2 Furthermore, it is easily seen that the second summand of the conditional bias expression in Eq. (24), which holds the remainder term, is given by 1 > u1 ((T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ])−1 (T n)−1 [1, Xx , Zz ]> Wµ,xz Rµ (x, z) = op (tr(Hµ )). 2 Summation of the two latter expressions yields the asymptotic approximation of the conditional bias E(µ̂(x, z; Hµ ) − µ(x, z)|X, Z) = 1 (ν2 (κ))2 tr {Hµ Hµ (x, z)} + op (tr(Hµ )). 2 This is our bias statement of Theorem 3.1 part (i). Proof of Theorem 3.1, part (ii): In the following we derive the conditional variance of the local linear estimator V(µ̂(x, z; Hµ )|X, Z) = > −1 =u> [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z) Wµ,xz [1, Xx , Zz ] 1 ([1, Xx , Zz ] Wµ,xz [1, Xx , Zz ]) ([1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ])−1 u1 (28) −1 > −1 =u> ((T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z) Wµ,xz [1, Xx , Zz ]) 1 ((T n) [1, Xx , Zz ] Wµ,xz [1, Xx , Zz ]) ((T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ])−1 u1 , where Cov(Y|X, Z) is the T n × T n matrix with typical elements Cov(Yit , Yjs |Xit , Xjs , Zt , Zs ) = γ|t−s| ((Xit , Zt ), (Xjs , Zs )) + σ2 1(i = j and t = s) with 1(.) being the indicator function. We begin with analyzing the 3 × 3 matrix (T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z) Wµ,xz [1, Xx , Zz ] using the following three Lemmas 1.1-1.3. 3 Lemma 1.1 The upper-left scalar (block) of the matrix (T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z)Wµ,xz [1, Xx , Zz ] is given by (T n)−2 1> Wµ,xz Cov(Y|X, Z)Wµ,xz 1 = (T n)−1 fXZ (x, z)|Hµ |−1/2 R(Kµ ) γ(x, x, z) + σ2 (1 + Op (tr(Hµ1/2 ))) n−1 γ(x, x, z) −1 −1 2 + T (fXZ (x, z)) hµ,Z R(κ) + cr (1 + Op (tr(H 1/2 ))) n fZ (z) = Op ((T n)−1 |Hµ |−1/2 ) + Op (T −1 h−1 µ,Z ). Lemma 1.2 The 1 × 2 dimensional upper-right block of the matrix (T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z)Wµ,xz [1, Xx , Zz ] is given by (X11 − x, Z1 − z) .. (T n)−2 1> Wµ,xz Cov(Y|X, Z)Wµ,xz . (XnT − x, ZT − z) 2 1/2 = (T n)−1 fXZ (x, z)|Hµ |−1/2 (1> H1/2 µ )R(Kµ ) γ(x, x, z) + σ (1 + Op (tr(Hµ ))) γ(x, x, z) n−1 −1 −1 2 > 1/2 hµ,Z R(κ) + cr (1 + Op (tr(Hµ1/2 ))) + T (fXZ (x, z)) (1 Hµ ) n fZ (z) −1 > 1/2 −1 = Op ((T n)−1 |Hµ |−1/2 (1> H1/2 (1 Hµ )hµ,Z ). µ )) + Op (T The 2 × 1 dimensional lower-left block of the matrix (T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z)Wµ,xz [1, Xx , Zz ] is simply the transposed version of this result. Lemma 1.3 The 2 × 2 lower-right block of the matrix (T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z)Wµ,xz [1, Xx , Zz ] is given by (T n)−2 ((X11 − x), (Z1 − z))> , . . . , ((XnT − x), (ZT − z)> ) × (X11 − x, Z1 − z) .. ×Wµ,xz Cov(Y|X, Z)Wµ,xz . (XnT − x, ZT − z) = (T n)−1 fXZ (x, z)|Hµ |−1/2 Hµ R(Kµ ) γ(x, x, z) + σ2 (1 + Op (tr(Hµ1/2 ))) n−1 γ(x, x, z) 1/2 + T −1 (fXZ (x, z))2 Hµ h−1 R(κ) + c r (1 + Op (tr(Hµ ))) µ,Z n fZ (z) = Op ((T n)−1 |Hµ |−1/2 Hµ ) + Op (T −1 Hµ h−1 µ,Z ). 4 Using the approximations for the bloc-elements of the matrix (T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z)Wµ,xz [1, Xx , Zz ], given by the Lemmas 1.1-1.3, and −1 the approximation for the matrix (T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ] , given in (25), we can approximate the conditional variance of the bivariate local linear estimator, given in (28). Some tedious yet straightforward matrix algebra leads to V(µ̂(x, z; Hµ )|X, Z) = ( (T n)−1 |Hµ |−1/2 +T −1 n−1 n R(Kµ ) γ(x, x, z) + σ2 fXZ (x, z) γ(x, x, z) h−1 µ,Z R(κ) fZ (z) ) (1 + op (1)) + cr (1 + op (1)) , which is asymptotically equivalent to our variance statement of Theorem 3.1 part (ii). Next we proof Lemma 1.1; the proofs of Lemmas 1.2 and 1.3 can be done correspondingly. To show Lemma 1.1 it will be convenient to split the sum such that (T n)−2 1> Wµ,xz Cov(Y|X, Z)Wµ,xz 1 = s1 + s2 + s3 . Using standard procedures from kernel density estimation leads to s1 = (T n)−2 X (Kµ,h (Xit − x, Zt − z))2 V(Yit |X, Z) (29) it |Hµ |−1/2 fXZ (x, z)R(Kµ ) γ(x, x, z) + σ2 + O((T n)−1 |Hµ |−1/2 tr(H1/2 µ )), XX s2 = (T n)−2 Kµ,h (Xit − x, Zt − z) Cov(Yit , Yjs |X, Z) Kµ,h (Xjs − x, Zs − z) = (T n) −1 ij = T −1 (fXZ (x, z))2 cr + Op (T −1 tr(Hµ1/2 )) XX −1 −1 −1 2 s3 = (T n)−2 h−1 µ,X κ(hµ,X (Xit − x))(hµ,Z κ(hµ,Z (Zt − z))) Cov(Yit , Yjt |X, Z)× ij i6=j (30) ts t6=s (31) t −1 × h−1 µ,X κ(hµ,X (Xjt − x)) n−1 γ(x, x, z) = T −1 (fXZ (x, z))2 h−1 R(κ) + Op (T −1 tr(Hµ1/2 )), µ,Z n fZ (z) where |cr | ≤ 2cγ r 1−r < ∞. Summing up (29)-(30) leads to the result in Lemma 1.1. Lemmas 1/2 1.2 and 1.3 differ from Lemma 1.1 only with respect to the additional factors 1> Hµ and Hµ . These come in due to the usual substitution step for the additional data parts (Xit − x, Zt − z). 5 1.2 Proof of Theorem 3.2 The proof of Theorem 3.2 follows the same arguments as in the proof of Theorem 3.1. The only additional issue we need to consider, is that we do not directly observe the dependent variables, namely, the true theoretical raw covariances Cijt as defined in Eq. (4), but only their empirical versions Ĉijt as defined in Eq. (9). In the following we show that this additional estimation error is asymptotically negligible in comparison to the approximation error of γ̂ derived under the usage of the theoretical raw covariates Cijt as given in Eq. (20) of Theorem 3.3. Observe that we can expand Ĉijt as following: Ĉijt = Cijt + (Yit − µ(Xit , Zt ))(µ(Xjt , Zt ) − µ̂(Xjt , Zt )) + (Yjt − µ(Xjt , Zt ))(µ(Xit , Zt ) − µ̂(Xit , Zt )) + (µ(Xit , Zt ) − µ̂(Xit , Zt ))(µ(Xjt , Zt ) − µ̂(Xjt , Zt )). Further, it follows from Eq. (19) in Theorem 3.3 that O (T n)−2/3 if 0 ≤ θ ≤ 1/5 p µ̂(Xit , Zt ; hµ,X,opt , hµ,Z,opt ) − µ(Xit , Zt ) X, Z = Op T −4/5 if θ > 1/5, for all i and t, where hµ,X,opt and hµ,Z,opt denote the θ-specific AMISE optimal bandwidth choices as defined in Eqs. (13), (15), (34), and (35). Therefore, under our setup we have that O (T n)−2/3 if 0 ≤ θ ≤ 1/5 p Ĉijt − Cijt X, Z = Op T −4/5 if θ > 1/5, which is of an order of magnitude smaller than the approximation error of γ̂ derived under the usage of the theoretical raw covariates Cijt as given in Eq. (20) of Theorem 3.3. 2 2.1 Further discussions AMISE. I optimal bandwidth selection The following expressions are derived under the hypothesis that the first variance terms in parts (ii) of Theorems 3.1 and 3.2 are dominating. The AMISE. I expression for the local 6 linear estimator µ̂ is given by −1 AMISE. Iµ̂ (hµ,X , hµ,Z ) = (T n)−1 h−1 µ,X hµ,Z R(Kµ ) Qµ,1 + 1 + (ν2 (Kµ ))2 h4µ,X Iµ,XX + 2 h2µ,X h2µ,Z Iµ,XZ + h4µ,Z Iµ,ZZ , 4 where Iµ,ZZ Iµ,XZ R (γ(x, x, z) + σ2 ) d(x, z), 2 R = SXZ µ(2,0) (x, z) wµ d(x, z), 2 R = SXZ µ(0,2) (x, z) wµ d(x, z), and R = SXZ µ(2,0) (x, z)µ(0,2) (x, z) wµ d(x, z). Qµ,1 = Iµ,XX (32) SXZ This is a known expression for the AMISE of a two-dimensional local linear estimator with a diagonal bandwidth matrix (see, e.g., Herrmann et al. (1995)1 ) and can be derived using the formulas in Wand and Jones (1994)2 . The AMISE. I expression for the local linear estimator γ̂ is less common as it is based on two bandwidths, which leads to −1 AMISE. Iγ̂ (hγ,X , hγ,Z ) = (T N )−1 h−2 γ,X hγ,Z R(Kγ ) Qγ,1 + 1 + (ν2 (Kγ ))2 2 h4γ,X (Iγ,X1 X1 + Iγ,X1 X2 ) + 4 h2γ,X h2γ,Z Iγ,X1 Z + h4γ,Z Iγ,ZZ , 4 (33) where Qγ,1 = Iγ,X1 X1 = Iγ,X1 X2 = Iγ,X1 Z = Iγ,ZZ = R (γ̃((x1 , x2 ), (x1 , x2 ), z) + σε2 (x1 , x2 , z)) d(x1 , x2 , z) 2 R (2,0,0) γ (x , x , z) wγ d(x1 , x2 , z), 1 2 S R XXZ (2,0,0) (0,2,0) γ (x , x , z)γ (x , x , z) wγ d(x1 , x2 , z), 1 2 1 2 S R XXZ (2,0,0) γ (x1 , x2 , z)γ (0,0,2) (x1 , x2 , z) wγ d(x1 , x2 , z), and SXXZ 2 R γ (0,0,2) (x1 , x2 , z) wγ d(x1 , x2 , z). SXXZ SXXZ Equation (33) can be derived again using the formulas in Wand and Jones (1994)3 with the further simplifications that Iγ,X1 X1 = Iγ,X2 X2 , Iγ,X1 X2 = Iγ,X2 X1 , and Iγ,X1 Z = Iγ,X2 Z 1 Herrmann, E., J. Engel, M. Wand, and T. Gasser (1995). A bandwidth selector for bi- variate kernel regression. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1), 171180. 2 Wand, M. and M. Jones (1994). Multivariate plug-in bandwidth selection. Computational Statistics 9 (2), 97116. 3 Wand, M. and M. Jones (1994). Multivariate plug-in bandwidth selection. Computational Statistics 9 (2), 97116. 7 due to the symmetry of the covariance function, where the expressions Iγ,X2 X2 , Iγ,X2 X1 , and Iγ,X2 Z are defined in correspondence with those above. The AMISE. I expressions (32) and (33) allow us to analytically derive the AMISE. I optimal bandwidth pairs. For the mean function we have hµ,X,AMISE.I = hµ,Z,AMISE.I 1/6 3/4 R(Kµ ) Qµ,1 Iµ,ZZ 2 h 1/2 Iµ,XX 1/2 Iµ,ZZ T n (ν2 (Kµ )) 1/4 Iµ,XX hµ,X,AMISE.I = Iµ,ZZ + Iµ,XZ i 3/4 Iµ,XX (34) (35) which corresponds to the result in Herrmann et al. (1995)4 . The AMISE. I optimal bandwidths for the covariance function are given by hγ,X,AMISE.I hγ,Z,AMISE.I where CI = √ 1/7 3/2 2 Iγ,ZZ = T N (ν2 (Kγ ))2 2 (ν2 (Kγ ))2 Iγ,X1 Z + CI (CI − Iγ,X1 Z )3/2 CI − Iγ,X1 Z 1/2 = hγ,X,AMISE.I , 2 Iγ,ZZ R(Kγ ) Qγ,1 4 2 Iγ,X + 4 (Iγ,X1 X1 + Iγ,X1 X2 ) Iγ,ZZ 1Z 1/2 (36) (37) . Obviously, the expressions in (36) and (37) are much less readable than those in (34) and (35). This is the burden of a two times higher dimension when estimating γ instead of µ. 2.2 AMISE. II optimal bandwidth selection The AMISE. II expression of the three-dimensional local linear estimator γ̂ is given by 1st Order 2nd Order }| { z }| { z −1 −1 −1 h R(K ) Q + T h R(κ) Q + (38) AMISE. IIγ̂ (hγ,X , hγ,Z ) = (T N )−1 h−2 γ γ,1 γ,2 γ,X γ,Z γ,Z 1 + (ν2 (Kγ ))2 2 h4γ,X (Iγ,X1 X1 + IX1 X2 ) + 4 h2γ,X h2γ,Z Iγ,X1 Z + h4γ,Z Iγ,ZZ , 4 {z } | {z } | {z } | 3rd Order where Qγ,2 = R SXXZ 2nd Order 1st Order γ̃((x1 , x2 ), (x1 , x2 ), z) fXX (x1 , x2 ) d(x1 , x2 , z) and all other quantities are de- fined in the preceding section below of Eq. (33). 4 Herrmann, E., J. Engel, M. Wand, and T. Gasser (1995). A bandwidth selector for bi- variate kernel regression. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1), 171180. 8 The lowest possible AMISE value under the AMISE. II scenario can be achieved if there exists a X-bandwidth which, first, allows us to profit from the (partial) annulment of the classical bias-variance tradeoff, but, second, assures that the AMISE. II scenario remains maintained. The first requirement is achieved if the X-bandwidth is of a smaller order of magnitude than the Z-bandwidth, i.e., if hγ,X = o(hγ,Z ). This restriction makes those bias components that depend on hγ,X asymptotically negligible, since it implies that h2γ,X h2γ,Z = o(h4γ,Z ) and therefore that h4γ,X = o(h2γ,X h2γ,Z ). The latter leads to the order relations between the third, fourth, and fifth AMISE. II term as indicated in Eq. (38). The second requirement is achieved if the X-bandwidth does not converge to zero too fast, namely if (N h2γ,X )−1 = o(1), which implies the order relation between the first two AMISE. II terms as indicated in Eq. (38). 2.3 Global polynomial fits We suggest approximating the unknown quantities of the AMISE. I optimal bandwidth expressions in Eqs. (34), (35), (36), and (37) and those of the AMISE. II optimal bandwidth expressions in Eqs. (15), (16), (13), and (14) using five different global polynomial models of order four referred to as: µpoly , γpoly , γ̃poly , [γ(x, x, z) + σ2 ]poly , and [γ̃((x1 , x2 ), (x1 , x2 ), z) + σε2 (x1 , x2 , z)]poly . Given estimates of these polynomial models, allows us to approximate the unknown quantities Iµ,XX , 9 Iµ,XZ , Iµ,ZZ , Qµ,1 , Qµ,2 , Iγ,X1 X1 , Iγ,X1 Z , Iγ,ZZ , Qγ,1 , and Qγ,2 by the empirical versions of Z (2,0) Iµpoly ,XX = (µpoly (x, z))2 wµ d(x, z), Z SXZ (2,0) (0,2) Iµpoly ,XZ = µpoly (x, z)µpoly (x, z) wµ d(x, z), Z SXZ (0,2) Iµpoly ,ZZ = (µpoly (x, z))2 wµ d(x, z), Z SXZ Qµpoly ,1 = [γ(x, x, z) + σ2 ]poly d(x, z), ZSXZ Qµpoly ,2 = γpoly (x, x, z) fX (x) d(x, z), SXZ Z (2,0,0) Iγpoly ,X1 X1 = (γpoly (x, x, z))2 wµ d(x, z), Z SXZ (2,0,0) (0,0,2) Iγpoly ,X1 Z = γpoly (x, x, z)γpoly (x, x, z) wγ d(x, x, z), Z SXXZ (0,0,2) Iγpoly ,ZZ = (γpoly (x, x, z))2 wγ d(x, x, z), Z SXXZ [γ̃((x1 , x2 ), (x1 , x2 ), z) + σε2 ]poly d(x1 , x2 , z), Qγpoly ,1 = SXXZ Z Qγpoly ,2 = γ̃poly ((x1 , x2 ), (x1 , x2 ), z) fXX (x1 , x2 ) d(x1 , x2 , z). SXXZ Remember that the weight functions wµ = wµ (x, z) and wγ = wγ (x1 , x2 , z) are defined as wµ (x, z) = fXZ (x, z) and wγ (x1 , x2 , z) = fXXZ (x1 , x2 , z). Estimates for these densities are constructed from kernel density estimates of the single pdfs fX and fZ , where we use the Epanechnikov kernel and the bandwidth selection procedure of Sheather and Jones (1991)5 . Note that it is necessary to estimate the models µpoly and γpoly with interactions, since otherwise their partial derivatives would degenerate – all necessary further details are found in the following list: • The model µpoly is fitted via regressing Yit on powers (each up to the fourth power) of Xit , Zt , and Xit · Zt for all i and t. poly • The model γpoly is fitted via regressing Cijt = (Yit − µpoly (Xit , Zt ))(Yjt − µpoly (Xjt , Zt )) on powers (each up to the fourth power) of Xit , Xjt , Zt , Xit · Zt , and Xjt · Zt for all t and all i, j with i 6= j. poly poly • The model γ̃poly is fitted via regressing Cpoly (ij),(kl),t = (Cijt − γpoly (Xit , Xjt , Zt ))(Cklt − γpoly (Xkt , Xlt , Zt )) on powers (each up to the fourth power) of Xit , Xjt , Xkt , Xlt , and Zt 5 Sheather, S. J. and M. C. Jones (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B (Methodological) 53 (3), 683690. 10 for all t and all i, j, k, l with (i, j) 6= (k, l). • The model [γ(x, x, z)+σ2 ]poly is fitted via regressing the noise contaminated diagonal values poly Ciit on powers (each up to the fourth power) of Xit , and Zt for all i and t. • The model [γ̃((x1 , x2 ), (x1 , x2 ), z) + σε2 (x1 , x2 , z)]poly is fitted via regressing the noise contaminated diagonal values Cpoly (ij),(ij),t on powers (each up to the fourth power) of Xit , Xjt , and Zt for all i, j, and t. For the computation of the Bonferroni type confidence intervals we also need to approximate the error variance σ2 . This is done via the empirical version of: Z 1 2 σpoly, =R [γ(x, x, z) + σ2 ]poly − γpoly (x, x, z) d(x, z). SXZ 1d(x, z) SXZ (39) Of course, it is always possible (and often necessary) to use more complex polynomial models that include more interaction terms and higher order polynomials. Though, due to the relatively simple structured data this rule-of-thumb method works very well. The global polynomial fits of the mean functions are shown in Figure 6. Their general shapes are plausible and we can expect them to be very useful in approximating the above unknown quantities, though these pilot estimates are not perfect substitutes for the final local linear estimates. Particularly, the temperature effects in the second time period show some implausible wave shapes, which do not show up in the nonparametric fit shown in Figure 5. 3 Data Sources The data for our analysis come from four different sources. Hourly spot prices of the German electricity market are provided by the European Energy Power Exchange (EPEX) (www.epexspot. com), hourly values of Germany’s gross electricity demand and electricity exchanges with other countries are provided by the European Network of Transmission System Operators for Electricity (www.entsoe.eu), German wind and solar power infeed data are provided by the transparency platform of the European energy exchange (www.eex-transparency.com), and German air temperature data are available from the German Weather Service (www.dwd.de). The data dimensions are given by n = 24, N = 552, T = 261, and T = 262, where the latter two numbers are the number of working days one year before and one year after Germany’s nuclear phaseout. Very few (0.2%) of the data tuples (Yit , Xit , Zt ) with prices Yit > 200 EUR/MWh are 11 One year before Germany’s nuclear phaseout −10 0 10 20 60000 80000 Gray−Scale 88 EUR/MWh 69 EUR/MWh 48 EUR/MWh 29 EUR/MWh 40000 Electr. Demand (MW) 20000 20000 40000 60000 Electr. Demand (MW) 80000 Gray−Scale 88 EUR/MWh 69 EUR/MWh 48 EUR/MWh 29 EUR/MWh First year after Germany’s nuclear phaseout 30 −10 0 Temperature (°C) 10 20 30 Temperature (°C) Figure 6: Approximated mean functions using global polynomial regression models. considered as outliers and removed. Such extreme prices cannot be explained by the merit order model, since the marginal costs of electricity production usually do not exceed the value of about 200 EUR/MWh. Prices above this threshold are often referred to as “price spikes” and have to be modeled using different approaches (Ch. 4 in Burger (2008)6 ). 6 Burger, M., B. Graeber, and G. Schindlmayr (2008). Managing Energy Risk: An Integrated View on Power and Other Energy Markets (1. ed.). Wiley. 12