Supplemental Paper

advertisement
Supplemental Paper for:
Anti-Proportional Bandwidth Selection for Smoothing
(Non-)Sparse Functional Data with Covariate Adjustments
Dominik Liebl
Institute for Financial Economics and Statistics
University of Bonn
Adenauerallee 24-42
53113 Bonn, Germany
Outline:
Section 1 contains the proofs of the Theorems 3.1 and 3.2. Further discussions and supplementary results on the bandwidth results and the rule-of-thumb procedure used in the
application are found in Section 2. The data sources are described in detail in Section 3.
1
Proofs
1.1
Proof of Theorem 3.1
Proof of Theorem 3.1, part (i): Let x and z be interior points of supp(fXZ ) and define
Hµ = diag(h2µ,X , h2µ,Z ), X = (X11 , . . . , XnT )> , and Z = (Z1 , . . . , ZT )> . Taylor-expansion of
µ around (x, z), the conditional bias of the estimator µ̂(x, z; H) can be written as
−1
1
−1
>
E(µ̂(x, z; Hµ ) − µ(x, z)|X, Z) = u>
×
1 (T n) [1, Xx , Zz ] Wµ,xz [1, Xx , Zz ]
2
×(T n)−1 [1, Xx , Zz ]> Wµ,xz (Qµ (x, z) + Rµ (x, z)) ,
where Qµ (x, z) is a T n × 1 vector with typical elements
(Xit − x, Zt − z)Hµ (x, z)(Xit − x, Zt − z)> ∈ R
(24)
with Hµ (x, z) being the Hessian matrix of the regression function µ(x, z). The T n × 1
vector Rµ (x, z) holds the remainder terms
o (Xit − x, Zt − z)(Xit − x, Zt − z)> ∈ R.
Next we derive asymptotic approximations for the 3 × 3 matrix
−1
(T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ]
and the 3×1 matrix (T n)−1 [1, Xx , Zz ]> Wµ,xz Qµ (x, z)
of the right hand side of Eq. (24). Using standard procedures from kernel density estimation it is easy to derive that
(T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ] =

fXZ (x, z) + op (1)

ν2 (Kµ )Hµ DfXZ (x, z) + op (Hµ 1)

ν2 (Kµ )DfXZ (x, z) Hµ + op (1 Hµ )
,
ν2 (Kµ )Hµ fXZ (x, z) + op (Hµ )
>
>
where 1 = (1, 1)> and DfXZ (x, z) is the vector of first order partial derivatives (i.e., the
gradient) of the pdf fXZ at (x, z). Inversion of the above block matrix yields
(T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ]

−1
(fXZ (x, z)) + op (1)

−DfXZ (x, z) (fXZ (x, z))−2 + op (1)
−1
=
(25)
−DfXZ (x, z)> (fXZ (x, z))−2
−1
(ν2 (Kµ )Hµ fXZ (x, z))
+ op

(1> )
+ op (Hµ )
.
The 3 × 1 matrix (T n)−1 [1, Xx , Zz ]> Wµ,xz Qµ (x, z) can be partitioned as following:


upper
element
,
(T n)−1 [1, Xx , Zz ]> Wxz Qµ (x, z) = 
lower bloc
where the 1×1 dimensional upper element can be approximated by a scalar variable equal
to
(T n)−1
X
Kµ,h (Xit − x, Zt − z)(Xit − x, Zt − z)Hµ (x, z)(Xit − x, Zt − z)>
(26)
it
= (ν2 (κ))2 tr {Hµ Hµ (x, z)} fXZ (x, z) + op (tr(Hµ ))
and the 2 × 1 dimensional lower bloc is equal to
(T n)−1
X
Kµ,h (Xit − x, Zt − z)(Xit − x, Zt − z)Hµ (x, z)(Xit − x, Zt − z)> ×
it
× (Xit − x, Zt − z)> = Op (H3/2
µ 1).
2
(27)
Plugging in the approximations of Eqs. (25)-(27) into the first summand of the conditional
bias expression in Eq. (24) leads to the following expression
1 >
u1 ((T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ])−1 (T n)−1 [1, Xx , Zz ]> Wµ,xz Qµ (x, z) =
2
1
= (ν2 (κ))2 tr {Hµ Hµ (x, z)} + op (tr(Hµ )).
2
Furthermore, it is easily seen that the second summand of the conditional bias expression
in Eq. (24), which holds the remainder term, is given by
1 >
u1 ((T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ])−1 (T n)−1 [1, Xx , Zz ]> Wµ,xz Rµ (x, z) = op (tr(Hµ )).
2
Summation of the two latter expressions yields the asymptotic approximation of the conditional bias
E(µ̂(x, z; Hµ ) − µ(x, z)|X, Z) =
1
(ν2 (κ))2 tr {Hµ Hµ (x, z)} + op (tr(Hµ )).
2
This is our bias statement of Theorem 3.1 part (i).
Proof of Theorem 3.1, part (ii): In the following we derive the conditional variance of
the local linear estimator V(µ̂(x, z; Hµ )|X, Z) =
>
−1
=u>
[1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z) Wµ,xz [1, Xx , Zz ]
1 ([1, Xx , Zz ] Wµ,xz [1, Xx , Zz ])
([1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ])−1 u1
(28)
−1
>
−1
=u>
((T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z) Wµ,xz [1, Xx , Zz ])
1 ((T n) [1, Xx , Zz ] Wµ,xz [1, Xx , Zz ])
((T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ])−1 u1 ,
where Cov(Y|X, Z) is the T n × T n matrix with typical elements
Cov(Yit , Yjs |Xit , Xjs , Zt , Zs ) = γ|t−s| ((Xit , Zt ), (Xjs , Zs )) + σ2 1(i = j and t = s)
with 1(.) being the indicator function.
We begin with analyzing the 3 × 3 matrix
(T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z) Wµ,xz [1, Xx , Zz ]
using the following three Lemmas 1.1-1.3.
3
Lemma 1.1 The upper-left scalar (block) of the matrix
(T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z)Wµ,xz [1, Xx , Zz ] is given by
(T n)−2 1> Wµ,xz Cov(Y|X, Z)Wµ,xz 1
= (T n)−1 fXZ (x, z)|Hµ |−1/2 R(Kµ ) γ(x, x, z) + σ2 (1 + Op (tr(Hµ1/2 )))
n−1
γ(x, x, z)
−1
−1
2
+ T (fXZ (x, z))
hµ,Z R(κ)
+ cr (1 + Op (tr(H 1/2 )))
n
fZ (z)
= Op ((T n)−1 |Hµ |−1/2 ) + Op (T −1 h−1
µ,Z ).
Lemma 1.2 The 1 × 2 dimensional upper-right block of the matrix
(T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z)Wµ,xz [1, Xx , Zz ] is given by


(X11 − x, Z1 − z)


..


(T n)−2 1> Wµ,xz Cov(Y|X, Z)Wµ,xz 

.


(XnT − x, ZT − z)
2
1/2
= (T n)−1 fXZ (x, z)|Hµ |−1/2 (1> H1/2
µ )R(Kµ ) γ(x, x, z) + σ (1 + Op (tr(Hµ )))
γ(x, x, z)
n−1
−1
−1
2 > 1/2
hµ,Z R(κ)
+ cr (1 + Op (tr(Hµ1/2 )))
+ T (fXZ (x, z)) (1 Hµ )
n
fZ (z)
−1 > 1/2 −1
= Op ((T n)−1 |Hµ |−1/2 (1> H1/2
(1 Hµ )hµ,Z ).
µ )) + Op (T
The 2 × 1 dimensional lower-left block of the matrix
(T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z)Wµ,xz [1, Xx , Zz ]
is simply the transposed version of this result.
Lemma 1.3 The 2 × 2 lower-right block of the matrix
(T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z)Wµ,xz [1, Xx , Zz ] is given by
(T n)−2 ((X11 − x), (Z1 − z))> , . . . , ((XnT − x), (ZT − z)> ) ×


(X11 − x, Z1 − z)


..


×Wµ,xz Cov(Y|X, Z)Wµ,xz 

.


(XnT − x, ZT − z)
= (T n)−1 fXZ (x, z)|Hµ |−1/2 Hµ R(Kµ ) γ(x, x, z) + σ2 (1 + Op (tr(Hµ1/2 )))
n−1
γ(x, x, z)
1/2
+ T −1 (fXZ (x, z))2 Hµ
h−1
R(κ)
+
c
r (1 + Op (tr(Hµ )))
µ,Z
n
fZ (z)
= Op ((T n)−1 |Hµ |−1/2 Hµ ) + Op (T −1 Hµ h−1
µ,Z ).
4
Using the approximations for the bloc-elements of the matrix
(T n)−2 [1, Xx , Zz ]> Wµ,xz Cov(Y|X, Z)Wµ,xz [1, Xx , Zz ], given by the Lemmas 1.1-1.3, and
−1
the approximation for the matrix (T n)−1 [1, Xx , Zz ]> Wµ,xz [1, Xx , Zz ] , given in (25), we
can approximate the conditional variance of the bivariate local linear estimator, given in
(28). Some tedious yet straightforward matrix algebra leads to V(µ̂(x, z; Hµ )|X, Z) =
(
(T n)−1 |Hµ |−1/2
+T
−1
n−1
n
R(Kµ ) γ(x, x, z) + σ2
fXZ (x, z)
γ(x, x, z)
h−1
µ,Z R(κ)
fZ (z)
)
(1 + op (1))
+ cr (1 + op (1)) ,
which is asymptotically equivalent to our variance statement of Theorem 3.1 part (ii).
Next we proof Lemma 1.1; the proofs of Lemmas 1.2 and 1.3 can be done correspondingly. To show Lemma 1.1 it will be convenient to split the sum such that
(T n)−2 1> Wµ,xz Cov(Y|X, Z)Wµ,xz 1 = s1 + s2 + s3 . Using standard procedures from kernel
density estimation leads to
s1 = (T n)−2
X
(Kµ,h (Xit − x, Zt − z))2 V(Yit |X, Z)
(29)
it
|Hµ |−1/2 fXZ (x, z)R(Kµ ) γ(x, x, z) + σ2 + O((T n)−1 |Hµ |−1/2 tr(H1/2
µ )),
XX
s2 = (T n)−2
Kµ,h (Xit − x, Zt − z) Cov(Yit , Yjs |X, Z) Kµ,h (Xjs − x, Zs − z)
= (T n)
−1
ij
= T −1 (fXZ (x, z))2 cr + Op (T −1 tr(Hµ1/2 ))
XX
−1
−1
−1
2
s3 = (T n)−2
h−1
µ,X κ(hµ,X (Xit − x))(hµ,Z κ(hµ,Z (Zt − z))) Cov(Yit , Yjt |X, Z)×
ij
i6=j
(30)
ts
t6=s
(31)
t
−1
× h−1
µ,X κ(hµ,X (Xjt − x))
n−1
γ(x, x, z)
= T −1 (fXZ (x, z))2
h−1
R(κ)
+ Op (T −1 tr(Hµ1/2 )),
µ,Z
n
fZ (z)
where |cr | ≤
2cγ r
1−r
< ∞. Summing up (29)-(30) leads to the result in Lemma 1.1. Lemmas
1/2
1.2 and 1.3 differ from Lemma 1.1 only with respect to the additional factors 1> Hµ
and Hµ . These come in due to the usual substitution step for the additional data parts
(Xit − x, Zt − z).
5
1.2
Proof of Theorem 3.2
The proof of Theorem 3.2 follows the same arguments as in the proof of Theorem 3.1. The
only additional issue we need to consider, is that we do not directly observe the dependent
variables, namely, the true theoretical raw covariances Cijt as defined in Eq. (4), but only
their empirical versions Ĉijt as defined in Eq. (9). In the following we show that this
additional estimation error is asymptotically negligible in comparison to the approximation
error of γ̂ derived under the usage of the theoretical raw covariates Cijt as given in Eq. (20)
of Theorem 3.3.
Observe that we can expand Ĉijt as following:
Ĉijt = Cijt + (Yit − µ(Xit , Zt ))(µ(Xjt , Zt ) − µ̂(Xjt , Zt ))
+ (Yjt − µ(Xjt , Zt ))(µ(Xit , Zt ) − µ̂(Xit , Zt ))
+ (µ(Xit , Zt ) − µ̂(Xit , Zt ))(µ(Xjt , Zt ) − µ̂(Xjt , Zt )).
Further, it follows from Eq. (19) in Theorem 3.3 that

 O (T n)−2/3 if 0 ≤ θ ≤ 1/5
p
µ̂(Xit , Zt ; hµ,X,opt , hµ,Z,opt ) − µ(Xit , Zt ) X, Z =
 Op T −4/5 if θ > 1/5,
for all i and t, where hµ,X,opt and hµ,Z,opt denote the θ-specific AMISE optimal bandwidth
choices as defined in Eqs. (13), (15), (34), and (35). Therefore, under our setup we have
that

 O (T n)−2/3 if 0 ≤ θ ≤ 1/5
p
Ĉijt − Cijt X, Z =
 Op T −4/5 if θ > 1/5,
which is of an order of magnitude smaller than the approximation error of γ̂ derived under
the usage of the theoretical raw covariates Cijt as given in Eq. (20) of Theorem 3.3.
2
2.1
Further discussions
AMISE. I optimal bandwidth selection
The following expressions are derived under the hypothesis that the first variance terms in
parts (ii) of Theorems 3.1 and 3.2 are dominating. The AMISE. I expression for the local
6
linear estimator µ̂ is given by
−1
AMISE. Iµ̂ (hµ,X , hµ,Z ) = (T n)−1 h−1
µ,X hµ,Z R(Kµ ) Qµ,1 +
1
+ (ν2 (Kµ ))2 h4µ,X Iµ,XX + 2 h2µ,X h2µ,Z Iµ,XZ + h4µ,Z Iµ,ZZ ,
4
where
Iµ,ZZ
Iµ,XZ
R
(γ(x, x, z) + σ2 ) d(x, z),
2
R
= SXZ µ(2,0) (x, z) wµ d(x, z),
2
R
= SXZ µ(0,2) (x, z) wµ d(x, z), and
R
= SXZ µ(2,0) (x, z)µ(0,2) (x, z) wµ d(x, z).
Qµ,1 =
Iµ,XX
(32)
SXZ
This is a known expression for the AMISE of a two-dimensional local linear estimator with
a diagonal bandwidth matrix (see, e.g., Herrmann et al. (1995)1 ) and can be derived using
the formulas in Wand and Jones (1994)2 .
The AMISE. I expression for the local linear estimator γ̂ is less common as it is based
on two bandwidths, which leads to
−1
AMISE. Iγ̂ (hγ,X , hγ,Z ) = (T N )−1 h−2
γ,X hγ,Z R(Kγ ) Qγ,1 +
1
+ (ν2 (Kγ ))2 2 h4γ,X (Iγ,X1 X1 + Iγ,X1 X2 ) + 4 h2γ,X h2γ,Z Iγ,X1 Z + h4γ,Z Iγ,ZZ ,
4
(33)
where
Qγ,1 =
Iγ,X1 X1 =
Iγ,X1 X2 =
Iγ,X1 Z =
Iγ,ZZ =
R
(γ̃((x1 , x2 ), (x1 , x2 ), z) + σε2 (x1 , x2 , z)) d(x1 , x2 , z)
2
R
(2,0,0)
γ
(x
,
x
,
z)
wγ d(x1 , x2 , z),
1
2
S
R XXZ (2,0,0)
(0,2,0)
γ
(x
,
x
,
z)γ
(x
,
x
,
z)
wγ d(x1 , x2 , z),
1
2
1
2
S
R XXZ (2,0,0)
γ
(x1 , x2 , z)γ (0,0,2) (x1 , x2 , z) wγ d(x1 , x2 , z), and
SXXZ
2
R
γ (0,0,2) (x1 , x2 , z) wγ d(x1 , x2 , z).
SXXZ
SXXZ
Equation (33) can be derived again using the formulas in Wand and Jones (1994)3 with
the further simplifications that Iγ,X1 X1 = Iγ,X2 X2 , Iγ,X1 X2 = Iγ,X2 X1 , and Iγ,X1 Z = Iγ,X2 Z
1
Herrmann, E., J. Engel, M. Wand, and T. Gasser (1995). A bandwidth selector for bi- variate kernel
regression. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1), 171180.
2
Wand, M. and M. Jones (1994). Multivariate plug-in bandwidth selection. Computational Statistics 9
(2), 97116.
3
Wand, M. and M. Jones (1994). Multivariate plug-in bandwidth selection. Computational Statistics 9
(2), 97116.
7
due to the symmetry of the covariance function, where the expressions Iγ,X2 X2 , Iγ,X2 X1 ,
and Iγ,X2 Z are defined in correspondence with those above.
The AMISE. I expressions (32) and (33) allow us to analytically derive the AMISE. I
optimal bandwidth pairs. For the mean function we have

hµ,X,AMISE.I = 
hµ,Z,AMISE.I
1/6
3/4
R(Kµ ) Qµ,1 Iµ,ZZ
2
h
1/2
Iµ,XX
1/2
Iµ,ZZ
T n (ν2 (Kµ ))
1/4
Iµ,XX
hµ,X,AMISE.I
=
Iµ,ZZ
+ Iµ,XZ
i
3/4
Iµ,XX

(34)
(35)
which corresponds to the result in Herrmann et al. (1995)4 . The AMISE. I optimal bandwidths for the covariance function are given by
hγ,X,AMISE.I
hγ,Z,AMISE.I
where CI =
√
1/7
3/2
2 Iγ,ZZ

=
T N (ν2 (Kγ ))2 2 (ν2 (Kγ ))2 Iγ,X1 Z + CI (CI − Iγ,X1 Z )3/2
CI − Iγ,X1 Z 1/2
=
hγ,X,AMISE.I ,
2 Iγ,ZZ

R(Kγ ) Qγ,1 4
2
Iγ,X
+ 4 (Iγ,X1 X1 + Iγ,X1 X2 ) Iγ,ZZ
1Z
1/2
(36)
(37)
. Obviously, the expressions in (36) and
(37) are much less readable than those in (34) and (35). This is the burden of a two times higher
dimension when estimating γ instead of µ.
2.2
AMISE. II optimal bandwidth selection
The AMISE. II expression of the three-dimensional local linear estimator γ̂ is given by
1st Order
2nd Order
}|
{ z
}|
{
z
−1
−1 −1
h
R(K
)
Q
+
T
h
R(κ)
Q
+ (38)
AMISE. IIγ̂ (hγ,X , hγ,Z ) = (T N )−1 h−2
γ
γ,1
γ,2
γ,X γ,Z
γ,Z


1


+ (ν2 (Kγ ))2 2 h4γ,X (Iγ,X1 X1 + IX1 X2 ) + 4 h2γ,X h2γ,Z Iγ,X1 Z + h4γ,Z Iγ,ZZ  ,
4
{z
} |
{z
} | {z }
|
3rd Order
where Qγ,2 =
R
SXXZ
2nd Order
1st Order
γ̃((x1 , x2 ), (x1 , x2 ), z) fXX (x1 , x2 ) d(x1 , x2 , z) and all other quantities are de-
fined in the preceding section below of Eq. (33).
4
Herrmann, E., J. Engel, M. Wand, and T. Gasser (1995). A bandwidth selector for bi- variate kernel
regression. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1), 171180.
8
The lowest possible AMISE value under the AMISE. II scenario can be achieved if there exists
a X-bandwidth which, first, allows us to profit from the (partial) annulment of the classical
bias-variance tradeoff, but, second, assures that the AMISE. II scenario remains maintained. The
first requirement is achieved if the X-bandwidth is of a smaller order of magnitude than the
Z-bandwidth, i.e., if hγ,X = o(hγ,Z ). This restriction makes those bias components that depend
on hγ,X asymptotically negligible, since it implies that h2γ,X h2γ,Z = o(h4γ,Z ) and therefore that
h4γ,X = o(h2γ,X h2γ,Z ). The latter leads to the order relations between the third, fourth, and fifth
AMISE. II term as indicated in Eq. (38). The second requirement is achieved if the X-bandwidth
does not converge to zero too fast, namely if (N h2γ,X )−1 = o(1), which implies the order relation
between the first two AMISE. II terms as indicated in Eq. (38).
2.3
Global polynomial fits
We suggest approximating the unknown quantities of the AMISE. I optimal bandwidth expressions
in Eqs. (34), (35), (36), and (37) and those of the AMISE. II optimal bandwidth expressions in
Eqs. (15), (16), (13), and (14) using five different global polynomial models of order four referred
to as: µpoly , γpoly , γ̃poly , [γ(x, x, z) + σ2 ]poly , and [γ̃((x1 , x2 ), (x1 , x2 ), z) + σε2 (x1 , x2 , z)]poly . Given
estimates of these polynomial models, allows us to approximate the unknown quantities Iµ,XX ,
9
Iµ,XZ , Iµ,ZZ , Qµ,1 , Qµ,2 , Iγ,X1 X1 , Iγ,X1 Z , Iγ,ZZ , Qγ,1 , and Qγ,2 by the empirical versions of
Z
(2,0)
Iµpoly ,XX =
(µpoly (x, z))2 wµ d(x, z),
Z SXZ
(2,0)
(0,2)
Iµpoly ,XZ =
µpoly (x, z)µpoly (x, z) wµ d(x, z),
Z SXZ
(0,2)
Iµpoly ,ZZ =
(µpoly (x, z))2 wµ d(x, z),
Z SXZ
Qµpoly ,1 =
[γ(x, x, z) + σ2 ]poly d(x, z),
ZSXZ
Qµpoly ,2 =
γpoly (x, x, z) fX (x) d(x, z),
SXZ
Z
(2,0,0)
Iγpoly ,X1 X1 =
(γpoly (x, x, z))2 wµ d(x, z),
Z SXZ
(2,0,0)
(0,0,2)
Iγpoly ,X1 Z =
γpoly (x, x, z)γpoly (x, x, z) wγ d(x, x, z),
Z SXXZ
(0,0,2)
Iγpoly ,ZZ =
(γpoly (x, x, z))2 wγ d(x, x, z),
Z SXXZ
[γ̃((x1 , x2 ), (x1 , x2 ), z) + σε2 ]poly d(x1 , x2 , z),
Qγpoly ,1 =
SXXZ
Z
Qγpoly ,2 =
γ̃poly ((x1 , x2 ), (x1 , x2 ), z) fXX (x1 , x2 ) d(x1 , x2 , z).
SXXZ
Remember that the weight functions wµ = wµ (x, z) and wγ = wγ (x1 , x2 , z) are defined as
wµ (x, z) = fXZ (x, z) and wγ (x1 , x2 , z) = fXXZ (x1 , x2 , z). Estimates for these densities are constructed from kernel density estimates of the single pdfs fX and fZ , where we use the Epanechnikov kernel and the bandwidth selection procedure of Sheather and Jones (1991)5 . Note that it
is necessary to estimate the models µpoly and γpoly with interactions, since otherwise their partial
derivatives would degenerate – all necessary further details are found in the following list:
• The model µpoly is fitted via regressing Yit on powers (each up to the fourth power) of Xit ,
Zt , and Xit · Zt for all i and t.
poly
• The model γpoly is fitted via regressing Cijt
= (Yit − µpoly (Xit , Zt ))(Yjt − µpoly (Xjt , Zt ))
on powers (each up to the fourth power) of Xit , Xjt , Zt , Xit · Zt , and Xjt · Zt for all t and
all i, j with i 6= j.
poly
poly
• The model γ̃poly is fitted via regressing Cpoly
(ij),(kl),t = (Cijt − γpoly (Xit , Xjt , Zt ))(Cklt −
γpoly (Xkt , Xlt , Zt )) on powers (each up to the fourth power) of Xit , Xjt , Xkt , Xlt , and Zt
5
Sheather, S. J. and M. C. Jones (1991). A reliable data-based bandwidth selection method for kernel
density estimation. Journal of the Royal Statistical Society. Series B (Methodological) 53 (3), 683690.
10
for all t and all i, j, k, l with (i, j) 6= (k, l).
• The model [γ(x, x, z)+σ2 ]poly is fitted via regressing the noise contaminated diagonal values
poly
Ciit
on powers (each up to the fourth power) of Xit , and Zt for all i and t.
• The model [γ̃((x1 , x2 ), (x1 , x2 ), z) + σε2 (x1 , x2 , z)]poly is fitted via regressing the noise contaminated diagonal values Cpoly
(ij),(ij),t on powers (each up to the fourth power) of Xit , Xjt ,
and Zt for all i, j, and t.
For the computation of the Bonferroni type confidence intervals we also need to approximate
the error variance σ2 . This is done via the empirical version of:
Z
1
2
σpoly,
=R
[γ(x, x, z) + σ2 ]poly − γpoly (x, x, z) d(x, z).
SXZ 1d(x, z) SXZ
(39)
Of course, it is always possible (and often necessary) to use more complex polynomial models
that include more interaction terms and higher order polynomials. Though, due to the relatively
simple structured data this rule-of-thumb method works very well. The global polynomial fits
of the mean functions are shown in Figure 6. Their general shapes are plausible and we can
expect them to be very useful in approximating the above unknown quantities, though these
pilot estimates are not perfect substitutes for the final local linear estimates. Particularly, the
temperature effects in the second time period show some implausible wave shapes, which do not
show up in the nonparametric fit shown in Figure 5.
3
Data Sources
The data for our analysis come from four different sources. Hourly spot prices of the German electricity market are provided by the European Energy Power Exchange (EPEX) (www.epexspot.
com), hourly values of Germany’s gross electricity demand and electricity exchanges with other
countries are provided by the European Network of Transmission System Operators for Electricity
(www.entsoe.eu), German wind and solar power infeed data are provided by the transparency
platform of the European energy exchange (www.eex-transparency.com), and German air temperature data are available from the German Weather Service (www.dwd.de).
The data dimensions are given by n = 24, N = 552, T = 261, and T = 262, where the latter
two numbers are the number of working days one year before and one year after Germany’s nuclear
phaseout. Very few (0.2%) of the data tuples (Yit , Xit , Zt ) with prices Yit > 200 EUR/MWh are
11
One year before Germany’s nuclear phaseout
−10
0
10
20
60000
80000
Gray−Scale
88 EUR/MWh
69 EUR/MWh
48 EUR/MWh
29 EUR/MWh
40000
Electr. Demand (MW)
20000
20000
40000
60000
Electr. Demand (MW)
80000
Gray−Scale
88 EUR/MWh
69 EUR/MWh
48 EUR/MWh
29 EUR/MWh
First year after Germany’s nuclear phaseout
30
−10
0
Temperature (°C)
10
20
30
Temperature (°C)
Figure 6: Approximated mean functions using global polynomial regression models.
considered as outliers and removed. Such extreme prices cannot be explained by the merit order
model, since the marginal costs of electricity production usually do not exceed the value of about
200 EUR/MWh. Prices above this threshold are often referred to as “price spikes” and have to
be modeled using different approaches (Ch. 4 in Burger (2008)6 ).
6
Burger, M., B. Graeber, and G. Schindlmayr (2008). Managing Energy Risk: An Integrated View on
Power and Other Energy Markets (1. ed.). Wiley.
12
Download