Keys

advertisement
2011-02-22
STAT602X
Homework 3
Keys
9. Find a set of basis functions for the natural (linear outside the interval (ξ1 , ξK )) quadratic
regression splines with knots at ξ1 < ξ2 < · · · < ξK .
(10 pts) Based on HTF’s construction, the general form for the truncated-power basis set, let
hj (x) = xj−1 ,
j = 1, . . . 2,
h2+l (x) = (x − ξl )2+ , l = 1, . . . K.
Then, H ≡ {hi (x)}2+K
i=1 is a basis set of piecewise linear function. For a natural quadratic
spline, the functions are asked to be linear outside the interval (ξ1 , ξK ). Therefore, two more
constraints at each side are required. As Ex.5.4 of HTF, I have
f (x) =
2
∑
αj x
j−1
+
j=1
and let
yeilds βK = −
∑K−1
k=1
K
∑
βk (x − ξk )2+ ,
k=1
K
∑
d2
βk ≡ 0
f (x) = 2
dx2
k=1
βk . Then,
f (x) =
2
∑
j−1
αj x
+
j=1
K−1
∑
βk [(x − ξk )2+ − (x − ξK )2+ ]
k=1
{
}
indicates N ≡ 1, x, {(x − ξk )2+ − (x − ξK )2+ }k=1,...,K−1 is a basis for the natural quadratic
spline. Note that N is dependent on the given starting set H.
10. (B-Splines) For a < ξ1 < ξ2 < · · · < ξK < b consider the B-spline bases of order m, {Bi,m (x)}
defined recursively as follows. For j < 1 define ξj = a, and for j > K let ξj = b. Define
Bi,1 (x) = I[ξi ≤ x < ξi+1 ]
(in case ξi = ξi+1 take Bi,1 (x) ≡ 0) and then
Bi,m (x) =
ξi+m − x
x − ξi
Bi,(m−1) (x) +
Bi+1,(m−1) (x)
ξi+m−1 − ξi
ξi+m − ξi+1
(where we understand that if Bi,1 (x) ≡ 0 its term drops out of the expression above). For a = −0.1
and b = 1.1 and ξi = (i − 1)/10 for i = 1, 2, . . . , 11, plot the non-zero Bi,3 (x). Consider all linear
–1–
2011-02-22
STAT602X
Homework 3
Keys
combinations of these functions. Argue that any such linear combination is piecewise quadratic
with first derivatives at every ξi . If it is possible to do so, identify one or more linear constraints
∑
on the coefficients (call them ci ) that will make qc (x) = i ci Bi,3 (x) linear to the left of ξ1 (but
otherwise minimally constrain the form of qc (x)).
(10 pts) The attached code gives a recurisve function defining the formula Bi,m (x) and generates a plot for i = −2, −1, . . . , 11, m = 3, and x ∈ [−0.1, 1.1] where the vertical dot lines
indicate the ξi ’s.
i = −2
i = −1
i=0
i=1
i=2
i=3
i=4
i=5
i=6
i=7
i=8
i=9
i = 10
i = 11
0.0
0.2
Bi, 3(x)
0.4
0.6
0.8
1.0
B−Spline bases of order 3
0.0
0.5
1.0
x
R Code
a <- -0.1
b <- 1.1
K <- 11
xi <- (1:K - 1) / 10
N.blocks <- K + 1
### Recursive funtion for B_{i, m}(x)
B <- function(x, i, m){
ret <- 0
if(m == 1){
### deal with m = 1 and stop recursion.
if(i >= 1 && i <= K - 1){
if(x >= xi[i] && x < xi[i + 1]){
ret <- 1
}
} else if((i == 0 && x >= a && x < xi[1]) || (i == K && x >= xi[K] && x < b)){
ret <- 1
–2–
2011-02-22
STAT602X
Homework 3
Keys
}
} else{
### deal with m > 1 and start recursion by calling B() itself.
xi.i <- if(i < 1) a else if(i > K) b else xi[i]
xi.i.m.1 <- if(i + m - 1 < 1) a else if(i + m - 1 > K) b else xi[i + m - 1]
xi.i.m <- if(i + m < 1) a else if(i + m > K) b else xi[i + m]
xi.i.1 <- if(i + 1 < 1) a else if(i + 1 > K) b else xi[i + 1]
term.1 <- if(xi.i.m.1 == xi.i) 0 else (x - xi.i) / (xi.i.m.1 - xi.i)
term.2 <- if(xi.i.m == xi.i.1) 0 else (xi.i.m - x) / (xi.i.m - xi.i.1)
ret <- term.1 * B(x, i, m - 1) + term.2 * B(x, i + 1, m - 1)
}
ret
}
### Compute B_{i, 3}(x)
N.x <- 200
x.all <- seq(a, b, length = N.x)
ret.B <- matrix(0, nrow = N.x, ncol = K + 3)
m <- 3
for(i in 1:ncol(ret.B)){
for(j in 1:N.x){
ret.B[j, i] <- B(x.all[j], i - 3, m)
}
}
# Extra column for i = -2, -1, and 0.
# Input i - 3 such that from -2 to 11.
### Generate plot
colors <- c("orange", "green3", "blue")
par(mar = c(3, 3, 2, 1), mgp = c(2, 1, 0))
plot(NULL, NULL, type = "n", xlab = "x", ylab = expression(B["i, 3"](x)),
main = paste("B-Spline bases of order ", m, sep = ""),
xlim = c(a - 0.3, b), ylim = range(ret.B, na.rm = TRUE))
ret <- lapply(1:ncol(ret.B),
function(i) lines(x.all, ret.B[,i], lwd = 2,
col = colors[i %% 3 + 1], lty = i %% 5 + 1))
abline(v = xi, lty = 3)
legend(-0.4, 0.9, paste("i = ", 1:ncol(ret.B) - 3, sep = ""),
col = colors[1:ncol(ret.B) %% 3 + 1], cex = 0.7,
lty = 1:ncol(ret.B) %% 5 + 1, lwd = 2)
(5 pts) Argue for piecewise quadratic with first derivatives at every ξi . Let Ii (x) ≡ Bi,1 (x)
and Ii (x) = 0 for any i is out of expression. I have
x − ξi
ξi+3 − x
Bi,2 (x) +
Bi+1,2 (x)
ξi+2 − ξi
ξi+3 − ξi+1
[
]
x − ξi
ξi+2 − x
x − ξi
Ii (x) +
Ii+1 (x)
=
ξi+2 − ξi ξi+1 − ξi
ξi+2 − ξi+1
[
]
ξi+3 − x
x − ξi+1
ξi+3 − x
+
Ii+1 (x) +
Ii+2 (x)
ξi+3 − ξi+1 ξi+2 − ξi+1
ξi+3 − ξi+2
Bi,3 (x) =
which is in a quadratic form of x. Also, for j ̸= i + 1, i + 2,
lim−
x→ξj
d
d
Bi,3 (x) = lim+ Bi,3 (x) = 0 .
dx
x→ξj dx
–3–
2011-02-22
STAT602X
Homework 3
Keys
For ξi+1 ,
[
]
d
1
x − ξi
x − ξi
1
lim−
Bi,3 (x) = lim−
+
= 10 ,
dx
ξi+2 − ξi ξi+1 − ξi
x→ξi+1
x→ξi+1 ξi+2 − ξi ξi+1 − ξi
[
]
−(x − ξi+1 ) + (ξi+3 − x)
d
(ξi+2 − x) − (x − ξi )
lim+
Bi,3 (x) = lim+
+
= 10 .
dx
(ξi+3 − ξi+1 )(ξi+2 − ξi+1 )
x→ξi+1
x→ξi+1 (ξi+2 − ξi )(ξi+2 − ξi+1 )
For ξi+2 , similarly,
lim−
x→ξi+2
d
d
Bi,3 (x) = lim+
Bi,3 (x) = −10 .
dx
x→ξi+2 dx
So, the first derivative exists for all x ∈ [a, b]. Therefore, the same conclusion holds for any
linear combination of Bi,3 (x). (The derivatives work for i = 0, . . . , 9, but the same argument
can be made for all i)
∑
(5 pts) Find ci such that qc (x) = i ci Bi,3 (x) is linear to the left of ξ1 = 0. Note that
Bi,3 (ξ1− ) = 0 for i ̸= −2, −1, 0, so let ci = 0 for the corresponding i. For linearity, I only need
a constraint on c−2 , c−1 , and c0 which require
]
[
d2
d2
d2
d2
lim
q (x) = lim− c−2 2 B−2,3 (x) + c−1 2 B−1,3 (x) + c0 2 B0,3 (x) ≡ 0 .
2 c
dx
dx
dx
x→ξ1− dx
x→ξ1
Note that only I0 (x) ̸= 0 for x → ξ1− . Then, (ξ−1 = ξ0 = a = −0.1)
d2
2
lim− 2 B0,3 (x) = lim−
= 100 ,
x→ξ1 dx
x→ξ1 (ξ2 − ξ0 )(ξ1 − ξ0 )
lim−
x→ξ1
d2
−2
−2
B−1,3 (x) =
+
= −300 ,
2
dx
(ξ1 − ξ−1 )(ξ1 − ξ0 ) (ξ2 − ξ0 )(ξ1 − ξ0 )
and
lim−
x→ξ1
d2
−2
B−2,3 (x) =
= −200 .
2
dx
(ξ1 − ξ−1 )(ξ1 − ξ0 )
So, the constraint is the set {ci } such that −2c−2 − 3c−1 + c0 = 0 and cj = 0 for j ̸= 0, −1, −2.
11. (3.23 of HTF) Suppose that columns of X with rank p have been standardized, as has Y .
Suppose also that
1
|⟨xj , Y ⟩| = λ ∀j = 1, . . . , p
N
Let β̂
ols
be the usual least squares coefficient vector and Ŷ
–4–
ols
be the usual projection of Y onto the
2011-02-22
STAT602X
Homework 3
Keys
column space of X. Define Ŷ (α) = αX β̂
1
N
ols
1
N
ols
= X β̂
ols
ols
and Ŷ (α) = αŶ
1
N
=
1
N
=
1
N
=
1
N
ols
)
∀j = 1, . . . , p
). Show this is decreasing in α. What is the implication
⟨
⟩
xj , Y − Ŷ (α) =
(∵ xj ⊥ Y − Ŷ
for α ∈ [0, 1]. Find
⟨
⟩
xj , Y − Ŷ (α) in terms of α, λ, and (Y − Ŷ )′ (Y − Ŷ
of this as regards the LAR algorithm?
(5 pts) Note that Ŷ
ols
ols
. For all j = 1, . . . , p,
⟨
⟩
ols xj , Y − αŶ
⟨
⟩
ols
ols
ols xj , Y − Ŷ + Ŷ − αŶ
⟨
⟩
⟨
⟩
ols
ols
ols + xj , Ŷ − αŶ
xj , Y − Ŷ
⟨
⟩
ols xj , (1 − α)Ŷ
= (1 − α)λ .
(No graded) See 3.23 of HTF for the complete question. Need to show: the correlation is
(1 − α)λ
λ(α) = √
(1 − α)2 + α(2−α)
RSS
N
ols
ols
where RSS = (Y − Ŷ )′ (Y − Ŷ ). Note that Var(Y ) = Var(xj ) = 1 for all j since
standardized, and Cov(Ŷ , Y − Ŷ ) = 0 since perpendicular. Then, we have
Var(Y − Ŷ (α)) = Var(Y − αY + αY − αŶ )
= (1 − α)2 Var(Y ) + α2 Var(Y − Ŷ ) + 2α(1 − α)Cov(Y , Y − Ŷ )
RSS
= (1 − α)2 + α2
+ 2α(1 − α)Cov(Y , Y − Ŷ )
N
RSS
+ 2α(1 − α)[Cov(Y − Ŷ , Y − Ŷ ) − Cov(Ŷ , Y − Ŷ )]
= (1 − α)2 + α2
N
RSS
= (1 − α)2 + α2
+ 2α(1 − α)Var(Y − Ŷ )
N
RSS
.
= (1 − α)2 + α(2 − α)
N
–5–
2011-02-22
STAT602X
Homework 3
Keys
Therefore, we have
λ(α) = √
Cov(xj , Y − Ŷ (α))
(1 − α)λ
=√
.
(1 − α)2 + α(2−α)
RSS
Var(xj )Var(Y − Ŷ (α))
N
(No graded) Need to show: λ(α) is decreased in α.
d
1
λ(α) = −λ RSS < 0.
dα
N
(No graded) The LAR algorithm along the selection process keeps the correlation of
residuals and variables in the active set decreasing, and keep the corelation tied/equal for all
variables in the active set.
12. a) Suppose that a < x1 < x2 < · · · < xN < b and s(x) is a natural cubic spline with knots
at the xi interpolating the points (xi , yi ) (i.e. s(xi ) = yi ). Let z(x) be any twice continuously
differentiable function on [a, b] also interpolating the points (xi , yi ). Then
∫
b
∫
′′
(s (x)) dx ≤
2
b
′′
(z (x))2 dx
a
a
(Hint: Consider d(x) = z(x) − s(x), write
∫
b
′′
∫
(d (x)) dx =
a
b
∫
′′
(z (x)) dx −
2
a
b
′′
∫
(s (x)) dx − 2
2
a
2
b
′′
′′
s (x)d (x)dx
a
′′′
and use integration by parts and the fact that s (x) is piecewise constant.)
(5 pts) By the hint, I need to show
∫b
a
′′
′′
′′′
s (x)d (x)dx = 0. Let x0 = a, xN +1 = b, and si = s (x)
–6–
2011-02-22
STAT602X
Homework 3
Keys
for x ∈ [xi , xi+1 ]. Also, we have d(xi ) = z(xi ) − s(xi ) = yi − yi = 0.
∫
b
′′
′′
′′
′
s (x)d (x)dx = s (x)d
a
′′
′′
(∵ s (a) = s (b) = 0)
=
=
=
∫
(x)|ba
N ∫ xi+1
∑
i=0
N
∑
i=0
N
∑
b
−
′′′
′
s (x)d (x)dx
a
′′′
′
s (x)d (x)dx
xi
∫
xi+1
si
′
d (x)dx
xi
si [d(xi+1 ) − d(xi )]
i=0
= 0
∫ b ′′
∫ b ′′
∫ b ′′
∫ b ′′
Therefore, a (d (x))2 dx = a (z (x))2 dx − a (s (x))2 dx and a (d (x))2 dx ≥ 0, so
∫ b ′′
∫ b ′′
(s (x))2 dx ≤ a (z (x))2 dx.
a
∫ b ′′
∑
2
b) Use a) and prove that the minimizer of N
(y
−
h(x
))
+
λ
(h (x))2 dx over the set of twice
i
i
i=1
a
continuously differentiable functions on [a, b] is a natural cubic spline with knots at the xi .
(5 pts) Let C 2 [a, b] be the set of twice continuously differentiable functions on [a, b]. Assume
z(x) ∈ C 2 [a, b] is the minimizer rather than s(x). i.e.
∫ b
∫ b
N
N
∑
∑
′′
′′
2
2
2
(yi − z(xi )) + λ
(z (x)) dx ≤
(yi − h(xi )) + λ
(h (x))2 dx
a
i=1
a
i=1
∫b
for all h(x) ∈ C 2 [a, b]. But from a), I have s(xi ) = z(xi ) and
which yield
N
∑
i=1
∫
(yi − s(xi )) + λ
b
′′
(s (x)) dx ≤
2
2
N
∑
a
i=1
a
′′
(s (x))2 dx ≤
∫
(yi − z(xi )) + λ
2
b
∫b
a
′′
(z (x))2 dx
′′
(z (x))2 dx
a
and contradict to the assumption. Therefore, s(x) has to be the minimizer.
13. Let H be the set of absolutely continuous functions on [0, 1] with square integrable first
–7–
2011-02-22
STAT602X
Homework 3
Keys
derivatives (that exist except possibly at a set of measure 0). Equip H with an inner product
∫
⟨f, g⟩H = f (0)g(0) +
1
′
′
f (x)g (x)dx
0
a) Show that
R(x, z) = 1 + min(x, z)
is a reproducing kernel for this Hilbert space.
(5 pts)
∫
1
d
d
⟨R(x, z), f (z)⟩H = R(x, 0)f (0) +
R(x, z) f (z)dz
dz
0 dz
∫ x
∫ 1
d
d
d
d
= f (0) +
R(x, z) f (z)dz +
R(x, z) f (z)dz
dz
dz
dz
x dz
∫0 x
= f (0) +
f ′ (z)dz + 0
0
= f (x)
b) Using Heckman’s development, describe as completely as possible,
argmin
h∈H
( N
∑
∫
(yi − h(xi ))2 + λ
1
)
′
(h (x))2 dx
0
i=1
(10 pts) The followings are based on Heckman’s notion.
As the Equation (2), we can define µ ≡ h(x), Fi (µ) = µ (i.e. Fi is an identity), and
Lµ(x) ≡ h′ (x), then this question fit in the general penalized least squares problem. This
Lµ(x) is also of the form of the Equation (3). Therefore, by the Theorem 3, the minimizer
exists and is of the form of the Equation (7). For this question, the solution is of the form
∑
µ̂(x) = α̂1 u1 (x) + N
i=1 β̂i ηi1 (x) for the given inner product. For finding µ̂(x), we can utilize
the Equation (8) and it’s solutions, the Equation (11) and (12), but we have to change
notations and dimensions.
For R1 (x, xi ), we already did in the part a).
–8–
2011-02-22
STAT602X
Homework 3
Keys
For ηi1 (x), by the definition of the Theorem 3,
ηi1 (x) = Fi (R1 (x, xi )) = R1 (x, xi ) = min(x, xi ).
This can be verified from the Green’s function: G(x, z) = 1 for x ≤ z, 0 else. Also, R1 (x, xi ) =
∫1
G(x, u)G(xi , u)du.
0
For u1 (x), the Theorem 5 gives a way to obtain a basis for kernel of L, and an example is
given in Heckman’s page 10. We have m = 1 and Lµ(x) = h′ (x), and the solution of x = 0
is r1 = 0. Therefore,
u1 (x) = exp(r1 x) = 1.
For α̂ and β̂, as given in the Equation (4), we solve
(Y − T α − Kβ)′ (Y − T α − Kβ) + λβ ′ Kβ
where α = α0 , β = (β1 , . . . , βn )′ , Y = (Y1 , . . . , Yn )′ , Ti1 = 1, i = 1, . . . , n, K ij = R1xj (xi ),
i, j = 1, . . . , n, and R1xj (xi ) = min(xi , xj ) as given in part a). (i.e. K ij = (min(xi , xj ))N ×N .)
D = I is an identity matrix in this setup.
Follow the same derivative in Heckman’s page 15, we have
M = K + λI
and from the Equations (11) and (12)
α̂ = (T ′ M −1 T )−1 T ′ M −1 Y
and
β̂ = M −1 [I − T (T ′ M −1 T )−1 T ′ M −1 ]Y .
So, the solution is
µ̂(x) = α̂1 u1 (x) +
N
∑
β̂i ηi1 (x)
i=1
= argmin
h∈H
( N
∑
∫
(yi − h(xi ))2 + λ
–9–
)
′
(h (x))2 dx .
0
i=1
1
2011-02-22
STAT602X
Homework 3
Keys
c) Using Heckman’s development, describe as completely as possible,
argmin
( N
∑
h∈H
∫
∫
xi
(yi −
h(t)dt)2 + λ
0
i=1
1
)
′
(h (x))2 dx
0
(10 pts) Similarly to the part b) and the example in Heckman’s page 16, we have to rewrite
∫x
the notation carefully. Let Fi (µ(x)) ≡ 0 i h(x)dx (i.e. fi (x) = 1) and Lµ(x) ≡ h′ (x).
For A, R1 (x, z) = min(x, z) is the same in the part a).
For B,
∫
ηi1 (x) = Fi (R1 (x, z)) =
0
xi
[
]
1 2
1 2
R1 (x, z)dz = xi I(x ≥ xi ) + xxi − x I(x < xi ).
2
2
For C, u1 = 1 as given in the part b),
Ti1 = Fi (u1 ) = xi
and
∫
xi
Kij = Fi (ηj1 (x)) =
ηj1 (x)dx
0
[
]
∫ xi
1 2
1 2
x I(x ≥ xj ) + xxj − x I(x < xj )dx
=
2 j
2
0
1 2
=
x (xi − xj )I(xi ≥ xj ) +
2 j
)
(
)
]
[(
1 3 1 3
1 2
1 3
x xj − xi I[xi < xj ] +
x − x I[xi ≥ xj ]
2 i
6
2 j 6 j
1 2
1
=
xj (3xi − xj )I(xi ≥ xj ) + x2i (3xj − xi )I(xi < xj ).
6
6
Note that Kij = Kji .
For D, α̂ and β̂ are the same formula as the part b) but substitue T , K and M by the
above settings A, B, and C.
– 10 –
2011-02-22
STAT602X
Homework 3
Keys
For E, the solution is similarly
µ̂(x) = α̂1 u1 (x) +
N
∑
β̂i ηi1 (x)
i=1
= argmin
h∈H
( N
∑
∫
(yi − h(xi ))2 + λ
1
)
′
(h (x))2 dx .
0
i=1
with substitution by the above settings from A to E.
Total: 75 pts (Q9: 10, Q10: 20, Q11: 10, Q12:10, Q13:25). Any resonable solutions are acceptable.
– 11 –
Download