2011-02-22 STAT602X Homework 3 Keys 9. Find a set of basis functions for the natural (linear outside the interval (ξ1 , ξK )) quadratic regression splines with knots at ξ1 < ξ2 < · · · < ξK . (10 pts) Based on HTF’s construction, the general form for the truncated-power basis set, let hj (x) = xj−1 , j = 1, . . . 2, h2+l (x) = (x − ξl )2+ , l = 1, . . . K. Then, H ≡ {hi (x)}2+K i=1 is a basis set of piecewise linear function. For a natural quadratic spline, the functions are asked to be linear outside the interval (ξ1 , ξK ). Therefore, two more constraints at each side are required. As Ex.5.4 of HTF, I have f (x) = 2 ∑ αj x j−1 + j=1 and let yeilds βK = − ∑K−1 k=1 K ∑ βk (x − ξk )2+ , k=1 K ∑ d2 βk ≡ 0 f (x) = 2 dx2 k=1 βk . Then, f (x) = 2 ∑ j−1 αj x + j=1 K−1 ∑ βk [(x − ξk )2+ − (x − ξK )2+ ] k=1 { } indicates N ≡ 1, x, {(x − ξk )2+ − (x − ξK )2+ }k=1,...,K−1 is a basis for the natural quadratic spline. Note that N is dependent on the given starting set H. 10. (B-Splines) For a < ξ1 < ξ2 < · · · < ξK < b consider the B-spline bases of order m, {Bi,m (x)} defined recursively as follows. For j < 1 define ξj = a, and for j > K let ξj = b. Define Bi,1 (x) = I[ξi ≤ x < ξi+1 ] (in case ξi = ξi+1 take Bi,1 (x) ≡ 0) and then Bi,m (x) = ξi+m − x x − ξi Bi,(m−1) (x) + Bi+1,(m−1) (x) ξi+m−1 − ξi ξi+m − ξi+1 (where we understand that if Bi,1 (x) ≡ 0 its term drops out of the expression above). For a = −0.1 and b = 1.1 and ξi = (i − 1)/10 for i = 1, 2, . . . , 11, plot the non-zero Bi,3 (x). Consider all linear –1– 2011-02-22 STAT602X Homework 3 Keys combinations of these functions. Argue that any such linear combination is piecewise quadratic with first derivatives at every ξi . If it is possible to do so, identify one or more linear constraints ∑ on the coefficients (call them ci ) that will make qc (x) = i ci Bi,3 (x) linear to the left of ξ1 (but otherwise minimally constrain the form of qc (x)). (10 pts) The attached code gives a recurisve function defining the formula Bi,m (x) and generates a plot for i = −2, −1, . . . , 11, m = 3, and x ∈ [−0.1, 1.1] where the vertical dot lines indicate the ξi ’s. i = −2 i = −1 i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i = 10 i = 11 0.0 0.2 Bi, 3(x) 0.4 0.6 0.8 1.0 B−Spline bases of order 3 0.0 0.5 1.0 x R Code a <- -0.1 b <- 1.1 K <- 11 xi <- (1:K - 1) / 10 N.blocks <- K + 1 ### Recursive funtion for B_{i, m}(x) B <- function(x, i, m){ ret <- 0 if(m == 1){ ### deal with m = 1 and stop recursion. if(i >= 1 && i <= K - 1){ if(x >= xi[i] && x < xi[i + 1]){ ret <- 1 } } else if((i == 0 && x >= a && x < xi[1]) || (i == K && x >= xi[K] && x < b)){ ret <- 1 –2– 2011-02-22 STAT602X Homework 3 Keys } } else{ ### deal with m > 1 and start recursion by calling B() itself. xi.i <- if(i < 1) a else if(i > K) b else xi[i] xi.i.m.1 <- if(i + m - 1 < 1) a else if(i + m - 1 > K) b else xi[i + m - 1] xi.i.m <- if(i + m < 1) a else if(i + m > K) b else xi[i + m] xi.i.1 <- if(i + 1 < 1) a else if(i + 1 > K) b else xi[i + 1] term.1 <- if(xi.i.m.1 == xi.i) 0 else (x - xi.i) / (xi.i.m.1 - xi.i) term.2 <- if(xi.i.m == xi.i.1) 0 else (xi.i.m - x) / (xi.i.m - xi.i.1) ret <- term.1 * B(x, i, m - 1) + term.2 * B(x, i + 1, m - 1) } ret } ### Compute B_{i, 3}(x) N.x <- 200 x.all <- seq(a, b, length = N.x) ret.B <- matrix(0, nrow = N.x, ncol = K + 3) m <- 3 for(i in 1:ncol(ret.B)){ for(j in 1:N.x){ ret.B[j, i] <- B(x.all[j], i - 3, m) } } # Extra column for i = -2, -1, and 0. # Input i - 3 such that from -2 to 11. ### Generate plot colors <- c("orange", "green3", "blue") par(mar = c(3, 3, 2, 1), mgp = c(2, 1, 0)) plot(NULL, NULL, type = "n", xlab = "x", ylab = expression(B["i, 3"](x)), main = paste("B-Spline bases of order ", m, sep = ""), xlim = c(a - 0.3, b), ylim = range(ret.B, na.rm = TRUE)) ret <- lapply(1:ncol(ret.B), function(i) lines(x.all, ret.B[,i], lwd = 2, col = colors[i %% 3 + 1], lty = i %% 5 + 1)) abline(v = xi, lty = 3) legend(-0.4, 0.9, paste("i = ", 1:ncol(ret.B) - 3, sep = ""), col = colors[1:ncol(ret.B) %% 3 + 1], cex = 0.7, lty = 1:ncol(ret.B) %% 5 + 1, lwd = 2) (5 pts) Argue for piecewise quadratic with first derivatives at every ξi . Let Ii (x) ≡ Bi,1 (x) and Ii (x) = 0 for any i is out of expression. I have x − ξi ξi+3 − x Bi,2 (x) + Bi+1,2 (x) ξi+2 − ξi ξi+3 − ξi+1 [ ] x − ξi ξi+2 − x x − ξi Ii (x) + Ii+1 (x) = ξi+2 − ξi ξi+1 − ξi ξi+2 − ξi+1 [ ] ξi+3 − x x − ξi+1 ξi+3 − x + Ii+1 (x) + Ii+2 (x) ξi+3 − ξi+1 ξi+2 − ξi+1 ξi+3 − ξi+2 Bi,3 (x) = which is in a quadratic form of x. Also, for j ̸= i + 1, i + 2, lim− x→ξj d d Bi,3 (x) = lim+ Bi,3 (x) = 0 . dx x→ξj dx –3– 2011-02-22 STAT602X Homework 3 Keys For ξi+1 , [ ] d 1 x − ξi x − ξi 1 lim− Bi,3 (x) = lim− + = 10 , dx ξi+2 − ξi ξi+1 − ξi x→ξi+1 x→ξi+1 ξi+2 − ξi ξi+1 − ξi [ ] −(x − ξi+1 ) + (ξi+3 − x) d (ξi+2 − x) − (x − ξi ) lim+ Bi,3 (x) = lim+ + = 10 . dx (ξi+3 − ξi+1 )(ξi+2 − ξi+1 ) x→ξi+1 x→ξi+1 (ξi+2 − ξi )(ξi+2 − ξi+1 ) For ξi+2 , similarly, lim− x→ξi+2 d d Bi,3 (x) = lim+ Bi,3 (x) = −10 . dx x→ξi+2 dx So, the first derivative exists for all x ∈ [a, b]. Therefore, the same conclusion holds for any linear combination of Bi,3 (x). (The derivatives work for i = 0, . . . , 9, but the same argument can be made for all i) ∑ (5 pts) Find ci such that qc (x) = i ci Bi,3 (x) is linear to the left of ξ1 = 0. Note that Bi,3 (ξ1− ) = 0 for i ̸= −2, −1, 0, so let ci = 0 for the corresponding i. For linearity, I only need a constraint on c−2 , c−1 , and c0 which require ] [ d2 d2 d2 d2 lim q (x) = lim− c−2 2 B−2,3 (x) + c−1 2 B−1,3 (x) + c0 2 B0,3 (x) ≡ 0 . 2 c dx dx dx x→ξ1− dx x→ξ1 Note that only I0 (x) ̸= 0 for x → ξ1− . Then, (ξ−1 = ξ0 = a = −0.1) d2 2 lim− 2 B0,3 (x) = lim− = 100 , x→ξ1 dx x→ξ1 (ξ2 − ξ0 )(ξ1 − ξ0 ) lim− x→ξ1 d2 −2 −2 B−1,3 (x) = + = −300 , 2 dx (ξ1 − ξ−1 )(ξ1 − ξ0 ) (ξ2 − ξ0 )(ξ1 − ξ0 ) and lim− x→ξ1 d2 −2 B−2,3 (x) = = −200 . 2 dx (ξ1 − ξ−1 )(ξ1 − ξ0 ) So, the constraint is the set {ci } such that −2c−2 − 3c−1 + c0 = 0 and cj = 0 for j ̸= 0, −1, −2. 11. (3.23 of HTF) Suppose that columns of X with rank p have been standardized, as has Y . Suppose also that 1 |⟨xj , Y ⟩| = λ ∀j = 1, . . . , p N Let β̂ ols be the usual least squares coefficient vector and Ŷ –4– ols be the usual projection of Y onto the 2011-02-22 STAT602X Homework 3 Keys column space of X. Define Ŷ (α) = αX β̂ 1 N ols 1 N ols = X β̂ ols ols and Ŷ (α) = αŶ 1 N = 1 N = 1 N = 1 N ols ) ∀j = 1, . . . , p ). Show this is decreasing in α. What is the implication ⟨ ⟩ xj , Y − Ŷ (α) = (∵ xj ⊥ Y − Ŷ for α ∈ [0, 1]. Find ⟨ ⟩ xj , Y − Ŷ (α) in terms of α, λ, and (Y − Ŷ )′ (Y − Ŷ of this as regards the LAR algorithm? (5 pts) Note that Ŷ ols ols . For all j = 1, . . . , p, ⟨ ⟩ ols xj , Y − αŶ ⟨ ⟩ ols ols ols xj , Y − Ŷ + Ŷ − αŶ ⟨ ⟩ ⟨ ⟩ ols ols ols + xj , Ŷ − αŶ xj , Y − Ŷ ⟨ ⟩ ols xj , (1 − α)Ŷ = (1 − α)λ . (No graded) See 3.23 of HTF for the complete question. Need to show: the correlation is (1 − α)λ λ(α) = √ (1 − α)2 + α(2−α) RSS N ols ols where RSS = (Y − Ŷ )′ (Y − Ŷ ). Note that Var(Y ) = Var(xj ) = 1 for all j since standardized, and Cov(Ŷ , Y − Ŷ ) = 0 since perpendicular. Then, we have Var(Y − Ŷ (α)) = Var(Y − αY + αY − αŶ ) = (1 − α)2 Var(Y ) + α2 Var(Y − Ŷ ) + 2α(1 − α)Cov(Y , Y − Ŷ ) RSS = (1 − α)2 + α2 + 2α(1 − α)Cov(Y , Y − Ŷ ) N RSS + 2α(1 − α)[Cov(Y − Ŷ , Y − Ŷ ) − Cov(Ŷ , Y − Ŷ )] = (1 − α)2 + α2 N RSS = (1 − α)2 + α2 + 2α(1 − α)Var(Y − Ŷ ) N RSS . = (1 − α)2 + α(2 − α) N –5– 2011-02-22 STAT602X Homework 3 Keys Therefore, we have λ(α) = √ Cov(xj , Y − Ŷ (α)) (1 − α)λ =√ . (1 − α)2 + α(2−α) RSS Var(xj )Var(Y − Ŷ (α)) N (No graded) Need to show: λ(α) is decreased in α. d 1 λ(α) = −λ RSS < 0. dα N (No graded) The LAR algorithm along the selection process keeps the correlation of residuals and variables in the active set decreasing, and keep the corelation tied/equal for all variables in the active set. 12. a) Suppose that a < x1 < x2 < · · · < xN < b and s(x) is a natural cubic spline with knots at the xi interpolating the points (xi , yi ) (i.e. s(xi ) = yi ). Let z(x) be any twice continuously differentiable function on [a, b] also interpolating the points (xi , yi ). Then ∫ b ∫ ′′ (s (x)) dx ≤ 2 b ′′ (z (x))2 dx a a (Hint: Consider d(x) = z(x) − s(x), write ∫ b ′′ ∫ (d (x)) dx = a b ∫ ′′ (z (x)) dx − 2 a b ′′ ∫ (s (x)) dx − 2 2 a 2 b ′′ ′′ s (x)d (x)dx a ′′′ and use integration by parts and the fact that s (x) is piecewise constant.) (5 pts) By the hint, I need to show ∫b a ′′ ′′ ′′′ s (x)d (x)dx = 0. Let x0 = a, xN +1 = b, and si = s (x) –6– 2011-02-22 STAT602X Homework 3 Keys for x ∈ [xi , xi+1 ]. Also, we have d(xi ) = z(xi ) − s(xi ) = yi − yi = 0. ∫ b ′′ ′′ ′′ ′ s (x)d (x)dx = s (x)d a ′′ ′′ (∵ s (a) = s (b) = 0) = = = ∫ (x)|ba N ∫ xi+1 ∑ i=0 N ∑ i=0 N ∑ b − ′′′ ′ s (x)d (x)dx a ′′′ ′ s (x)d (x)dx xi ∫ xi+1 si ′ d (x)dx xi si [d(xi+1 ) − d(xi )] i=0 = 0 ∫ b ′′ ∫ b ′′ ∫ b ′′ ∫ b ′′ Therefore, a (d (x))2 dx = a (z (x))2 dx − a (s (x))2 dx and a (d (x))2 dx ≥ 0, so ∫ b ′′ ∫ b ′′ (s (x))2 dx ≤ a (z (x))2 dx. a ∫ b ′′ ∑ 2 b) Use a) and prove that the minimizer of N (y − h(x )) + λ (h (x))2 dx over the set of twice i i i=1 a continuously differentiable functions on [a, b] is a natural cubic spline with knots at the xi . (5 pts) Let C 2 [a, b] be the set of twice continuously differentiable functions on [a, b]. Assume z(x) ∈ C 2 [a, b] is the minimizer rather than s(x). i.e. ∫ b ∫ b N N ∑ ∑ ′′ ′′ 2 2 2 (yi − z(xi )) + λ (z (x)) dx ≤ (yi − h(xi )) + λ (h (x))2 dx a i=1 a i=1 ∫b for all h(x) ∈ C 2 [a, b]. But from a), I have s(xi ) = z(xi ) and which yield N ∑ i=1 ∫ (yi − s(xi )) + λ b ′′ (s (x)) dx ≤ 2 2 N ∑ a i=1 a ′′ (s (x))2 dx ≤ ∫ (yi − z(xi )) + λ 2 b ∫b a ′′ (z (x))2 dx ′′ (z (x))2 dx a and contradict to the assumption. Therefore, s(x) has to be the minimizer. 13. Let H be the set of absolutely continuous functions on [0, 1] with square integrable first –7– 2011-02-22 STAT602X Homework 3 Keys derivatives (that exist except possibly at a set of measure 0). Equip H with an inner product ∫ ⟨f, g⟩H = f (0)g(0) + 1 ′ ′ f (x)g (x)dx 0 a) Show that R(x, z) = 1 + min(x, z) is a reproducing kernel for this Hilbert space. (5 pts) ∫ 1 d d ⟨R(x, z), f (z)⟩H = R(x, 0)f (0) + R(x, z) f (z)dz dz 0 dz ∫ x ∫ 1 d d d d = f (0) + R(x, z) f (z)dz + R(x, z) f (z)dz dz dz dz x dz ∫0 x = f (0) + f ′ (z)dz + 0 0 = f (x) b) Using Heckman’s development, describe as completely as possible, argmin h∈H ( N ∑ ∫ (yi − h(xi ))2 + λ 1 ) ′ (h (x))2 dx 0 i=1 (10 pts) The followings are based on Heckman’s notion. As the Equation (2), we can define µ ≡ h(x), Fi (µ) = µ (i.e. Fi is an identity), and Lµ(x) ≡ h′ (x), then this question fit in the general penalized least squares problem. This Lµ(x) is also of the form of the Equation (3). Therefore, by the Theorem 3, the minimizer exists and is of the form of the Equation (7). For this question, the solution is of the form ∑ µ̂(x) = α̂1 u1 (x) + N i=1 β̂i ηi1 (x) for the given inner product. For finding µ̂(x), we can utilize the Equation (8) and it’s solutions, the Equation (11) and (12), but we have to change notations and dimensions. For R1 (x, xi ), we already did in the part a). –8– 2011-02-22 STAT602X Homework 3 Keys For ηi1 (x), by the definition of the Theorem 3, ηi1 (x) = Fi (R1 (x, xi )) = R1 (x, xi ) = min(x, xi ). This can be verified from the Green’s function: G(x, z) = 1 for x ≤ z, 0 else. Also, R1 (x, xi ) = ∫1 G(x, u)G(xi , u)du. 0 For u1 (x), the Theorem 5 gives a way to obtain a basis for kernel of L, and an example is given in Heckman’s page 10. We have m = 1 and Lµ(x) = h′ (x), and the solution of x = 0 is r1 = 0. Therefore, u1 (x) = exp(r1 x) = 1. For α̂ and β̂, as given in the Equation (4), we solve (Y − T α − Kβ)′ (Y − T α − Kβ) + λβ ′ Kβ where α = α0 , β = (β1 , . . . , βn )′ , Y = (Y1 , . . . , Yn )′ , Ti1 = 1, i = 1, . . . , n, K ij = R1xj (xi ), i, j = 1, . . . , n, and R1xj (xi ) = min(xi , xj ) as given in part a). (i.e. K ij = (min(xi , xj ))N ×N .) D = I is an identity matrix in this setup. Follow the same derivative in Heckman’s page 15, we have M = K + λI and from the Equations (11) and (12) α̂ = (T ′ M −1 T )−1 T ′ M −1 Y and β̂ = M −1 [I − T (T ′ M −1 T )−1 T ′ M −1 ]Y . So, the solution is µ̂(x) = α̂1 u1 (x) + N ∑ β̂i ηi1 (x) i=1 = argmin h∈H ( N ∑ ∫ (yi − h(xi ))2 + λ –9– ) ′ (h (x))2 dx . 0 i=1 1 2011-02-22 STAT602X Homework 3 Keys c) Using Heckman’s development, describe as completely as possible, argmin ( N ∑ h∈H ∫ ∫ xi (yi − h(t)dt)2 + λ 0 i=1 1 ) ′ (h (x))2 dx 0 (10 pts) Similarly to the part b) and the example in Heckman’s page 16, we have to rewrite ∫x the notation carefully. Let Fi (µ(x)) ≡ 0 i h(x)dx (i.e. fi (x) = 1) and Lµ(x) ≡ h′ (x). For A, R1 (x, z) = min(x, z) is the same in the part a). For B, ∫ ηi1 (x) = Fi (R1 (x, z)) = 0 xi [ ] 1 2 1 2 R1 (x, z)dz = xi I(x ≥ xi ) + xxi − x I(x < xi ). 2 2 For C, u1 = 1 as given in the part b), Ti1 = Fi (u1 ) = xi and ∫ xi Kij = Fi (ηj1 (x)) = ηj1 (x)dx 0 [ ] ∫ xi 1 2 1 2 x I(x ≥ xj ) + xxj − x I(x < xj )dx = 2 j 2 0 1 2 = x (xi − xj )I(xi ≥ xj ) + 2 j ) ( ) ] [( 1 3 1 3 1 2 1 3 x xj − xi I[xi < xj ] + x − x I[xi ≥ xj ] 2 i 6 2 j 6 j 1 2 1 = xj (3xi − xj )I(xi ≥ xj ) + x2i (3xj − xi )I(xi < xj ). 6 6 Note that Kij = Kji . For D, α̂ and β̂ are the same formula as the part b) but substitue T , K and M by the above settings A, B, and C. – 10 – 2011-02-22 STAT602X Homework 3 Keys For E, the solution is similarly µ̂(x) = α̂1 u1 (x) + N ∑ β̂i ηi1 (x) i=1 = argmin h∈H ( N ∑ ∫ (yi − h(xi ))2 + λ 1 ) ′ (h (x))2 dx . 0 i=1 with substitution by the above settings from A to E. Total: 75 pts (Q9: 10, Q10: 20, Q11: 10, Q12:10, Q13:25). Any resonable solutions are acceptable. – 11 –