Instructions There are 7 exercises and the task is to complete as many of these as possible within 4 hours. Your solutions are to be handed in at the first lecture and we will go through the exercises together. The exercises contain all necessary mathematics used throughout the course. Please note that some of the exercises are quite difficult and I do not expect you to be able to complete all the exercises, but it is important that you work at least 5 hours on this task. The reason that I want you to do these exercises is that the students taking the course have different backgrounds and I think it is important that we establish a common “mathematical language” as early as possible. It is sufficient if your solutions are written with pen and paper, and they do not have to be neatly written. Notation Matrix notation scalars (i.e. single valued parameters): small letters in italic, e.g. x vectors: small letters in bold, e.g. v matrices: capital letters in bold, e.g. A transpose of a matrix: AT or A' determinant of a matrix: A or det(A) inverse of a matrix: A-1 trace of a matrix (i.e. sum of the diagonal elements): tr(A) matrix rank: r(A) element i in vector v: vi element in row i and column j in matrix A: Aij Statistical notation random variables: capital letters in italic, e.g. X vector of random variables: small letters in bold, e.g. y expected value: E(X) variance: var(X) covariance: cov(X,Y) variance-covariance matrix: V(y) X is normally distributed with mean and variance e2 : X~N(, e2 ) Exercise 1: Matrix Algebra x 2y 7 3x 4 y 5 a) Solve the set of equations above b) Write the set of equations in matrix form Exercise 2: Some More Matrix Algebra A 13 2 4 b 57 a) Calculate det(A) b) Calculate A-1 c) Use A-1 to find the solution to Ax = b d) Find the eigenvalues of A e) Check that the sum of eigenvalues equals tr(A), and that the product of eigenvalues equals det(A). Exercise 3: Variances and Variance-Covariance Matrices Let X~N(0,1) a) var(2X) = ? b) Let Ui be independent and identically distributed (i.e. “iid”) from N(0,1) with U1 i = 1, 2,3. Let u U 2 . U 3 i) V(u)= ? 1 1 0 1 0 1 ii) Let Z 0 0 1. Calculate V(Zu) 0 0 1 Exercise 4: Ordinary Least Squares 1 1 0 1 1 10 1 1 0 1 1 W 1 0 1 X 1 0 y 12 17 1 0 1 1 0 11 Suppose we have the linear model y=Xb+e, with iid ei ~ N(0, e2 ) 1 a) Estimate b from the formula bˆ XT X XT y b) Calculate r(X) variance from the formula c) Estimate the residual ˆ e2 (y Xbˆ )T (y Xbˆ ) /(N r(X)) where N = 4 d) Find r(W). Solve this question by making a good guess or by calculating the number of non-zero eigenvalues for WTW. Exercise 5: Numerical methods Suppose we have the equation f(x) = 0, where f(x) is some function of x. We can find a solution to this problem with the iterative procedure of Newton-Raphson: f (x i ) where the subscript denotes iteration number and x0 is the starting x i1 x i f (x i ) value Let f(x) = x2 – 2x a) For which values of x is f(x) = 0? b) Find an approximation of these values by calculating x2 in the NewtonRaphson procedure with x0 = -1 c) Repeat the previous calculations with x0 = 3 d) How can one use the Newton-Raphson procedure to find an approximate value of x that maximizes y 13 x 3 x 2 7 ? Exercise 6: Matrixdifferentiation M 13 43 b 3 5 x x 1 x 2 Let c = bx , p = Mx and q = xTMx derivatives: Calculate the following a) c x1 b) c x c) p x d) q x Exercise 7: Likelihood Theory Suppose we have the same linear model as in Exercise 4: y=Xb+e, with iid ei ~N(0, e2 ). But in the present exercise we assume that the residual variance is known e2 = 10. The likelihood L is the probability of observing y given a parameter value of b, i.e. L = P(y | b). We can also define a likelihood for each single observation Li = P(yi | b) Since we assume that the residuals are normally distributed we have that 1 Li 2 e y i X i b e 2 2 e 2 , where Xi is the i:th row of X then the likelihood for all observations is the product of L1, L2, L3, and L4. That is N L Li i1 We wish to estimate b by maximizing the likelihood L, which is equivalent to maximizing the logarithm of the likelihood l = log(L). So we can get the maximum l likelihood estimates by calculating = 0. b N 2 a) Show that l log 1 N 21 2 y i X ib e 2 e i1 b) Check that l N2 log 2 e2 21 2 y Xb y Xb T e c) Find the maximum likelihood estmate of b d) l is called the gradient of l. The matrix of second derivatives is called the b Hessian and is denoted H. The Hessian is symmetric and in our example we 2 l 2 l 2 XT X b b1 b2 H have H 21 . Show that in our example. 2 l e2 l b22 b2b1 e) The observed Fisher Information matrix is defined as: I H and maximum 1 ˆ likelihood theory gives that b ~ Nb,I . Find the variance-covariance matrix of bˆ . What are the standard errors of bˆ1 and bˆ2 ?