PS1 for Econometrics 101, Warwick Econ Ph.D Exercise 1: working with probabilities (1/3) The probability density function for the random variable 6x(1 − x) fX (x) = 1. Find P (X ≤ 1/2). 0 What is the median of P (X ≤ 1/2) using is given by: 0≤x≤1 otherwise 2. Find the cumulative distribution function 3. Calculate if X X? FX (x) = P (X ≤ x). FX (x). Exercise 2: working with probabilities (2/3) Let X be a random variable with probability density function x fX (x) = 2 − x if 0<x<1 if 1<x<2 0 Find otherwise V (X). Exercise 3: working with probabilities (3/3) Let X be a continuous random variable with pdf fX (x) = Show that E(X) = 0 and V (X) = 1, 2 √1 e−x /2 , 2π knowing that x ∈ R. R∞ −∞ fX (x) = 1. Exercise 4: transpose, inner products, and outer products. The tranpose of a matrix A is a matrix B of B is the element on the of B are the columns of vector. The transpose of j th A. A If line and A such that the element on the ith line and ith column of A. is denoted A0 . If A is a square matrix, a) Let 3 2 . A= 1 0 5 2 A0 . b) Let B= Find Putting it in other words, the lines is a column vector, its transpose is the corresponding line symmetric with respect to its diagonal. Find j th column 4 8 8 3 B0. 1 ! . A0 = A if and only if A is c) Let 4 C= . 8 C 0. Find X Let ! β and X 0β (as over i X0 k×1 be two is 1×k and vectors. The inner product of β of the product of the k×k matrix equal to Xβ 0 k × 1, X 0 β is ith (as is k×1 and β is a real number equal to is indeed a real number). This number is the sum X coordinates of X X β0 and and is β. The outer product of 1 × k , Xβ 0 is indeed a k×k X and d) Let j th coordinate of X = (2, 3)0 and is a matrix). The 0 element on the ith line and j th colum of Xβ is the product of the ith coordinate of of the β X and β. β = (6, −4)0 . e) Compute the matrixes Xβ 0 and X 0β. Compute β0X . Check that it is equal to β0X . Are they equal? Is there another relationship between the two? f ) Compute the matrix XX 0 . Which property does it satisfy? Exercise 5: expectation and variance matrixes. For any matrix A whose elements are random variables Aij , E(A) is the matrix whose elements are the expectations of the random variables in of E(A) is E(Aij ), A: the element on the ith line and jth column the expectation of the element on the ith line and jth column of A. a) Let X11 X12 X= X X 21 22 X31 X32 be a 3×2 matrix of random variables, with E(X22 ) = 3, E(X31 ) = −6, 0 matrix E(X ). Check that of random variables Let X be a k×1 and 2×2 = Give the 3×2 matrix E(X). Give the 2×3 E(X)0 . This result holds more generally: for any matrix X , E(X 0 ) = E(X)0 . vector of random variables. on the ith line and jth column of b) Let E(X32 ) = 4. E(X 0 ) E(X11 ) = 1, E(X12 ) = 4, E(X21 ) = −1, X = (X1 , X2 )0 be a V (X) is V (X) is a k×k cov(Xi , Xj ). 2 × 1 vector of random variables. 0 matrix (X −E(X))(X −E(X)) , and then show that This result holds more generally: for any matrix, such that the element Give an explicit expression of the E((X −E(X))(X −E(X))0 ) = V (X). k ×1 vector of random variables, E((X −E(X))(X − E(X))0 ) = V (X). c) Let B= 4 8 8 3 2 ! and X = (X1 , X2 )0 . Check that E(BX) = BE(X). This result holds more generally: for any k×k matrix deterministic matrix B and k×1 vector of random variables X , E(BX) = BE(X). To which property of the expectation operator is this equality due? d) Let X11 X= be a 2×2 X21 X22 . matrix of random variables, and let C= Compute the matrixes ! 4 8 . 2 3 CXC 0 , E(CXC 0 ), and CE(X)C 0 , and check that E(CXC 0 ) = CE(X)C 0 . This result holds more generally: for any k×k ! X12 deterministic matrix k×k matrix of random variables X and for any and for any C , E(CXC 0 ) = CE(X)C 0 . e) Use the results from questions b), c), and d) to show that for any variables X k×k deterministic matrix B , V (BX) = BV k×1 vector of random (X)B 0 . Exercise 6: A super consistent estimator Assume you observe an iid sample of on [0, θ]. θ is the unknown parameter we would like to estimate. a) Show that √ n random variables (Yi ) following the uniform distribution E(Yi ) = θ/2. Use this to form an estimator 1 θbM M for θ. Show that θbM M is n-consistent. Consider the following alternative estimator for b) Why does using θbM L c) Show that for any to estimate θ θ: θbM L = max {Yi }. 1≤i≤n sounds like a natural idea? x ∈ [0, θ], P (θbM L ≤ x) = x n , for θ x < 0 P (θbM L ≤ x) = 0, and for x > θ P (θbM L ≤ x) = 1. d) Use this to show that n θ−θbM L θ ,→ U , where U follows an exponential distribution with parameter 1. Hint : to prove this, you need to use the denition of convergence in distribution in your lecture notes. e) Which estimator is the best: θbM M , or θbM L ? f ) Illustrate this through a Monte-Carlo study. Draw 1000 iid realizations of variables following a uniform distribution on θbM M and θbM L . [0, 1] in Stata (you need to use the uniform() What is the value of θ command), compute in this example? Which estimator is the closest to 1 θ? We call this a parametric model, because the distribution of the data is fully known up to something, θ, which is a parameter, a one dimensional object. When you just say: the (Yi ) are iid, the distribution of the data is known up to F , the cdf of the (Yi ). F can be any element of an innite dimensional set, the set of all possible cdf. We call such models non-parametric models. 3 g) Let tx denote the xth quantile of the exp(1) distribution. is a condence interval for θ with asymptotic coverage Show that h i IC(α) = θbM L , θbM L + θbM L t1−α n 1 − α. Exercise 7: Roy selection model (1951), and randomized experiments. a) Try to prove the following theorem: Theorem 0.0.1 Let of F Using uniforms to generate other continuous distributions denote a strictly increasing cdf. If F −1 (U ) U [0, 1] distribution, then the cdf G(x) = P (F −1 (U ) ≤ x). You need to show that follows the uniform F. is Hint: the cdf of F −1 (U ) if the function G(x) = F (x). b) The inverse of the cdf of a random variable following the − λ1 ln(1 − x). on (0, 1), − λ1 ln(1 − U ) It follows from the previous theorem that if follows an exp(λ) distribution is x 7→ follows a uniform distribution distribution. Use this to generate in stata a data base with a rst variable containing 1000 draws of the variable U exp(λ) exp(1) distribution (we will call this Y0 ), and a second variable containing 1000 draws of the exp(0.8) distribution (we will call this variable Y1 ). The command to generate draws from uniform distributions in Stata is unif orm(). Y0 Say that the 1000 observations in your data are 1000 unemployed people, is the monthly wage (in thousand pounds) they will have 6 months from now if they do not participate in a training program, and Y1 is the wage they will get if they do participate. We would like to measure the eectiveness of this training program, and to do this the indicator we use is E(Y1 − Y0 ). c) In this exercise, we know the probability distribution of distribution, and Y1 follows an exp(0.8) exponential distributions to show that distribution. Y0 and Y1 : Y0 follows and exp(1) Use this, and standard properties of E(Y1 − Y0 ) = 0.25. What is the eect of this program on participants' monthly wages? We are rst going to assume that unemployed self-select themselves into the training program, and the decision rule they use is the following one: to 1 Ds = 1 {Y1 − Y0 > 0.1}, where Ds is equal if the unemployed chooses to participate. d) Give an interpretation to this decision rule from the perspective of economic theory. What do Y1 − Y0 and 0.1 represent? e) This simple decision rule is called the Roy selection model. On which assumption does it rely? Any idea to come up with a more realistic decision rule? f ) Generate the when Ds variable in the data (you just need to compute a dummy equal to Y1 − Y0 > 0.1, observation. Y = Y1 and to 0 otherwise). Generate also for people with Ds = 1 and Y = Y0 4 Y, 1 the observed outcome for each for people with Ds = 0. Compute the Y sample mean of with Ds = 0. among people with That's an estimator for Ds = 1 minus the sample of mean of E(Y |Ds = 1) − E(Y |Ds = 0), Y among people the naive measure of the eect of the treatment we mentioned in the lectures. Is this estimator close to the average 0.25? treatment eect, Can you explain why this is the case? Now, assume that for each unemployed you toss a fair coin, compel her to follow the program if she gets heads, and compel her not to follow it if she gets tails. The variable 1 the result of that lottery: it is equal to g) Generate the Dr mean of Y0 Dr = 0 . Y if the unemployed gets heads. among people with Dr = 1 minus the sample mean of Y Is the value of the estimator close to among people with Compare the sample mean of Dr = 0. people with denotes variable in the data (you can create a dummy equal to 1 if uniform()≤ Compute the sample mean of people with Dr Dr = 1 Y1 E(Y1 − Y0 )? to the sample mean of among people with Dr = 1 Y0 1/2). among Compare the sample among people with to the sample mean of Dr = 0. Y1 among Why those two comparisons illustrate that randomized experiments cancel out selection bias? h) The randomized experiment allows us to measure E(Y1 − Y0 ), up to some statistical uncer- tainty. In this particular context, explain where the uncertainty comes from. i) Generate 300 samples of 1000 realizations of random variables following the same distribution as Y0 and Y1 , compute for each of them the 90% condence interval for E(Y1 − Y0 ) using the formula in Theorem 2.6.1 in the notes, and compute the percentage of times when E(Y1 −Y0 ) does not lie in the condence interval. time should E(Y1 − Y0 ) According to the theory, which percentage of lie in its condence interval? Does what you nd conrm the theory? Run the same exercise again with only 100 observations, and with only 30 observations. Is asymptotia very far? j) The condence interval we derived in the previous question relies on Theorem 2.6.1, which assumes that V (Y0 ) = V (Y1 ). Is this assumption satised here? Does this seem to matter? k) Use Theorem 2.6.2 in the notes, to show that with n = 1000, α = 0.05, MDE of this experiment is 0.18 standard deviation of Y. E(V (Y |Dr )) + V (E(Y |Dr )) eect is equal to 0.22 Y. β = 0.8, Use the ANOVA formula (V to compute the standard deviation of standard deviation of and Y, the (Y ) = and show that the true Is this a well designed experiment? l) Assume we can only have 653 participants in the experiment. Use Theorem 2.6.2 in the notes to show that with standard deviation of Y, n = 653, α = 0.05, and β = 0.8, which is exactly equal to the true eect of the experiment. Generate 300 pair of variables following the same distribution as the 95% the MDE of this experiment is 0.22 condence interval for in the condence interval. E(Y1 − Y0 ), Y0 and Y1 , compute for each of them and compute the percentage of times when 0 lies According to the theory, which percentage of times should within the condence interval of E(Y1 − Y0 )? 0 lie Does the normal approximation works ne for those exponentially distributed data? 5 m) Write me a 10 lines email summarizing your results: How do your results illustrate the fact that randomized experiments are a good tool to measure the eect of a treatment? How do they illustrate that there is still statistical uncertainty in the results of a randomized experiment? 6