1. For the regression line yi = ai + ei (where yi = earnwke, weekly earnings) derive the OLS estimator. The problem of finding a least squares estimator has several steps. 0) Write down the model. yi a ei 1) Write down an expression for e-hat in terms of the models parameters, then use that to get an expression for the sum of squared errors. eˆi y i aˆ 2 f (aˆ ) eˆi ( yi aˆ ) 2 2) Take the derivative of this expression with respect to each of the model’s parameters, and set this equal to zero. You do this to minimize the expression.1 It is a least squares estimator because it minimizes – makes “least” – the sum of squares of the errors. In this case, the “model’s parameters” consists of just the a-hat you’re trying to estimate. 2 eˆ f (aˆ ) i aˆ aˆ 2 eˆ eˆi i eˆi aˆ eˆ 2eˆi i aˆ 2( y aˆ )( 1) [derivative of sum = sum of derivatives] [chain rule] [taking derivative of first part] [substituting in definition of e-hat, taking derivative of second part] 2 ( y aˆ ) 0 [set it equal to zero!] Note that the expression that the sum of errors = 0 that comes from taking the derivative and setting it equal to zero!2 Why? Because the derivative gives the slope of a function. Where that slope is equal to zero, the function reaches a minimum (or a maximum!) Draw a picture to see this. 2 In fact, the least squares estimator for any model with an intercept, a, will have the property that the errors sum to zero, because of this first order condition. 1 3) Solve for your parameter – a-hat! ( yi aˆ ) 0 yi aˆ 0 yi Naˆ aˆ 1 N yi y in other words, a-hat is y-bar! The least squares estimator of the mean is just y-bar. Use that estimator -- which we will pretend is the population estimator3 -- and the data in ps2_01.xls to calculate a-hat. If you do that in excel you get a-hat = 230, just the average of all the earnings observations. You are now told that the population standard deviation is (=97.86). Calculate the probability that i) Someone earns more than $400 ii) Someone earns $200-350 iii) Someone earns less than $182 For this problem, we pretend we are drawing from a population that is normally distributed with a mean of a-hat (y-bar) and variance 2. Then we can use the fact that… y ~ N ( , 2 ) z y ~ N (0,1) ..so we can look things up in a z (standard normal) table. 3 It is the population estimator if the data we are using is the entire population! i) Calculate P(y>400) y 230 400 230 ) 97.86 97.86 P( z 1.74) 0.5 P(0 z 1.74) 0.5 0.4591 0.0409 P( y 400) P( The key is we have to arrange things so that we can look the number up in the table, which gives, for some number c, P(0<z<c). Some of the properties we can use are: Property 1: Symmetry P(z<-c) = P(z>c) and P(-c<z<0)=P(0<z<c) Property 2: P(z>c) = 0.5-P(0<z<c) (Since P(z>0)=0.5.) - Draw a picture – it helps. ii) Calculate P(200<y<350) 200 230 y 230 350 230 ) 97.86 97.86 97.86 P (.31 z 1.23) P (.31 z 0) P (0 z 1.23) P (0 z .31) P (0 z 1.23) P(200 y 350) P( .1217 .3907 .5124 iii) Calculate P(y<182) y 230 182 230 ) 97.86 97.86 P( z .49) P( z .49) 0.5 P(0 z .49) P( y 182) P( 0.5 .1879 .3121 2. For the data in ps2_02.xls, compute a population mean and variance for earnings. Using these figures, compute the probability that someone earns: i) $200-$300 ii) iii) iv) v) Less than $500 More than $700 More than the mean wage Between $550 and $650 The population mean is the expected value. It has the form =E[Y]=yipi …where pi=P(yi), the marginal probability of this outcome yi. It’s the sum of the outcomes multiplied by their probabilities. To do this, we must first calculate the probabilities associated with each income band in excel by dividing by the number of people in each band by the total number of people (7912). You then multiply each of these by the midpoint of the income band – taken to be the “outcome” – and then sum up. You end up with E[Y]=355.3. See the spreadsheet for the calculation. Once we have that, we can calculate the variance of y according to the formula: 2=V(y) = (yi – E[y])2pi See the spreadsheet, where you get 2=25578 or =159.9. Once you’ve got these, you can calculate the probabilities the same way as the last problem, by assuming y~N(, 2)=N(355.3, 25578). i) ii) iii) iv) v) P(200<y<300) = P(-.97<z<-.35) = P(.35<z<.97) = P(0<z<.97)-P(0<z<.35) = .3340-.1368 = .1972 P(y<500) = P(z<.90) = 0.5+P(0<z<.90) = 0.5+0.3159 = .8159 P(y>700) = P(z>2.16) = 0.5-P(0<z<2.16) = 0.5-.4846 = .0154 The probability that you’re more than the mean is 0.5 in a symmetric distribution! P(550<y<650) = P(1.22<z<1.84) = P(0<z<1.84)-P(0<z<1.22) = .4671.3888=.0783 Also, I suppose for question 3, part i) you’d have to use this approach as well: i) P(100<y<200) = P(-1.60<z<-.97) = P(.97<z<1.60) = P(0<z<1.60)-P(0<z<.97) = .4452-.3340 = .1112 (Also, see the spreadsheet for a calculation of P(0<z<199). 3…For the rest, see spreadsheet.