Solution PS2_1

advertisement
1. For the regression line yi = ai + ei (where yi = earnwke, weekly earnings) derive the OLS estimator.
The problem of finding a least squares estimator has several steps.
0) Write down the model.
yi  a  ei
1) Write down an expression for e-hat in terms of the models parameters, then use that
to get an expression for the sum of squared errors.
eˆi  y i  aˆ
2
 f (aˆ )   eˆi   ( yi  aˆ ) 2
2) Take the derivative of this expression with respect to each of the model’s parameters,
and set this equal to zero. You do this to minimize the expression.1 It is a least
squares estimator because it minimizes – makes “least” – the sum of squares of the
errors. In this case, the “model’s parameters” consists of just the a-hat you’re trying
to estimate.
2
eˆ
f (aˆ )
 i
aˆ
aˆ
2
eˆ eˆi
 i
eˆi aˆ
eˆ
  2eˆi i
aˆ
  2( y  aˆ )( 1)
[derivative of sum = sum of derivatives]
[chain rule]
[taking derivative of first part]
[substituting in definition of e-hat, taking derivative of second part]
 2 ( y  aˆ )  0 [set it equal to zero!]
Note that the expression that the sum of errors = 0 that comes from taking the
derivative and setting it equal to zero!2
Why? Because the derivative gives the slope of a function. Where that slope is equal to zero, the function reaches a
minimum (or a maximum!) Draw a picture to see this.
2 In fact, the least squares estimator for any model with an intercept, a, will have the property that the errors sum to
zero, because of this first order condition.
1
3) Solve for your parameter – a-hat!
  ( yi  aˆ )  0
  yi   aˆ  0
  yi  Naˆ
 aˆ  1
N
yi  y
in other words, a-hat is y-bar! The least squares estimator of the mean is just y-bar.
Use that estimator -- which we will pretend is the population estimator3 -- and the data in ps2_01.xls
to calculate a-hat.
If you do that in excel you get a-hat = 230, just the average of all the earnings
observations.
You are now told that the population standard deviation is (=97.86). Calculate the probability that
i)
Someone earns more than $400
ii)
Someone earns $200-350
iii)
Someone earns less than $182
For this problem, we pretend we are drawing from a population that is normally
distributed with a mean of a-hat (y-bar) and variance 2. Then we can use the fact that…
y ~ N ( , 2 )
z
y

~ N (0,1)
..so we can look things up in a z (standard normal) table.
3
It is the population estimator if the data we are using is the entire population!
i) Calculate P(y>400)
y  230 400  230

)
97.86
97.86
 P( z  1.74)
 0.5  P(0  z  1.74)
 0.5  0.4591
 0.0409
P( y  400)  P(
The key is we have to arrange things so that we can look the number up in the table,
which gives, for some number c, P(0<z<c). Some of the properties we can use are:
Property 1: Symmetry P(z<-c) = P(z>c) and P(-c<z<0)=P(0<z<c)
Property 2: P(z>c) = 0.5-P(0<z<c) (Since P(z>0)=0.5.)
-
Draw a picture – it helps.
ii) Calculate P(200<y<350)
200  230 y  230 350  230


)
97.86
97.86
97.86
 P (.31  z  1.23)
 P (.31  z  0)  P (0  z  1.23)
 P (0  z  .31)  P (0  z  1.23)
P(200  y  350)  P(
 .1217  .3907
 .5124
iii)
Calculate P(y<182)
y  230 182  230

)
97.86
97.86
 P( z  .49)
 P( z  .49)
 0.5  P(0  z  .49)
P( y  182)  P(
 0.5  .1879
 .3121
2. For the data in ps2_02.xls, compute a population mean and variance for earnings. Using these figures,
compute the probability that someone earns:
i)
$200-$300
ii)
iii)
iv)
v)
Less than $500
More than $700
More than the mean wage
Between $550 and $650
The population mean is the expected value. It has the form
=E[Y]=yipi
…where pi=P(yi), the marginal probability of this outcome yi. It’s the sum of the
outcomes multiplied by their probabilities. To do this, we must first calculate the
probabilities associated with each income band in excel by dividing by the number of
people in each band by the total number of people (7912). You then multiply each of
these by the midpoint of the income band – taken to be the “outcome” – and then sum
up. You end up with E[Y]=355.3. See the spreadsheet for the calculation.
Once we have that, we can calculate the variance of y according to the formula:
2=V(y) = (yi – E[y])2pi
See the spreadsheet, where you get 2=25578 or =159.9.
Once you’ve got these, you can calculate the probabilities the same way as the last
problem, by assuming y~N(, 2)=N(355.3, 25578).
i)
ii)
iii)
iv)
v)
P(200<y<300) = P(-.97<z<-.35) = P(.35<z<.97) = P(0<z<.97)-P(0<z<.35) =
.3340-.1368 = .1972
P(y<500) = P(z<.90) = 0.5+P(0<z<.90) = 0.5+0.3159 = .8159
P(y>700) = P(z>2.16) = 0.5-P(0<z<2.16) = 0.5-.4846 = .0154
The probability that you’re more than the mean is 0.5 in a symmetric distribution!
P(550<y<650) = P(1.22<z<1.84) = P(0<z<1.84)-P(0<z<1.22) = .4671.3888=.0783
Also, I suppose for question 3, part i) you’d have to use this approach as well:
i)
P(100<y<200) = P(-1.60<z<-.97) = P(.97<z<1.60) = P(0<z<1.60)-P(0<z<.97)
= .4452-.3340 = .1112
(Also, see the spreadsheet for a calculation of P(0<z<199).
3…For the rest, see spreadsheet.
Download