Uploaded by Shaikh Jawad Bin Islam

Econ322lec7p2

advertisement
[Page 136] So we can use regression analysis to explain why some observations have high values of Y
and some observations have low values of Y. In fact, that's the purpose of regression analysis is to
explain what's generating these differences and why. Why do some individuals earn high wages whereas
other individuals earn low wages? So we can think of this role that regression analysis has in terms of
explaining differences in Y by breaking down the differences and why into three different groups, the
first group being just the total differences in Y. So, if we think of, you know the total deviation from the
mean, and again we're going to square it when we're aggregating this, because if we don't square it,
that will sum to 0. So we define the sum of total deviation squared SST as the sum from I = 1 to N of the
Y I minus the mean. So minus Y bar squared.
Now when we're doing regression analysis, one thing we want to understand is. How much of this total
deviation can be explained by the characteristics of the individuals in the sample? Now in simple
regression analysis, we're just looking at one characteristic. So how much of Y can explain what is the
magnitude of that explanatory relationship? Again, I'm somewhat assuming causality there, which we
need to be careful, but it certainly wouldn't hold in most simple regression analysis. But this second
term here, sum of explained deviation squared is really looking at how much of these differences in Y,
how much of these differences in wages can be explained by the explanatory variables that we're using
by these X variables. Or in the case of a simple regression analysis by that single X variable.
So, the second term sum of explained deviation squared or SSE is equal to the sum from I equals 1 to N
of the predicted values for each I minus the mean of Y squared. So as the Y hat I minus Y bar in
parentheses squared where again remember this Y hat I is the predicted value of Y given our estimators
beta hat 0 and beta hat 1. OK, final term is the part of the differences in Y that we can't explain so over
and above controlling for the characteristics that we think are driving differences in wage. Say for
example education levels, there are differences in wage that seem to be just random that aren't
explained by any of the variables that we have in our regression or in the case of a simple regression
analysis by the variable we have in our regression. By the way, if other variables explain why, but we
don't include them in our regression, we just have Y as a function of the single X variable. Then all of
those other characteristics are going into this unexplained part here. OK, so if we're not controlling for
them, they become part of the residual term here part of the unexplained deviations squared, but we'll
talk more about that later.
At any rate, the third component is SSR. That is the sum of the unexplained deviation squared and this R
comes from the residual, right, which is our estimate of the error term. It's our estimate of what part is
not explained. So what is not explained given our estimators. So the SSR is equal to the sum from I
equals 1 to N. Of the Y I that is the actual data point minus Y hat I predicted data point in parentheses
squared and you know that this term here is the residual squared. So the sum of squared residuals is
where this is coming from, so SSR is equal to the sum from I = 1 to N of the residual squared. That is, our
predictions of the error term squared.
[Page 137] So we can show that the sum of squared total deviation in Y is equal to the sum of squared
explained deviation plus the sum of squared unexplained deviation or the sum of squared residuals. And
I'll go through the proof step by step. So we know that SST is equal to the sum from I equals 1 to N of in
parentheses Y I minus Y bar squared. We can decompose this Y I minus Y bar into two terms that is Y I
minus Y hat I plus Y hat I minus Y bar. Now, why do we want to separate it into these two terms? Well,
because this first term here is the residual we're looking to get sum of squared residuals out of this, so
getting a residual there is going to help us with that, and this second term is the explained difference or
the explain deviation of an individual's predicted Y from the mean Y. OK, so that's our first step in
getting toward these two separate terms here.
So let's simplify this. We know that Y I minus Y hat I is equal to the residual or U hat I. So this expression
becomes the sum from I = 1 to N of the U hat I plus the Y hat I minus Y bar all squared and I'll just write
down the next step in between here so that we just have an easy slow day today. OK so I'm sliding that
over. And we know if we expand this squared term that we get U hat I squared plus U hat I, I have left
space there because there are two of these. We multiply when we're squaring the U hat I plus the Y hat I
minus y bar. OK, and then the last term is the square of the second term in that brackets, so that's the Y
hat I. Sorry that's a funky Y hat, minus Y bar squared.
OK, now we have, and this is all, I suppose I should put square brackets around. This is all summed from I
= 1 to N. This is all summed from I = 1 to N, and I'm not going to bother putting I equals 1 to N. I'll just do
the summation term. There we go. OK. So zoom back out again and slide this over and away we go. So
again we can work through this expression because all the additive components in this expression and
get three separate summations. So we have the SST then equals the sum from I = 1 to N of the U hat I
squared plus the sum from I = 1 to N of 2 times U hat I times Y bar - I'm sorry, Y hat I minus Y bar plus
the sum from I = 1 to N. There should be an equal sign there not a minus sign of the Y hat I minus Y bar
in parentheses squared. Now this first term as you know is the sum of squared residuals. The last term
here by definition is the sum of squared explained deviations from the mean. How do we know, so I've
got here that this terminal use a different color this term in the middle here, that 2 U hat I times Y hat I
minus y bar. How do we know that the sum of that from I = 1 to N is equal to 0? Well, because by
construction, the sample covariance between the residuals and the fitted values, the residuals and the
fitted values is 0.
So when we construct our OLS estimators, we construct them such that the covariance between X and U
hat, sorry, my stylus is acting up again, is equal to 0, right? You recall that from the previous slides.
Therefore knowing the properties of covariance operators and I'm just going to give myself some more
space here. We can write that the covariance between beta O hat plus beta 1 hat times X minus mu Y or
the Y bar. Anyway, the mean of the Y. So the covariance between this expression, which is again just the
X, is the only variable here. So X and a bunch of constants. So knowing the properties of covariance
operators, we know that the covariance between this expression of X and U hat is also equal to 0, so by
construction, the sample covariance between the residuals and the fitted values. This should really be Y
bar here I'm sorry bout that, this mu Y. The sample covariance between the residuals and the fitted
values. These are Y hat and then we're subtracting out Y why bar here is 0. So that means that the
covariance between this expression and this term is 0, and so the sum of this is equal to 0.
[Page 138] So in the next few slides we will talk about some of the properties of the OLS estimators with
respect to the properties that are important for us when we're doing an econometric analysis. So the
first property is that the OLS estimators are unbiased in terms of the definition, what does that mean?
That means that the expectation of the estimator is equal to the true population parameter, so we will
prove this by showing that the expectation of our OLS estimator beta hat 1 is equal to the true
population parameter, so let's zoom in here and adjust this and write down. Actually, maybe I'll just put
it right in here. From, you know, write down so we remember what beta hat 1 actually is. So I need more
space than this. Even still it's going to be quite small when I zoom out again. OK, So what is our beta hat
1? Our beta hat 1, remember, is equal to the sum from I = 1 to N, but I'm not going to write that down.
Of the X I minus the X bar, so X is deviations from the mean times Y’s deviations from the mean. So
that's Y I minus Y bar. Divided by the sum for I = 1 to N of X’s deviations from the mean squared. OK. So
I'll zoom out and recenter and we'll have that for reference. There we go.
OK, so how do we prove that this value that the expectation of this beta hat 1 is equal to the true
population parameter beta 1. Well, let's start because this is a little bit messy. Let's start with just the
numerator. What we're going to do is we're going to expand the numerator and then simplify. So the
numerator is the sum from I = 1 to N of the X deviations from the mean times the Y deviations from their
mean. So we expand through this expression X times Y or the X I times the Y is equal to X I times Y I
there. Then we've got the X I times the Y bar, so that's the second term here, and it's positive times
negative, so you have it entering negatively. Then you have the X bar times the Y I and again entering
negatively. So that's this term here. The X bar times the Y bar. Negative times negative is a positive, so
that's positive X bar times Y bar. Now what we're going to do is collect some like terms here. So we've
got X I times Y I and X bar times Y I. So we can collect these terms and what we have is Y I times X I
minus X bar then you subtract it. The second term X I times Y bar, the final term add in X bar times Y bar.
OK, now again we can - these are additive and subtractive terms in touch inside the summations
operator. So the properties of summations operators we knew we can work through this and obtain
three separate summations. So the first term being the sum from I = 1 to end of the X deviations from
the mean times Y I minus the sum from I = 1 to end of the X I times Y bar plus the sum from I = 1 to end
of the X bar times Y bar. Now we can simplify these right? Because when we have, here we have two
constants and the sum of a constant from Y = 1 to N is just N times that constant, so X bar is a constant,
Y bar is a constant so X bar times Y bar is a constant and this final term simplifies to N times X bar times
Y bar. This term here is a constant times a variable. The constant can be pulled out of the expectations
operator and then we have Y bar times the sum from I = 1 to N of the X I. So whenever we have the sum
of something, just a single variable in here that becomes N times its mean. You know, we've already
worked that in previous slides, so we don't have to do that again. So this penultimate term then is
negative N times X bar times Y bar and the first term remains as is. OK, and you should note that now we
have negative N times X bar times Y bar plus N times X bar times Y bar so those last two terms cancel
out. And our numerator for our OLS estimator becomes the sum from I = 1 to N of the X I minus the X
bar times the Y I. So we can rewrite this beta hat 1 estimator and I'll show you on the next slide as beta
hat 1 is equal to the sum from I = 1 to N of the X I minus the X bar times the Y that we just arrived.
[Page 139] Over the sum from I = 1 to N of X’s deviations from its mean squared. OK, now we can plug
the true - remember this is the observed value of Y, so we can plug in the true data generating process
for Y in order to get the following. Now why do we do this? Why do we want to plug in that we know the
underlying data generating process is that Y I any observed Y I is equal to beta 0 plus beta 1 times the X I
plus some error term? And remember these are unobserved error terms, unobserved population
parameters. The reason why we want to plug this in is because remember that we have to take the
expectation of this beta hat 1 and see how it is, what its value is in relation to the true population
parameter. So in order to do that, the true population parameter has to enter into our equation here
and the way we enter it in is by noting that the observed value of Y is equal to the population regression
line plus some unobserved error term.
So we put that we plug that in here and then we have that beta hat 1 is equal to the sum from I = 1 to N
of excess deviations from the mean times Y I, which is equal to beta 0 plus beta 1 times X I plus this U
hat I all divided by the sum from I equals 1 to N of the excess deviations from its mean squared. Now
again, we're going to expand and then simplify the numerator. We're still working on the numerator
here. OK, so let's expand by multiplying this X I minus X bar through this term here. OK, so we have X I
minus X bar times the beta 0 X I minus X bar times the beta 1 times X I and X I minus X bar times the U I.
So the first term then with that first, multiplication gives us pulling the constant through the
summations operator beta 0 times the sum from I = 1 to N of the X I minus the X bar plus the second
multiplication term here, pulling the beta 1 through the summations operator is beta 1 plus the sum
from I = 1 to N of that X I minus X bar times X I plus and then the final multiplied term is the sum from I =
1 to N of the X I minus X bar times the U I.
OK, so remember that when we defined that sum of squared deviations from the mean, the sum of
squared deviations from the mean equals 0. So this sum equals 0, which means that beta 0 times this
sum equals 0. Second thing to note is that X I minus X bar times X I this some of that from I = 1 to N is
actually equal to the sum from I = 1 to N of the X I minus X bar squared. So that's the sum of squared
total deviations and that's this second term here and I'll just draw a line between it. So this term here is
equal to the sum of squared total deviations of X, which is also by the way, what we have in our
denominator. OK, and you can work through - I mean these are common things that we do simple
mathematical examples so you can work through that on your own. I won't go through that in detail.
We’ll Just move on to the next slide.
[Page 140] Our OLS estimator then becomes much much simpler. Now we've simplified the numerator
to be that beta hat 1 is equal to beta 1 times the sum of squared total deviations of X plus the sum from
I = 1 to N of the X I minus the X bar times U divided by the sum of squared total deviations of X. OK, so
we can separate this out into two different summations. They both have the same denominator, so this
is equal to beta 1, that's the true population parameter times SST X divided by SST X. So those SST X’s
cancel out plus the sum from I equals 1 to N of the X I minus the X bar times U I divided by the sum of
squared total deviations for X. OK, and then you get to the next stage here, which is simply that this is
SSTX to the negative 1, so that's this times the sum from I = 1 to N of the X I minus the X bar times U and
then this right here is the first term that beta 1 times 1.
OK, so we're getting somewhere right. We've got our OLS estimator written in terms of our true
population parameter and some other stuff. So in order to prove unbiasedness, remember we have to
prove that the expectation of this beta hat 1 is equal to beta 1. So let's see how we do that.
[Page 141] We take our expectation of beta hat 1 and we solve. So this is what we're aiming for and it's
always handy when you're doing math to know where you want to end up approximately. If you want to
do a proof, you've gotta head toward that direction. OK, so let's take the expectation of beta hat 1.
Remember our equation for Beta hat 1 is given right here, so we take the expectation of the left hand
side and the expectation of the right hand side and we get that the expectation of beta hat 1 is equal to
the expectation of beta 1 + SSTX to the negative 1 plus the sum from I = 1 to N of the X I minus the X bar
times the UI. So the expectations operator, like the summations operator, any constant, any additive
constants, or any additive terms can be put into two separate expectations. So the expectation of beta 1
plus blah is equal to the expectation of beta 1 plus the expectation of the blah. So the expectation of
beta 1 the expectation of a constant is just a constant, so the expectation of beta 1 just gives us beta 1.
The expectation of this second half here SSTX to the negative 1 is a constant. The sum of squared total
deviations from the mean is a constant, so that can be pulled through the expectations operator and any
summation, we're doing additive actions here, so the expectations operator can be worked through
that, so the expectation of the sum is equal to the sum of the expectation. So this second term here
becomes SSTX to the negative 1 times the sum from I = 1 to N of the expectation of the X I minus the X
bar times the UI. Well, if the assumption that the OLS assumption that the error term and the X are
uncorrelated holds, If EU conditional on X holds, if our error terms aren't more extreme for higher values
of X or lower for lower values of X, and this is 1 of the assumptions we make with OLS estimation, then
the last term equals 0. And so this is 0 and therefore SSTX times 0 is 0, so this last term is equal to 0. And
let's just write that in there. So 0 and so therefore this whole thing is 0. And our expectation of Beta hat
1 is equal to beta 1. QED, quad erat demonstrandum. I like using this also because I did my PhD at
Queens and the Queens Economics Department called ourselves QED quad erat demonstrandum. As
was to be, as was to be shown in Latin.
At any rate, if you're ever asked to do a proof on a term test or an exam, this few set of slides that we've
gone through, that's the sort of proof that I would expect. I wanted to mention one more thing before
moving on to the next slide in the next lecture, and that is this assumption that X and U aren't
correlated. That is, this assumption that the expected value of the error term conditional on X, the UI
conditional on X. The assumption that that is equal to 0 is an assumption that we're making with linear
regression analysis, and it's an assumption that we actually have to provide some sort of support, we
have to convince our audience that this is a reasonable assumption to make, because if this assumption
doesn't hold, then our OLS estimators, we would expect to be biased. We would expect this expectation
of Beta hat 1 not to equal the true population parameter. We will talk about this later in the course.
What happens when this assumption is violated in specific ways? What are our expectations for Beta
Hat 1 and will actually be able to show in certain cases what the direction of bias would be. If we know
there's going to be a bias in beta hat 1.
Download