Lecture 7: Introduction to regression analysis BUEC 333 Professor David Jacks

advertisement
Lecture 7: Introduction to regression analysis
BUEC 333
Professor David Jacks
1
First week focused our attention on a single RV X
and its relation to population quantities like μ.
We have also considered the critical difference
between samples and populations; in particular:
1.) how to compute sample statistics to
estimate population quantities;
2.) the properties of all estimators w.r.t. their
bias, efficiency, and consistency;
3.) how to form and test hypotheses about
the population parameter of interest.
Where we have been and where we are going
2
Second week expanded our attention to two or
more RVs X…our baseline case.
We generally care about the relationship between
or among variables; for example, the role of the
VIX index in driving changes in the LIBOR.
We can also use one or more variables to predict a
variable of interest; for example, variables from
monthly purchasing managers’ surveys to forecast
quarterly inventories and, thus, GDP.
Where we have been and where we are going
3
We can get a lot of mileage from studying the
conditional expectation of a variable of interest, Y,
given values of other variables, X.
Rest of the term spent learning about regression
analysis based on E(Y|X)…a very powerful tool
for analyzing economic data.
Where we have been and where we are going
4
Regression analysis is also the foundation for
almost all econometric analysis/techniques used to
measure/explain relationships between variables.
Example: casual observation will reveal that more
educated individuals tend to have higher incomes.
Regression methods can be used to measure the
rate of return of an extra year of education and/or
estimate the relationship between income and
education, gender, labour market experience, etc.
What is regression analysis?
5
A 101 example: theory tells us that if the price of a
good increases, individuals will consume less of it;
that is, demand curves slope down.
But theory does not predict how big the change in
consumption will be for a given price change.
Use regression analysis to measure how much
individuals reduce their consumption in response
to a price increase; in other words, estimate the
price elasticity of demand.
What is regression analysis?
6
Goal of regression analysis is to explain the value
of one variable of interest (dependent variable)
as a function of the values of other variables
(independent or explanatory variables).
Usually, dependent variable denoted as Y and
independent variables are X1, X2, X3 …
Often want to explain “movement” in Y as a
function of “movement” in the X variables
The regression model
7
We always use (at least) one equation to specify
the relationship between Y and the X variables.
This equation is known as the regression model:
E(Y|X1, X2, X3) = f(X1, X2, X3 )
Example from before: Y is income, X1 is years of
education, X2 is gender, X3 is years of experience,
and f is some function...
The regression model
8
The simplest example of a regression model is the
case where the regression function f is linear and
where there is only one X:
E[Y|X] = β0 + β1X
This says that the conditional expected value of Y
is a linear function of the independent variable X.
We call β0 and β1 the regression
Simple linear regression
9
β0 is called the intercept or constant term;
tells us the expected value of Y when X is zero.
β1 is the slope coefficient; measures the amount
that the expected value of Y changes for a unit
change in X; that is, simply the slope of the line
relating X and E[Y | X]:
Simple linear regression
10
There are two kinds of linearity present in the
regression model from the previous slide:
E[Y|X] = β0 + β1X
This regression function is linear in X;
counter-example: E[Y|X] = β0 + β1X2
This regression function is linear in the
coefficients, β0 and β1;
counter-example: E[Y|X] = β0 + Xβ1
A word on linearity
11
In general, neither kind of linearity is necessary.
However, we will focus our attention mostly on
what is known as the linear regression model.
The linear regression model requires linearity in
the coefficients, but not linearity in X.
So for our purposes,
OK: E[Y|X] = β0 + β1X2
A word on linearity
12
A word on linearity
13
Must recognize that the regression function is
never an exact representation of the relationship
between dependent and independent variables.
For example, there is no exact relationship
between income (Y) and education, gender,
experience…because of things like luck.
That is, there is always some variation in Y
The stochastic error term
14
Many possible reasons: important explanatory
variables left out of model; wrong functional
form (f); variables measured with error; or maybe
just some randomness in outcomes…
These are all sources of error.
To reflect these kinds of error, we include a
stochastic (random) error term in the model
The stochastic error term
15
After adding the error term, our simple linear
regression model is:
Y = β0 + β1X + ε
Regression model now has two components:
1.) a deterministic (non-random)
component, β0 + β1X
2.) a stochastic (random) component, ε
More about the error term
16
The right way to think about it: β0 + β1X is the
conditional mean of Y given X.
That is, Y = E(Y|X) + ε where E(Y|X) = β0 + β1X.
Remember “regression function” = E(Y|X).
Give us another way to
think about errors:
ε = Y − E(Y|X)
More about the error term
17
More about the error term
18
Think about starting salaries for new university
graduates (Y).
There is a lot of variation in starting salaries
among individuals.
Some of this variation is predictable; starting
salary depends on university, field of study,
industry, firm size...
Call all of this X.
A further example
19
Predictable part of starting salary goes into the
deterministic component of the regression: E(Y|X).
No need to impose that X enters linearly, but will
require E(Y|X) to be linear in β’s.
Choose the specific functional form of E(Y|X)
when we build the model.
A further example
20
Starting salary also depends on luck, nepotism,
interview skill, and a host of other unobservables.
We cannot measure these things, so we cannot
include them in E(Y|X).
Thus, the unpredictable/unobservable part ends up
in the error term, ε.
A further example
21
We need to extend our notation of the regression
function to accurately reflect the number of
observations.
As usual, we will work with an iid random sample
of n observations.
If we use the subscript i to indicate a particular
observation in our sample, our regression function
with one independent variable is:
Notes on notation
22
What we really have are n equations, one for each
observation, such that:
Y1   0  1 X 1   1
Y2   0  1 X 2   2

Yn   0  1 X n   n
Note: the coefficients β0 and β1 are the same in
each equation
Notes on notation
23
If we have more (say k) independent variables,
then we need to extend our notation further.
We could use a different letter for each
independent variable (X, Z, W,...); instead, we
usually just introduce another subscript on the X.
Now, we have two subscripts on the X:
one for the variable number (first subscript) and
one for the observation number (second subscript).
Further notes on notation
24
Before, said β1 was the marginal effect of X on Y:
dY dE[Y | X ]
1 

dX
dX
What do the regression coefficients measure now?
They are partial derivatives. That is,
Yi
Yi
Yi
1 
2 
 k 
X 1i
X 2i
X ki
β1 measures the effect on Yi of a one-unit increase
in X1i
Further notes on notation
25
Summary of what is known, unknown, and
assumed/hypothesized…
Known: the data, Yi and X1i , X2i , ... , Xki
Unknown: the coefficients and errors,
β0 , β1 , β2 , ... , βk and εi
Hypothesized: functional form of regression,
E(Yi | Xi) = β0 + β1X1i + β2X2i + βkXki
That which is known, unknown, and assumed
26
We want to use what is known to learn about the
unknown using the hypothesized functional form
of the regression function.
We might learn a lot about the β’s because they
are the same for each observation.
We might also learn about the functional form.
That which is known, unknown, and assumed
27
Think of the regression function we have so far
developed as the population regression function.
As always, we (have to) collect a sample of data to
learn about the population.
We do not know the population regression
coefficients (β), so we must estimate them.
The details of how these are estimated is the
subject of next week’s lecture; for now, we need to
develop the intuition of what they really are.
Estimated regression coefficients
28
Goal to end up with a set of estimated coefficients:
ˆ0 , ˆ1 , ˆ2 ,..., ˆk
The estimated coefficients are sample statistics
that we compute from our data.
Because they are sample statistics, they are RVs:
a different sample contains different observations,
different observations take different values,
and so the value of the statistic would be different.
Estimated regression coefficients
29
Estimated regression coefficients in hand, we can
calculate the predicted value of Yi.
This predicted value is a sample estimate of the
conditional expectation of Yi given all the X’s:
Yˆi  ˆ0  ˆ1 X 1i  ˆ2 X 2i    ˆk X ki
It is our “best guess” of the value of Yi given all
the X’s
Predicted values and residuals
30
Of course, Yi and its predicted value are rarely
equal. 
We call the difference between Yi and its predicted
value the residual, ei  Yi  Yˆi .
Then write the estimated regression function as:
Y  ˆ  ˆ X  ˆ X    ˆ X  e
i
0
1
1i
2
2i
Predicted values and residuals
k
ki
i
31
One big lesson: the (unknown) stochastic error
term, εi = Yi – E(Yi|Xi), is a population quantity as
it depends on the population quantities, β.
Residuals are the sample counterpart to εi.
Luckily, they are computable but they are RVs
themselves.
Errors and residuals
32
Predicted values and residuals
33
Here, we expand on the example of the weight
guessing job given in the book (pp. 18-20).
Recap: customers pay $2 each which you get to
keep if you guess their weight within 10 pounds.
If you miss by more than 10 pounds, you return
the $2 and give them a prize which costs $3.
The only information given to you is their height,
but you know there is a positive relationship
between height and weight.
A first example
34
Time to get your regression on…
Initially, you have a collection of the height of 20
males
A first example
35
Going one step further…
Scatter-plot of 20 (male) observations with weight
in pounds on vertical axis and height above five
feet in inches on horizontal axis.
A first example
36
Yˆi  ˆ0  ˆ1 X 1i
Wˆ  103.40  6.38 H
i
i
Suggests that our best guess of someone’s weight
who is five feet tall is 103.40 pounds.
Likewise, our best guess of someone’s weight who
is six feet tall is 103.40 + 6.38*(72 – 60) = 179.96
pounds.
A first example
37
Again, our predicted values lie on the estimated
regression line by construction
A first example
38
When we plot residuals, 3 instances where
estimate is off by more than 10 pounds are clear
A first example
39
Suppose you collected data on the heights and
weights of 20 different male customers and
estimated the following:
Wˆi  125.1  4.03H i
The coefficients are not the same as before
because the sample is different; when the sample
changes so will the estimates.
A first example
40
First regression has a steeper slope while the
second has a higher intercept; the lines intersect at
only one point when H = 9.23 inches &
W-hat = 162.3 pounds.
A first example
41
Suppose you could run a regression on the whole
population…
A first example
42
Also remember if an equation has more than one
independent variable, we have to be careful when
interpreting the regression coefficients.
For example, consider that amount of spending per
public school student across Canada.
Any regression for this variable should include at
least two variables, income in a province (since
this is the level of government where school
funding is determined) and enrollment growth.
A further example
43
Si   0  1Yi   2Gi   i
where Si = educational dollars spent per public
school student in province i
Yi = per capita income in province i
Gi = the percent growth of public school
enrollment in province i
Should always not only think about what the
coefficients tell us (given units of measurement)
A further example
44
First, β1 is the change in dollars spent per public
school student associated with a one-unit increase
in provincial income, holding the percent growth
of provincial public school enrollment constant.
Likewise, β2 is the change in dollars spent per
student associated with a one-unit increase in the
percent growth of provincial public school
enrollment, holding provincial income constant.
A further example
45
And what about their expected signs?
This would be positive for income since the more
income a province has the more they probably
spend on schools (if enrollment growth is held
constant).
But this would be negative for enrollment growth
since the faster enrollment is growing, the less
there will be to spend on each student (if income is
held constant).
A further example
46
Example: suppose we estimate a regression model
that predicts a student’s grade in BUEC 333 (Y) as
a function of the number of hours per week they
spend studying (X1) and working off-campus (X2).
The regression model is given by:
Yi  0  1 X1i   2 X 2i   i
for i  1, 2, ... , n
Before estimation, consider what your
expectations on X1 and X2 are…
A final example
47
You then estimate this model on a random sample
of 100 BUEC students and obtain the following
estimates:
ˆ0  45, ˆ1  2.5, ˆ2  1
What do these estimates tell us?
1.) The expected grade of a student who studies
zero hours per week and works zero hours per
week off-campus is 45.
A final example
48
2.) A student’s expected grade increases by 2.5
points for each additional hour that they study,
holding constant the number of hours they work
off campus.
3.) A student’s expected grade decreases by 1.0
points for each additional hour that they work offcampus, holding constant the number of hours
they study.
A final example
49
Given these estimates, we can predict the grade of
a student that studies 12 hours per week and works
10 hours per week off-campus as the following:
A final example
50
Download