Lecture 10: Regression analysis in practice BUEC 333 Professor David Jacks

advertisement
Lecture 10: Regression analysis in practice
BUEC 333
Professor David Jacks
1
The lectures of the past two weeks may give you
the impression that regression analysis is simply
the mechanical application of a few formulas.
Since computers do the dirty work for us, all you
are required to do is tell them what your model is,
locate some data, and push the button.
Thus, it is easy to think that regression is easy and
requires little thought.
Regression is “easy”
2
At the same time, making money in financial
markets is easy as well: “buy low, sell high”.
This art or craft lies in developing a good model to
address an interesting question, finding convincing
data to estimate it with, and providing plausible
interpretations of your results…things to keep in
mind when vegging out in front of a computer.
Regression is “easy”
3
There are six basic steps to regression analysis in
real world applications:
1.) Review the literature and develop a model to
explain your dependent variable.
2.) Specify the regression equation by choosing
the independent variables and functional form.
The six steps
4
4.) Collect and clean the data.
5a.) Estimate and evaluate the regression.
6.) Document the results.
Some of these require a little elaboration;
some decidedly do not…
The six steps
5
Again, regression is dominant in empirical work in
economics and used to answer specific questions.
These questions could be related to measurement;
e.g. what is the price elasticity of demand for
cigarettes or how large is the gender wage gap?
Or these questions could be more subjective;
e.g. is there a role for government in alleviating
the duration of unemployment or is buying shares
of Apple a good investment thesis?
1.) Developing a model
6
In either case, the first step is to educate yourself
about the problem at hand:
a.) Does economic intuition or theory have
anything to say about my question and does it
suggest a particular regression model?
b.) What have other researchers done?
1.) Developing a model
7
For example, Mincer (1974) borrowed from a
theory of human capital used to explain the
sources of economic growth.
His application of this theory related to the
formation of wages in perfect competition:
ln(Wi) = β0 + β1Si + β2Xi + β3(Xi)2
where Wi = wages, Si = years of education, and
Basis of almost all studies of education and wages.
1.) Developing a model
8
With any luck, economic theory provides guidance
on an appropriate specification.
In the Mincer model, the dependent variable
should be measured in logarithms, education
should enter linearly, and experience should enter
linearly and as a square.
You need to ensure that all variables that are
2.) Model specification
9
If you leave out important variables, there is a real
risk that OLS estimates of the β’s will be biased.
The intuition: if that which is and is not included
are correlated, we may over/under-estimate the
importance of the included variables.
Again, using theory as a guide is helpful, but
intuition can be just as valuable…even without
Mincer’s theory
2.) Model specification
10
You should also exclude variables that are not
important for explaining your dependent variable.
Remember: we like concise models and will be
penalized for
Again, theory hopefully guides us here, but
intuition is just as important…we need no stinking
theory
2.) Model specification
11
The process of model specification involves
choosing three (separate but related) components:
a.) the independent variables and how they should
be measured;
b.) the functional (mathematical) form of the
variables;
c.) the properties of the stochastic error term.
2.) Model specification
12
Before going any further, you should think about
what kind of results you expect from your model.
In particular, which coefficients do you expect to
be positive or negative?
In the Mincer model, you should expect β1 > 0,
β2 > 0, and β3 < 0.
Whether or not you are surprised by any
coefficient estimates
3.) Forming expectations
13
Data comes in two varieties: hand-made and
off-the-shelf.
For most purposes, you will likely collect it offthe-shelf from sources like StatCan, the TSX,
Datastream….
In either case (but especially for hand-made), you
must clean your data.
4.) Collecting and cleaning data
14
“Data” inspired
by Woody’s
example from
Chapter 3.2…
Y = gross sales
volume
N = number of
competitors
…
4.) Collecting and cleaning data
15
A general rule regarding the appropriate sample
size is “the more, the merrier”…provided the
observations are from the same population.
Relates back to the notion of degrees of freedom;
when there are more degrees of freedom:
a.) every positive error is likely to be balanced by
a negative error;
b.) the estimated regression coefficients are
4.) Collecting and cleaning data
16
We understand the mechanics of estimation and
know how to let the computer do its thing;
estimation is easy after first four steps…
but first four steps can take a long, long time!
Next, we need to decide whether we are happy
with the output:
a.) do the coefficients have the expected sign?
b.) is the R2 acceptable?
5.) & 6.) Estimation, evaluation, and documentation
17
Regression line
(observed versus
predicted values)
on top and plot
of residuals on
bottom from
height-weight
guessing game in
Chapter 1.
5.) & 6.) Estimation, evaluation, and documentation
18
Whenever there is a large residual, we call the
particular observation an outlier.
Questions to ask: Are there many outliers (bad) or
few (better)? Can we modify the model to reduce
the number of outliers (i.e. improve the fit)?
When you are happy with a model, always report:
a.) coefficient estimates and their standard errors
b.) sample size, R2, and adjusted R2
c.) qualitative information on data & sampling
5.) & 6.) Estimation, evaluation, and documentation
19
“Modern” portfolio theory (MPT) is the backbone
of a tremendous amount of financial analysis.
Roots lie in the 1950s and 1960s with the work of
Markowitz and others in defining the efficient
frontier of investment options.
Basic framework: positive correlation between
risk and return.
An extended example
20
It is also an example where economic theory
suggests a very specific regression model used
daily in financial institutions worldwide.
This model is called the Capital Asset Pricing
Model (or CAPM).
The driving idea behind CAPM is that any asset’s
risk can be decomposed into two sub-types of risk:
1.) systematic risk
2.) specific risk
An extended example
21
Systematic risk is common to all assets:
it is the risk you would face if you held some of
every asset…hence also called, “market risk”.
Specific risk is particular to an individual asset:
it is that part of an asset’s risk that is unrelated to
the broader risk of the market.
MPT: investors are compensated for taking on
systematic
An extended example
22
Let: Zs denote the rate of return on asset s.
Zm denote the market rate of return.
Zf denote the risk-free rate of return, the
return on a hypothetically risk-free
asset.
In practice, Zm is the rate of return on some broad
market index like the S&P 500.
An extended example
23
CAPM theoretically establishes that the expected
rate of return on asset s is:
E(Zs) = Zf + β[E(Zm) – Zf]
Because of the linearity of the expectations
operator, we can aggregate the CAPM model up to
a given portfolio p (a set of assets whose return is
the weighted sum of the returns of each asset):
An extended example
24
Thus, β measures the sensitivity of the expected
return of portfolio p to systematic risk (as
measured by the expected market rate of return).
The expected return of the portfolio:
1.) tracks the broader market’s return if β = 1
2.) moves in the same direction as the broader
market’s return but is more volatile if β > 1
3.) moves in the same direction as the broader
market’s return but is less volatile if β=(0, 1)
An extended example
25
The CAPM model given before lends itself nicely
to regression analysis because of its linearity.
We rewrite it as E(Zp) - Zf = β[E(Zm) – Zf].
Now, suppose we have the actual (historical) rate
of return on our portfolio over some period t (Zp,t)
and similarly for a risk-free asset (Zf,t) and some
measure of the market rate of return (Zm,t).
An extended example
26
Let
1.) Yt = Zp,t – Zf,t (“excess return” of portfolio p)
2.) Xt = Zm,t – Zf,t (“excess return” of the market)
We can estimate a portfolio’s β (its sensitivity to
systematic risk) from the regression:
Yt = α + βXt + εt
All we have done here is add an intercept term α
An extended example
27
Monthly data on returns for 3 stocks for 10 years:
Apple, Rogers, and Shell.
rprogers = ZR,t – Zf,t
(“excess return” of Rogers)
rpmarket = Zm,t – Zf,t
(“excess return” of the market)
An extended example
28
Now, suppose an equally weighted portfolio of
Apple, Rogers, and Shell.
rpportfolio = (ZA,t /3 + ZR,t /3+ ZS,t /3) – Zf,t
(“excess return” of portfolio)
rpmarket = Zm,t – Zf,t
(“excess return” of the market)
An extended example
29
Download