Understanding Financial Econometrics

advertisement
Market Structure, Trading, and Liquidity
FIN 2340
Dr. Michael Pagano, CFA
Econometric Topics
Adapted and Excerpted from Slides by:
Dr. Ian W. Marsh
Cass College, Cambridge U. and CEPR
1
Overview of Key Econometric
Topics
(1) Two-Variable Regression: Estimation & Hypothesis Testing
(2) Extensions of the Two-Variable Model: Functional Form
(3) Estimating Multivariate Regressions
(4) Multivariate Regression Inference Tests & Dummy Variables
2
Introduction
• Introduction to Financial Data and Financial
Econometrics
• Ordinary Least Squares Regression Analysis What is OLS?
• Ordinary Least Squares Regression Analysis Testing Hypotheses
• Ordinary Least Squares Regression Analysis Diagnostic Testing
3
Econometrics
• Literally means “measurement in economics”
• More practically it means “the application of
statistical techniques to problems in economics”
• In this course we focus on problems in financial
economics
• Usually, we will be trying to explain the
behavior of a financial variable
4
Econometric Model Building
1. Understand finance theory
2. Derive estimable model
3. Collect data
4. Estimate model
5. Evaluate estimation results
Satisfactory
Unsatisfactory
Interpret model
Re-estimate model using better techniques
Collect better data
Reformulate model
Assess implications for theory
5
Financial Data
• What sorts of financial variables do we usually
want to explain?
–
–
–
–
–
Prices - stock prices, stock indices, exchange rates
Returns - stock returns, index returns, interest rates
Volatility
Trading volumes
Corporate finance variables
• Debt issuance, use of hedging instruments
6
Time Series Data
• Time-series data are data arranged
chronologically, usually at regular intervals
– Examples of Problems that Could be Tackled Using
a Time Series Regression
• How the value of a country’s stock index has varied with
that country’s macroeconomic fundamentals.
• How a company’s stock returns has varied when it
announced the value of its dividend payment.
• The effect on a country’s currency of an increase in its
interest rate
7
Cross Sectional Data
• Cross-sectional data are data on one or more
variables collected at a single point in time
• e.g. A sample of bond credit ratings for UK banks
– Examples of Problems that Could be Tackled
Using a Cross-Sectional Regression
• The relationship between company size and the
return to investing in its shares
• The relationship between a country’s GDP level and
the probability that the government will default on
its sovereign debt.
8
Panel Data
• Panel Data has the dimensions of both time
series and cross-sections
• e.g. the daily prices of a number of blue chip stocks
over two years.
– It is common to denote each observation by the
letter t and the total number of observations by
T for time series data,
– and to to denote each observation by the letter i
and the total number of observations by N for
cross-sectional data.
9
Econometrics versus Financial
Econometrics
– Little difference between econometrics and financial
econometrics beyond emphasis
– Data samples
• Economics-based econometrics often suffers from paucity
of data
• Financial economics often suffers from infoglut and
signal to noise problems even in short data samples
– Time scales
• Economic data releases often regular calendar events
• Financial data are likely to be real-time or tick-by-tick
10
Economic Data versus Financial Data
• Financial data have some defining
characteristics that shape the econometric
approaches that can be applied
–
–
–
–
outliers
trends
mean-reversion
volatility clustering
11
Outliers
12
Trends
13
Mean-Reversion (with Outliers)
14
More Mean-Reversion
15
Volatility Clustering
16
Basic Data Analysis
• All pieces of empirical work should begin
with some basic data analysis
– Eyeball the data
– Summarize the properties of the data series
– Examine the relationship between data series
• Most powerful analytic tools are your eyes
and your common sense
– Computers still suffer from “Garbage in garbage out”
17
Basic Data Analysis
• Eyeballing the data helps establish presence of
– trends versus mean reversion
– volatility clusters
– key observations
• outliers
– data errors?
• turning points
• regime changes
18
Basic Data Analysis
• Summary statistics
– Average level of variable
• Mean, median, mode
– Variability around this central tendency
• Standard deviations, variances, maxima/minima
– Distribution of data
• Skewness, kurtosis
– Number of observations, number of missing
observations
19
Basic Data Analysis
• Since we are usually concerned with explaining
one variable using another
– “trading volume depends positively on volatility”
• relationships between variables are important
– cross-plots, multiple time-series plots
– correlations (covariances)
– multi-collinearity
20
Basic Data Manipulations
•
•
•
•
•
•
Taking natural logarithms
Calculating returns
Seasonally adjusting
De-meaning
De-trending
Lagging and leading
21
The basic story
• y is a function of x
• y depends on x
• y is determined by x
“the spot exchange rate depends on relative price
levels and interest rates…”
22
Terminology
• y is the
–
–
–
–
–
–
predictand
regressand
explained variable
dependent variable
endogenous variable
left hand side variable
x’s are the
predictors
regressors
explanatory variables
independent variables
exogenous variables
right hand side variables
23
Data
• Suppose we have n observations on y and x:
cross section
yi = α + β xi + ui
time series
y t = α + β x t + ut
i = 1, 2, …, n
t = 1, 2, …, n
24
Errors
• Where does the error come from?
– Randomness of (human) nature
• men and markets are not machines
– Omitted variables
• men and markets are more complex than the models
we use to describe them. Everything else is
captured by the error term
– Measurement error in y
• unlikely in financial applications
25
Objectives
• to get good point estimates of α and β given
the data
• to understand how confident we should be in
those estimates
• both will allow us to make statistical
inferences on the true form of the relationship
between y and x (“test the theory”)
26
Simple Regression: An Example
• We have the following data on the excess returns on a
fund manager’s portfolio (“fund XXX”) together with the
excess returns on a market index:
Year, t
1
2
3
4
5
Excess return
= rXXX,t – rft
17.8
39.0
12.8
24.2
17.2
Excess return on market index
= rmt - rft
13.7
23.2
6.9
16.8
12.3
• We want to find whether there is a relationship between x
and y given the data that we have. The first stage would be
to form a scatter plot of the two variables.
27
Graph (Scatter Diagram)
Excess return on fund XXX
45
40
35
30
25
20
15
10
5
0
0
5
10
15
20
25
Excess return on market portfolio
28
Finding the Line of Best Fit
• We can use the general equation for a straight line,
y = α + βx
to get the line that best “fits” the data.
• But this equation (y = α + βx) is completely
deterministic.
• Is this realistic? No. So what we do is to add a
random disturbance term, u into the equation.
yt =  + xt + ut
where t = 1, 2, 3, 4, 5
29
Determining the Regression
Coefficients
• So how do we determine what  and  are?
• Choose  and  so that the distances from the data
points to the fitted lines are minimised (so that the
line fits the data as closely as possible)
• The most common method used to fit a line to the
data is known as OLS (ordinary least squares).
30
Ordinary Least Squares
• What we actually do is
1. take each vertical distance between the data point
and the fitted line
2. square it and
3. minimize the total sum of the squares (hence least
squares).
31
40
y = - 1.7366 + 1.6417x
35
30
25
20
15
10
5
5
10
15
20
25
32
Algebra Alert!!!!!
•
•
•
•
Tightening up the notation, let
yt denote the actual data point t
ŷt denote the fitted value from the regression line
ût denote the residual, yt - ŷt
33
How OLS Works
5
• So min. uˆ  uˆ  uˆ  uˆ  uˆ , or minimise  uˆt2. This
t 1
is known as the residual sum of squares.
2
1
2
2
2
3
2
4
2
5
• But what was ût ? It was the difference between the
actual point and the line, yt - ŷt .
• So minimizing  uˆt2 is equivalent to minimizing   y
with respect to $ and $.
34
 yˆ t 
2
t
Coefficient Estimates
T
T
RSS    yt  yˆ t   
2
t 1
t 1

yt  ˆ  ˆxt

2
differenti ating RSS wrt ˆ and ˆ and setting equal to zero
gives OLS estimtes :
S xx    xt  x    xt2  Tx 2
2
S xy    xt  x  yt  y    xt yt  Tx y
S xy
ˆ

S xx
ˆ  y  ˆx
th e estimated value of 
th e estimated value of 
35
What do we Use
$
and
$
For?
• In the CAPM example used above, optimising would
lead to the estimates
• $ = -1.74 and
• $ = 1.64.
• We would write the fitted line as:
yˆ t  1.74  1.64 x t
36
What do we Use
$
and
$
For?
• If an analyst tells you that she expects the market
to yield a return 20% higher than the risk-free rate
next year, what would you expect the return on
fund XXX to be?
• Solution: We can say that the expected value of y
= “-1.74 + 1.64 * value of x”, so plug x = 20 into
the equation to get the expected value for y:
yˆ i  1.74  1.64 20  31.06
37
Is Using OLS a Good Idea?
• Yes, since given some assumptions (see
later) least squares is BLUE
– best, linear, unbiased estimator
• OLS is consistent
– as sample size increases, estimated coefficients
tend towards true values
• OLS is unbiased
– Even in small samples, estimated coefficients
are on average equal to true values
38
Is Using OLS a Good Idea? (cont.)
• OLS is efficient
– no other linear estimator has a smaller variance
around the estimated coefficient values
– some non-linear estimators may be more
efficient
39
Testing Hypotheses
• Once you have regression estimates
(assuming the regression is a “good” one)
you take the results to the theory:
“Theory says that the intercept should be zero”
“Theory says that the coefficient on prices should
be unity”
“Theory says that the coefficient on domestic
money should be unity and the coefficient on
foreign money should be minus unity”
40
Testing Hypotheses (cont.)
• Testing these statements is called hypothesis
testing
• This involves comparing the estimated
coefficients with what theory suggests
• In order to say whether the estimates are “too
far” from theory we need some measure of the
precision of the estimated coefficients
41
Standard Errors
• Based on a sample of data, you have
estimated the coefficients $ and $
• How much are these estimates likely to alter
if different samples are chosen?
• The usual measure of this degree of
uncertainty is the standard error of the
coefficient estimates
42
Standard Errors (cont.)
• Algebraically, given some crucial
assumptions, standard errors can be computed
as follows:
SE ˆ   s

SE ˆ  s
x
T  x   Tx 
2
t
2
t
2
1
 xt2  Tx 2
43
Error Variance
• σ2 is the variance of the error or disturbance
term, u
• this is unobservable
• we approximate it with the variance of the
residual terms, s2
 uˆ  uˆ    uˆ
s 
t
2
T 2
s
 uˆ
t
t
T 2
t
T 2
44
Standard Errors
• SE are smaller as
– T increases,
• more data makes precision of estimated coefficients
higher
– the variance of x increases,
• more dispersion of dependent variable about its mean,
makes estimated coefficients more precise
– s decreases
• better the fit of the regression (smaller residuals), the
more precise are estimates
45
Null and Alternative Hypotheses
• So now you have the coefficient estimates
and the associated standard errors.
• You now want to test the theory.
Five-Step Process:
Step 1: Draw up the null hypothesis (H0)
Step 2: Draw up the alternative hypothesis
(H1 or HA)
46
Null Hypothesis
• Usually, the null hypothesis is what theory
suggests:
e.g. testing the ability of fund mangers to
outperform the index
R jt  R ft   j   j Rmt  R ft   u jt




 

excess return of
fund j at time t
expected risk adjusted
return
• EMH suggests αj = 0,
• so, H0: αj = 0 (fund managers earn zero risk
47
adjusted excess returns)
Alternative Hypothesis
• The alternative is more tricky
• Usually the alternative is just that the null is
wrong:
– H1: α  0 (fund managers earn non-zero risk
adjusted excess returns; fund managers
underperform or out-perform)
• But sometimes is more specific
– H1: α < 0 (fund managers underperform)
48
Confidence Intervals
• Suppose our point estimate for α is 0.058
for fund XXXX and the associated standard
error is 0.025 based on 20 observations
• Has fund XXXX outperformed?
• Can we be confident that the true α is
different to zero?
Step 3: Choose your level of confidence
Step 4: Calculate confidence interval
49
Confidence Interval (cont.)
• Convention is to use 95% confidence levels
• Confidence interval is then
ˆ  tcritical  SEˆ ,ˆ  tcritical  SEˆ 
– tcritical is appropriate percentile (eg 97.5th) of
the t-distribution with T-2 degrees of freedom
• 97.5th percentile since two-sided test
• 2 degrees of freedom were lost in estimating 2
coefficients
50
Confidence Interval (cont.)
• We are now 95% confident that the true
value of alpha lies between
0.058 - 2.1009*0.025 = 0.0059 and
0.058 + 2.1009*0.025 = 0.1105
51
Making inferences
Step 5: Does the value under the null
hypothesis lie within this interval?
• No (null was that alpha = 0)
– So we can reject the null hypothesis that fund
XXXX earns a zero risk adjusted excess return
– and accept the alternative hypothesis
– we reject the restriction implied by theory
52
Making inferences (cont.)
• Suppose our standard error was 0.03
• The confidence interval would have been
-0.005 to 0.121
• The value under the null is within this range
– We cannot reject the null hypothesis that fund
XXX only earns a zero risk adjusted return
– NOTE we never accept the null - hypothesis
testing is based on the doctrine of falsification
53
Significance Tests
• Instead of calculating confidence intervals
we could calculate a significance test
Step 4: Calculate test statistic
ˆ   * 0.058  0.0


 2.32
SE ˆ 
0.025
α* is value under the null
Step 5: Compare test statistic to critical value,
tcritical
54
Significance Tests (cont.)
Step 6: Is test statistic in the non-rejection or
acceptance region?
95% acceptance region
2.5% rejection
region
-2.1009
2.5% rejection
region
+2.1009
55
One-Sided tests (cont.)
• Suppose the alternative hypothesis was
– H1: α < 0
• We then perform one-sided tests
– Confidence interval is  , ˆ  tcritical  SEˆ 
– Significance test statistic is compared to tcritical
– tcritical is based on 95th percentile (not 97.5th)
56
Type I and Type II Errors
• Where did 95% level of confidence come
from?
– Convention
• What does it mean?
– We are going to reject the null when it is
actually true 5% of the time
– This is a Type I error
57
Type I and Type II Errors (cont.)
– Type II error is when we fail to reject the null
when it was actually false
– To reduce Type I errors, we could use 99%
confidence level
• this would widen our CI, raise the critical value
• making it less likely to reject the null by mistake
• but also making it less likely we correctly reject the
null
• so raises Type II errors
58
Which is Worse?
• This depends on the circumstances
– In Texas, the null hypothesis is that you are
innocent and if the null is rejected you are
killed
• Type I errors are very important (to the accused)
– But if tests are “weak” there is low power to
discriminate and econometrics cannot inform
theory
59
Statistical Significance I
Intercept
X Variable 1
Coefficients Standard Error
t Stat
2.02593034
0.127404709 15.90153
0.48826704
0.044344693 11.01072
P-value
Lower 95% Upper 95%
4.835E-12 1.7582628 2.29359791
1.99E-09 0.3951022 0.58143186
• Can we say with any degree of certainty that the
true coefficient is statistically different from zero?
• t-statistic and P-value
– t-stat is the coefficient estimate/its standard error
– rule-of-thumb is that |t-stat|>2 means we can be 95%
confident that the true coefficient is not equal to zero
– P-value gives probability that true coefficient is zero
given the estimated coefficient and its standard error
60
Statistical Significance II
Intercept
X Variable 1
Coefficients Standard Error
t Stat
P-value
Lower 95% Upper 95%
2.02593034
0.127404709 15.90153 4.8355E-12 1.7582628 2.29359791
0.48826704
0.044344693 11.01072
1.99E-09 0.3951022 0.58143186
• Can we say with any degree of certainty that the
true coefficient is statistically different from zero?
• Confidence intervals
– Give range within which we can be 95% confident that
the true coefficient lies
• actual coefficients are 2.0 and 0.5
61
Statistical Inference II (cont.)
• t-test coeff = 0.5 is (0.488 - 0.5)/0.044 = -0.26
• t-test coeff = 0.6 is (0.488 - 0.6)/0.044 = -2.52
• critical value of t-test (17 d.f.) is 2.11
– cannot reject null that true coefficient is 0.5
– can reject null that true coefficient is 0.6 in favour of
alternative that true coefficient is different to 0.6
• with 95% confidence
• or at the 5% level of significance
62
Economic Significance
Intercept
X Variable 1
Coefficients Standard Error
t Stat
P-value
Lower 95% Upper 95%
2.02593034
0.127404709 15.90153 4.8355E-12 1.7582628 2.29359791
0.48826704
0.044344693 11.01072
1.99E-09 0.3951022 0.58143186
• As x increases by 1 unit, y on average increases by
0.488 units
• The econometrician has to decide whether this is
economically important
– Depends on magnitudes of x and y and the variability of
x and y
– Very important in finance to check economic
importance of results
63
Some Real Data
•
•
•
•
•
•
•
annually from 1800+ to 2001
spot cable [dollars per pound] (spot)
consumer price indices (ukp, usp)
long-term interest rates (uki, usi)
stock indices (ukeq, useq)
natural log of each series (l...)
log differences of each series (d…)
64
Excel Regression Output
Regression Statistics
Multiple R
0.245486731
R Square
0.060263735
Adjusted R Square
0.038906093
Standard Error
0.092479061
Observations
181
Intercept
dukp
dusp
duki
dusi
Coefficients
Standard Error
t Stat
P-value
Lower 95% Upper 95%
-0.00790941
0.007344336 -1.07694 0.2829809 -0.02240372 0.00658489
-0.28954448
0.11451444 -2.52845 0.01233588 -0.51554277 -0.06354619
0.393294362
0.140052233 2.808198 0.00554476 0.116896338 0.66969239
-0.06224627
0.083163402 -0.74848 0.45516881 -0.22637218 0.10187964
-0.02335464
0.069677157 -0.33518 0.73788573 -0.16086497 0.11415569
65
Assumptions of OLS
• OLS is BLUE if the error term, u, has:
– zero mean: E(ui) = 0 all i
– common variance: var(ui)=σ2 all i
– independence: ui and uj are independent
(uncorrelated) for all i j
– normal: ui are normally distributed for all i
66
Problems with OLS
•
•
•
•
•
What the problem means
How to detect it
What it does to our estimates and inference
How to correct for it
Key Problems: Multi-Collinearity, NonNormality, Heteroskedasticity, Serial
Correlation.
67
Multi-collinearity
• What it means
– Regressors are highly intercorrelated
• How to detect it
– If economics of model is good, high R-squared, lots of
individually insignificant (t-stats) but jointly significant
(F-tests) regressors. Also, via high VIF values > 10.
• What it does
– Inference is hard because std errors are blown up
• How to correct for it
– Get more data; sequentially drop regressors.
68
Non-Normality
• What it means
– Over and above assumptions about mean and variance
of regression errors, OLS also assumes they are
normally distributed
• How to detect non-normality
– If normal, skew = 0, kurtosis = 3
– Jarque-Bera test
• J-B = n[S2/6 + (K-3)2/24]
• distributed χ2(2) so CV is approx 6.0
– J-B>6.0 => non-normality
69
Non-Normality (cont.)
• What it does (loosely speaking)
• skewness means coefficient estimates are biased.
• excess kurtosis means standard errors are understated.
• How to correct for it
– skewness can be reduced by transforming the data
• take natural logs
• look at outliers
– kurtosis can be accounted for by adjusting the degrees
of freedom used in standard tests of coefficient on x
• use k(T-2) d.f. instead of (T-2)
• 1/k = 1+[(Ku - 3)(Kx - 3)]/2T
70
Heteroskedasticity
• What it means
– OLS assumes common variance or homoscedasticity
• var(ui) = σ2 for all i
– Heteroscedasticity is when the variance varies
• often variance gets larger for larger values of x
71
Detecting Heteroskedasticity
– Plot residuals as time series or against x
– White test
• Regress squared residuals on x’s, x2’s and cross products.
– Reset test
• Regress residuals on fitted y2, y3, etc. Significance indicates
heteroscedasticity
– Goldfeld-Quandt test
• Split sample into large x’s and small x’s, fit separate
regressions and test for equality of error variances
– Breusch-Pagan test
• σ2 = a + bx + cz … so test b = c = 0
72
White Test
White Heteroskedasticity Test:
F-statistic
Obs*R-squared
1.497731
7.427565
Probability
Probability
Dependent Variable: RESID^2
Variable
Coefficient
Std. Error
t-Statistic
Prob.
0.002497
0.034449
0.282897
0.614692
0.045411
0.398780
2.511094
-0.031327
-0.692862
0.054979
0.758711
1.740713
0.0129
0.9750
0.4893
0.9562
0.4490
0.0835
C
DUKP
DUKP^2
DUKP*DUSP
DUSP
DUSP^2
R-squared
Log likelihood
Durbin-Watson stat
0.006270
-0.001079
-0.196008
0.033795
0.034454
0.694162
0.041036
397.3308
1.774378
0.192863
0.190734
Mean dependent var
F-statistic
Prob(F-statistic)
0.008361
1.497731
0.192863
73
Implications of Heteroskedasticity
• OLS coefficient estimates are unbiased
• OLS is inefficient
– has higher variance than it should
• OLS estimated standard errors are biased
– if σ2 is positively correlated with x2 (usual case) then
estimated standard errors are too small
– so inference is wrong
• we become too confident in our estimated coefficients
74
Correcting for Heteroskedasticity
• If we know the nature of the heteroskedasticity it
is best to take this into account in the estimation
– use weighted least squares
– “deflate” the variable by the appropriate measure of
“size”
• Usually, we don’t know the functional form
– so correct the standard errors so that inference is valid
– White standard errors alter the OLS std errors and
asymptotically give reasonable inference properties
75
White Standard Errors
Dependent Variable: DSPOT
Method: Least Squares
Date: 07/17/03 Time: 21:41
Sample(adjusted): 1821 2001
Included observations: 181 after adjusting endpoints
White Heteroskedasticity-Consistent Standard Errors & Covariance
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
DUKP
DUSP
-0.007288
-0.310380
0.379539
0.006605
0.102816
0.197376
-1.103411
-3.018788
1.922921
0.2713
0.0029
0.0561
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.055141
0.044525
0.092208
1.513423
176.1352
2.115206
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
-0.006391
0.094332
-1.913097
-1.860083
5.193965
0.006422
76
Serial Correlation
• OLS assumes no serial correlation
– ui and uj are independent for all i  j
• In cross-section analysis, residuals are likely to be
correlated across individuals
– e.g. common shocks
• In time series analysis, today’s error is likely to be
related to (correlated with) yesterday’s residual
– autocorrelation or serial correlation
– maybe due to autocorrelation in omitted variables
77
Detecting Serial Correlation
• Durbin-Watson test statistic, d
– assumes errors ut and ut-1 have (positive) correlation p
– tests for significance of p on basis of correlation
between residuals u^t and u^t-1
– only valid in large samples,
– only tests first order correlation
– only valid if there are no lagged dependent variables
(yt-i) in regression
78
Detecting Serial Correlation (cont.)
• d lies between 0 and 4
– d = 2 implies residuals uncorrelated.
• D-W provide upper and lower bounds for d
– if d < dL then reject null of no serial correlation
– if d > dU then reject null hypothesis of no serial
correlation
– if dL< d < dU then test is inconclusive.
79
Durbin-Watson
Dependent Variable: DSPOT
Method: Least Squares
Date: 07/17/03 Time: 21:41
Sample(adjusted): 1821 2001
Included observations: 181 after adjusting endpoints
White Heteroskedasticity-Consistent Standard Errors & Covariance
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
DUKP
DUSP
-0.007288
-0.310380
0.379539
0.006605
0.102816
0.197376
-1.103411
-3.018788
1.922921
0.2713
0.0029
0.0561
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.055141
0.044525
0.092208
1.513423
176.1352
2.115206
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
-0.006391
0.094332
-1.913097
-1.860083
5.193965
0.006422
80
Implications of Serial Correlation
• With no lagged dependent variable (so d is a valid
test)
– OLS coefficient estimates are unbiased
– but inefficient
– estimated standard errors are biased
• so inference is again wrong
81
Correcting for 1st Order Serial
Correlation
• Rule of thumb:
– if d < R2 then estimate model in first difference form
yt = α + β xt + ut
yt-1 = α + β xt-1 + ut-1
yt - yt-1 = β( xt - xt-1) + (ui - ut-1)
– so we can recover the regression coefficients
(but not the intercept).
82
Implications of Serial Correlation
• With lagged dependent variables in regression
(when DW test is invalid).
• OLS coefficient estimates are inconsistent
– even as sample size increases, estimated coefficient
does not converge on the true coefficient (i.e. it is
biased)
• So inference is wrong.
83
Using a Dummy Variable to Test
Changes in Slope & Intercept
• What if  and  change over time? So, what we
do is add two additional RHS variables:
yt =  + xt + Dt + (Dt  xt) + ut
where, t = 1, 2, 3, … T.
Dt = 1 for 2003 - 2004 period, 0 otherwise.
 = measures change in intercept during 2003 2004.
 = measures change in slope during 2003 - 2004.
84
Download