Classical Linear Regression Analysis

advertisement
Econometrics 1
Lecture 1
Classical Linear Regression Analysis
1
What is Econometrics ?
Application of mathematical statistics to economic data
to send empirical support:
a. Economic theory postulates a qualitative relation
b. Mathematical economics turns economic theory
in equations
c. Economic statistics concerns with collecting,
processing and presenting economic data
d. Econometricians estimate precise numerical
estimates of these relations
2
Branches of Econometrics
Econometrics
Theoretical
Classical
Bayesian
Applied
Classical
Bayesian
3
Econometric Methodology
Traditional or Classical
Methodology of
Econometrics
Mehtodology of Bayesian
Econometrics
 Bayesian prior
 Statement of hypothesis
 Specification of the
 Sample information
mathematical model
 Specification of
 Posterior information
econometric model
 Data collection
 Estimation of
parameters of the
econometric model
 Hypothesis testing
 Forecasting or
prediction
 Using model for control
or policy analysis
4
Assume a Simple Linear regression model:
Yi     xi  ei
1 2
Main assumptions about the error termei are following:
 Mean of ei is zero for every observations ofxi , E ei   0
 variance of ei is constant var ei    2 for every ith observation
 cov(eie j )  0 for all i  j ; this also means there is no autocorrelation or
heteroscedasticity; errors are homoscedatic and independent of each other
x ; E ei xi   0
 there is no correlation betweenei and the explanatory variable
i
 explanatory variable,xi , is exogenous, not random
 variance of the dependent variable is equal to the variance of the error
term
var yi   var ei    2




5
Graphical Illustration of a Simple Linear Regression Model
ˆy  ˆ  ˆ x
i 1 2 i


 .

Y



X represents an observation. Some
Each dot in the above graph
observations lie above the least square Yˆi line and other observations lie
below it.
These errors represent all sorts elements missing from this relationship.
Some of them might be due to the missing variables, others might be due
to measurement errors, still other may be from the mis-specification of
the relationship.
The least square line is the line best fits the data set. Differences between
ˆ
i
each observation and the Y line is represented by error termse
i
. As
some of them are above the line and others below the line, positive errors
cancel out with the negative errors. Note that the least square line passes
through the average values of variables X and Y;X Y,
.
6
Minimisation of Error Sum Square
Errors are given by ei  Yi  1   2 xi . Some of them
are positive and some others are negative. Since
mean of these errors is zero, E ei   0 , it is customary
to take sum squared errors and estimate the unknown
parameters 1 and  2 that minimise the sum squared
errors.
2
S   ei2   Yi     xi 
1 2 
i
i
(1)
Sign of each and every squared error would be
positive






2
 e   0 when ei ~ N  0, 2  .
i
S   e2 ~  n2 distribution where subscript n stands for
i i
degrees of freedom which equals the number of terms
in S
Normal equations of the least square estimator are
obtained by minimising S function (1) with respect to


1and 2 .
7
Derivation of Normal Equations
S  2 Y     x 1  0 and
 i 1 2 i 

1
S  2 Y     x   x   0
  i 1 2 i  i 

2
Thus normal equations are
 yi  N1   2  xi
i
i
 xi yi  1 xi   2  xi2
i
i
i
(2)
(3)
This is a system of two equations, (2) and (3) , and
two unknowns 1 and  2 . All other values such as
 xt ,  xt yt , x 2 , y t and N are known from the
t
sample information on X and Y. In order get value of
 eliminate  by multiplying the (2) by  xt and
2
1
(3) by N and take a difference of the resulting two
equations.
8
OLS Estimators
 xi  yi
i i



 N  xi     xi 
1i
2 i 
2
(4)
N  x y   N  x   N  x2
(5)
i
i
i
i
1 i
2 i
i
Now subtracting (5) from (4) we get the estimator for
 .
2
N x y  x  y
i i i ii i
ˆ
i
 
(6)
2
2


N  x2    x 
i i  i i 
Estimator for 1 can be found by dividing both sides
x
y .
of (2) by N and using the average values and
9
ˆ
ˆ
  y x
(7)
An Example of OLS Estimation
Y
4
6
7
8
11
15
18
22
Sumy
X
5
8
10
12
14
17
20
25
Sumx
91
111
Food expenditure and income: data and prediction
Xy
xsquare
ysquare
Ypred
Sqpredy
20
25
16
2.866285 8.21559
48
64
36
5.742472 32.97598
70
100
49
7.65993 58.67453
96
144
64
9.577388 91.72636
154
196
121
11.49485 132.1315
255
289
225
14.37103 206.5266
360
400
324
17.24722 297.4666
550
625
484
22.04087 485.7997
Sumxy
sumxsq
sumysq
36.4218 Smsqpred
y
1553
1843
1319
127.4218 1313.517
prede
1.133715
0.257528
-0.65993
-1.57739
-0.49485
0.628967
0.75278
-0.04087
sqprede
1.28531
0.066321
0.435508
2.488153
0.244873
0.395599
0.566678
0.00167
smsqpred
e
-3.9E-05 5.484111
10
Estimates
ˆ 2 
N  xi y i   xi  y i
i
i
i


N  x    xi 
i
 i

2
2
i
Or using the values from the above table.
ˆ 2 
8(1553)  111(91) 12424  10101 2323


 0.95873
8(1843)  (111) 2 14744  12321 2423
ˆ1  y  ˆ 2 x 
(8)
91
111
 0.95872
 11.375  0.95872(13.875)  11.375  13.30224  1.92724
8
8
(9)
The fitted regression line is
yˆ i  ˆ1  ˆ 2 xi  1.92724  0.95873 xi
(10)
11
Interpretation and Prediction
Both slops and intercepts make economic sense. In this sample expenditure
on foods is determined by weekly income of an individual, people spend
95.6% percent of their weekly income in food expenditure. People who do
not have any income receive a income subsidy of 1.93 pence per week.
 Mean prediction
We can use equation (10) to find the predicted values Yˆi for each
observation on xi . These are reported as YPRED in the above table. If the
weekly income is 40 predicted food expenditure will be 36.422. Error terms
are also estimated using the fact that
eˆi  yi  yˆi  yi  ˆ  ˆ xi  yi 1.92724  0.95873xi
1 2
These predicted errors are reported as prede in the above table. Note that as
expected some of the errors are negative and some other are positive.
12
Prediction of Food Expenditure
Prediction of food expenditure
25.00
Predicted food expenditure
20.00
15.00
10.00
5.00
0.00
0
1
2
3
4
5
6
7
8
Income
13
Use of regression estimates to calculate the
elasticities
The definition of elasticity of food expenditure on income is given by

Y Y
Y X
X
13.875

 0.95783  0.95783
 1.1683
X X X Y
Y
11.375
This suggests that the expenditure on food is elastic around the mean.
There will be 17 pence more expenditure to every £1 rise in weekly
income.
14
Hints to get into the Shazam program in the
Network
0. Create a Metrics directory in G: drive.
1. Login to the network
2. At start choose Applications\Economics\Professional Shazam
Limdep also is available there if you are familiar with it.
3. You have an editor in the Shazam program to write your program. An
econometric program involves following four steps while compiling and
computing the model.
1. declaring the sample size
2. reading the data for each variables declared
3. calculations (checking the discriptive statistics if the mean and
variance, correlation ; make sure that there are no missing
observations0
4. Using the standard Shazam routines for estimation,
such as OLS x Y; Arima x
5. Interpreting the results whether they make sense according to the
economic theory.
4. Click on Shazam. Now you should be in the Shazam program.
Click on File/New, it will bring you to the Shazam editor.
15
6. Write a Shazam program similar to the one as given in the example
below. Save this Shazam file in your own directory in G:\metrics directory
How to Read Data in Shazam?
For small data files cut your data and past
in the Shazam editor.
For large data file you can read the data
directly from the file. The data file should
be in the text format. If your data is in
Excell save your data in the text format
using “save as” option or make a number
small files of data and combine those data
using Shazam program.
There are more examples in File/Open/Intro
option in the menu. There is also a demo
which can bring you various features of
Shazam. It is worth trying if this is your 16
first encounter with Shazam.
Getting Around with Shazam
7.
When you have your program written,
then click on “Run”, and “Run Batch” to
execute your program.
8. If everything is alright you will get
Shazam working behind the screen and
displaying output in the screen. You may
save your result file in your directory if
you wish by using “copy” and “past” in the
“edit” menu.
9. Get more practice on several aspects of
the programming with Shazam such as reading
a data file, transforming variables by
taking log or lag or square, plotting one
variable against another one using “plot x
y /gnu line only”, saving a variable to use
later on. Also write a couple of lines to
read regression estimates and diagnostics.
10. Consult the Short Loan Section in the
Library to borrow a hard copy of the Shazam
manual. You can always use the online
manual inside the Shazam program which you
can get by clicking at Help and Visit
Shazam Online option while you are in the
Shazam programme.
17
A Simple Example of Shazam
sample 1-10
read y x1 x2
3.5 15 16
4.5 20 13
5 30 10
6 42 7
7 50 7
9 54 5
8 65 4
10 72 3
12 85 3.5
14 90 2
ols y x1 x2 /cov=b anova
CONFID X1 X2
confid x1 x2 / TCRIT=3.499
gen1 srb =sqrt(b:1)
print b srb
ols y x1 x2 /predict=py
diagnos / het
diagnos / acf
dim p 10 2
gls y x1 x2 /omega=b
print py
stop
18
Simple Regression in matrix notation
1 1
X'X  
5 8
1 1
X 'Y  
5 8
1 5 
1 8 


1 10 


111 
 8
1 1 1 1 1 1  1 12 
X
'
X

=>


10 12 14 17 20 25 1 14 
111 1843


1 17 
1 20


1 25
4
6
 
7
 
 91 
1 1 1 1 1 1  8 

=> X 'Y  

1553
10 12 14 17 20 25 11 
 
15 
18 
 
22
19
The estimators in terms of matrix notation
The estimators in terms of matrix notation:
 N
 ˆ1  
  

ˆ
  2   xi
 i
X’X
x
x
i
i


2

i

i
1
1
  yi 
111   91 
 i
 = X ' X 1 X ' Y =  8
111 1843 1553
 x i y i 

 

 i

X’Y
The desired inverse matrix is  X ' X  
1
X’X
X’Y
1
1 1843  111
Adj  X ' X  
8 
X'X
2423  111
1
 ˆ1   8
111   91 
1 1843  111  91 
1 1843(91)  111(1553) 


ˆ 







8  1553 2423   111(91)  (8)1553 
  2  111 1843 1553 2423  111
 ˆ1 
1 167723  172383  1.92736
ˆ 
  10101  12424    0.95873 
2423


 

 2
20
Download