Ch.3 The Simple Linear Regression Model

advertisement
政治大學中山所共同選修
課程名稱:
社會科學計量方法與統計-計量方法
Methodology of Social Sciences
授課內容: The Simple Linear Regression Model
日期:2003年10月9日
政治大學 中山所共同選修 黃智聰
Ch.3 The Simple Linear Regression Model
Question:
If one variable change in a certain way, by
how much will another variable change?
If we know the value of one variable, can we
forecast or predict the corresponding value of
another?
The regression model is based on assumptions.
1.An Model (Economic, political,
social…)
Randomly selecting sample form a
particular population.
Random variable y, pdf f(y)
If we know E y x  480   y x and  2 then
we can calculate the probability
Mou interesting the is to know the
relationship b/w, y and x, direction, size
and prediction!!!
政治大學 中山所共同選修 黃智聰
Ex:
Build an economic model
econometric
model
Linear function (simple regression model)
Ey x   y x  1   2 x
β1 =Intercept
β2=slope
b/c only one variable
Intuition explanation
Marginal effect of x on y
政治大學 中山所共同選修 黃智聰
The condition mean E y x    y x  1   2 x is
called a simple regression function become
there is only as variable on the right-hard side
of the equation.
2 
E ( y x)
x

dE ( y x)
x
Ex: if E(y x) is the average household
expenditure on food x is the household
explanation
Ex: E(y x) is the average of # of voters.
X is the electoral spending.
β1 and β2 help us to characterize behavior.
政治大學 中山所共同選修 黃智聰
2.An Econometric Model
Notice :1.random sample
2.normal distribution
Assumptions:
1.The dispersion of values y about their mean
is the same for all level of x.
ie:v(y x)=for all value of x.
2.The values of y are all uncorrelated, and
have zero covariance, implying that there is
no linear association among them.
Cov (yi , yj)=0
homoslcedastic
2
政治大學 中山所共同選修 黃智聰
If V(y x) ≠ for all values x data are said to
be hetroskedastic .
3.or stronger assumption the values of y are
all statistically independent b/c random
selection.
4.The variable x is not random, and must
take at least two different values, otherwise
the regression analysis fails.
However, variable x could be random.
5.(Optional) The values of y are normally
distributed about their mean for each value of
x.
Y~N[(β1+β2 ) ,  ]2
政治大學 中山所共同選修 黃智聰
2.1 Introducing the error Term
Y:dependent variable in the regression
model
Y a systematic component :y’s mean
E(y) not random
A random component: random erroe
term e
e=y-E(y x)=y- β1 -β2x
Y=β1 +β2x +e
X:independent or explanatory variable
政治大學 中山所共同選修 黃智聰
The assumptions of the regression model in
terms of the random error e.
1.The value of y, for each value of x, is
Y= β1 +β2x +e
2.The average value of the random error e is
E(e)=0 we assume that
E(y) = β1 +β2 x
3.The variance of the random error e is
Var(e)=  2 =Var(y) y and e differ only by
a constant , when does not change the
variance
政治大學 中山所共同選修 黃智聰
4.b/w any pair of random errors, ei and ej
is
Cov (ei , ej ) =Cov (yi , yj)=0
5.The variable x is not random and
must take at least two different values
6.(optional) The values of are normally
2

distributed about their mean e~N(0, )
Y is observable, e is unobservable .
For any value of y, we can calculate e=y
- β1 -β2 x
政治大學 中山所共同選修 黃智聰
Another interpretation of the error term:
E represents all factor affecting y other than x
In the food expenditure example, what
factors can result in a difference b/w
expenditure y and its mean E(y x)?
Other example about election!!
1.In the model, we want to include all the
important and relevant explanatory variable
in the model.
2.The error term e capture any approximation
error that arises, because the linear functional
form we have assumed may be only an
approximation to reality.
政治大學 中山所共同選修 黃智聰
3.The error term capture any element
of random behavior that might be
present in each individual .
Knowledge of all variables that
influence an individual’s food
expenditure might not be sufficient to
perfectly predict expenditure. An
unpredictable random behavioral
component might also be contained in e.
政治大學 中山所共同選修 黃智聰
3.3Estimating the parameters for the
Expenditure.
Relationship.
1.Assume that the data in Table 3.1 are
observed values of the random variable
yt , t=1,…,40, that satisfy the
assumptions SR1-SR5.
2.We represent the 40 data points as
(yt ,, xt), t=a,…40, and plot them, we
obtain scatter diagram.
政治大學 中山所共同選修 黃智聰
3.Estimate the location of mean expenditure
line E(y)=β1 +β2 x .
We would expect this line to be somewhere in
the middle of all the data point, since it
represent average.
To estimate β1 andβ2 , we could simply draw
a freehand line through the middle of the
data and then measure the slope and
intercept with a ruler.
Problem: different people would draw
different lines.
Way2:draw a line from the smallest income
point to the largest income point.
Problem: it ignores information on the exact
position of the
remaining
38
observations.
政治大學
中山所共同選修
黃智聰
3.3.1The Least Squares Principle
This principle asserts that to fit a line to
the data values we should fit the line so
that the sum of the squares of the
vertical distances form each point to the
line is as small as possible.
This distances are squared to prevent
large positive distances from being
canceled by large negative.
政治大學 中山所共同選修 黃智聰
3.3.2Estimates for the Food Expenditure
Function
3.3.3Interpreting The Estimates
b2=0.1283
if x goes up by 100,y
will increase by approximately 12.83
b1 =40.7676
amount of Y for a
family with zero X.
政治大學 中山所共同選修 黃智聰
3.3.3a elasticity
Y=b2Xb2 log=logb+b2logX
logy=b1+b2logX
Elasticity:
% change in y
% change in X =
y x
x
  b2   elasticity
x y
y
3.3.3b Prediction
The prediction is carried out by substituting x into
our estimated equation to obtain.
政治大學 中山所共同選修 黃智聰
Download