政治大學中山所共同選修 課程名稱: 社會科學計量方法與統計-計量方法 Methodology of Social Sciences 授課內容: The Simple Linear Regression Model 日期:2003年10月9日 政治大學 中山所共同選修 黃智聰 Ch.3 The Simple Linear Regression Model Question: If one variable change in a certain way, by how much will another variable change? If we know the value of one variable, can we forecast or predict the corresponding value of another? The regression model is based on assumptions. 1.An Model (Economic, political, social…) Randomly selecting sample form a particular population. Random variable y, pdf f(y) If we know E y x 480 y x and 2 then we can calculate the probability Mou interesting the is to know the relationship b/w, y and x, direction, size and prediction!!! 政治大學 中山所共同選修 黃智聰 Ex: Build an economic model econometric model Linear function (simple regression model) Ey x y x 1 2 x β1 =Intercept β2=slope b/c only one variable Intuition explanation Marginal effect of x on y 政治大學 中山所共同選修 黃智聰 The condition mean E y x y x 1 2 x is called a simple regression function become there is only as variable on the right-hard side of the equation. 2 E ( y x) x dE ( y x) x Ex: if E(y x) is the average household expenditure on food x is the household explanation Ex: E(y x) is the average of # of voters. X is the electoral spending. β1 and β2 help us to characterize behavior. 政治大學 中山所共同選修 黃智聰 2.An Econometric Model Notice :1.random sample 2.normal distribution Assumptions: 1.The dispersion of values y about their mean is the same for all level of x. ie:v(y x)=for all value of x. 2.The values of y are all uncorrelated, and have zero covariance, implying that there is no linear association among them. Cov (yi , yj)=0 homoslcedastic 2 政治大學 中山所共同選修 黃智聰 If V(y x) ≠ for all values x data are said to be hetroskedastic . 3.or stronger assumption the values of y are all statistically independent b/c random selection. 4.The variable x is not random, and must take at least two different values, otherwise the regression analysis fails. However, variable x could be random. 5.(Optional) The values of y are normally distributed about their mean for each value of x. Y~N[(β1+β2 ) , ]2 政治大學 中山所共同選修 黃智聰 2.1 Introducing the error Term Y:dependent variable in the regression model Y a systematic component :y’s mean E(y) not random A random component: random erroe term e e=y-E(y x)=y- β1 -β2x Y=β1 +β2x +e X:independent or explanatory variable 政治大學 中山所共同選修 黃智聰 The assumptions of the regression model in terms of the random error e. 1.The value of y, for each value of x, is Y= β1 +β2x +e 2.The average value of the random error e is E(e)=0 we assume that E(y) = β1 +β2 x 3.The variance of the random error e is Var(e)= 2 =Var(y) y and e differ only by a constant , when does not change the variance 政治大學 中山所共同選修 黃智聰 4.b/w any pair of random errors, ei and ej is Cov (ei , ej ) =Cov (yi , yj)=0 5.The variable x is not random and must take at least two different values 6.(optional) The values of are normally 2 distributed about their mean e~N(0, ) Y is observable, e is unobservable . For any value of y, we can calculate e=y - β1 -β2 x 政治大學 中山所共同選修 黃智聰 Another interpretation of the error term: E represents all factor affecting y other than x In the food expenditure example, what factors can result in a difference b/w expenditure y and its mean E(y x)? Other example about election!! 1.In the model, we want to include all the important and relevant explanatory variable in the model. 2.The error term e capture any approximation error that arises, because the linear functional form we have assumed may be only an approximation to reality. 政治大學 中山所共同選修 黃智聰 3.The error term capture any element of random behavior that might be present in each individual . Knowledge of all variables that influence an individual’s food expenditure might not be sufficient to perfectly predict expenditure. An unpredictable random behavioral component might also be contained in e. 政治大學 中山所共同選修 黃智聰 3.3Estimating the parameters for the Expenditure. Relationship. 1.Assume that the data in Table 3.1 are observed values of the random variable yt , t=1,…,40, that satisfy the assumptions SR1-SR5. 2.We represent the 40 data points as (yt ,, xt), t=a,…40, and plot them, we obtain scatter diagram. 政治大學 中山所共同選修 黃智聰 3.Estimate the location of mean expenditure line E(y)=β1 +β2 x . We would expect this line to be somewhere in the middle of all the data point, since it represent average. To estimate β1 andβ2 , we could simply draw a freehand line through the middle of the data and then measure the slope and intercept with a ruler. Problem: different people would draw different lines. Way2:draw a line from the smallest income point to the largest income point. Problem: it ignores information on the exact position of the remaining 38 observations. 政治大學 中山所共同選修 黃智聰 3.3.1The Least Squares Principle This principle asserts that to fit a line to the data values we should fit the line so that the sum of the squares of the vertical distances form each point to the line is as small as possible. This distances are squared to prevent large positive distances from being canceled by large negative. 政治大學 中山所共同選修 黃智聰 3.3.2Estimates for the Food Expenditure Function 3.3.3Interpreting The Estimates b2=0.1283 if x goes up by 100,y will increase by approximately 12.83 b1 =40.7676 amount of Y for a family with zero X. 政治大學 中山所共同選修 黃智聰 3.3.3a elasticity Y=b2Xb2 log=logb+b2logX logy=b1+b2logX Elasticity: % change in y % change in X = y x x b2 elasticity x y y 3.3.3b Prediction The prediction is carried out by substituting x into our estimated equation to obtain. 政治大學 中山所共同選修 黃智聰