Prediction, Goodness-of-Fit, and Modeling Issues Prepared by Vera Tabakova, East Carolina University 4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit 4.3 Modeling Issues 4.4 Log-Linear Models Principles of Econometrics, 3rd Edition Slide 4-2 y0 β1 β 2 x0 e0 (4.1) where e0 is a random error. We assume that E y0 1 2 x0 and E e0 0 . We also assume that var e0 2 and cov e0 , ei 0 i 1,2, ,N ŷ0 b1 b2 x0 Principles of Econometrics, 3rd Edition (4.2) Slide 4-3 Figure 4.1 A point prediction Principles of Econometrics, 3rd Edition Slide 4-4 f y0 yˆ0 1 2 x0 e0 b1 b2 x0 (4.3) E f 1 2 x0 E e0 E b1 E b2 x0 1 2 x0 0 1 2 x0 0 2 ( x x ) 1 2 0 var( f ) 1 2 N ( x x ) i Principles of Econometrics, 3rd Edition (4.4) Slide 4-5 The variance of the forecast error is smaller when i. the overall uncertainty in the model is smaller, as measured by the variance of the random errors ; ii. the sample size N is larger; iii. the variation in the explanatory variable is larger; and iv. the value of (xo – x)2 is small. Principles of Econometrics, 3rd Edition Slide 4-6 2 ( x x ) 1 2 0 var( f ) ˆ 1 2 N ( x x ) i ^ ^ ^ se f var f yˆ0 tc se f Principles of Econometrics, 3rd Edition (4.5) (4.6) Slide 4-7 y Figure 4.2 Point and interval prediction Principles of Econometrics, 3rd Edition Slide 4-8 yˆ 0 b1 b2 x0 83.4160 10.2096(20) 287.6089 2 ( x x ) 1 2 0 var( f ) ˆ 1 2 N ( x x ) i ˆ 2 ˆ ( x0 x ) 2 N 2 ˆ 2 2 ( x x ) i 2 ˆ ˆ 2 ( x0 x ) 2 var b2 N yˆ0 tc se(f ) 287.6069 2.0244(90.6328) 104.1323,471.0854 Principles of Econometrics, 3rd Edition Slide 4-9 yi 1 2 xi ei (4.7) yi E ( yi ) ei (4.8) yi yˆ i eˆi yi y ( yˆi y ) eˆi Principles of Econometrics, 3rd Edition (4.9) (4.10) Slide 4-10 Figure 4.3 Explained and unexplained components of yi Principles of Econometrics, 3rd Edition Slide 4-11 yi y 2 ˆ y 2 N 1 2 2 2 ˆ ˆ ( y y ) ( y y ) e i i i Principles of Econometrics, 3rd Edition (4.11) Slide 4-12 ( yi y )2 = total sum of squares = SST: a measure of total variation in y about the sample mean. ( yˆi y )2 = sum of squares due to the regression = SSR: that part of total variation in y, about the sample mean, that is explained by, or due to, the regression. Also known as the “explained sum of squares.” eˆi2 = sum of squares due to error = SSE: that part of total variation in y about its mean that is not explained by the regression. Also known as the unexplained sum of squares, the residual sum of squares, or the sum of squared errors. SST = SSR + SSE Principles of Econometrics, 3rd Edition Slide 4-13 SSR SSE R 1 SST SST 2 (4.12) The closer R2 is to one, the closer the sample values yi are to the fitted regression equation yˆi b1 b2 xi . If R2= 1, then all the sample data fall exactly on the fitted least squares line, so SSE = 0, and the model fits the data “perfectly.” If the sample data for y and x are uncorrelated and show no linear association, then the least squares fitted line is “horizontal,” so that SSR = 0 and R2 = 0. Principles of Econometrics, 3rd Edition Slide 4-14 xy xy cov( x, y ) var( x) var( y ) x y rxy (4.13) ˆ xy ˆ x ˆ y (4.14) ˆ xy ( xi x )( yi y ) N 1 ˆ x ( xi x )2 N 1 ˆ y ( yi y )2 N 1 Principles of Econometrics, 3rd Edition (4.15) Slide 4-15 rxy2 R 2 R 2 ryy2ˆ 2 R measures the linear association, or goodness-of-fit, between the sample data 2 and their predicted values. Consequently R is sometimes called a measure of “goodness-of-fit.” Principles of Econometrics, 3rd Edition Slide 4-16 SST yi y 495132.160 2 SSE yi yˆi eˆi2 304505.176 2 SSE 304505.176 R 1 1 .385 SST 495132.160 2 rxy Principles of Econometrics, 3rd Edition ˆ xy ˆ x ˆ y 478.75 .62 6.848112.675 Slide 4-17 Figure 4.4 Plot of predicted y, ŷ against y Principles of Econometrics, 3rd Edition Slide 4-18 FOOD_EXP = weekly food expenditure by a household of size 3, in dollars INCOME = weekly household income, in $100 units FOOD_EXP = 83.42 10.21 INCOME (se) (43.41)* (2.09)*** * indicates significant at the 10% level ** indicates significant at the 5% level *** indicates significant at the 1% level Principles of Econometrics, 3rd Edition R 2 .385 Slide 4-19 4.3.1 The Effects of Scaling the Data Changing the scale of x: y 1 2 x e =1 (c2 )( x / c) e =1 *2 x* e where β*2 cβ 2 and x* x c Changing the scale of y: y / c (1 / c) (2 / c) x (e/c) or y* 1* *2 x e* Principles of Econometrics, 3rd Edition Slide 4-20 Variable transformations: Power: if x is a variable then xp means raising the variable to the power p; examples are quadratic (x2) and cubic (x3) transformations. The natural logarithm: if x is a variable then its natural logarithm is ln(x). The reciprocal: if x is a variable then its reciprocal is 1/x. Principles of Econometrics, 3rd Edition Slide 4-21 Figure 4.5 A nonlinear relationship between food expenditure and income Principles of Econometrics, 3rd Edition Slide 4-22 The log-log model ln( y ) 1 2 ln( x) The parameter β is the elasticity of y with respect to x. The log-linear model ln( yi ) 1 2 xi A one-unit increase in x leads to (approximately) a 100×β2 percent change in y. The linear-log model y 1 2 ln x or y 2 100 x x 100 A 1% increase in x leads to a β2/100 unit change in y. Principles of Econometrics, 3rd Edition Slide 4-23 The reciprocal model is FOOD _ EXP 1 2 1 e INCOME The linear-log model is FOOD _ EXP 1 2 ln( INCOME ) e Principles of Econometrics, 3rd Edition Slide 4-24 Remark: Given this array of models, that involve different transformations of the dependent and independent variables, and some of which have similar shapes, what are some guidelines for choosing a functional form? 1. 2. 3. Choose a shape that is consistent with what economic theory tells us about the relationship. Choose a shape that is sufficiently flexible to “fit” the data Choose a shape so that assumptions SR1-SR6 are satisfied, ensuring that the least squares estimators have the desirable properties described in Chapters 2 and 3. Principles of Econometrics, 3rd Edition Slide 4-25 Figure 4.6 EViews output: residuals histogram and summary statistics for food expenditure example Principles of Econometrics, 3rd Edition Slide 4-26 The Jarque-Bera statistic is given by N 2 K 3 JB S 6 4 2 where N is the sample size, S is skewness, and K is kurtosis. In the food expenditure example 2.99 3 40 JB .097 2 6 4 Principles of Econometrics, 3rd Edition 2 .063 Slide 4-27 Figure 4.7 Scatter plot of wheat yield over time Principles of Econometrics, 3rd Edition Slide 4-28 YIELDt 1 2TIMEt et YIELDt .638 .0210 TIMEt (se) R 2 .649 (.064) (.0022) Principles of Econometrics, 3rd Edition Slide 4-29 Figure 4.8 Predicted, actual and residual values from straight line Principles of Econometrics, 3rd Edition Slide 4-30 Figure 4.9 Bar chart of residuals from straight line Principles of Econometrics, 3rd Edition Slide 4-31 YIELDt 1 2TIMEt3 et TIMECUBE TIME 3 1000000 YIELDt 0.874 9.68 TIMECUBEt (se) R 2 0.751 (.036) (.082) Principles of Econometrics, 3rd Edition Slide 4-32 Figure 4.10 Fitted, actual and residual values from equation with cubic term Principles of Econometrics, 3rd Edition Slide 4-33 Yield t = Yield0(1+g)t ln YIELDt ln YIELD0 ln 1 g t 1 2t ln YIELDt .3434 .0178t (se) (.0584) Principles of Econometrics, 3rd Edition (.0021) Slide 4-34 4.4.2 A Wage Equation ln WAGE ln WAGE0 ln 1 r EDUC 1 2 EDUC ln WAGE .7884 .1038 EDUC (se) (.0849) Principles of Econometrics, 3rd Edition (.0063) Slide 4-35 4.4.3 Prediction in the Log-Linear Model yˆ n exp ln y exp b1 b2 x yˆc E y exp b1 b2 x ˆ 2 yˆ ne 2 ˆ 2 2 ln WAGE .7884 .1038 EDUC .7884 .1038 12 2.0335 yˆc E y yˆ ne ˆ 2 2 7.6408 1.1276 8.6161 Principles of Econometrics, 3rd Edition Slide 4-36 4.4.4 A Generalized R2 Measure R corr y, yˆ ry2, yˆ 2 g 2 R corr y, yˆ c .47392 .2246 2 g 2 R2 values tend to be small with microeconomic, cross-sectional data, because the variations in individual behavior are difficult to fully explain. Principles of Econometrics, 3rd Edition Slide 4-37 4.4.5 Prediction Intervals in the Log-Linear Model exp ln y t se f ,exp ln y t se f c c exp 2.0335 1.96 .4905 ,exp 2.0335 1.96 .4905 2.9184,20.0046 Principles of Econometrics, 3rd Edition Slide 4-38 coefficient of determination correlation data scale forecast error forecast standard error functional form goodness-of-fit growth model Jarque-Bera test kurtosis least squares predictor Principles of Econometrics, 3rd Edition linear model linear relationship linear-log model log-linear model log-log model log-normal distribution prediction prediction interval R2 residual skewness Slide 4-39 Principles of Econometrics, 3rd Edition Slide 4-40 f y0 yˆ0 1 2 x0 e0 b1 b2 x0 var yˆ 0 var b1 b2 x0 var b1 x02 var b2 2 x0 cov b1 , b2 2 xi2 N xi x Principles of Econometrics, 3rd Edition 2 x 2 0 2 xi x 2 2 x0 2 x xi x 2 Slide 4-41 2 x 2 2 Nx 2 2 x 2 2 2 x0 x 2 Nx 2 i 0 var yˆ 0 2 2 2 2 2 N xi x N xi x xi x xi x N xi x xi2 Nx 2 x02 2 x0 x x 2 2 2 N xi x xi x 2 2 2 x x x x 2 i 0 2 2 N x x x x i i 2 1 x0 x 2 N x x i 2 Principles of Econometrics, 3rd Edition Slide 4-42 f ~ N (0,1) var( f ) 1 ( x0 x )2 var f ˆ 1 2 N ( x x ) i 2 y0 yˆ0 ~ t( N 2) var f se( f ) f P (tc t tc ) 1 Principles of Econometrics, 3rd Edition (4A.1) (4A.2) Slide 4-43 P[tc y0 yˆ 0 tc ] 1 se( f ) P yˆ0 tcse(f ) y0 yˆ0 tcse(f ) 1 Principles of Econometrics, 3rd Edition Slide 4-44 yi y 2 ( yˆi y ) eˆi ( yˆi y )2 eˆi2 2( yˆi y )eˆi 2 yi y 2 ( yˆi y )2 eˆi2 2 ( yˆi y )eˆi yˆi y eˆi yˆi eˆi y eˆi b1 b2 xi eˆi y eˆi b1 eˆi b2 xi eˆi y eˆi Principles of Econometrics, 3rd Edition Slide 4-45 eˆi yi b1 b2 xi yi Nb1 b2 xi 0 xi eˆi xi yi b1 b2 xi xi yi b1 xi b2 xi2 0 yˆi y eˆi 0 If the model contains an intercept it is guaranteed that SST = SSR + SSE. If, however, the model does not contain an intercept, then eˆi 0 and SST ≠ SSR + SSE. Principles of Econometrics, 3rd Edition Slide 4-46 Suppose that the variable y has a normal distribution, with mean μ and variance σ2. 2 y If we consider w e then y ln w ~ N , is said to have a log-normal distribution. E w e var w e Principles of Econometrics, 3rd Edition 2 2 2 2 e 2 1 Slide 4-47 Given the log-linear model ln y 1 2 x e 2 If we assume that e ~ N 0, E yi E e1 2 xi ei E e1 2 xi e ei e 1 2 xi e E e ei 1 2 xi 2 2 e e E yi eb1 b2 xi ˆ Principles of Econometrics, 3rd Edition 2 1 2 xi 2 2 2 Slide 4-48 The growth and wage equations: 2 ln 1 r and r e2 1 b2 ~ N 2 , var b2 E eb2 e2 varb2 /2 var b2 ˆ 2 Principles of Econometrics, 3rd Edition 2 xi x 2 rˆ eb2 varb2 /2 1 xi x 2 Slide 4-49