2.1 Week 2 Two-variable Regression Analysis 2.2 Purpose of Regression Analysis 1. Estimate a relationship among economic variables, such as Y = f(X). 2. Forecast or predict the value of one variable, Y, based on the value of another variable, X. Weekly Food Expenditures 2.3 Y = dollars spent each week on food items. X = consumer’s family weekly income. The relationship between X and the expected value of Y , given X, might be linear: P(Y|X) = E(Y|Xi) = f(Xi) = 1 + 2 Xi Means that each conditional mean E(Y|Xi) is a function of Xi, this equation is known as the population regression function (PRF) 2.4 Elucidating “conditional distribution” Two-variable case is easy to draw, yet illustrates basic concepts (as we saw in Galton-Pearson example) (Galton (1886): although there was a tendency for tall parents to have tall children an for short parents to have short children, the average height of children born of parents of a given height tended to move or “regress” to ward the average height in the population as a whole. Pearson (1903) collects more than 1000 records and confirms the claim (saying tis was “regression to mediocrity”) Example of population of 60 families - relationship between weekly consumption and family income Table 2.1 illustrates conditional distribution of consumption given family income (in each column) Conditional mean E(Y|X=80) = 65 (Table 2.2) The population regression line plot the conditional mean as a function of X (Figures 2.1 & 2.2) 2.5 Elucidating “conditional distribution” Table 2.1 (and 2.2) income----> 80 100 120 140 160 180 200 220 240 260 55 65 79 80 102 110 120 135 137 150 weekly 60 70 84 93 107 115 136 137 145 152 family 65 74 90 95 110 120 140 140 155 175 consmption 70 80 94 103 116 130 144 152 165 178 expenditure 75 85 98 108 118 135 145 157 175 180 in dollars 88 113 125 140 160 189 185 115 162 191 Total 325 462 445 707 678 750 685 1043 966 1211 Cond. Mean 65 77 89 101 113 125 137 149 161 173 2.6 f(Y|X=80) f(Y|X=80) Y|X=80 Condition Probability Distribution f(Y|X=80) of Food Expenditures if given income X=$80. Y f(Y|x) f(Y|x=80) Y|x=80 f(Y|x=100) Y|x=100 Condition Probability Distribution of Food Expenditures if given income X=$80 and X=$100. 2.7 Y 2.8 Y Conditional means E(Y|xi) Population regression line (PRF) 149 101 65 Distribution of Y given X=220 X 80 140 220 2.9 Average Consumption Y E(Y|x) E(Y|x)=1+2x E(Y|X) x 2= E(Y|X) X slope 1{ Intercept X (income) The Econometric Model: a linear relationship between average consumption and income. PRF - population regression function Conditional mean is a function of X E(Y|Xi) = f(Xi) - can take any functional form Linear PRF: E(Y|Xi) = 1 + 2Xi Parameters 1 and 2 are not known For each value of Xi , Y has a distribution as shown in Figure 2.2, with a conditional mean and variance 2.10 PRF - population regression function Population of 60 families 2.11 weekly consumption 200 160 PRF: E(Y|X) = 17 + 0.6•X 120 80 40 Fig. 2- 60 80 100 120 140 160 180 200 220 240 260 280 weekly income 1 Linearity 2.12 Linear in the variables: 1. E(Y|Xi) = 1 + 2Xi is linear in the variables 2. E(Y|Xi) = 1 + 2Xi2 is not linear in the variables Linear in the parameters: 2. E(Y|Xi) = 1 + 2Xi2 is linear in the parameters 3. E(Y|Xi) = 1 + 2Xi + 12Zi is not linear in the parameters Linearity in the parameters is what matters: only case 3 cannot be handled by the linear regression model (LRM). Case 3 is an example of the nonlinear regression model (NLRM) Stochastic Specification of PRF 2.13 Given any income level of Xi, an family’s consumption is clustered around the average of all families at that Xi, that is, around its conditional expectation, E(Y|Xi). The deviation of any individual Yi is: ui = Yi - E(Y|Xi) or or Yi = E(Y|Xi) + u i Yi = 1 + 2 X + u i Shochastic error or Stochastic disturbance The Error Term Y is a random variable composed of two parts: I. Systematic component: This is the mean of Y. E(Y) = 1 + 2X II. Random component: u = Y - E(Y) = Y - 1 - 2X This is called the random or shochastic error. Together E(Y) and u form the model: Y = 1 + 2X + u 2.14 For examples: given X = $80, the individual consumption are Y1 = 55 = 1 + 2 (80) + u 1 Y2 = 60 = 1 + 2 (80) + u 2 Y3 = 65 = 1 + 2 (80) + u 3 Y4 = 70 = 1 + 2 (80) + u 4 Y5 = 75 = 1 + 2 (80) + u 5 Estimated average: ^ = 65 = ^ + ^ (80) Y 1 1 2 ^ = 65 = ^ + ^ (80) Y 2 1 2 ^ = 65 = ^ + ^ (80) Y 3 1 2 Y^4 = 65 = ^1 + ^2 (80) Y^5 = 65 = ^1 + ^2 (80) 2.15 2.16 The reasons for stochastic disturbance • • • • • • • • Vagueness of theory Unavailability of data Direct effect vs indirect effect (Core variables vs peripheral variables) Intrinsic randomness in human behaviour Poor proxy variables Principle of parsimony wrong functional form Unobservable Nature of Error Term 2.17 • Unspecified factors / explanatory variables, not in the model, may be in the error term. For example: Final examine score does not only depend on class attendance but also other unobserved factors such as student ability, maths background, hard work effort, etc. • Approximation error is in the error term if relationship between y and x is not exactly a perfectly linear relationship. • Strictly unpredictable random behavior that may be unique to that observation is in error. SRF - sample regression function 2.18 In practice we do not observe the entire population Instead we collect a sample to estimate the PRF We may not be accurate because of sample fluctuations (sampling error) To illustrate, suppose we have only one family taken at random from each income level in Table 2.1 We use the 10 observations to derive a sample regression function as illustrated in Figure 2.3 SRF: Ŷ = b1 + b2X, where b is an estimator for Ŷ = 24 + (0.51)X (note “hat” over the Y) SRF - sample regression function2.19 Table 2.1 (and 2.4) - Sample 1 in green) income----> 80 100 120 55 65 79 weekly 60 70 84 family 65 74 90 consmption 70 80 94 expenditure 75 85 98 in dollars 88 140 80 93 95 103 108 113 115 Total 325 462 445 707 Cond. Mean 65 77 89 101 160 102 107 110 116 118 125 180 110 115 120 130 135 140 200 120 136 140 144 145 220 135 137 140 152 157 160 162 678 750 685 1043 113 125 137 149 240 137 145 155 165 175 189 260 150 152 175 178 180 185 191 966 1211 161 173 Y (SRF) . Y4 Y3 Y2 ^ Y 2 { E(Y|x)=1+2x (PRF) } u2 ^Y = ^ + ^2.20 1 2x ^ u2 E(Y|x2) Y1 .} u 1 x x1 x2 x3 x4 The relationship among Yi, ui and the true regression line. The Sample Regression Function (SRF) Y ^u {. 4 Y4 Y3 Y2 Y1 ^u . 2{ (SRF2) ^ ^ ^ Y = 1 + 2x ^Y = ^ + ^ x 1 2 (SRF1) }^ u3 . u^1 } . x1 x2 x3 x4 2.21 x Different samples will have different SRFs) SRF - sample regression function 2.22 Sample 1 weekly consumption 160 PRF: E(Y|X) = 17 + 0.6•X 120 SRF: Ŷ = 24 + 0.51•X 80 40 60 80 100 120 140 160 180 200 220 240 260 280 weekly income SRF: or ^ ^ ^ Yi = 1 + 2 Xi ^ ^ Yi = 1 + 2Xi + u^ i or Yi = b1 + b2 Xi + ei PRF: E(Y|X) = 1 + 2 Xi Yi = 1 + 2 Xi + u i ^ Yi = estimator of Yi (E(Y|xi) ^ i or bi = estimator of i Other Notation ??? Residual Error term or Disturbance 2.23 Least Squared Method 2.24 ^2 SRF2:Y = a1+a2X Y ^ 1= b1+b2X SRF1:Y 1 1 1 -1/2 2 -2 -11/2 0 -1 -1 SRF1: |u| = |1| + |-1| + |-1| + |1| + |-1.5| u2 =12 + 12 + 12 + 12 + 1.52 X = 5.5 = 6.25 smaller SRF2: |u| = |2| + |0| + |-1/2| + |1| + |-2| = 5.5 u2 = 22 + 02 + (-1/2)2 + 12+ (-2)2 = 9.25 Ordinary Least Squares (OLS) Method 2.25 Yi = 1 + 2Xi + ui u i = Y i - 1 - 2X i Minimize error sum of squared deviations: n n i=1 i=1 ui2 = (Y i - 1 - 2X i )2 = f(1,2) 2.26 Minimize w.r.t. 1 and 2: n f(1,2) = (Y i - 1 - 2x i i =1 f ( ) = - 2 (Y i 1 2 ) = f ( ) - 1 - 2Xi ) f ( ) = - 2 Xi (Yi 2 - 1 - 2Xi ) Set each of these two derivatives equal to zero and solve these two equations for the two unknowns: 1 2 To minimize f(.), you set the two derivatives equal to zero to get: f() = - 2 (Y i 1 ^ 2.27 ^ – 1 – 2Xi ) = 0 f() = - 2 xi (Yi 2 ^ ^ - 1 – 2Xi ) = 0 When these two terms are set to zero, ^ ^ 1 and 2 become 1 and 2 because they no longer represent just any value of 1 and 2 but the special values that correspond to the minimum of f() . - 2 (Y i ^ 2.28 ^ - 1 – 2Xi ) = 0 - 2 Xi (Y i ^ ^ – 1 – 2Xi ) = 0 ^ ^ Y n – i 1 2 Xi = 0 ^ ^ Xi Yi - 1 X i - 2 Xi ^ ^ 2 = 0 n 1 + 2 Xi = Y i 2 ^ ^ 1 Xi + 2 Xi = Xi Yi 2.29 Example: Maddala (p. 66) Month Sales (Y) Advertising Exp. (X) X2 XY 1 3 1 1 3 0.8 2 4 2 4 8 0.6 3 2 3 9 6 -2.6 4 6 4 16 24 0.2 5 8 5 25 40 1.0 Total 23 15 55 81 0 5ˆ1 15ˆ2 23 15ˆ 55ˆ 81 1 2 Yˆi 1.0 1.2 X i uˆi Yi 1.0 1.2 X i ^ n Xi 2 X X i i 1 = ^ = 2 Yi Xi Yi Solve for two unknowns 2 = ^ = n Xi Yi - Xi Yi n X i - (Xi ) 2 (Xi - X )(Yi -Y) 2 X X ) ( i 1 = Y - 2 x ^ ^ 2 = xiyi 2 x i xi X i X yi Yi Y 2.30 2.31 Y . ^* Y1 . { ^ Y* . ^ u*2 {. Y2 2 ^* Y3 . ^ u*3{ . {. Y4 ^u* 4 ^* Y4 ^ ^ ^ Y = 1 + 2X ^ ^ * ^ * Y = 1 + *2X Y3 ^ u*1 . Y1 x1 x2 x3 x4 x Why the SRF is the best one? ^ is larger. The sum of squared residuals from any other line Y* Assumptions of Simple Regression 2.32 1. The linear regression Model:linear in parameters Y = 1+ 2X+ u 2. X values are fixed in repeated sampling, so that X is not constant (X is nonstochastic). 3. Zero mean value of error terms (disturbance, ui), E( ui | xi) = 0 4. Homoscedasticity or equal variance of ui, the conditional variances of u i are identical, i.e., var(ui|xi) = 2 2.33 Homoscedasticity Case f(Yi) . . x1=80 x2=100 income The probability density function for Yi at two levels of family income, X i , are identical. xi 2.34 Heteroscedasticity Case f(Yi) . . x1 x2 x3 . income xt The variance of Yi increases as family income, xi, increases. Assumptions of SRF (continue) 5. No autocorrelation between the disturbance. cov(ui,uj|xi ,xj) = 0 6. Zero covariance between ui and xi, i.e., cov(ui,xi ) = E(ui,xi ) = 0 7. The # of observation (n) must be greater than the # of parameters (k) to be estimated. n>k 2.35 Assumptions of SRF (continue) 8. Variability in X values: The values in a given sample must not all be the same, at least two must different. 9. No specification bias or error: the regression model is correctly specified. 10. There is no perfect multicollinearity. No perfect linear relationship among the independent variables. i.e., X k Xm 2.36 2.37 One more assumption that is often used in practice but is not required for least squares: (Optional) The values of y are normally distributed about their mean for each value of x: Y ~ N [(1+2X),2 ] The Error Term Assumptions 2.38 1. The value of y, for each value of x, is Y = 1 + 2X + u 2. The average value of the random error u is: E(u) = 0 3. The variance of the random error u is: var(u) = 2 = var(Y) 4. The covariance between any pair of u’s is: cov(ui , uj) = cov(Yi ,Yj) = 0 5. u is normally distributed with mean 0, var(u)=2 u ~ N(0,2) Prediction 2.39 Estimated regression equation: ^ Yi = 4 + 1.5 X i X i = years of experience ^ Yi = predicted wage rate If If ^ X i = 2 years, then Yi = $7.00 per hour. ^ X i = 3 years, then Yt = $8.50 per hour. 2.40 Mean Prediction: Ŷ ˆ 1 ˆ 2 X Prediction Ŷ = 24.454 + 0.5090 X X= 100 ^ Y = 24.454 + 0.5090 (100) = 75.364 (estimated result) The “ex-post” and “ex ante” forecasting: For example: Suppose you have data on Y and X for 1947–1999 and the estimated consumption expenditures for 1947-1995 is 1947 – 1995: ^ Yt= 238.4 + 0.87Xt 2.41 t=time, e.g., t=1947,1948, … Given values of X96 = 10,419; X97 = 10,625; … X99 = 11,286 The calculated predictions or the “ex post” forecasts are: ^ ^ 1996: Y96 = 238.4 + 0.87(10,149) = 9.355 1997: Y97 = 238.4 + 0.87(10,625) = 9.535.50 ….. ^ 1999: Y99 = 238.4 + 0.87(11285) = 10,113.70 The calculated predictions or the “ex ante” forecasts base on the assumed value of X2000=12000: ^ 2000: Y2000 = 238.4 + 0.87(12,000) = 10678.4 Forecasting with the two-variable regression model ex-post forecast 1996 2.42 ex-ante forecast 1999 2003 t=time Estimated regression function in a time-series context: ^ ^ ^ Yt 1 2 Xt Forecast for-period t+t is ^ ^ ^ Yt t 1 2 Xt t t Forecast error: xt t : # of period into the future : is an observed or control value of future f ^ ^ ut t Yt t Yt t 2.43 Comparison of Forecasts Mean squared error (MSE) ^ 2 (Y Y ) nk i i MSE 2 ^ (Y Y ) nk Root mean squared error(RMSE) i Mean absolute percentage error (MAPE) MAPE i ( ^ Y | |Y i i Yi n k ) n = # of forecats, k = # of parameters estimated in the model