LN6-Applications in

Lecture Notes #6 AN APPLICATION OF OPTIMIZATION AND MATRIX ALGEBRA: REGRESSION Regression analysis, applied in econometrics, is a basic part of economics. Here we will study how the concepts we learned in the discussion of matrix algebra and derivatives are applied in regression analysis. We will start with simple regression and then extend the discussion to multiple regression. 1. Simple Regression In simple regression we want to find out about the relationship between two variable: If the variations in the values of one variable are associated with the variations with the values of another variable. Here we designate one variable as the dependent variable and the other as the independent variable. We will use the following simple example, to see if, and to what extent, the variations in the quantity of a good sold by a firm (the dependent variable) is related to the annual advertising expenditure (the independent variable). Year 1 2 3 4 5 6 7 8 9 10 11 12 Sales y 38 48 52 35 30 56 63 46 61 68 72 65 Ad Expenditure x 56.7 63.9 60.7 59.7 55.9 68.7 69.2 65.5 72.5 73.4 74.1 76.2 A scatter diagram of the data is shown in Figure 1. LN6—Applications in Econometrics Page 1 of 11 Figure 1 Scatter Diagram of Sales and Advertising Data As the first step, the scatter diagram indicates that there is a direct relationship between sales volume and advertising expenditure. We want, however, obtain a mathematical relationship between x and y. The mathematical relationship is represented by the regression equation, which is the equation of a line that is fitted to the scatter diagram. This fitted line is shown in Figure 2. Figure 2 The Fitted Regression Line The general format of the regression line is as follows: 𝑦̂ = 𝑏₁ + 𝑏₂𝑥 Note the symbol “^” above y. The symbol ŷ (y-hat) represents the y value that lies on the regression line, which is obtained by plugging in values for x after we determine the values for the slope of the line (b2) and the vertical intercept (b1). It is called the predicted value. The “plain”, hatless, y represents the value of the dependent variable observed in data set, called the observed value. For now, the equation of the fitted regression line is LN6—Applications in Econometrics Page 2 of 11 𝑦̂ = −67.4299 + 1.8119𝑥 To show the difference between observed and predicted value of y, consider, for example, the observed sales volume in year 7 is: y = 63 million units, which corresponds to the advertising expenditure of x = $69.2 thousand. The predicted sales is: 𝑦̂ = −67.4299 + 1.8119(69.2) = 57.95 The difference between the observed and predicted values of y is called the prediction error or the residual and is denoted by e. 𝑒 = 𝑦̂ − 𝑦̂ The prediction error provides the theoretical basis to determine the mathematical process to obtain the values for slope and vertical intercept (the coefficients) of the regression equation. The mathematical process is called the least squares method. Note that for each value of x in the data set there is an observed y value and a predicted y value. Thus, given the number of observations in the data set, n, there are n residuals. All predicted values lie along the fitted regression line. There is only one line that fits the scatter diagram the best. The criterion for the best fitting line is that the sum of the residuals equal zero, that is, the combined positive prediction errors exactly balance the combined negative errors. 𝑛 𝑛 ∑ 𝑒𝑖 = ∑(𝑦̂𝑖 − 𝑦̂𝑖 ) = 0 𝑖=1 𝑖=1 Since the best-fitting line provides for the sum of residuals being zero, then the sum of squared residuals must be the minimum, or the least. Thus, we need the method to obtain the coefficients of the regression line by minimizing the sum of square errors (SSE), 𝑛 𝑆𝑆𝐸 = 𝑛 ∑ 𝑒𝑖2 𝑖=1 2 = ∑(𝑦̂𝑖 − 𝑦̂𝑖 ) = 0 𝑖=1 2. Least Square Method The objective here is to find the formulas to compute the values for b1 and b2. We start with the SSE. 𝑛 ∑ 𝑒𝑖2 𝑖=1 𝑛 2 = ∑(𝑦̂𝑖 − 𝑦̂𝑖 ) 𝑖=1 Substituting for 𝑦̂𝑖 = 𝑏1 + 𝑏2 𝑥𝑖 on the right-hand-side, we have, 𝑛 ∑ 𝑒𝑖2 𝑖=1 𝑛 = ∑(𝑦̂𝑖 − 𝑏1 − 𝑏2 𝑥𝑖 )2 𝑖=1 Since we are interested in determining the values of the two coefficients b1 and b2 such that ∑ 𝒆𝟐 is minimized, we take the partial derivate of the right hand side with respect to b1 and b2 and set them equal to zero. LN6—Applications in Econometrics Page 3 of 11 𝜕 (∑𝑒 2 ) = −2∑(𝑦̂ − 𝑏1 − 𝑏2 𝑥) 𝜕𝑏1 𝜕 (∑𝑒 2 ) = −2∑𝑥(𝑦̂ − 𝑏1 − 𝑏2 𝑥) 𝜕𝑏2 Rewriting the right-hand-side and setting them equal to zero, we have: ∑𝑦̂ − 𝑛𝑏1 − 𝑏2 ∑𝑥 = 0 ∑𝑥𝑦̂ − 𝑏1 ∑𝑥 − 𝑏2 ∑𝑥2 = 0 which lead to: 𝑛𝑏1 + (∑𝑥)𝑏2 = ∑𝑦̂ (∑𝑥)𝑏1 + (∑𝑥2 )𝑏2 = ∑𝑥𝑦̂ These are called the normal equations of the regression. Here we have two equations with two unknowns 𝑏1 and 𝑏2 . We can solve for 𝑏1 and 𝑏2 two ways. Using the matrix notation, the equation system is written a 𝑋𝑏 = 𝑐 s, where 𝑋=[ 𝑛 ∑𝑥 ] ∑𝑥 ∑𝑥 2 𝑏 𝑏 = [ 1] 𝑏2 𝑐=[ “Coefficient” matrix “Variable” matrix ∑𝑦̂ ] ∑𝑥𝑦̂ “Constant” matrix Thus, [ 𝑛 ∑𝑥 ∑𝑥 𝑏1 ∑𝑦̂ ] 2 ] [𝑏 ] = [ ∑𝑥 ∑𝑥𝑦̂ 2 The solutions for 𝑏1 and 𝑏2 is then found by finding the inverse of the coefficient matrix and the post-multiplying the inverse by the constant matrix. 𝑏 = 𝑋 −1 𝑐 To find 𝑋 −1, first find the determinant of 𝑋. |𝑋| = 𝑛∑𝑥 2 − (∑𝑥)2 = 𝑛∑𝑥 2 − (𝑛𝑥̅ )2 (Substituting for ∑𝑥 = 𝑛𝑥 ̅) |𝑋| = 𝑛(∑𝑥 2 − 𝑛𝑥̅ 2 ) Next find the Cofactor matrix, [𝐶] = [ ∑𝑥 2 −∑𝑥 ] − ∑𝑥 𝑛 LN6—Applications in Econometrics Page 4 of 11 Since the square matrix is symmetric about the principal diagonal, the Adjoint matrix, which is the transpose of the cofactor matrix, is the same as [𝐶]. The inverse matrix 𝑋 −1 is then, 𝑋 −1 = 1 ∑𝑥 2 [ |𝑋| −∑𝑥 −∑𝑥 ] 𝑛 𝑋 −1 = 1 ∑𝑥 2 [ 𝑛(∑𝑥 2 − 𝑛𝑥̅ 2 ) −∑𝑥 −∑𝑥 ] 𝑛 ∑𝑥 2 𝑋 −1 −∑𝑥 𝑛(∑ − 𝑛(∑𝑥 2 − 𝑛𝑥̅ 2 ) = −∑𝑥 𝑛 2 2 2 [𝑛(∑𝑥 − 𝑛𝑥̅ ) 𝑛(∑𝑥 − 𝑛𝑥̅ 2 )] 𝑥2 𝑛𝑥̅ 2 ) ∑𝑥 2 𝑋 −1 = 𝑛(∑ 𝑥2 𝑛𝑥̅ 2 ) − −𝑥̅ [ ∑𝑥 2 − 𝑛𝑥̅ 2 −𝑥̅ ∑ − 𝑛𝑥̅ 2 1 ∑𝑥 2 − 𝑛𝑥̅ 2 ] 𝑥2 Thus, ∑𝑥 2 𝑏 𝑛(∑ − [ 1] = 𝑏2 −𝑥̅ 2 [ ∑𝑥 − 𝑛𝑥̅ 2 𝑥2 𝑛𝑥̅ 2 ) −𝑥̅ ∑ − 𝑛𝑥̅ 2 ∑𝑦̂ [ ] 1 ∑𝑥𝑦̂ ∑𝑥 2 − 𝑛𝑥̅ 2 ] 𝑥2 Using the matrix multiplication rule, first solve for 𝑏2 . 𝑏2 = −𝑛𝑥̅ 𝑦̂̅ ∑𝑥𝑦̂ + 2 2 ∑𝑥 − 𝑛𝑥̅ ∑𝑥 2 − 𝑛𝑥̅ 2 𝑏2 = ∑𝑥𝑦̂ − 𝑛𝑥̅ 𝑦̂̅ ∑𝑥 2 − 𝑛𝑥̅ 2 (∑𝑦̂ = 𝑛𝑦̂̅) Next, solve for 𝑏1 . 𝑏1 = 𝑦̂̅ ∑𝑥 2 𝑥̅ ∑𝑥𝑦̂ − ∑𝑥 2 − 𝑛𝑥̅ 2 ∑𝑥 2 − 𝑛𝑥̅ 2 𝑏1 = 𝑦̂̅∑𝑥 2 − 𝑥̅ ∑𝑥𝑦̂ ∑𝑥 2 − 𝑛𝑥̅ 2 𝑏1 = 𝑦̂̅∑𝑥 2 − 𝑛𝑥̅ 2 𝑦̂̅ − 𝑥̅ ∑𝑥𝑦̂ + 𝑛𝑥̅ 2 𝑦̂̅ ∑𝑥 2 − 𝑛𝑥̅ 2 𝑏1 = 𝑦̂̅(∑𝑥 2 − 𝑛𝑥̅ 2 ) − (∑𝑥𝑦̂ − 𝑛𝑥̅ 𝑦̂̅)𝑥̅ ∑𝑥 2 − 𝑛𝑥̅ 2 𝑏1 = 𝑦̂̅ − 𝑏2 𝑥̅ LN6—Applications in Econometrics Page 5 of 11 Alternatively, dividing both sides of the first normal equation by n 𝑏1 + 𝑏2 ∑𝑥 = 𝑛 ∑𝑦̂ 𝑛 𝑏1 + 𝑏2 𝑥̅ = 𝑦̂̅ 𝑏1 = 𝑦̂̅ − 𝑏2 𝑥̅ Now divide the second normal equation by n and substitute for b1. (𝑦̂̅ − 𝑏2 𝑥̅) ∑𝑥 𝑛 + 𝑏2 ∑𝑥 2 𝑛 = ∑𝑥𝑦̂ 𝑛 𝑛(𝑦̂̅ − 𝑏2 𝑥̅)𝑥̅ + 𝑏2 ∑𝑥 2 = ∑𝑥𝑦̂ 𝑛𝑥̅ 𝑦̂̅ − 𝑛𝑏2 𝑥̅2 + 𝑏2 ∑𝑥 2 = ∑𝑥𝑦̂ 𝑏2 (∑𝑥 2 − 𝑛𝑥̅ 2 ) = ∑𝑥𝑦̂ − 𝑛𝑥̅ 𝑦̂̅ 𝑏2 = ∑𝑥𝑦̂ − 𝑛𝑥̅ 𝑦̂̅ ∑𝑥 2 − 𝑛𝑥̅ 2 From the data compute the following: 𝑥̅ = 66.375 𝑏2 = ∑ 𝑥𝑦̂ = 43066.4 𝑦̂̅ = 52.833 ∑ 𝑥² = 53411.13 43066.4 − 12(66.375)(52.833) = 1.8119 53411.13 − 10(66.375)² 𝑏1 = 52.833 − 1.8119(66.375) = −67.4299 Alternatively, compute the quantities in the matrix notation: [ 𝑛 ∑𝑥 𝑏1 ∑𝑦̂ ][ ] = [ ] ∑𝑥 ∑𝑥 2 𝑏2 ∑𝑥𝑦̂ The solutions for 𝑏1 and 𝑏2 is then found by finding the inverse of the coefficient matrix and the post-multiplying the inverse by the constant matrix. X=[ 𝑛 ∑𝑥 ∑𝑥 ] ∑𝑥 2 𝑏 b = [ 1] 𝑏2 c=[ ∑𝑦̂ ] ∑𝑥𝑦̂ 𝑏 = 𝑋 −1 𝑐 From the data compute the following: ∑𝑥 = 796.5 12 [ 796.5 ∑𝑦̂ = 634 ∑𝑥² = 53411.13 ∑𝑥𝑦̂ = 43066.4 634.0 796.50 𝑏1 ][ ] = [ ] 43066.4 53411.13 𝑏2 LN6—Applications in Econometrics Page 6 of 11 Using Excel, find X −1 = [ 8.1902 −0.1221 −0.1221 ] 0.0018 𝑏 8.1902 [ 1] = [ 𝑏2 −0.1221 −0.1221 634 ][ ] 0.0018 43066.4 𝑏 −67.4299 [ 1] = [ ] 𝑏2 1.8119 3. Multiple Regression In multiple regression there are two or more independent variables. With two independent variables, the regression equation is written as 𝑦̂̂ = 𝑏1 + 𝑏2 𝑥2 + 𝑥3 To determine the coefficients of the multiple regression the only feasible approach is using matrix algebra. Here, again, we start with finding the normal equations. ∑𝑒 2 = ∑(𝑦̂ − 𝑦̂)2 ∑𝑒 2 = ∑(𝑦̂ − 𝑏1 − 𝑏2 𝑥2 − 𝑏3 𝑥3 )2 𝜕 (∑𝑒 2 ) = −2∑(𝑦̂ − 𝑏1 − 𝑏2 𝑥2 − 𝑏3 𝑥3 ) = 0 𝜕𝑏1 𝜕 (∑𝑒 2 ) = −2∑𝑥2 (𝑦̂ − 𝑏1 − 𝑏2 𝑥2 − 𝑏3 𝑥3 ) = 0 𝜕𝑏2 𝜕 (∑𝑒 2 ) = −2∑𝑥3 (𝑦̂ − 𝑏1 − 𝑏2 𝑥2 − 𝑏3 𝑥3 ) = 0 𝜕𝑏3 The normal equations are: 𝑛𝑏1 + (∑𝑥2 )𝑏1 + (∑𝑥2 )𝑏2 + (∑𝑥3 )𝑏3 = ∑𝑦̂ (∑𝑥22 )𝑏2 + (∑𝑥2 𝑥3 )𝑏3 = ∑𝑥2 𝑦̂ (∑𝑥3 )𝑏1 + (∑𝑥2 𝑥3 )𝑏3 + (∑𝑥32 )𝑏3 = ∑𝑥3 𝑦̂ In matrix format, we have, 𝑛 [∑𝑥2 ∑𝑥3 ∑𝑥2 ∑𝑥22 ∑𝑥2 𝑥3 ∑𝑥3 ∑𝑦̂ 𝑏1 ∑𝑥2 𝑥3 ] [𝑏2 ] = [∑𝑥2 𝑦̂] 𝑏3 ∑𝑥3 𝑦̂ ∑𝑥32 Labeling the three matrices as before, to find the values of the regression coefficients, b = X −1 k LN6—Applications in Econometrics Page 7 of 11 Example The following data shows the price of houses as the dependent variable, and the size (in square feet) and the age (years) as the independent variables. We want to determine to what extent the price varies with the size and the age of the house. PRICE y 89950 138950 87000 165000 210000 108000 89000 79000 124500 135000 105500 133650 83500 101000 151500 88500 198000 135000 79500 135050 175000 71000 76000 99250 98500 117500 97000 125000 115000 145000 SQFT x2 917 1684 1800 1900 2000 1050 1057 954 1350 2134 1313 1671 1200 1314 1877 1132 2198 1525 1208 1450 2000 1267 1088 1159 1255 1386 1400 1442 1477 1566 AGE x3 39 10 20 13 20 40 45 6 47 13 12 37 18 45 10 38 2 10 29 35 9 18 12 35 4 30 63 12 23 6 The following are the quantities needed to generate the X and k matrices. 𝑛 = 30 ∑𝑥22 = 67540602 ∑𝑦̂ = 3556850 30 43774 X = [43774 67540602 701 955209 ∑𝑥2 = 43774 ∑𝑥32 = 23573 ∑𝑥2 𝑦̂ = 5494952850 701 955209] 23573 ∑𝑥3 = 701 ∑𝑥2 𝑥3 = 955209 ∑𝑥3 𝑦̂ = 77587100 3556850 k = [5494952850] 77587100 b = X −1 k LN6—Applications in Econometrics Page 8 of 11 𝑏1 1.0387523 [𝑏2 ] = [−0.0005537 𝑏3 −0.0084550 −0.0005537 0.0000003 0.0000031 −0.0084550 3556850 0.0000031] [5494952850] 0.0001682 77587100 𝑏1 −3609.450 [𝑏2 ] = [ 83.459] 𝑏3 16.802 The estimated regression equation is then, 𝑦̂̂ = −3609.450 + 83.459𝑥2 + 16.802𝑥3 Thus, for example, the price of a 2,000 square feet house of 10 years of age is predicted to be 𝑦̂̂ = −3609.450 + 83.459(2000) + 16.802(10) = $163,476.57 4. Functional Forms in Econometrics In the simple linear regression model the population parameters β₁ (the intercept) and β₂ (the slope) are linear—that is, they are not expressed as, say, β22 , 1⁄β2 , or any form other than β₂—and also the impact of the changes in the independent variable on 𝑦̂ works directly through 𝑥 rather than through expressions such as, say, 𝑥², 1⁄𝑥 , or ln 𝑥. In many cases, even though the parameters are held as linear, the variables may take on forms other than linear. In many economic models the relationship between the dependent and independent variables is not a straight line relationship. That is the change in y does not follow the same pattern for all values of x. Consider for example an economic model explaining the relationship between expenditure on food (or housing) and income. As income rises, we do expect expenditure on food to rise, but not at a constant rate. In fact, we should expect the rate of increase in expenditure on food to decrease as income rises. Therefore the relationship between income and food expenditure is not a straight-line relationship. The following is an outline of various functional forms encountered in regression analysis. First we start with the straight-line (linear) relationship between 𝑥 and 𝑦̂ and then point out various non-linear functional relationships. 4.1. Linear Functional Form The linear functional form is the familiar, 𝑦̂ = β1 + β2 𝑥 The slope of the function is: 𝑑𝑦̂ = β2 𝑑𝑥 The slope represents the change in y per unit change in x. The elasticity of the function shows the proportional or percentage change in y relative to a percent change in x: ε= 𝑑(ln 𝑦̂) 𝑥 = β2 𝑑(ln 𝑥) 𝑦̂ To show this, let 𝑢 = ln 𝑦̂, and 𝑣 = ln 𝑥. But, since 𝑣 = ln 𝑥, then 𝑥 = 𝑒 𝑣 . Therefore, 𝑑(ln 𝑦̂) 𝑑𝑢 𝑑𝑢 𝑑𝑦̂ 𝑑𝑥 = = 𝑑(ln 𝑥) 𝑑𝑣 𝑑𝑦̂ 𝑑𝑥 𝑑𝑣 LN6—Applications in Econometrics Page 9 of 11 𝑑(ln 𝑦̂) 1 𝑥 = ( ) (β2 )𝑒 𝑣 = β2 𝑑(ln 𝑥) 𝑦̂ 𝑦̂ 𝑥 Note that to obtain elasticity, simply multiply the slope by . 𝑦 ε= 𝑑𝑦̂ 𝑥 𝑥 = β2 𝑑𝑥 𝑦̂ 𝑦̂ 4.2. Reciprocal Functional Form The reciprocal functional form is, 𝑦̂ = β1 + β2 1 𝑥 The slope is, 𝑑𝑦̂ β2 =− 2 𝑑𝑥 𝑥 and the elasticity is, ε= 𝑑(ln 𝑦̂) 1 = −β2 𝑑(ln 𝑥) 𝑥𝑦̂ Note that, as usual, using the general definition of elasticity, ε= 𝑑𝑦̂ 𝑥 β2 𝑥 1 = (− 2 ) ( ) = −β2 𝑑𝑥 𝑦̂ 𝑥 𝑦̂ 𝑥𝑦̂ 4.3. Log-Log Functional Form Many relationships between variables are naturally expressed in percentages. Logarithms convert changes in variables into percentage changes. The log-log functional form is, ln 𝑦̂ = β1 + β2 ln 𝑥 The slope and elasticity of the function are: 1 𝑑𝑦̂ 1 = β2 𝑦̂ 𝑑𝑥 𝑥 ε= 𝑑𝑦̂ 𝑦̂ = β2 𝑑𝑥 𝑥 𝑑𝑦̂ 𝑥 𝑦̂ 𝑥 = (β2 ) ( ) = β2 𝑑𝑥 𝑦̂ 𝑥 𝑦̂ Consider, for example, the function, ln 𝑦̂ = 0.5 + 2 ln 𝑥 For 𝑥0 = 1.5, ln 𝑦̂ = 0.5 + 2 ln 1.5 = 0.5 + 2(0.4055) = 1.3109 𝑦̂0 = exp(1.3109) = 3.7096 The slope of the function at the point 𝑥0 = 1.5 is then, LN6—Applications in Econometrics Page 10 of 11 𝑑𝑦̂ 𝑦̂ 3.7096 = β2 = 2 = 4.9462 𝑑𝑥 𝑥 1.5 This means that for each small unit increase in x, y increases by 4.9462. Now let’s compute the elasticity. First, assume x increases by 1 percent point from 𝑥0 = 1.5 to 𝑥1 = 1.5 × 1.01 = 1.515. Then, ln 𝑦̂1 = 0.5 + 2 ln 1.515 = 1.3308 𝑦̂ = exp(1.3308) = 3.7842 Thus, the percentage change in y when x increases by 1 percentage point is, 3.7842 − 3.7096 = 0.0201 𝑜𝑟 2.01% 3.7096 Elasticity is then, ε = 0.0201⁄0.01 = 2.01. For a very small percent change in x, elasticity will approach β2 = 2, as shown in the following table. “Percent” change in x (1) 0.01 0.005 0.0025 0.001 0.0001 0.00001 𝒙𝟏 (2) 1.515 1.5075 1.50375 1.5015 1.50015 1.500015 𝒚𝟏 (3) 3.7842 3.7468 3.7282 3.7170 3.7104 3.7097 ∆𝒚⁄𝒚𝟎 (4) 0.0201 0.010025 0.005006 0.002001 0.0002 0.00002 Elasticity (4)/(1) 2.01 2.005 2.0025 2.001 2.0001 2.00001 The following will be added: Linear-log (semi-log) model Log-inverse model LN6—Applications in Econometrics Page 11 of 11

LN6-Applications in

Related documents

Products

Support

LN6-Applications in

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib