QBA ASSIGNMENT PROJECT Semester 1, 2011. Lecturer: Dr Otto Konstandatos This assignment is in two parts of unequal value. Part 1 is technical, and is designed to help you understand the connection between Mathematics (specifically multivariate optimisation using calculus), and econometric regression analysis, by guiding you through the derivation of the least squares estimators which are used in multiple regression. This part may be typed or neatly hand-written in the space provided in the printed document. Part 2 is applied, using multiple regression analysis to explore a real-world problem, namely the relationship between a car’s ‘size’, and its fuel efficiency. Part 2 is designed to test your econometric modelling skills using multiple regression and Eviews. This part requires you to cut and paste your eviews output in the spaces below before printing your solution. This assignment may be done in groups of at most to five students who either are in the same formal tutorial group as you, or who have the same tutor but are from another tutorial group. When you hand in your assignment you must include this cover sheet. No names can be added onto the group lists apart from the names that appear below. Due Time/Date: 1:00pm Monday 30-05-2011 It must be deposited into your assigned tutor’s collection box on Level 3, Building 5, Discipline of Finance. Ku-ring-gai students may submit their scripts after the lecture. Name Johanan Ottensooser Jeremy Raymond Fariba Razi Kristina Coffey Tutor’s Name: Tutorial Day and Time: Date stamp or tutor’s signature and date Student Number 10873305 10596854 10449977 10837944 Quantitative Business Analysis Part 1: Question 1: 1. The sum of the average value of x is equal to the sum of all x values: that is, n n n n i=1 i=1 i=1 i=1 å xi = å x , since å xi - å x = 0 . 2. We are required to prove that n n i=1 i=1 n n i=1 i=1 å(xi - x)(yi - y) = å xi (yi - y). 3. LHS = å(xi - x )(yi - y) = å(xi yi - xi y - xyi - xy) 4. Applying the rule of summation: n n n n i=1 i=1 i=1 i=1 n n n n i=1 i=1 i=1 i=1 n n n i=1 i=1 i=1 n n n n n i=1 i=1 i=1 i=1 i=1 n n i=1 i=1 LHS = å xi yi - å xi y - å xyi + å xy 5. A constant may be moved outside of the summation, thus: LHS = å xi yi - å yxi - xå yi + å xy 6. The sum of a constant is n multiplied by the constant, thus: LHS = å xi yi - å yxi - xå yi + nxy 7. Applying the rule from point 1: LHS = å xi yi - å yxi - xå y + nxy = å xi yi - å yxi - nxy + nxy 8. Thus, collecting like terms: LHS = å xi yi - å yxi 9. Applying the rules of summation: n LHS = å(xi yi - yxi ) i=1 10. Factorizing: n LHS = å xi (yi - y) = RHS i=1 Question 2: n 1. Since n å(x - x)(y - y) = å x (y - y). i i i i=1 i i=1 2. We are required to prove that n n å(xi - x)2 = å xi (xi - x) i=1 i=1 n n n 3. LHS = å(xi - x)2 = å(xi - x)(xi - x) = å(xi2 - xi x - xi x + x ) i=1 n i=1 i=1 n n i=1 i=1 4. LHS = å xi2 - 2xå xi + å x i=1 n 5. Since 2 n åx = åx i i=1 n i=1 n n i=1 i=1 LHS = å x - 2xå x + å x 2 i i=1 n 6. Since 2 2 å k = nk , i=1 n n n LHS = å xi2 - 2x ·nx + nx = å xi2 - 2nx + nx = å xi2 - nx 2 i=1 2 2 i=1 i=1 2 7. Applying the rules of summation, and the rule from Q1 point 1: n n n n n n LHS = å x - å x = å x - å x · x = å x - xå xi 2 i i=1 i=1 2 2 i i=1 2 i i=1 8. Applying the rules of summation: n LHS = å xi2 - xxi i=1 9. Factorizing: n LHS = å xi (xi - x) = RHS i=1 i=1 i=1 Question 3: Part 1: 1. Show that n ¶ f (b0 , b1 ) = -2å(yi - b0 - b1 xi ) ¶b0 i=1 n 2. f (b0 , b1 ) = å(y1 - b0 - b1 xi )2 i=1 3. Applying the chain rule ¶ ( f (x))n = n( f (x))n-1 · f '(x) : ¶x n ¶ f (b0 , b1 ) = 2å(-1)(yi - b0 - b1 xi ) ¶b0 i=1 4. And simplifying n ¶ f (b0 , b1 ) = -2å(yi - b0 - b1 xi ) , as required. ¶b0 i=1 Part 2: ¶ f (b0 , b1 ) = 0 when b0 º B̂0 = y - b1 x ¶b0 2. Substituting for b0 : 1. Then show that n n ¶ f (b0 , b1 ) = -2å(yi - (y - b1 x) - b1 xi ) = -2å(yi - y + b1 x - b1 xi ) ¶b0 i=1 i=1 3. Applying the rules of summation: n n n n n n én ù én ù = -2 êå yi - å y + å b1 x - å b1 xi ú = -2 êå yi - å y + b1 (å x - å xi )ú ë i=1 û ë i=1 û i=1 i=1 i=1 i=1 i=1 i=1 n 4. Since n åx = åx : i i=1 n i=1 n n n é ù = -2 êå y - å y + b1 (å x - å x)ú ë i=1 û i=1 i=1 i=1 5. Collecting like terms: = -2 [ 0 + b1 (0)] ¶ 6. Thus, when b0 º B̂0 = y - b1 x , f (b0 , b1 ) = 0 , as required. ¶b0 Question 4: Define ¶ f (b0 , b1 ) : ¶b1 n 1. f (b0 , b1 ) = å(yi - b0 - b1 xi )2 i=1 2. Applying the chain rule: ¶ ( f (x))n = n( f (x))n-1 · f '(x) : ¶x n ¶ f (b0 , b1 ) = 2å(-xi )(yi - b0 - b1 xi ) ¶b1 i=1 Question 5: n ¶ Setting f (b0 , b1 ) = 0 , show that b1 º B̂1 = ¶b1 å(x - x)(y - y) i i i=1 n å(x - x) 2 i i=1 1. n ¶ f (b0 , b1 ) = 0 = 2å(-xi )(yi - b0 - b1 xi ) ¶b1 i=1 n n n n i=1 i=1 i=1 2. 0 = 2å(-xi yi + b0 xi + b1 x i2) = -å xi yi + b0 å xi + b1 å x i2 i=1 n 3. n n å(x y ) - b å x = b å x i i i 0 i=1 1 i=1 i=1 n n å(x y ) - b å x i i 4. b1 = 2 i i 0 i=1 i=1 n åx 2 i i=1 5. Now, substitute b0 with y - b1 x : n b1 = n å(xi yi ) - (y - b1 x)å xi i=1 i=1 n åx 2 i i=1 n n n n i=1 i=1 n 6. b1 å x = å(xi yi ) - yå xi + b1 xå xi 2 i i=1 n i=1 n n i=1 i=1 n n i=1 i=1 7. b1 (å x i2 + xå xi ) = å(xi yi ) - yå xi i=1 n n i=1 8. b1 (å x i2 + å xi x ) = å(xi yi ) - å(xi y) i=1 n i=1 n 9. b1 (å x i2 + xi x) = å(xi yi - xi y) i=1 n i=1 n 10. b1 (å xi (xi + x) = å xi (yi - y) i=1 n 11. b1 (å xi (xi + x) i=1 n å x (x + x) i i=1 n i = i i=1 n å x (y - y) i 12. b1 = i i=1 n å x (x + x) i i=1 i å x (y - y) i i=1 n å x (x + x) i i=1 i , when b0 = y - b1 x n 13. Substitute (as proved above) n å x (y - y) = å(x - x)(y - y), and i i i=1 n n i=1 i=1 å xi (xi - x) = å(xi - x)2 n å(x - x)(y - y) i 14. Thus, b1 = i i=1 n å(x - x) i i=1 2 , as required i i=1 i Question 6 Use the appropriate second-order test from Mathematics to show that the above choices for (b0 , b1 ) give a minimum for f (b0, b1 ). By deriving this two factor function twice, in all its forms, and using the formula D = AC - B2 , it is possible to prove, if D > 0 , that the outcome is a local minimum. The function is: n 1. f (b0 , b1 ) = å(y1 - b0 - b1 xi )2 i=1 The first derivatives are: n n n ¶ 2. f (b0 , b1 ) = -2å(yi - b0 - b1 xi ) = -2å yi + 2nb0 - 2b1 å xi ¶b0 i=1 i=1 i=1 n n n n ¶ f (b0 , b1 ) = 2å(-xi )(yi - b0 - b1 xi ) = -2å xi yi + 2b0 å xi + 2b1 å xi2 ¶b1 i=1 i=1 i=1 i=1 The second derivatives are: ¶2 4. f (b0 , b1 ) = 2n = A ¶b02 3. 5. n ¶2 f (b0 , b1 ) = 2å xi = B ¶b0 b1 i=1 n ¶2 f (b0 , b1 ) = 2å xi2 = C 2 ¶b1 i=1 Applying the test D = AC - B2 6. n n 7. D = (2n)(2å x ) - (2å xi )2 2 i i=1 é n 2 n ù 8. = 4 ênå x i - (å xi )2 ú ë i=1 û i 9. since n n i=1 i=1 i=1 å xi = å x n é n 2 n 2ù é n 2 n ù é n 2 ù é n 2 2 2ù én 2 = 4 ênå x i - (å x ) ú = 4ênå x i - (å x )(å x )ú = 4 ênå x i - (nx)(nx)ú = 4 ênå x i - n x ú = 4n êå x i ë i=1 û ë i=1 û ë i=1 û ë i=1 û ë i=1 i i i é n 2 n 2ù 10. = 4n êå x i - å x ú ë i=1 û i=1 Thus, D is always positive, and a local minimum exists. (b0 , b1 ) , thus, gives a minimum for f (b0, b1 ). Part 2: Econometrics Question 1 (1 mark) Read the given data file cars.xls into EVIEWS, and run a regression of KMLIT against the rest of the variables assuming homoskedastic errors. Copy and paste the EVIEWS output into the space below, and report the estimated equation with the standard errors below the coefficients. Regression output Dependent Variable: KMLIT Method: Least Squares Date: 05/23/11 Time: 12:35 Sample: 1 392 Included observations: 392 Variable Coefficient Std. Error t-Statistic Prob. CYL ENGCM3 HP WTKG C -0.160554 2.72E-05 -0.015504 -0.004089 16.23109 0.146039 0.000196 0.004587 0.000561 0.540492 -1.099391 0.139041 -3.379960 -7.286719 30.03019 0.2723 0.8895 0.0008 0.0000 0.0000 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) 0.704063 0.701004 1.511960 884.6908 -715.7633 230.1772 0.000000 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat 8.297429 2.765078 3.677364 3.728017 3.697439 0.887708 Equation Y = B1(X1) KMLIT = -0.160554(CYL) Std. Errors (0.146039) +B2(X2) +2.72E-05(ENGCM3) (0.000196) +B3(X3) +B4(X4) +U -0.015504(HP) -0.004089(WTKG) 16.23109 (0.004587) (0.000561) 0.540492 Question 2 (1 mark) Comment on the sign of each estimated coefficient in turn, and state whether this is what you expect. Ignore significance at this stage. Ans The coefficient “CYL”, or number of cylinders, has a negative sign: as the number of cylinders increases, the fuel efficiency decreases. This is to be expected, since, cetirus paribus, increase in the number of cylinders with out an increase in horsepower, weight or engine capacity would increase the friction on the motor, decreasing efficiency and reducing the amount of kilometres that you are able to drive per litre. The coefficient “ENGCM3”, or the capacity of the engine in centimeters squared, has a positive sign: increasing engine capacity increases the efficiency of the engine. Whilst an engine with a higher capacity may run at lower revs, the positive sign is still unexpected, since more petrol is used. The coefficient “HP”, or the power of the engine, has a negative sign: as power increases, the fuel efficiency decreases. Whilst this is to be expected at the higher range (increasing power above a certain point would decrease efficiency) this is not to be expected at the lower range: where there is not enough power, you would need to use more throttle to maintain speed, using more petrol. However, this is a relatively rare case, and in most cases, the negative relationship is to be expected. The coefficient “WTKG”, or weight in kilos has a negative sign: the heavier the car, the less fuel efficient. This is to be expected since more power will need to be used to move the increased weight. Question 3 (1 mark) Interpret the estimated effect of the engine power (HP) on kilometers travelled. Answer An increase of 1 hp will lead to a 0.0155 kilometres per litre decrease in efficiency. Further, this coefficient is statistically significant (with a 99.92% chance of being non-zero). Question 4 (1 mark) Test whether the data supports the hypothesis that Engine size does affect a car’s mileage (i.e. how far it can travel per litre). Formulate and carry out an appropriate hypothesis test using the t-statistic approach at the 5% significance level. Answer To prove engine size (ENGCM3) affects mileage (KMLIT), ENGCM3≠0. 1. H0 :KMLIT=0 2. H1 :KMLIT≠0 3. The level of significance is 5%; this is a two tailed test, so 2.5% per tail. a. tcrit = −1.96 4. tact = ̂1 −𝛽1,0 𝛽 ̂1 SE𝛽 (2.72×10−05 )−0 a. = 0.000196 b. ≈ 0.1388 (4.d.p) 5. Since tact < |tcrit |, we do not reject the null hypothesis: thus, engine size does not statistically significantly affect mileage at the 5% level. Question 5 (1 mark) Test whether the number of cylinders affects a car’s mileage. Formulate and carry out an appropriate hypothesis test using the p-values approach, at the 5% level. Answer 1. H0 :CYL=0 2. H1 :CYL≠0 3. The level of significance is 5%; this is a two tailed test. a. Thus, if P-val<0.05, accept at the 5% level. 4. tact = ̂1 −𝛽1,0 𝛽 ̂1 SE𝛽 (−0.160554)−0 a. = 0.146039 b. -1.099391≈ 1.10 5. P-Val=2Φ(-|tact |) a. 2 Φ-1.10 b. 2(0.1357) c. 0.2714= 27.14% 6. Since P-Val>5%, we reject H0 , Cylinder size is not a statistically significant variable in determining mileage. Question 6 (2 marks) Test the following hypotheses about the coefficients on CYL (B1) and ENGCM3 (B2). Clearly specify the rejection region if you are using critical values, and clearly state your conclusions. When using p-values, calculate and compare your p-values to the test size then state your conclusion. (Hint, assume the Central Limit Theorem holds) (a) H0: B1 0 , H1: B1 0 , with α=0.05 using the critical-value approach. (b) H0: B1 0 , H1: B1 0 , with α=0.05 using the critical-value approach. (c) H0: B2 0 , H1: B2 0 , with α=0.05 using the p-value approach. (d) H0: B2 0 , H1: B2 0 , with α=0.05 using the p-value approach. Answer Part A To prove that B>0, we must reject H0: B1=0, in the right tail of the distribution. In order to prove that B1 is greater than 0, we must show that Tact(B)>Tcrit(B). Thus, Tact(B) must be in the rejection region, in the right 5% of the distribution, to the right of Tcrit(B)=2.57. Cyl(B1) TactCyl(B1)=-1.10 (from the E-views output). This is to the left of TcritCyl(B1), thus, we cannot reject H0, and, thus, we cannot show that Cyl(B1) is statistically significantly greater than 0. ENGCM3(B2) TactENGCM3(B2)=0.14 (from the E-views output). This is to the left of TcritCyl(B1), thus, we cannot reject H0, and, thus, we cannot show that ENGCM3(B2) is statistically significantly greater than 0. Part B To prove that B<0, we must reject H0: B1=0, in the left tail of the distribution. In order to prove that B1 is less than 0, we must show that Tact(B)<Tcrit(B). Thus, Tact(B) must be in the rejection region, in the left 2.5% of the distribution, to the left of Tcrit(B)=-2.57. Cyl(B1) TactCyl(B1)=-1.10 (from the E-views output). This is to the right of TcritCyl(B1), thus, we cannot reject H0, and, thus, we cannot show that Cyl(B1) is statistically significantly less than 0. ENGCM3(B2) TactENGCM3(B2)=0.14 (from the E-views output). This is to the right of TcritCyl(B1), thus, we cannot reject H0, and, thus, we cannot show that ENGCM3(B2) is statistically significantly greater than 0. Part C To prove that B>0, we must reject H0: B1=0, in the right tail of the distribution. In order to prove that B1 is greater than 0, we must show that P-Val(B)>PCrit(B). Thus, P-Val(B) must be in the rejection region, in the right 5% of the distribution (P-Val(B) must be equal to or greater than 95%). Cyl(B1) P-Val Cyl(B1)=.27=27% (from the E-views output). This is to the left of PcritCyl(B1), thus, we cannot reject H0, and, thus, we cannot show that Cyl(B1) is statistically significantly greater than 0. ENGCM3(B2) P-Val ENGCM3(B2) =.89= 89% (from the E-views output). This is to the left of PcritCyl(B1), thus, we cannot reject H0, and, thus, we cannot show that Cyl(B1) is statistically significantly greater than 0. Part C To prove that B<0, we must reject H0: B1=0, in the left tail of the distribution. In order to prove that B1 is less than 0, we must show that P-Val(B)<P-Crit(B). Thus, P-Val(B) must be in the rejection region, in the left 5% of the distribution (P-Val(B) must be equal to or greater than 5%). Cyl(B1) P-Val Cyl(B1)=.27=27% (from the E-views output). This is to the right of PcritCyl(B1), thus, we cannot reject H0, and, thus, we cannot show that Cyl(B1) is statistically significantly less than 0. ENGCM3(B2) P-Val ENGCM3(B2) =.89= 89% (from the E-views output). This is to the right of P-critCyl(B1), thus, we cannot reject H0, and, thus, we cannot show that Cyl(B1) is statistically significantly less than 0. Question 7 (1 mark) Formulate a hypothesis test to test whether a unit increase in a car’s weight (mass) has a greater detrimental effect on fuel efficiency than a unit increase in the power of the car’s engine, rather than the same effect. Use reparameterization to convert the model to allow you to test this hypothesis using a simple t-test. Answer We want to test that B4>B3 (that weight has a greater effect on efficiency than engine power). The original regression is of the form Yi = 𝛽0 + 𝛽1 (Cyl)+𝛽2 (ENGCM3) + 𝛽3 (HP) + 𝐵4 (WTKG) However, it is not possible to do the appropriate hypothesis test on this: thus, we use re-parameterization: 𝛽4 − 𝛽3 = 𝜃. Thus: Yi = 𝛽0 + 𝛽1 (Cyl)+𝛽2 (ENGCM3) + 𝛽3 (HP) + (𝜃 + 𝛽3 )(WTKG) Yi = 𝛽0 + 𝛽1 (Cyl)+𝛽2 (ENGCM3) + 𝛽3 ((HP) + (WTKG)) + (𝜃)(WTKG) Now, we substitute (HP) + (WTKG) = 𝑋̅ Yi = 𝛽0 + 𝛽1 (Cyl)+𝛽2 (ENGCM3) + 𝛽3 (𝑋̅) + (𝜃)(WTKG) Now, we can do a hypothesis test on θ to solve the problem. H0: θ=0; H1: θ>0. If we can reject H0, then θ>0, and B4>B3, (weight is statistically significantly greater than hp in effect). Dependent Variable: KMLIT Method: Least Squares Date: 05/26/11 Time: 14:59 Sample: 1 392 Included observations: 392 Variable Coefficient Std. Error t-Statistic Prob. C CYL ENGCM3 HP+WTKG WTKG 16.23109 -0.160554 2.72E-05 -0.015504 0.011415 0.540492 0.146039 0.000196 0.004587 0.004721 30.03019 -1.099391 0.139041 -3.379960 2.418110 0.0000 0.2723 0.8895 0.0008 0.0161 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) 0.704063 0.701004 1.511960 884.6908 -715.7633 230.1772 0.000000 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat 8.297429 2.765078 3.677364 3.728017 3.697439 0.887708 1. H0: θ=0; H1: θ>0. 2. Rejection region is the right tail of the distribution, equal to or above Pcrit (0.05) 3. P-val (θ)=0.016 (θ is positive, and significant) 4. Thus, we can reject H0 and accept H1: θ>0. Since θ is statistically significantly greater than 0, WTKG-HP>0, thus weight has a greater per-unit effect on mileage than hp. Question 8 (1 mark) Verify that the “OLS Wonder Equation” gives a standard error for the CYL coefficient close to 0.145. You will need to run a regression of CYL on all the other independent variables, and you must include this regression output below. (Remember, the OLS Wonder Equation gives an estimate of the homoskedasticity consistent standard error). Dependent Variable: CYL Method: Least Squares Date: 05/26/11 Time: 13:33 Sample: 1 392 Included observations: 392 Variable Coefficient Std. Error t-Statistic Prob. ENGCM3 HP WTKG C 0.000915 -0.002908 0.000426 2.285539 4.97E-05 0.001588 0.000194 0.147783 18.42238 -1.831399 2.195681 15.46555 0.0000 0.0678 0.0287 0.0000 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) 1. SE(Cyl)≈ 0.905786 0.905057 0.525600 107.1870 -302.0733 1243.421 0.000000 𝑆u ̂ 𝑆xi × Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat 1 2 √𝑛(1−𝑅𝑥𝑖 𝑜𝑛 𝑥 ) 1.511 2. SE(Cyl)≈ 1.706 × 3. SE(Cyl)≈0.1459 1 √392(1−.906) 5.471939 1.705783 1.561598 1.602121 1.577659 1.479510 Question 9 (1 mark) Test the following joint hypothesis about the coefficients on CYL (B1) and ENGCM3 (B2): H0: B1 0 and B2 0 , H1: B1 0 or B2 0 , with α=0.05. Along with the previous results, what do you conclude about B1 and B2? Is this consistent with your intuition? Answer The Regression is as follows: Yi = 𝛽0 + 𝛽1 (Cyl)+𝛽2 (ENGCM3) + 𝛽3 (HP) + 𝐵4 (WTKG) + 𝐶 To create the restricted regression, we assume the null hypothesis (H0: B1=B2=0) is true. Thus, the following regression is formed: Yi = 𝛽0 + 𝛽3 (HP) + 𝐵4 (WTKG) + 𝐶 This has two restrictions and 392 observations, thus, the F-crit value is (at 5% significance) 3.00. Dependent Variable: KMLIT Method: Least Squares Date: 05/26/11 Time: 14:21 Sample: 1 392 Included observations: 392 Variable Coefficient Std. Error t-Statistic Prob. HP WTKG C -0.016962 -0.004488 16.13025 0.003947 0.000394 0.282533 -4.297243 -11.38779 57.09155 0.0000 0.0000 0.0000 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) 0.702602 0.701073 1.511786 889.0585 -716.7285 459.5047 0.000000 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat 8.297429 2.765078 3.672084 3.702477 3.684130 0.882036 (𝑅 2 −𝑅 2 )/𝑞 𝑅 1. 𝐹 = (1−𝑅2𝑈𝑅)/(𝑛−𝑘−1) 𝑈𝑅 2. (0.704−0.703)/2 (1−0.704)/(392−2−1) 3. 0.657 Since the F statistic for this is smaller than the F critical value, we cannot reject the null hypothesis: thus, the restriction is not void, on CYL (B1) and ENGCM3 (B2) are jointly statistically insignificant. This is consistent with the tests conducted above, and our expectations for the values. Question 10 (2 marks) Can you explain any conflict between the implications of the results obtained about B1 and B2 and your expectations? (Hint: Run auxiliary regressions for the explanatory variables in question against the others, and compute the correlations between all the explanatory variables. What do you notice?). Answer The results above suggest that CYL and ENGCM3 are insignificant variables. Whilst the R squared coefficient of the regression on KMLIT including all the variables is slightly higher than the restricted regression (KMLIT against HP and WTKG), this is to be expected by adding variables. Whilst this improves the predictive power on KMLIT, it does not allow us to further understand causation, since the standard errors would be higher. Further, it seems that a regression on KMLIT against the variables HP and WTKG gives a slightly better adjusted R sqared coefficient: as such, the variables are more explanatory in such a model. Thus, we believe that the coefficients HP and WTKG correllate with CYL and ENGCM3. Whilst leaving the in the regression formula would cause Ommited Variable Bias, this would be slight compared to the inaccuracy on each variable caused by the correlation. This correlation is proved below, and shows that our expectations about CYL and ENGCM3 are not invalidated, but, rather, that their effect is absorbed by the effect of WTKG and HP on KMLIT. CYL is correlated with WTKG and HP CYL is correlated with both WTKG and HP, as shown in the following scatter plots (an increase in CYL, generally, is correlated with an increase in HP and WTKG): 2,400 200 2,000 160 1,600 HP WTKG 240 120 1,200 80 800 400 40 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 CYL CYL Further, an auxillary regression shows that WTKG and HP are excellent explanators of CYL (with an R squared coefficient of greater than 0.8) Dependent Variable: CYL Method: Least Squares Date: 05/27/11 Time: 13:41 Sample: 1 392 Included observations: 392 Variable Coefficient Std. Error t-Statistic Prob. HP WTKG 0.011212 0.003171 0.001856 0.000147 6.042161 21.55655 0.0000 0.0000 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.821785 0.821328 0.721029 202.7542 -427.0075 1.584778 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. 5.471939 1.705783 2.188814 2.209075 2.196844 ENGCM3 is correlated with WTKG and HP ENGCM3 is correlated with both WTKG and HP, as shown in the following scatter plots (an increase in ENGCM3, generally, is correlated with an increase in HP and WTKG): 2,400 200 2,000 160 1,600 HP WTKG 240 120 1,200 80 800 40 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 400 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 ENGCM3 ENGCM3 Further, an auxillary regression shows that WTKG and HP are excellent explanators of ENGCM3 (with an R squared coefficient of greater than 0.75). Dependent Variable: ENGCM3 Method: Least Squares Date: 05/27/11 Time: 13:47 Sample: 1 392 Included observations: 392 Variable Coefficient Std. Error t-Statistic Prob. HP WTKG 21.30413 0.831055 2.074520 0.164440 10.26942 5.053839 0.0000 0.0000 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.779611 0.779046 806.0712 2.53E+08 -3178.553 0.812699 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Correlations computed Using Eviews, we computed the following correlations: CYL ENGCM3 HP CYL 1.00 0.95 0.84 ENGCM3 0.95 1.00 0.90 HP 0.84 0.90 1.00 WTKG 0.90 0.93 0.86 3185.650 1714.836 16.22731 16.24757 16.23534 WTKG 0.90 0.93 0.86 1.00 Conclusion Both CYL and ENGCM3 are correlated closely (and explained well) by HP and WTKG. Thus, the effects of CYL and ENGCM3 on KMLIT is explained via HP and WTKG, and their effect ong KMLIT outside of this is too small to be significant.