Faculty of Economy International Business and Development November 24 2011 3st TEST (Type AB) Economic Statistics Duration – 50 minutes Examination Aids: Calculator Point Value EXERCISE 1 EXERCISE 2 Total 8 2 10 Point Earned In the calculations use no more than two decimal Remember always of Commenting the results obtained EXERCISE 1 –Type AB (eight points) A consumer products company wants to measure the effectiveness of different types of advertising media in the promotion of its products. Specifically the company is interested in the effectiveness of radio advertising (RADIO) and newspaper advertising (NEWS). A sample of 22 cities with approximately equal populations is selected for study during a test period of one month. Each city is allocated a specific expenditure level both for radio advertising and for newspaper advertising. The SALES of the product (in thousands of dollar) and also the levels of media expenditure (in thousands of dollars) during the test month are recorded, with the following results stored in this table. With Gretl we have obtained the following result. Correlation coefficients, using the observations 1 - 22 SALES 1.0000 RADIO 0.6966 1.0000 NEWS 0.5021 -0.0921 1.0000 SALES RADIO NEWS Model 1:OLS, using observations 1-22 Dependent variable: SALES Coefficient 156.43 13.0807 16.7953 const RADIO NEWS Mean dependent var Sum squared resid R-squared F Std. Error 126.758 1.75937 2.96338 VIF 1.009 1.009 1225.136 479760 S.D. dependent var S.E. of regression 345.5701 158.9041 40.15823 P-value(F) 1.50e-07 ANOVA Analysis of Variance: Sum of squares Regression Residual (error) Total t-ratio 2028030 479760 2507790 df Mean square 2 19 21 (A) [point 1] State the multiple regression equation in conventional term and interpret the meaning of the slopes, b2 and b3 in this problem (points 1) (B) [point 1] First, Predict the mean Sales for an expenditure in Radio Advertising of 65000 $ and in Newspaper Advertising of 35000 $ and then evaluate the residual. (C) [points 2] Which type of advertising is more effective? Explain (D) [point 1] Determine whether there is a significant relationship between Sales and the two independent variables (radio advertising and newspaper advertising) at the 0.05 level of significance. Interpret the meaning of the pvalue. (E) [points 2] At the 0.05 level of significance, determine whether each independent variable makes a significant contribution to the regression model. On the basis of these result, indicate the independent variables to include in this model. (F) [point 1] Show how to obtain R squared (R2) from the sums of squares in the ANOVA table. Interpret it. SOLUTION a) State the multiple regression equation in conventional term and interpret the meaning of the slopes, b2 and b3 in this problem (points 1) Sales_hat = 156 + 13.1*Radio + 16.8*Newspaper (127) (1.76) (2.96) R-squared = 0.809 (standard errors in parentheses) In this model, the regression coefficients are interpreted as follows: 1)Holding constant the spending in newspaper advertising , for each increase of 1.0 thousand dollars in radio advertising , the Sales is estimated to increase by 13.1 thousand dollars (i.e., $13100). 2)Holding constant the spending in Radio advertising , for each increase of 1.0 thousand dollars in Newspaper advertising , the Sales is estimated to increase by 16.8 thousand dollars (i.e., $16800). 3)The sample Y intercept (b1 = 156) estimate the value of Sales when there is no money spent on radio advertising and newspaper advertising. Because these value of promotion are outside the range of RADIO and Newspaper used in this market study, and are nonsensical, the value of b1 has little or no practical interpretation. b) First, Predict the mean Sales for an expenditure in Radio Advertising of 65000 $ and in Newspaper Advertising of 35000 $ and then evaluate the residual. Sales_hat = 156 + 13.1*65 + 16.8*35= 1637.5 (1000$) . c) Which type of advertising is more effective? Explain Holding the other independent variable constant, newspaper advertising seems to be more effective because its slope is greater. But in such case if the variability of the two independent variables is different, standardized versions of the regression coefficients provide more meaningful comparisons. In our case we do not know the variability and is better to compute the standardized partial coefficients. beta2 beta3 ry 2 ry 3r23 1 r 2 23 ry 3 ry 2 r23 1 r 2 23 0.6966 0.5021*(0.0921) 0.7492 1 (0.09212 ) 0.5021 0.6966*(0.0921) 0.5711 1 (0.09212 ) The type of advertising more effective is radio advertising d) Determine whether there is a significant relationship between Sales and the two independent variables (radio advertising and newspaper advertising) at the 0.05 level of significance. Interpret the meaning of the p-value. Our next task is to test the "significance" of this model based on that F-ratio using the standard five step hypothesis testing procedure. Hypotheses: H0: all coefficients are zero H1: almost one is different from 0 Critical value: an F-value based on (k-1) numerator df and (n - k) denominator df gives us F(2, 19) at 0.05 = 3.52 Calculated Value: From above the F-ratio is 40.18 Compare: F-calc > F-crit and thus we reject H0. Conclusion: This model has explanatory power with respect to Y. In other words the set of X variables in this model help us explain or predict the Y variable. This model is SIGNIFICANT. The p-value associated to F-calc is 1.50e-07, that is much less than α. So, in another way we can say that the value of F-crit falls in the rejection zone of the null hypothesis. e) [points 2] At the 0.05 level of significance, determine whether each independent variable makes a significant contribution to the regression model. On the basis of these result, indicate the independent variables to include in this model. Our step is to test the significance of the individual coefficients in the equation. We will conduct a t-test for each b associated with an X variable. Mechanically the actual test is going to be the value of b1 (or b2, b3.....bi) over SEb1 (or SEb1...SEbi) compared to a t-critical with n - k ) df (the Error df from the ANOVA table). Or we consider the p-values to determine whether to reject or accept Ho. The Ho being tested by this test is βi = 0. which means this variable is not related to Y. We consider each variable separately and thus must conduct as many t-tests as there are X variables. What NULL are we considering? Hypotheses: we are testing H0: βi=0 This variable is unrelated to the dependent variable at alpha=0.05. With the actual values of the b's and the SEb's, we obtain the t-value (one for each X variable ): tRADIO = 13.0807/1.75937 = 7.435 tNewspaper = 16.7953/2.96338 =5.6676 and comparing them with t-critical value (it is the same for each t-test within a single model) to determine whether to reject or accept the Ho associated with each X. tcritical = 2.093 with 19 df At the 0.05 significance level, reject H0 if t ≥2.093 or t 2.093. Do not reject H0 if 2.093 t 2.093. The critical value from the t-table is t = 2.093 with 19 degrees of freedom. Compare the t statistics ( 7.435 and 5.6676) to the critical value X2 and X3 are significant independent variable . Conclusion: Variables X2 (RADIO) and X3 /Newspaper are significant and contributes to the model's explanatory power f) [point 1] Show how to obtain R squared (R2) from the sums of squares in the ANOVA table. Interpret it. R2 = ESS/TSS or 1-RSS/TSS R2 = 2028030/2507790 = 0.81 81% of the variation in Sales can be explained by variation in the amount of Radio Advertising and Newspaper Advertising. EXERCISE 2 – Type AB (two points) A) [1 point] Standardized multiple regression coefficient: definition, interpretation and use The sizes of regression coefficients in multiple regression models depend on the units of measurement for the variables. To compare the relative effects of two explanatory variables, it is appropriate to compare their coefficients only if the variables have the same units. Otherwise, standardized versions of the regression coefficients provide more meaningful comparisons. The standardized regression coefficient for an explanatory variable represents the change in Y, in Y standard deviations, for a one standard deviation increase in that variable, controlling for the other explanatory variables in the model. We denote them by βeta2, βeta3. If |βeta3| > |βeta2| , for example, then a standard deviation increase in X3 has a greater partial effect on Y than does a standard deviation increase in X2. We standardize the partial regression coefficients by adjusting for the differing standard deviation of Y and each Xj. Let sy denote the sample standard deviation of Y , and let sx2 ; sx3…. sxk denote the sample standard deviations of the explanatory variables. The estimates of the standardized regression coefficients are beta2 b2 sX 2 sY , beta3 b3 sX3 sY .........., betak bk sX k sY B) [1 point] Define the following term: R2 and Adjusted R2 R2 and adjusted R2 R2 is the amount of variance in Y explained by the set of X independent variables. It is expressed as a percentage and thus goes from values of 0 - 100% (or 0 - 1 when expressed in decimal form). Adjusted R2 is "adjusted" for the number of X variables (k-1, in the formula) and the sample size (n in the formula). Both R2 and adjusted R2 are easily calculated. R2 is ESS/TSS and these can be pulled right out of the ANOVA table. The adjusted R2 formula is shown RAdj 2 1 RSS /(n k ) TSS /(n 1) Again both of these can be calculated from the ANOVA table are always provided as part of the computer output.