DSCI 3710 Excel Assignment #4 Food Lion Inc. would like to investigate the feasibility and future prospects of setting up stores in Denton County. Food Lion has provided you with a sample from a database of household financial variables. They would like you to use regression techniques to predict the monthly expenditure on groceries, of families who either rent or own their homes in Denton. The sample contains 100 records. The various fields in the sample are: Income1 Annual income of head of household or primary wage earner Income2 Annual income of secondary wage earner Famlsize Size of family (number of people permanently residing in the household) Ownorent 1 if household is owned; 0 if it is rented Autodebt Automobile related debt pending for wage earners in the household Hpayrent Household mortgage payment or rent per month Groc Monthly expenditure on groceries Loc 1 = East Denton (E); 0 = West Denton(W); -1 = North Denton (N); 2 = South Denton (S) For this assignment you will be given minimal instructions since, having completed the first three assignments, you should now be quite familiar with Excel functions and pull down menus. 1. The sample data are contained in a file named Assgt#4.xls to be found in the folder Excel Assignments for All Sections. Download the files from the folder onto a disk. 2. Import the data into the first nine columns (A-I) of your spreadsheet. The first row contains labels for the variables in the order outlined in the data description. The table below illustrates how the first 10 data records should look, after this step is completed. Obs 1 2 3 4 5 6 7 8 9 10 Income1 36557 27045 38878 41448 33136 44308 31997 43437 41625 40140 Income2 Famlsize Ownorent Autodebt Hpayrent 20610 4 0 15290 1339 25490 5 0 14676 1175 0 8 0 10317 1108 0 2 1 9504 729 25300 6 1 17802 875 24559 2 1 20537 1282 25419 4 1 11725 919 0 6 1 12084 970 22802 4 1 14863 945 32158 5 0 15708 1470 Groc 278 220 456 253 344 326 308 311 305 432 Loc 0 0 0 0 0 -1 1 2 0 2 3. Insert three columns prior to the column labeled “Groc”. In these columns, create three logical 1 (dummy) variables to represent the four locations. You may use the IF function from the pull down menu, to do this. Please follow the illustration below, where: ED=1, if record is for East Denton; WD=1, if record is for West Denton; and SD=1, if the record is for South Denton (thus North Denton representing the base). After the creation of the dummy variables, the first 11 records should look as shown below. Obs Income1 1 2 3 4 5 6 7 8 9 10 11 36557 27045 38878 41448 33136 44308 31997 43437 41625 40140 31448 Income2 Famlsize 20610 25490 0 0 25300 24559 25419 0 22802 32158 0 4 5 8 2 6 2 4 6 4 5 4 Ownrent 0 0 0 1 1 1 1 1 1 0 0 Autodebt Hpayrent 15290 14676 10317 9504 17802 20537 11725 12084 14863 15708 6160 1339 1175 1108 729 875 1282 919 970 945 1470 985 ED W SD Groc D 0 1 0 278 0 1 0 220 0 1 0 456 0 1 0 253 0 1 0 344 0 0 0 326 1 0 0 308 0 0 1 311 0 1 0 305 0 0 1 432 1 0 0 259 4. Conduct the regression analysis with Groc as the dependent variable (Y) and Income1 (X1), Income2 (X2), Ownorent (X3), Autodebt (X4), Famlsize (X5), and Location variables (X6 through X8) as the independent variables at 1% level of significance. Use the Regression tool accessed from the Data Tab/ Data Analysis (in 2003 use Tools/Data Analysis) pull-down menu. Check only the labels box and specify that you want the output in cell M1. (Make sure that you enter “Confidence Level = 99%”). Also, make sure to check the Standardized residuals box to obtain outlier information. A partial output is shown below for your guidance. SUMMARY OUTPUT Regression Statistics Multiple R 0.779966 R Square xxxx Adjusted R Square 0.5739159 Standard Error xxxx Observations 100 ANOVA Regression Residual Total df 8 91 99 SS MS 760733.044 95091.6 xxxx xxxx xxxx 2 F xxxx Significance F xxxx Coefficients Intercept Income1 Income2 Ownrent Autodebt Famlsize ED WD SD Standard Error 59.847981 43.87639859 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx t Stat PLower 95% Upper Lower Upper 99.0% value 95% 99.0% 1.364 0.176 -27.307103 147.0031 -55.58863 175.284587 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx 4. By substituting appropriate values directly into the sample regression equation given in your output generate a point estimate for (Alternatively, you may use the TREND function from the pull down menu, and follow directions given in Excel or use the KPK macro to run regression and for prediction as shown in class): a) the monthly Grocery Bill for a family of 4 living in a rented home in South Denton, whose primary income is $42,457 and with the secondary wage earner having an income of $10000, and Autodebt is $6,000 b) the monthly Grocery Bill for a family of size 7 living in a home in the North Denton that is owned, whose primary income is $35,000 with the secondary wage earner having an income of $25,000, and Autodebt is $10,000. Details on the trend function can be found by clicking the fx icon on the toolbar then selecting statistical and trend. The new values for Income1, Income2, Ownorent, Autodebt, Famlsize, and Location variables can be placed in convenient cells in the two rows below your regression output and the point estimates for Groc evaluated in the adjacent cells. Use the output from the regression analysis and trend function to answer the following questions. A. Write one to three sentences to interpret the meaning of your model’s R-square value. B. Conduct an F test for the regression model containing the eight independent variables, at the 1% significance level. State the null and alternative hypotheses, the decision, reason for the decision, and a conclusion. C. Conduct a t-test, at the 1% significance level, for the usefulness of the dummy (indicator) variable Ownorent. State the null and alternative hypotheses, the decision, reason for the decision, and a conclusion. D. Conduct a t-test, at the 1% significance level, for the usefulness of the variable Income2. State the null and alternative hypotheses, the decision, reason for the decision, and a conclusion. 3 E. State which of the eight X (independent) variables are statistically significant and which ones are not, at the 1% significance level. F. Write one or two sentences on the interpretation of the coefficient of each of the six independent variables, making specific reference to the value of each coefficient given by your model. Experiential Exercise As you work this assignment think about the following questions. Then form a team of 3 to 5 and discuss each of the following. You can engage in this discussion by meeting or your group can use a Wiki to engage in an online discussion. Instructions for setting up a Wiki are provided on the excel assignment page of our course web site. 1. What wording tells you the alternative and null hypothesis? 2. What wording tells you the type of statistical test to perform? For example, is a z or t statistic appropriate? 3. What wording tells you that this is a one or two tail hypothesis test? 4. What were the steps you used to obtain the calculated value of the test statistic? 5. What were the steps you used to obtain the critical value of the test statistic? 6. How do you use the calculated and critical value to make a statistical decision about this test? 7. How do you obtain the p value for the test statistic? 8. How do you use the p value and the level of significance to make a statistical decision about this test? 9. How does the result of your test relate to the statistical significance of your findings? 10. What managerial implications can you conclude from the results of your test? To be ready for the “Excel Quiz 43” HLS Web Test you should prepare the following: 1. A printout of the regression analysis 2. The results of the prediction/estimation of Groc (computed directly or by using the trend function) in the two rows immediately below the regression output 3. Answers for questions A-F 4. Clearly labeled or highlighted parts of the output that pertain to each answer for ques. A-F. The Excel assignments are each graded via a short Excel Quiz in HLS Web Test that is open for about 48 hours as listed in the syllabus and in your HLS progress report. You are expected to use your output and written answers to complete the quiz. You are not required to turn in the output. The questions below are much like the quiz you will have in WEBTEST. If you can answer these, you should have no difficulty with those that will be asked. However, these questions are not the exact questions that you will have to answer. 4 SAMPLE WEBTEST QUIZ: The correct answers to the sample questions are highlighted. 1. What is the p-value of the test statistic for the global F test for the regression model? A. 0.444 B. 0.0000 C. 0.9650 D. 50 E. 2.0696 2. What is the calculated value of the test statistic to conduct the test for the usefulness of the Autodebt variable? A. 118.98 B. 0.9650 C. 0.234 D. -1.314 E. 0.0079 3. The best interpretation of R-square for this regression analysis is A. 60.8 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X1-X8 B. 77.80 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X1-X8 C. 57.4 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X2-X8 D. 72.84 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X2-X8 E. 95 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X1-X8 4. The result of the t-test for the usefulness of the Income2 (X2) variable, at = .05 is to A. F.T.R. Ho, since the p-value is greater than .05. Conclude X2 is not useful for predicting Y B. F.T.R. Ho, since the p-value is greater than .05. Conclude X2 is useful for predicting Y C. Reject Ho, since the p-value is less than .05. Conclude X1 is not useful for predicting Y D. F.T.R. Ho, since the p-value is less than .05. Conclude X1 is not useful for predicting Y E. Reject Ho, since the p-value is less than .05. Conclude X1 is useful for predicting Y 5. What is the coefficient for Income2 in the sample regression equation? A. 1. 20.10 B. -1.2504 C. 0.00104 D. 118.97 E. 0.0115 5