DSCI 3710 Excel Assignment #4

advertisement
DSCI 3710
Excel Assignment #4
Food Lion Inc. would like to investigate the feasibility and future prospects of setting up stores in
Denton County. Food Lion has provided you with a sample from a database of household
financial variables. They would like you to use regression techniques to predict the monthly
expenditure on groceries, of families who either rent or own their homes in Denton. The
sample contains 100 records. The various fields in the sample are:
Income1  Annual income of head of household or primary wage earner
Income2  Annual income of secondary wage earner
Famlsize  Size of family (number of people permanently residing in the household)
Ownorent  1 if household is owned; 0 if it is rented
Autodebt  Automobile related debt pending for wage earners in the household
Hpayrent  Household mortgage payment or rent per month
Groc  Monthly expenditure on groceries
Loc  1 = East Denton (E); 0 = West Denton(W); -1 = North Denton (N); 2 = South Denton
(S)
For this assignment you will be given minimal instructions since, having completed the first three
assignments, you should now be quite familiar with Excel functions and pull down menus.
1. The sample data are contained in a file named Assgt#4.xls to be found in the folder Excel
Assignments for All Sections. Download the files from the folder onto a disk.
2. Import the data into the first nine columns (A-I) of your spreadsheet. The first row contains
labels for the variables in the order outlined in the data description. The table below illustrates
how the first 10 data records should look, after this step is completed.
Obs
1
2
3
4
5
6
7
8
9
10
Income1
36557
27045
38878
41448
33136
44308
31997
43437
41625
40140
Income2 Famlsize Ownorent Autodebt Hpayrent
20610
4
0
15290
1339
25490
5
0
14676
1175
0
8
0
10317
1108
0
2
1
9504
729
25300
6
1
17802
875
24559
2
1
20537
1282
25419
4
1
11725
919
0
6
1
12084
970
22802
4
1
14863
945
32158
5
0
15708
1470
Groc
278
220
456
253
344
326
308
311
305
432
Loc
0
0
0
0
0
-1
1
2
0
2
3. Insert three columns prior to the column labeled “Groc”. In these columns, create three logical
1
(dummy) variables to represent the four locations. You may use the IF function from the pull
down menu, to do this. Please follow the illustration below, where: ED=1, if record is for East
Denton; WD=1, if record is for West Denton; and SD=1, if the record is for South Denton (thus
North Denton representing the base). After the creation of the dummy variables, the first 11
records should look as shown below.
Obs
Income1
1
2
3
4
5
6
7
8
9
10
11
36557
27045
38878
41448
33136
44308
31997
43437
41625
40140
31448
Income2 Famlsize
20610
25490
0
0
25300
24559
25419
0
22802
32158
0
4
5
8
2
6
2
4
6
4
5
4
Ownrent
0
0
0
1
1
1
1
1
1
0
0
Autodebt
Hpayrent
15290
14676
10317
9504
17802
20537
11725
12084
14863
15708
6160
1339
1175
1108
729
875
1282
919
970
945
1470
985
ED W SD Groc
D
0 1
0
278
0 1
0
220
0 1
0
456
0 1
0
253
0 1
0
344
0 0
0
326
1 0
0
308
0 0
1
311
0 1
0
305
0 0
1
432
1 0
0
259
4. Conduct the regression analysis with Groc as the dependent variable (Y) and Income1 (X1),
Income2 (X2), Ownorent (X3), Autodebt (X4), Famlsize (X5), and Location variables (X6 through
X8) as the independent variables at 1% level of significance. Use the Regression tool accessed
from the Data Tab/ Data Analysis (in 2003 use Tools/Data Analysis) pull-down menu. Check
only the labels box and specify that you want the output in cell M1. (Make sure that you enter
“Confidence Level = 99%”). Also, make sure to check the Standardized residuals box to obtain
outlier information. A partial output is shown below for your guidance.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.779966
R Square
xxxx
Adjusted R Square 0.5739159
Standard Error
xxxx
Observations
100
ANOVA
Regression
Residual
Total
df
8
91
99
SS
MS
760733.044 95091.6
xxxx
xxxx
xxxx
2
F
xxxx
Significance F
xxxx
Coefficients
Intercept
Income1
Income2
Ownrent
Autodebt
Famlsize
ED
WD
SD
Standard
Error
59.847981 43.87639859
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
t Stat
PLower 95%
Upper
Lower
Upper 99.0%
value
95%
99.0%
1.364 0.176 -27.307103 147.0031 -55.58863 175.284587
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
4. By substituting appropriate values directly into the sample regression equation given in your
output generate a point estimate for (Alternatively, you may use the TREND function from the
pull down menu, and follow directions given in Excel or use the KPK macro to run regression and
for prediction as shown in class):
a) the monthly Grocery Bill for a family of 4 living in a rented home in South Denton,
whose primary income is $42,457 and with the secondary wage earner having an income of
$10000, and Autodebt is $6,000
b) the monthly Grocery Bill for a family of size 7 living in a home in the North Denton that
is owned, whose primary income is $35,000 with the secondary wage earner having an
income of $25,000, and Autodebt is $10,000.
Details on the trend function can be found by clicking the fx icon on the toolbar then selecting
statistical and trend. The new values for Income1, Income2, Ownorent, Autodebt, Famlsize, and
Location variables can be placed in convenient cells in the two rows below your regression output
and the point estimates for Groc evaluated in the adjacent cells.
Use the output from the regression analysis and trend function to answer the following
questions.
A. Write one to three sentences to interpret the meaning of your model’s R-square value.
B. Conduct an F test for the regression model containing the eight independent variables, at the
1% significance level. State the null and alternative hypotheses, the decision, reason for the
decision, and a conclusion.
C. Conduct a t-test, at the 1% significance level, for the usefulness of the dummy (indicator)
variable Ownorent. State the null and alternative hypotheses, the decision, reason for the decision,
and a conclusion.
D. Conduct a t-test, at the 1% significance level, for the usefulness of the variable Income2. State
the null and alternative hypotheses, the decision, reason for the decision, and a conclusion.
3
E. State which of the eight X (independent) variables are statistically significant and which ones are
not, at the 1% significance level.
F. Write one or two sentences on the interpretation of the coefficient of each of the six
independent variables, making specific reference to the value of each coefficient given by your
model.
Experiential Exercise
As you work this assignment think about the following questions. Then form a team of 3 to 5
and discuss each of the following. You can engage in this discussion by meeting or your group
can use a Wiki to engage in an online discussion. Instructions for setting up a Wiki are provided
on the excel assignment page of our course web site.
1. What wording tells you the alternative and null hypothesis?
2. What wording tells you the type of statistical test to perform? For example, is a z or t
statistic appropriate?
3. What wording tells you that this is a one or two tail hypothesis test?
4. What were the steps you used to obtain the calculated value of the test statistic?
5. What were the steps you used to obtain the critical value of the test statistic?
6. How do you use the calculated and critical value to make a statistical decision about this
test?
7. How do you obtain the p value for the test statistic?
8. How do you use the p value and the level of significance to make a statistical decision
about this test?
9. How does the result of your test relate to the statistical significance of your findings?
10. What managerial implications can you conclude from the results of your test?
To be ready for the “Excel Quiz 43” HLS Web Test you should prepare the following:
1. A printout of the regression analysis
2. The results of the prediction/estimation of Groc (computed directly or by using the trend
function) in the two rows immediately below the regression output
3. Answers for questions A-F
4. Clearly labeled or highlighted parts of the output that pertain to each answer for ques. A-F.
The Excel assignments are each graded via a short Excel Quiz in HLS Web Test that is open for
about 48 hours as listed in the syllabus and in your HLS progress report. You are expected to use
your output and written answers to complete the quiz. You are not required to turn in the output.
The questions below are much like the quiz you will have in WEBTEST. If you can answer these,
you should have no difficulty with those that will be asked. However, these questions are not the
exact questions that you will have to answer.
4
SAMPLE WEBTEST QUIZ: The correct answers to the sample questions are highlighted.
1. What is the p-value of the test statistic for the global F test for the regression model?
A. 0.444
B. 0.0000
C. 0.9650
D. 50
E. 2.0696
2. What is the calculated value of the test statistic to conduct the test for the usefulness of the
Autodebt variable?
A. 118.98
B. 0.9650
C. 0.234
D. -1.314
E. 0.0079
3. The best interpretation of R-square for this regression analysis is
A. 60.8 percent of the total variation in a family's expenditure on groceries is explained by
regression on the variables X1-X8
B. 77.80 percent of the total variation in a family's expenditure on groceries is explained by
regression on the variables X1-X8
C. 57.4 percent of the total variation in a family's expenditure on groceries is explained by
regression on the variables X2-X8
D. 72.84 percent of the total variation in a family's expenditure on groceries is explained by
regression on the variables X2-X8
E. 95 percent of the total variation in a family's expenditure on groceries is explained by
regression on the variables X1-X8
4. The result of the t-test for the usefulness of the Income2 (X2) variable, at  = .05 is to
A. F.T.R. Ho, since the p-value is greater than .05. Conclude X2 is not useful for predicting
Y
B. F.T.R. Ho, since the p-value is greater than .05. Conclude X2 is useful for predicting Y
C. Reject Ho, since the p-value is less than .05. Conclude X1 is not useful for predicting Y
D. F.T.R. Ho, since the p-value is less than .05. Conclude X1 is not useful for predicting Y
E. Reject Ho, since the p-value is less than .05. Conclude X1 is useful for predicting Y
5. What is the coefficient for Income2 in the sample regression equation?
A. 1. 20.10
B. -1.2504
C. 0.00104
D. 118.97
E. 0.0115
5
Download