Answers

advertisement
Exam 1 answers
Data is available as listed below on several financial and demographic variables for households
sampled from a large city in the Western United States.
INCOME1
INCOME2
FAMLSIZE
OWNORENT
TOTLDEBT
HPAYRENT
LOCATION
INCOME
Principal wage earner income ($)
Secondary wage earner income ($)
Family size
1 = own home, 0 = rent home
Total debt (excluding home mortgage) ($)
Home mortgage payment or rent ($)
Location of residence (NE=1, NW=2, SW=3, SE=4)
Total family income (sum of income1 and income2)
For each of the following short scenarios, describe briefly and specifically how you would
analyze the data:
1. a. You wish to test whether population mean total debt is less than $20000. Set up appropriate
hypotheses. Also explain any assumption that you would need to make and how you would
check.
H0: µ > 20,000 vs. H1: µ < 20,000 You need to know whether the data is normal or not. If the
sample size is at least 30, the data can be non-normal and the results will still be accurate.
b. You wish to test whether more than 60% of households own their home. Set up appropriate
hypotheses.
H0: π < 0.6 vs H1: π > 0.6
2. You feel that total family income of a household depends to a great degree on the location of
residence. State the specific hypotheses for this type of problem.
H0: Means are equal for the 4 locations vs H1: Not all means are equal
3. You wish to know how much family size influences total debt. Specifically you wish to be 95%
confident about how much total debt changes as family size increases. What would you use to address
this concern?
A 95% confidence interval for the population slope given as Lower and Upper 95% values.
4. In using family size to predict total debt, you have a direct relationship. You notice that prediction
errors were much larger for large families than for small families. What type of problem does this cause
and how would you confirm by using a graph that this type of problem really exists?
Plot residuals against predicted values to check for constant variability.
1
Refer to Appendix A. The analysis refers to the amount of total debt a household has
depending upon whether the head of household owns or rents the home.
5. Check the assumptions for this test by using Appendix A.
Normality not OK for owners but sample size larger than 30 makes up for that. Data is normal
for renters. Equality of variance test allows to accept the assumption of equal variability.
6. Given your check of assumptions what test should you use?
t-test pooled variance
7. Given the results of checking the assumptions set up appropriate hypotheses
Use a level of significance of .05. Write up a conclusion stating your confidence.
H0: µ1 = µ2 vs H1: µ1 ≠ µ2
We can be 98.58% confident of a difference between the population means and so can conclude
that they are different. Sample data shows owners have higher total debt ($18,620 to $15,509)
For this last set of questions, refer to Appendix B. The dependent variable is Total Debt ($),
and the independent variables are the income of the household ($) and a dummy variable for
whether the family owns (1) or rents (0)..
8. Use the regression stats table to interpret clearly the meaning of the measures Adjusted R2 and
Std. Error.
The prediction equation using income and Own indicates a linear relationship that explains
79.1 percent of the variability in Total Debt after adjusting for the number of variables in the
equation. This relationship can predict Total Debt with a typical error of $2218.73.
9. Interpret the number 2,167.337 in the coefficients column
Total Debt is @2,167.337 higher on the average for families that own their home than those
who rent holding income constant.
2
10. Interpret clearly the meaning of the 95% lower and upper limits across from the number .252.
For each additional dollar of income, holding home ownership constant, total debt increases
by $.252 (or we could say a hundred extra dollars of income is associated with a $25.20
increase in total debt). We can be 95 % confident that this increase is somewhere between
$.20 and $.30.
11. Set up specific hypotheses to test for a relationship between total debt and income and write
up a sentence detailing what you conclude and how confident you can be. You may use a
significance level of .05. What specifically is meant by a relationship in this case?
H0: β1 = 0 (or Total debt and income are not related)
H1: β1 ≠ 0 (or Total Debt and income are related)
We can be almost 100% confident that a relationship exists and so we can conclude a
relationship at the 95% level. A relationship in this case means that increased income is
associated with increased total debt.
12. Although none of the Cook’s D values give any real concern the largest value was observation
23. Discuss why that might have the largest value by looking at leverage and studentized
residuals.
Cook’s D is high when the combination of high leverage and large residuals occur. In this case
both obs 23 and 24 have large residuals but obs 23 has higher leverage than obs 24.
13. Interpret the meaning of the 95% Prediction Interval for when Income =$69,483 and the family
owns their home.
We can be 95% confident that when family income = $69,483 and the family owns their home
that total debt will fall between $14,453 and $23,778.
3
14. Use the information given in Appendix B to address any potential problems with the assumptions
for using a simple linear regression. For each assumption: A. State the assumption B. How
did you check the assumption? (What chart did you use?) C. What do you conclude?
Constant variability
Residuals vs predicted
This assumption looks OK since there is no evidence of increasing variability as the
predicted value increases
Linearity
Income residual plot
Linearity looks OK since no rainbow or smile patterns are seen.
Normality
Last Plot
Seeing no real departure from a straight line, it looks like normality is OK
15. Do you need to check the assumption of independence for this problem? Why?
Independence – Not time series, so irrelevant
4
Appendix A
Descriptive statistics
count
mean
sample standard deviation
skewness
kurtosis
Own
38
18,620.29
4,356.16
1.38
-0.90
Rent
23
15,509.09
5,133.12
-0.04
-0.83
Hypothesis Test: Independent Groups (t-test, pooled variance)
Own
18,620.29
4,356.16
38
Rent
15,509.09 mean
5,133.12 std. dev.
23 n
2.527 t
.0142 p-value (two-tailed)
F-test for equality of variance
26,348,878 variance: Rent
18,976,156 variance: Own
1.39 F
.3696 p-value
Hypothesis Test: Independent Groups (t-test, unequal variance)
Own
18,620.29
4,356.16
38
Rent
15,509.09 mean
5,133.12 std. dev.
23 n
2.426 t
.0199 p-value (two-tailed)
Wilcoxon - Mann/Whitney Test
n
38
23
61
sum of ranks
1334
Own
557
Rent
1891
total
1178.000
67.199
2.321
.0203
expected value
standard deviation
z
p-value (two-tailed)
5
Appendix B
Regression Analysis
R²
Adjusted R²
R
Std. Error
ANOVA table
Source
SS
Regression 550,181,195.9025
Residual 132,859,807.5642
Total 683,041,003.4667
Regression output
variables
Intercept
INCOME
Own
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
coefficients
-559.7520
0.2520
2,167.3373
TOTLDEBT
19,795.0
19,539.0
14,045.0
12,943.0
23,658.0
27,532.0
16,056.0
16,537.0
20,321.0
21,492.0
8,416.0
22,661.0
10,616.0
20,938.0
18,933.0
14,331.0
15,726.0
22,097.0
26,116.0
17,548.0
21,306.0
10,195.0
15,572.0
10,988.0
13,923.0
19,876.0
10,594.0
18,246.0
16,904.0
13,480.0
0.805
0.791
0.897
2218.273
df
2
27
29
n 30
k 2
Dep. Var. TOTLDEBT
MS
275,090,597.9512
4,920,733.6135
F
55.90
std. error
t (df=27)
p-value
0.0249
885.6143
10.108
2.447
1.13E-10
.0212
Predicted
20,843.9
20,167.3
13,973.7
14,618.0
21,592.2
24,010.7
19,313.5
15,075.8
22,789.9
23,638.0
9,923.3
20,945.7
9,483.3
19,627.2
18,274.9
12,963.2
15,877.6
22,173.3
24,568.6
17,218.1
18,212.7
12,187.9
11,259.3
15,547.3
17,576.0
19,864.7
10,917.6
15,917.6
18,754.1
13,068.6
Residual
-1,048.9
-628.3
71.3
-1,675.0
2,065.8
3,521.3
-3,257.5
1,461.2
-2,468.9
-2,146.0
-1,507.3
1,715.3
1,132.7
1,310.8
658.1
1,367.8
-151.6
-76.3
1,547.4
329.9
3,093.3
-1,992.9
4,312.7
-4,559.3
-3,653.0
11.3
-323.6
2,328.4
-1,850.1
411.4
Leverage
0.062
0.056
0.083
0.073
0.071
0.115
0.141
0.067
0.090
0.245
0.171
0.063
0.181
0.146
0.048
0.102
0.058
0.080
0.129
0.049
0.048
0.119
0.146
0.061
0.048
0.053
0.152
0.112
0.133
0.099
p-value
2.52E-10
confidence interval
95% lower 95% upper
0.2008
350.2069
Studentized
Residual
-0.488
-0.291
0.034
-0.784
0.966
1.688
-1.585
0.682
-1.167
-1.114
-0.746
0.799
0.564
0.639
0.304
0.650
-0.070
-0.036
0.747
0.153
1.429
-0.957
2.103
-2.121
-1.688
0.005
-0.158
1.114
-0.896
0.195
0.3031
3,984.4678
Studentized
Deleted
Residual Cook 's D
-0.481
0.005
-0.286
0.002
0.033
0.000
-0.778
0.016
0.965
0.024
1.751
0.124
-1.633
0.138
0.675
0.011
-1.175
0.045
-1.119
0.134
-0.740
0.038
0.793
0.014
0.557
0.024
0.632
0.023
0.299
0.002
0.643
0.016
-0.069
0.000
-0.035
0.000
0.741
0.028
0.150
0.000
1.458
0.034
-0.955
0.041
2.257
0.251
-2.280
0.098
-1.751
0.048
0.005
0.000
-0.155
0.001
1.119
0.052
-0.892
0.041
0.192
0.001
Predicted values for: TOTLDEBT
95% Confidence Intervals
INCOME
69,483
42,329
Own
Predicted
1 19,115.790
1 12,273.573
lower
upper
18,104.146 20,127.434
10,718.389 13,828.756
95% Prediction Intervals
lower
14,453.199
7,463.695
upper
23,778.381
17,083.450
Leverage
0.049
0.117
6
APPENDIX C
Residuals by Predicted
Residuals by INCOME
6,654.8
Residual (gridlines = std. error)
Residual (gridlines = std. error)
6,654.8
4,436.5
2,218.3
0.0
-2,218.3
-4,436.5
-6,654.8
4,436.5
2,218.3
0.0
-2,218.3
-4,436.5
-6,654.8
20,000
5000 10000 15000 20000 25000 30000
Predicted
40,000
60,000 80,000 100,000
INCOME
Normal Probability Plot of Residuals
5,000.0
4,000.0
3,000.0
Residual
2,000.0
1,000.0
0.0
-1,000.0
-2,000.0
-3,000.0
-4,000.0
-5,000.0
-6,000.0
-3.0
-2.0
-1.0
0.0
1.0
Normal Score
2.0
3.0
7
Download