Class 26 Assignment answers

advertisement
ASSIGNMENT 26
Regression
1. For assignment 14, we used summary statistics for the length of MLB games in 2002 and 2003 to test
the one-tailed alternative that mean length was smaller in 2003 (due to rule changes).
2002 game
length
2003 game
length
sample mean
172.1
165.9
s
12.2
13.7
n
61
51
pooled variance
Numerator of t
Denominator of t
t-stat
one-tailed pvalue
166.4991
6.2
2.4483
2.5324
0.0064
=(60*12.2^2+50*13.7^2)/(60+50)
=172.1 - 165.9
=166.5^.5*(1/61+1/51)^.5
=t.dist.rt(2.53,110)
the difference in average minutes per game is statistically significant.
We reject H0.
As usual, Bo marches to the beat of a different drummer. Bo found the original data for the minutes of
each of the 112 games. Bo created two columns of data: minutes and D2003, where D2003 was 1 for a
game in 2003 and 0 for a game in 2002. Bo regressed minutes on D2003. Fill in each blank. (If it is not
possible to know the exact answer, simply write “NP” for not possible without the data.) [25 points]
Bo’s estimated intercept will be __172.1, the sample mean for 2002___
Bo’s estimated coefficient of D2003 will be __-6.2, the difference in sample means_
The standard error of Bo’s coefficient of D2003 will be __2.45, the denominator of the t-stat
The t-stat associated with Bo’s coefficient of D2003 will be __-2.5324___
The p-value associated with Bo’s coefficient of D2003 will be __0.0064 for Bo’s 1-tialed test, but
0.0128 on the regression output
2. (EMBS Problem 42, page 448. Assigned as one of Class 19 practice problems).
A study claimed that self-employed individuals do not experience greater job satisfaction than
individuals who are not self-employed. Job satisfaction was measured using 18 questions with answers
ranging from 1 to 5. The total score was the measure of job satisfaction. Scores for individuals in four
separate professions are given below and in the spreadsheet “Class 26 Assignment data”.
Lawyer
44
42
74
42
53
50
45
48
64
38
Physical Therapist
55
78
80
86
60
59
62
52
55
50
Cabinetmaker
54
65
79
69
79
64
59
78
84
60
Systems Analyst
44
73
71
60
64
66
41
55
76
62
Are the differences in sample mean job satisfaction scores across the four professions statistically
significant? PLEASE USE REGRESSION TO ANWER THIS QUESTION. Show the relevant regression output
to demonstrate that you used regression to do the required calculations. [25 points]
Stacking the data, creating 3 columns of dummy variables, and regressing the
satisfaction scores on the three dummies produced the following output:
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.537136224
R Square
0.288515323
Adjusted R Square
0.229224933
Standard Error
11.52605744
Observations
40
ANOVA
df
3
36
39
SS
1939.4
4782.6
6722
Intercept
Coefficients
61.2
Standard
Error
3.644859394
Dlawyer
DPT
Dcabinet
-11.2
2.5
7.9
5.154609588
5.154609588
5.154609588
Regression
Residual
Total
MS
F
646.4666667 4.866139757
132.85
Significance
F
0.006080513
t Stat
P-value
16.79077116 1.30888E-18
2.172812472 0.036455868
0.485002784 0.63061271
1.532608797 0.134114132
The null hypothesis is that mean satisfaction is equal for the four groups. This is
the same as saying the satisfaction and profession are independent. This is the
same as all three b’s in the above model are equal to zero. The alternative is that
the four means are not equal, the variables are not independent, the three b’s are
not all equal to zero. The p-value is 0.006. We reject H0 in favor of Ha.
3. (EMBS cape problem 1, page 606)
Consumer Research, Inc. investigated consumer characteristics that predict the amount charged by
credit card users. Data were collected on annual income, household size, and annual credit card charges
for a random sample of 50 consumers. The complete data are available in a spreadsheet “Class 26
Assignment data”.
Income Household
($1000s)
Size
54
3
30
2
32
4
.
.
.
.
22
4
46
5
66
4
Amount
Charged ($)
4,016
3,159
5,100
.
.
3,074
4,820
5,149
3a. Which is the better predictor of Amount Charged: Income or Household Size? Why? (Assume you
can use one or the other but not both.) [10 points]
The regression of Amount on Income has a standard error of 731.7. The
regression of Amount on Household size has a standard error of 620.8. Thus,
household size is the better predictor. (Note, household size will also have the
higher adjusted r-square and the lower p-value. When comparing simple models fit
to the same data, these three statistics tell an identical story.)
We can also answer this question using the multiple regression results from 3c.
Household size has the lower p-value (higher magnitude t-value), and thus is the
better predictor variable.
3b. Regress Amount Charged on Household Size. Report and interpret BRIEFLY the coefficient of
household size. (Tell us what the coefficient means and measures. Tell us why it turned out
positive/negative.) [10 points]
Intercept
Household
Size
Standard
Coefficients
Error
t Stat
2581.941 195.26258 13.2229177
404.1284
50.99787 7.92441641
P-value
1.28E-17
2.86E-10
The coefficient of 404.13 is the average rate at which amount charged changes
with size of household. In the data, larger households charged more, on average,
at a rate of $404 for each extra person.
3c. Run a multiple regression of Amount Charged on both Household Size and Income. Report and
interpret BRIEFLY the coefficient of household size. (Tell us what the coefficient means and measures.
Tell us why it turned out higher/lower than the coefficient from 3b. ) [15 points]
Intercept
Income
($1000s)
Household
Size
Standard
Coefficients
Error
t Stat
1304.905 197.65484 6.60193679
P-value
3.29E-08
33.13301 3.9679058 8.35025085
7.68E-11
356.2959
3.12E-14
33.20089 10.7315164
The coefficient of 356.3 is the average rate at which amount charged changes with
size of household for households with comparable incomes. In order to use this
multiple regression equation, one must know both income and household size.
Once one “plugs in” income, the forecast goes up by $356.3 for each extra person.
The fact that this rate is lower than that in 3b indicates that household size and
income are positively correlated in the data. Amount charged goes up by $356.3
per person for any given income, and income also goes up with household size.
Thus the $404 from 3b includes both the $356 from 3c AND the fact that larger
households indicate larger incomes which, in turn, led to even higher charge
amounts.
3d. Will a household of 2 with annual income of $60,000 charge more or less than $4,500 on their credit
card? (Assume the household in question will be a random sample of the population of consumers used
in the study.) [15 points]
Since both income and size are significant predictors (they both have p-values
much less than 0.05), I will use the 3c model to answer this question. The point
forecast is
Intercept
Income
($1000s)
Coefficients
1304.905
33.13301
Household
Size
356.2959
intercept
Income
Size
Point forecast
1
60
2
4005.477
The uncertainty in this new charge amount is measured by 398.1, the reported
“standard error” from the multiple regression.
Using the “better” method, the probability charge amount < $4,500 is calculated
using the t-distribution with 47 degrees of freedom (n-2 for a simple regression, np for a multiple regression where p is the number of terms in the model…including
the intercept.) The t is (4,500 – 4,005.5)/398.1 = 1.24. The Pr(charge amount <
$4,500) = 1 – t.dist.rt(1.24,47) = 0.89.
Download