Homework 12 Key

advertisement
Homework 12
1. Amos wants to test if increasing the volume on his speakers will decrease a computer’s bit rate.
He has pieces of his ANOVA table from his regression. (Assume the assumptions for regression
have been met). Fill in the six missing values on the table.
df
Model
Error
Total
1
96
97
SS
5.8644
173.76
179.6244
MS
5.8644
1.81
F
3.240
P-value
0.075
2. Ross thinks the four people who sit next to him in class spend the same amount of money on their
haircut. To find out he keeps track of how much their haircuts cost for the next couple of months.
Assuming the cost of a haircut is random and that the sample sizes are large enough to assume
normality, and that the variances are the same for all four people, test if the average cost is equal
for all four people (the tip is included in the price).
Use all 7 steps of the hypothesis procedure and use α=0.05.
Name
Tomson
Lawrence
Julien
Ty
ANOVA
Group
Residual
Total
df
Number of
Haircuts
33
35
41
32
SS
Average
Cost
18
11
13
14
Standard deviation for
one haircut
11
10
8
12
MS
F
P-value
Missing
15178.6
For your information, the pooled standard deviation is 10.215. The table below shows the values for the F
distribution (in other words, the critical F which has 0.05 in the right tail. This can be used instead of a pvalue in step 5 to decide whether to reject or not). The first small number is the degrees of freedom for
groups, second is the degrees of freedom for error, and the third is 0.05 for alpha.
F1,1,.05= 161.4
F1,31,.05= 4.16
F1,137,.05= 3.91
F2,1,.05= 199.5
F2,31,.05= 3.305
F2,137,.05= 3.062
F3,1,.05= 215.7
F3,31,.05= 2.911
F3,137,.05= 2.671
F4,1,.05= 224.6
F4,31,.05= 2.679
F4,137,.05= 2.438
H0: µtomson= µlawrence= µjulien= µty
HA: Not all the means are the same
(note: µtomson ≠ µlawrence ≠ µjulien ≠ µty would not be an acceptable alternative hypothesis, since, for example,
it’s possible that only one of the means is different)
α=0.05
G=4
N = 141
To get the SSE:
Method 1:
The MSE is the pooled variance: 10.2152=104.35
The MSE is found by taking the SSE over the df=N-G=141-4=137
So the SSE=104.35*137 = 14296 (there may be a little rounding error)
Method2:
SSE  33  1112  (35  1)10 2  41  18 2  (32  1)12 2  14296
To get the SSG:
Method 1:
After finding the SSE is 14296, subtract it from the SST of 15178.6 to get 882.6
Method 2:
First find the overall average:
Average 
33 * 18  35 *11  41 * 13  32 *14
 13.9
33  35  41  32
SSG  33 * 18  13.9  3511  13.9  4113  13.9  3214  13.9  882.6
2
ANOVA
Group
Residual
Total
2
df
3
137
140
SS
882.6
14296
15178.6
2
MS
294.2
104.35
2
F
2.82
P-value
Missing
Now we don’t have a p-value, but we do have a critical F value. With the degrees of freedom (3,137) then
the critical F we are looking for is the 2.671.
Since 2.82 is more than 2.671, then we are further out in the tail. That means we are in the rejection
region, so we reject the hypothesis.
Conclude that Ross’ friends do not all spend the same amount on their haircuts.
3. Ablative spray paint is used on government buildings because it helps make the walls stronger to
resist terrorist attacks. It has been suggested in recent literature that it may be a fire hazard. To
test these claims civil engineers are going select 1000 walls and randomly choose some to be
covered with ablative spray paint. They will throw grenades at each wall and record whether the
wall catches fire.
Assume you are asked to select and alpha other than 0.05. Choose your alpha and explain why.
H0: The proportion that catch fire with the paint = proportion that catch fire without the paint
HA: The proportion that catch fire with the paint is greater (the paint is a fire hazard)
Type 1 error: The paint is not a fire hazard but we claim it is
Type 2 error: The paint is a fire hazard and we say it is safe
Students who have low alpha should mention the need to protect government buildings
Students who have a high alpha should mention the danger of creating a fire hazard
4. Kyle surveys 500 people and records their gender and if they are a vegetarian.
His data is shown below. Test whether gender is related to being a vegetarian.
Male
Female
Vegetarian
22
28
50
Not a vegetarian
218
232
450
H0: Gender is not related to whether you are a vegetarian
HA: Gender is related to whether you are a vegetarian
α = 0.05
Method 1:
22  28
 0.1
240  260
22
28

240 260
Z
 0.60
0.1(1  0.1) 0.1(1  0.1)

240
260
Pp 
p-value = 0.2743 * 2 = 0.5486
Method 2:
22  218
 0.48
50  450
22 218

50
450
Z
 0.60
0.48(1  0.48) 0.48(1  0.48)

50
450
Pp 
240
260
500
p-value = 0.2743 * 2 = 0.5486
Method 3:
E
Male
Female
Vegetarian
24
26
50
Not a vegetarian
216
234
450
Chi2
Male
Female
Vegetarian Not a vegetarian
0.1667
0.0185
0.1538
0.0171
240
260
500
0.3561
Χ21=0.3561
p-value > 0.25 (the p-value happens to 0.5486 if you use a computer)
p-value>α
Fail to Reject
There is not enough evidence to conclude that the decision to be a vegetarian is related to a person’s
gender
5. TAMU admissions board believes the score you get on the SAT in high school can help
predict your college GPA. Below is a regression model using the SAT scores and GPA for
100 college graduates. Calculate a 98% Confidence Interval for the slope of the
regression line.
(Note: The SAT scores have been divided
by 100 just to make the numbers nicer
Simple linear regression results:
Dependent Variable: GPA
Independent Variable: SAT/100
GPA = 1.3186783 + 0.11072124 [SAT/100]
Sample size: 100
R (correlation coefficient) = 0.7751
R-sq = 0.6007698
Estimate of error standard deviation: 0.1911169
Parameter estimates:
Parameter
Estimate
Intercept
1.3186783 0.16711824 98 7.8906903 <0.0001
Slope
Std. Err.
DF
T-Stat
P-Value
0.11072124 .091174900 98 1.2143823
0.1244
t80 = 2.374, 0.1107 ± 2.374 (0.091174) = (-0.10574, 0.32714)
6. Santa wants to use regression to learn how sleigh weight affects reindeer speed. He uses 20
different weights, and measures the reindeer speed at each weight. The output from his
regression is below. When he did SSE, SSR, and SST, Santa calculated those with n=20 by hand.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.759706825
R Square
0.577154460
Adjusted R Square
0.553663041
Standard Error
6.350124954
Average Speed
34.75
(in kilometers/second)
Observations
20
Coefficients
Intercept
Weight (in kilotons)
48.76533
-0.94029
Std Error
3.164064
0.189702
t Stat
P-value
15.41225
-4.95669
8.18E-12
0.000102
Then Santa’s statistical elf pointed out that Santa forgot to include the last data point when he did the
SSR, SSE, and SST by hand (although he did remember to use n=20). What would the correct values be for
SSR, SSE, and SST, if the data point that Santa forgot was: at 19 kilotons speed was 41 kilometers/second.
First the notation:
yi=41
ybar = 34.75
yhat = 48.76533 - .94029*19 = 30.8998
SSR = sum[ (yhat – ybar)2 ] = 975.89 + (30.8998-34.75)2 = 990.71
SSE = sum [ (yi – yhat)2 ] = 623.82 + (41-30.8998)2 = 725.83
SST = sum [ (yi – ybar)2 ] = 1677.48 + (41 – 34.75)2 = 1716.55
An alternative way:
SST = sum[ (yi – ybar)2 ] = 1677.48 + (41 – 34.75)2 = 1716.55
R2 = SSR/SST so .57715=SSR/1716.55 so then SSR = 990.71
SSR+SSE=SST, so SSE=1716.55-990.71 = 725.83
7. Data was collected for the average number of traffic citations per month given by Officer Smith
and Officer Jones of the Highway Patrol. The last five months were looked at for both officers.
Officer Smith had an average of 94 tickets with a standard deviation of 8.7 tickets. Officer Jones
had an average of 98.6 tickets with a standard deviation of 6.4 tickets. The standard deviation of
the differences in each month was 1.52 tickets. Using the .05 significance level, test the claim that
there is a mean difference in the number of citations given by the two officers.
Cannot do
8.
Monica believes that the number of musicals a person sees depends on their gender. Houston says it
depends on their age. To find out what really matters they randomly get 80 old women, 90 young women,
75 old men, and 36 young men. Assume the value of the standard deviation for each group should be 2.10,
and the overall average number of musicals is 7.82. Calculate the test statistic if you know that for each
data point
281
  x  7.82 
i 1
i
2
xi
12247
This is categorical by numerical, so this is ANOVA, the test statistic is F
The standard deviation is 2.1, so the MSE = 2.12 = 4.41
The sum of the squared distance from each data point to the overall average is SST
DF
SS
MS
F
Group 3
11025.43 3675.14
833.3
Error 277
1221.57
4.41
Total 280
12247
9. Santa wants to know what type of food will make his reindeer calves gain weight. He tries four
different types of food and gets an overall average of 24.93 with a pooled standard deviation of
5.17. Note the p-value was nearly zero. Santa decides to keep feeding his reindeer hay.
Hay
Dog Food
Cat Food
Number of
Calves
4
4
6
Standard deviation
(for each calf)
4.6
5.2
4.8
Mean growth weight (in
pounds)
41.228
34.281
41.228
Fish Food
6
5.8
-8.471
Based on the data show what Santa’s hypothesis test should have looked like.
H0: Food type is independent of growth weight
Ha: Food type depends on growth weight
Alpha=0.05
SSG = 4*(41.228-24.93)^2+4*(34.281-24.93)^2+6*(41.228-24.93)^2+6*(-8.471-24.93)^2 = 9699.77
SSE = sum[ (n-1) s2 ] = (4-1)*4.62 + (4-1)*5.22 + (6-1)*4.82 + (6-1)*5.82 = 428
Or
5.172 = SSE/(20 – 4) so SSE = 5.172 * (20-4) = 428
ANOVA
Group
Residual
Total
df
3
16
19
SS
9699.77
428
10127.77
MS
3233.26
26.75
F
120.86
P-value
0
Reject
Conclude the type of food really does matter (duh – fish food? Really Santa)?
10. You plan to fly from New York to Chicago and have a choice of two flights. You are able to find out
how many minutes late each flight was for a random sample of 25 days over the past few years.
(You have data for BOTH flights on the same 25 days.) For Scairline the average delay is 31
minutes with a standard deviation of 12 minutes. For PilotAirOr the average delay is 48 minutes
with a standard deviation of 20 minutes. The matched pairs standard deviation is 10.1 minutes.
Test whether either flight has a higher average delay assuming normality.
H0: µd=0
Ha: µd≠0
α=0.05
t24=(31-48)/(10.1/sqrt(25))=-8.41
p-value off the chart ≈ 0
Reject
Our data shows one of the airlines (PilotAirOr) is significantly higher than the other
11. Michael is testing 7 different insecticides on fire ants. For each insecticide he sprays a group of 49
fire ants and notes the time it takes for the ants to die. Assume that ant deaths are normally
distributed and each group of ants should have a different standard deviation. Michael then did
an ANOVA test with a 5% significance level and got an F value of 2.02. The computer gave a right
tailed area of 0.062. What conclusion do you think Michael should get from his results?
No conclusion the assumption of equal variances is not met
12. Thomas knows horses live longer than pigs, so he doesn’t need to test it, but he does want a 99%
confidence interval for how much longer they live. Assume the variances for both groups are not
the same. A random sample of each type of animal is shown below. Calculate the 99% confidence
interval.
Horses:
Sample size: 81 horses
Sample average: 15 years
Sample standard deviation: 3.5 years
Pigs:
Sample size: 81 pigs
Sample average: 12 years
Sample standard deviation: 1.2 years
Pooled variance: 6.93
Matched Pairs variance: 2.82
Weighted variance: 5.52
15  12  2.639
3.52 1.2 2

 1.915,4.0849
81
81
13. Brittany has developed a cure for dogs poisoned with antifreeze, but it doesn’t always work.
One experiment had 19 dogs out of 1000 survive antifreeze poisoning, but now only 16 out of
100 will die. Find a 94% confidence interval (Not Hypothesis test) for the difference in the
percent of dogs that die from antifreeze poisoning with Brittany’s new medicine.
Note: It doesn’t matter that one sample has 10 times as many as the other sample, and the
percentage of dogs who die without the cure is (1000-19)/1000 = 981/1000
n1π1=981 n1(1-π1)=19 n2π2=11 n2(1-π2)=89 So the assumptions for normality are met
p1  p2  Z
p1 1  p1  p2 1  p2 

n1
n2
981 
981  16 
16 
1 

1 

981 16
1000  1000  100  100 

 1.88

1000 100
1000
100
= (0.7516, 0.8904)
14. Jazz wonders if the height of an engineer determines how many complaints they get. She
surveys the height of engineers. Assume complaints are normally distributed, and that
the variances should be pooled. The engineers are categorized according to whether they
get few, some, or many complaints. The data is shown below. Also is shown some math
that may be helpful.
Test Jazz’s hypothesis that the average height is different according to the number of
complaints using all 7 steps of a hypothesis.
Data
Few
6.2
Some
5.9
Many
5.1
5.8
6.1
5.9
5.4
6.1
Average 6.00
Std dev 0.283
5.5
5.85
0.251
 2  1 0.283   4  1 0.251   3  1 0.513
2
5.53
0.513
2
 6.2  5.77    5.8  5.77    5.9  5.77  
2
2
2
 6.1  5.77    5.9  5.77    5.5  5.77  
2
2
2
 5.1  5.77    5.4  5.77    6.1  5.77  
2
2
1.096
There are three groups, the data is normal, and the variances should be pooled.
This is an ANOVA test. The first equation describes the SST, the second equation
is SSE. The SSG is either the difference (1.096 – 0.796)=0.300 or you can use the
equation 6  5.77 2 2  5.85  5.77 2 4  5.53  5.77 2 3 = 0.304 (the difference is
from rounding in the standard deviations)
Groups
Error
Total
DF
2
6
8

0.796
Overall average: 5.77
P-value = 0.3845
2
2
SS
0.300
0.796
1.096
MS
0.150
0.133
F
1.13
H0: μ1= μ2= μ3
HA: At least one mean is different
α=0.05
F=1.13
p-value=0.3845 (this was given in the problem)
Fail to Reject
There is not sufficient evidence to suggest that the number of complaints is
different depending on the height of the veterinarian.
15. An experiment was run to compare four types of metal plating on the resistance of a sword to
rust. The sample sizes were 25, 15, 181 and 22, the means were 50, 55, 48, 53, and the
corresponding estimated standard deviations were 17, 24, 8 and 19. The overall average was
49, and the total sample size was 243. Test if metal plating changes the resistance if the p-value
is 0. The following equations may or may not be helpful:
2550  49  1555  49  18148  49  2253  49  1098
2
2
2
2
25  117 2  15  1242  181  182  22  1192  34101
25  172  50  552  22  192  48  532  123
H0: metal plating all the same
Ha: metal plating not all the same
Alpha:0.05
Groups
Error
Total
DF
3
239
242
SS
1098
34101
35199
MS
366
142.68
F
2.565
p-value=0
Reject
The different metal platings do not all have the same average
16. There are three companies (A, B, and C) trying to bid for construction of a preschool. The
preschool is worried about getting a bad company. They randomly select months (3 from
company A, 4 from company B, and 5 from company C) and find the number of complaints the
company had for those months. The data is shown below. Test with 5% significance whether it
matters which company they choose (Assume normality and equal variances. Use all 7 steps of
a hypothesis. As a hint, the p-value is 0.10)
Company A: 0, 19, 38
Company B: 0, 16, 24, 48
Company C: 16, 39, 64, 68, 68
H0: The means are the same
Ha: The means are not the same
Alpha:0.05
2 2690.667 1345.333 2.998514 0.100477
9
4038 448.6667
11 6728.667
Fail to Reject
We cannot say that some companies have more complaints than any others.
Download