Homework 12 1. Amos wants to test if increasing the volume on his speakers will decrease a computer’s bit rate. He has pieces of his ANOVA table from his regression. (Assume the assumptions for regression have been met). Fill in the six missing values on the table. df Model Error Total 1 96 97 SS 5.8644 173.76 179.6244 MS 5.8644 1.81 F 3.240 P-value 0.075 2. Ross thinks the four people who sit next to him in class spend the same amount of money on their haircut. To find out he keeps track of how much their haircuts cost for the next couple of months. Assuming the cost of a haircut is random and that the sample sizes are large enough to assume normality, and that the variances are the same for all four people, test if the average cost is equal for all four people (the tip is included in the price). Use all 7 steps of the hypothesis procedure and use α=0.05. Name Tomson Lawrence Julien Ty ANOVA Group Residual Total df Number of Haircuts 33 35 41 32 SS Average Cost 18 11 13 14 Standard deviation for one haircut 11 10 8 12 MS F P-value Missing 15178.6 For your information, the pooled standard deviation is 10.215. The table below shows the values for the F distribution (in other words, the critical F which has 0.05 in the right tail. This can be used instead of a pvalue in step 5 to decide whether to reject or not). The first small number is the degrees of freedom for groups, second is the degrees of freedom for error, and the third is 0.05 for alpha. F1,1,.05= 161.4 F1,31,.05= 4.16 F1,137,.05= 3.91 F2,1,.05= 199.5 F2,31,.05= 3.305 F2,137,.05= 3.062 F3,1,.05= 215.7 F3,31,.05= 2.911 F3,137,.05= 2.671 F4,1,.05= 224.6 F4,31,.05= 2.679 F4,137,.05= 2.438 H0: µtomson= µlawrence= µjulien= µty HA: Not all the means are the same (note: µtomson ≠ µlawrence ≠ µjulien ≠ µty would not be an acceptable alternative hypothesis, since, for example, it’s possible that only one of the means is different) α=0.05 G=4 N = 141 To get the SSE: Method 1: The MSE is the pooled variance: 10.2152=104.35 The MSE is found by taking the SSE over the df=N-G=141-4=137 So the SSE=104.35*137 = 14296 (there may be a little rounding error) Method2: SSE 33 1112 (35 1)10 2 41 18 2 (32 1)12 2 14296 To get the SSG: Method 1: After finding the SSE is 14296, subtract it from the SST of 15178.6 to get 882.6 Method 2: First find the overall average: Average 33 * 18 35 *11 41 * 13 32 *14 13.9 33 35 41 32 SSG 33 * 18 13.9 3511 13.9 4113 13.9 3214 13.9 882.6 2 ANOVA Group Residual Total 2 df 3 137 140 SS 882.6 14296 15178.6 2 MS 294.2 104.35 2 F 2.82 P-value Missing Now we don’t have a p-value, but we do have a critical F value. With the degrees of freedom (3,137) then the critical F we are looking for is the 2.671. Since 2.82 is more than 2.671, then we are further out in the tail. That means we are in the rejection region, so we reject the hypothesis. Conclude that Ross’ friends do not all spend the same amount on their haircuts. 3. Ablative spray paint is used on government buildings because it helps make the walls stronger to resist terrorist attacks. It has been suggested in recent literature that it may be a fire hazard. To test these claims civil engineers are going select 1000 walls and randomly choose some to be covered with ablative spray paint. They will throw grenades at each wall and record whether the wall catches fire. Assume you are asked to select and alpha other than 0.05. Choose your alpha and explain why. H0: The proportion that catch fire with the paint = proportion that catch fire without the paint HA: The proportion that catch fire with the paint is greater (the paint is a fire hazard) Type 1 error: The paint is not a fire hazard but we claim it is Type 2 error: The paint is a fire hazard and we say it is safe Students who have low alpha should mention the need to protect government buildings Students who have a high alpha should mention the danger of creating a fire hazard 4. Kyle surveys 500 people and records their gender and if they are a vegetarian. His data is shown below. Test whether gender is related to being a vegetarian. Male Female Vegetarian 22 28 50 Not a vegetarian 218 232 450 H0: Gender is not related to whether you are a vegetarian HA: Gender is related to whether you are a vegetarian α = 0.05 Method 1: 22 28 0.1 240 260 22 28 240 260 Z 0.60 0.1(1 0.1) 0.1(1 0.1) 240 260 Pp p-value = 0.2743 * 2 = 0.5486 Method 2: 22 218 0.48 50 450 22 218 50 450 Z 0.60 0.48(1 0.48) 0.48(1 0.48) 50 450 Pp 240 260 500 p-value = 0.2743 * 2 = 0.5486 Method 3: E Male Female Vegetarian 24 26 50 Not a vegetarian 216 234 450 Chi2 Male Female Vegetarian Not a vegetarian 0.1667 0.0185 0.1538 0.0171 240 260 500 0.3561 Χ21=0.3561 p-value > 0.25 (the p-value happens to 0.5486 if you use a computer) p-value>α Fail to Reject There is not enough evidence to conclude that the decision to be a vegetarian is related to a person’s gender 5. TAMU admissions board believes the score you get on the SAT in high school can help predict your college GPA. Below is a regression model using the SAT scores and GPA for 100 college graduates. Calculate a 98% Confidence Interval for the slope of the regression line. (Note: The SAT scores have been divided by 100 just to make the numbers nicer Simple linear regression results: Dependent Variable: GPA Independent Variable: SAT/100 GPA = 1.3186783 + 0.11072124 [SAT/100] Sample size: 100 R (correlation coefficient) = 0.7751 R-sq = 0.6007698 Estimate of error standard deviation: 0.1911169 Parameter estimates: Parameter Estimate Intercept 1.3186783 0.16711824 98 7.8906903 <0.0001 Slope Std. Err. DF T-Stat P-Value 0.11072124 .091174900 98 1.2143823 0.1244 t80 = 2.374, 0.1107 ± 2.374 (0.091174) = (-0.10574, 0.32714) 6. Santa wants to use regression to learn how sleigh weight affects reindeer speed. He uses 20 different weights, and measures the reindeer speed at each weight. The output from his regression is below. When he did SSE, SSR, and SST, Santa calculated those with n=20 by hand. SUMMARY OUTPUT Regression Statistics Multiple R 0.759706825 R Square 0.577154460 Adjusted R Square 0.553663041 Standard Error 6.350124954 Average Speed 34.75 (in kilometers/second) Observations 20 Coefficients Intercept Weight (in kilotons) 48.76533 -0.94029 Std Error 3.164064 0.189702 t Stat P-value 15.41225 -4.95669 8.18E-12 0.000102 Then Santa’s statistical elf pointed out that Santa forgot to include the last data point when he did the SSR, SSE, and SST by hand (although he did remember to use n=20). What would the correct values be for SSR, SSE, and SST, if the data point that Santa forgot was: at 19 kilotons speed was 41 kilometers/second. First the notation: yi=41 ybar = 34.75 yhat = 48.76533 - .94029*19 = 30.8998 SSR = sum[ (yhat – ybar)2 ] = 975.89 + (30.8998-34.75)2 = 990.71 SSE = sum [ (yi – yhat)2 ] = 623.82 + (41-30.8998)2 = 725.83 SST = sum [ (yi – ybar)2 ] = 1677.48 + (41 – 34.75)2 = 1716.55 An alternative way: SST = sum[ (yi – ybar)2 ] = 1677.48 + (41 – 34.75)2 = 1716.55 R2 = SSR/SST so .57715=SSR/1716.55 so then SSR = 990.71 SSR+SSE=SST, so SSE=1716.55-990.71 = 725.83 7. Data was collected for the average number of traffic citations per month given by Officer Smith and Officer Jones of the Highway Patrol. The last five months were looked at for both officers. Officer Smith had an average of 94 tickets with a standard deviation of 8.7 tickets. Officer Jones had an average of 98.6 tickets with a standard deviation of 6.4 tickets. The standard deviation of the differences in each month was 1.52 tickets. Using the .05 significance level, test the claim that there is a mean difference in the number of citations given by the two officers. Cannot do 8. Monica believes that the number of musicals a person sees depends on their gender. Houston says it depends on their age. To find out what really matters they randomly get 80 old women, 90 young women, 75 old men, and 36 young men. Assume the value of the standard deviation for each group should be 2.10, and the overall average number of musicals is 7.82. Calculate the test statistic if you know that for each data point 281 x 7.82 i 1 i 2 xi 12247 This is categorical by numerical, so this is ANOVA, the test statistic is F The standard deviation is 2.1, so the MSE = 2.12 = 4.41 The sum of the squared distance from each data point to the overall average is SST DF SS MS F Group 3 11025.43 3675.14 833.3 Error 277 1221.57 4.41 Total 280 12247 9. Santa wants to know what type of food will make his reindeer calves gain weight. He tries four different types of food and gets an overall average of 24.93 with a pooled standard deviation of 5.17. Note the p-value was nearly zero. Santa decides to keep feeding his reindeer hay. Hay Dog Food Cat Food Number of Calves 4 4 6 Standard deviation (for each calf) 4.6 5.2 4.8 Mean growth weight (in pounds) 41.228 34.281 41.228 Fish Food 6 5.8 -8.471 Based on the data show what Santa’s hypothesis test should have looked like. H0: Food type is independent of growth weight Ha: Food type depends on growth weight Alpha=0.05 SSG = 4*(41.228-24.93)^2+4*(34.281-24.93)^2+6*(41.228-24.93)^2+6*(-8.471-24.93)^2 = 9699.77 SSE = sum[ (n-1) s2 ] = (4-1)*4.62 + (4-1)*5.22 + (6-1)*4.82 + (6-1)*5.82 = 428 Or 5.172 = SSE/(20 – 4) so SSE = 5.172 * (20-4) = 428 ANOVA Group Residual Total df 3 16 19 SS 9699.77 428 10127.77 MS 3233.26 26.75 F 120.86 P-value 0 Reject Conclude the type of food really does matter (duh – fish food? Really Santa)? 10. You plan to fly from New York to Chicago and have a choice of two flights. You are able to find out how many minutes late each flight was for a random sample of 25 days over the past few years. (You have data for BOTH flights on the same 25 days.) For Scairline the average delay is 31 minutes with a standard deviation of 12 minutes. For PilotAirOr the average delay is 48 minutes with a standard deviation of 20 minutes. The matched pairs standard deviation is 10.1 minutes. Test whether either flight has a higher average delay assuming normality. H0: µd=0 Ha: µd≠0 α=0.05 t24=(31-48)/(10.1/sqrt(25))=-8.41 p-value off the chart ≈ 0 Reject Our data shows one of the airlines (PilotAirOr) is significantly higher than the other 11. Michael is testing 7 different insecticides on fire ants. For each insecticide he sprays a group of 49 fire ants and notes the time it takes for the ants to die. Assume that ant deaths are normally distributed and each group of ants should have a different standard deviation. Michael then did an ANOVA test with a 5% significance level and got an F value of 2.02. The computer gave a right tailed area of 0.062. What conclusion do you think Michael should get from his results? No conclusion the assumption of equal variances is not met 12. Thomas knows horses live longer than pigs, so he doesn’t need to test it, but he does want a 99% confidence interval for how much longer they live. Assume the variances for both groups are not the same. A random sample of each type of animal is shown below. Calculate the 99% confidence interval. Horses: Sample size: 81 horses Sample average: 15 years Sample standard deviation: 3.5 years Pigs: Sample size: 81 pigs Sample average: 12 years Sample standard deviation: 1.2 years Pooled variance: 6.93 Matched Pairs variance: 2.82 Weighted variance: 5.52 15 12 2.639 3.52 1.2 2 1.915,4.0849 81 81 13. Brittany has developed a cure for dogs poisoned with antifreeze, but it doesn’t always work. One experiment had 19 dogs out of 1000 survive antifreeze poisoning, but now only 16 out of 100 will die. Find a 94% confidence interval (Not Hypothesis test) for the difference in the percent of dogs that die from antifreeze poisoning with Brittany’s new medicine. Note: It doesn’t matter that one sample has 10 times as many as the other sample, and the percentage of dogs who die without the cure is (1000-19)/1000 = 981/1000 n1π1=981 n1(1-π1)=19 n2π2=11 n2(1-π2)=89 So the assumptions for normality are met p1 p2 Z p1 1 p1 p2 1 p2 n1 n2 981 981 16 16 1 1 981 16 1000 1000 100 100 1.88 1000 100 1000 100 = (0.7516, 0.8904) 14. Jazz wonders if the height of an engineer determines how many complaints they get. She surveys the height of engineers. Assume complaints are normally distributed, and that the variances should be pooled. The engineers are categorized according to whether they get few, some, or many complaints. The data is shown below. Also is shown some math that may be helpful. Test Jazz’s hypothesis that the average height is different according to the number of complaints using all 7 steps of a hypothesis. Data Few 6.2 Some 5.9 Many 5.1 5.8 6.1 5.9 5.4 6.1 Average 6.00 Std dev 0.283 5.5 5.85 0.251 2 1 0.283 4 1 0.251 3 1 0.513 2 5.53 0.513 2 6.2 5.77 5.8 5.77 5.9 5.77 2 2 2 6.1 5.77 5.9 5.77 5.5 5.77 2 2 2 5.1 5.77 5.4 5.77 6.1 5.77 2 2 1.096 There are three groups, the data is normal, and the variances should be pooled. This is an ANOVA test. The first equation describes the SST, the second equation is SSE. The SSG is either the difference (1.096 – 0.796)=0.300 or you can use the equation 6 5.77 2 2 5.85 5.77 2 4 5.53 5.77 2 3 = 0.304 (the difference is from rounding in the standard deviations) Groups Error Total DF 2 6 8 0.796 Overall average: 5.77 P-value = 0.3845 2 2 SS 0.300 0.796 1.096 MS 0.150 0.133 F 1.13 H0: μ1= μ2= μ3 HA: At least one mean is different α=0.05 F=1.13 p-value=0.3845 (this was given in the problem) Fail to Reject There is not sufficient evidence to suggest that the number of complaints is different depending on the height of the veterinarian. 15. An experiment was run to compare four types of metal plating on the resistance of a sword to rust. The sample sizes were 25, 15, 181 and 22, the means were 50, 55, 48, 53, and the corresponding estimated standard deviations were 17, 24, 8 and 19. The overall average was 49, and the total sample size was 243. Test if metal plating changes the resistance if the p-value is 0. The following equations may or may not be helpful: 2550 49 1555 49 18148 49 2253 49 1098 2 2 2 2 25 117 2 15 1242 181 182 22 1192 34101 25 172 50 552 22 192 48 532 123 H0: metal plating all the same Ha: metal plating not all the same Alpha:0.05 Groups Error Total DF 3 239 242 SS 1098 34101 35199 MS 366 142.68 F 2.565 p-value=0 Reject The different metal platings do not all have the same average 16. There are three companies (A, B, and C) trying to bid for construction of a preschool. The preschool is worried about getting a bad company. They randomly select months (3 from company A, 4 from company B, and 5 from company C) and find the number of complaints the company had for those months. The data is shown below. Test with 5% significance whether it matters which company they choose (Assume normality and equal variances. Use all 7 steps of a hypothesis. As a hint, the p-value is 0.10) Company A: 0, 19, 38 Company B: 0, 16, 24, 48 Company C: 16, 39, 64, 68, 68 H0: The means are the same Ha: The means are not the same Alpha:0.05 2 2690.667 1345.333 2.998514 0.100477 9 4038 448.6667 11 6728.667 Fail to Reject We cannot say that some companies have more complaints than any others.