Test2Review_Solutions

advertisement
Updated Fall 2011
Intro to Stats, Test 2 Review
Solutions in Red Below
Section A: Which Test to Run?
For each of the following research scenarios, select the appropriate statistical test.
1. Researchers are wondering if hypertension and smoking are related. They divide their 500
participants into 3 groups: non-smokers, moderate smokers and heavy smokers. They
measure the blood pressure of the participants in each category.
Z-Interval
T-
-
-
-Test (Ind. Samples)
-Test
ANOVA
3 Groups/Samples => ANOVA compares 3 or more groups on the same variable (blood pressure).
2. The following data represents the running time of recent movies from two top motion-picture
companies. Test the hypothesis that company 2’s movies have on average a longer running
time.
Company
Running Time (Minutes)
1
102 86 98 109 92
2
81 165 97 134 92 87 114
-Interval
-
-
-Test (Dep. Samples)
-Test (Ind. Samples)
-Test
2 Samples => Either Ind. or Dep. samples T-Test. Comparing sub-population averages (tprocedures), so Independent Samples t-test. (Also, can’t be Dep. Samples because sample sizes
are different.
3. A college infirmary conducted an experiment to determine the degree of relief provided for
each of three cold remedies: NyQuil, Robitussin and Triaminic. They had 30 students with
colds. Ten students tried each remedy and reported the level of relief they experienced on a
scale of 1 – 10 with 10 being perfect relief and 1 being no relief.
-Interval
-
-
-Tes
-
-Test
3 Groups/Samples => ANOVA compares 3 or more groups on the same variable (relief experienced).
4. To determine if a new serum will arrest leukemia, 9 mice with advanced leukemia are
selected, 5 of which receive the serum, 4 of which do not. Survival times (in months) are
recorded.
-Interval
-
-
-Test (Dep. Samples)
-Test (Ind. Samples)
-Test
2 Samples => This is treatment vs. control – prototype for Ind. Samples t-tests. We infer the time
variable will be averaged, but also the sample size is too tiny for z-procedures.
5. A manufacturer of rice cereal (baby food) claims the average fat content does not exceed 1.5
milligrams per serving. A consumer group purchases 40 jars of the food and tests the
average fat content. They are concerned that the food has a higher fat content than
advertised.
-Interval
-Interval
-Test
-
-
-Test
1 Sample => T-Test, comparing a subgroup (sample) to overall population average.
Updated Fall 2011
6. Joe and Moe are two Marine Corps drill instructors at Quantico, VA. Joe doesn’t think Moe
works his Officer Candidate School (OCS) classes hard enough. Moe gives all 25 of his
OCS candidates a physical fitness when they arrive. The same test is repeated after 4 weeks
of Moe’s gentle persuasions.
-Interval
-
-Test
-Test (Dep. Samples)
-
-Test
2 Samples => Ind. vs. Dep. samples T-Test. Pretest/posttest design is dep. samples prototype. Again,
the fact that scores will be averaged is understood, not stated, but there is no “dependent samples” zprocedure.
7. Dr. Sinn queries his students about their flip flop ownership. He is trying to estimate the
percentage of his students who have more than 5 pairs of flip flops.
-Interval
-
-
-
-Test (Ind. Samples)
-Test
Estimating => confidence interval. Percentage => z-procedures. Z-Interval (1 sample).
8. A large automobile manufacturer is deciding between two brands of tires to purchase for its
newly designed SUV. They test 50 sets of each type by putting sets of tires on the SUV’s
and calculate the average driving time before the tires are worn out.
-Interval
-
-
-Test (Dep. Samples)
-Test (Ind. Samples)
-Test
2 Samples, comparing averages => 2 sample t-test. Even though n1=n2, this is like a demographics
comparison, with tires from different manufacturers. Ind. Samples t-test.
9. A researcher surveys males and females about their preferences for corrective lenses. She
finds that 49 of 72 females who need corrective lenses prefer contacts, while 36 of 58 males
prefer contacts. She tests for a gender difference in preference for contacts at the .05 level.
-Interval
-
-
-Test (Dep. Samples)
-Test (Ind. Samples)
-Test
2 Samples, but we are asked about percentages, so it’s a Z-Test for Population Proportion (2
samples).
10. Researchers are comparing the effectiveness of the financial aid offices at 5 Georgia schools.
They a random sample of 150 students from each university and calculate the average
amount of financial aid each student has. They test for a difference at the .01 level.
-Interval
-
-Test
-
-
-Test
5 samples, comparing averages => ANOVA
11. Jack Erwin, professional fisherman extraordinaire, catches a large trout and nearly has the
doggone thing reeled in. He caught a 26” trout earlier that day that weighed in at a hefty 7.5
lbs, and this one looks even bigger. But then, doggone it, his 15 lb. test fishing line breaks,
and he suffers the agony of watching one get away. He purchases 10 more packages of
fishing line to test his hypothesis that the mean breaking strength of the fishing line is less
than the advertised 15 lbs. of force.
-Interval
-Interval
-Test
-
-Test (Ind. Samples)
1 Sample => T-Test, comparing a subgroup (sample) to overall population
-Test
Updated Fall 2011
12. Researchers were trying to determine if the material in a college physics course was better
understood by students when the course had an accompanying lab section. Of the 28
participants, 11 were randomly selected for the lab course and 17 for the course without a
lab. The exact same end-of-course test was administered to all 28.
-Interval
-
-
-Test (Dep. Samples)
-Test (Ind. Samples)
-Test
2 Samples, this is treatment vs. control – prototype for ind. samples. Different sample sizes means it
could NOT be dep. Samples.
13. Joe and Moe, over beers in the NCO, decide to estimate the average fitness levels of
incoming OCS candidates. They give 100 future Marine Corps officers a physical fitness test
and estimate with a 95% level of confidence.
-Interval
-Interval
-Test
-
-
-Test
ESTIMATE and AVERAGE are the key words, here. We know this is a confidence interval (also
thanks to last word in problem statement), and we’re dealing with averages, not percentages. So TInterval.
14. Researchers in Boulder, CO, believe that (different than for younger women) running
increases the resting heart rate (RMR) in older women. The average RMR of 30 elderly
women runners was higher than the average RMR of 30 sedentary elderly women.
-Interval
-
-
-Test (Dep. Samples)
-Test (Ind. Samples)
-Test
2 Samples, this is treatment vs. control – prototype for ind. samples.
Section B: Symbolic Hypotheses, Error Rates and Choosing α
For each of the following research scenarios, (a) set up the null and alternate hypotheses in
correct mathematical symbols, (b) describe both Type I and II errors in words and (c)
choose an appropriate level for α.
15. Environmental engineering students at Georgia Tech have found a new way to cure concrete.
The old method (developed in Athens, Georgia) generated concrete with an average strength
of 5000 kg/cm2. They cure and test 42 samples. Statistically test the hypothesis that the new
curing method generates stronger cement.
H0 : μ = 5000
Ha : μ > 5000
Type I:
Type II:
Falsely claim Tech concrete is better than UGA.
Falsely claim there is no difference between UGA and Tech concrete.
If concrete is NOT better (or, perhaps, worse), we could be putting people at risk by building
with it. Minimize Type I. Set α low, say α = .01.
16. Pharmaceutical researchers are testing allergy medications. Their new drug SuperCureAll
has some rather nasty side effects. They compare 31 adults with allergies who take
Updated Fall 2011
SuperCureAll to 22 adults with allergies who take a placebo. The allergy sufferers rate their
levels of relief from 1 – 10, 10 being complete relief. The placebo group average is 3.9.
H0 : μt = μc
Ha : μt > μc
Type I:
Type II:
(T = Treatment, C = Control)
Higher score (closer to 10) would mean MORE allergy relief
Falsely claim treatment works
Falsely claim there is no difference between treatment and control
If treatment is NOT significantly better, allergy sufferers take medication (and get nasty side
effects) for no reason. Minimize Type I. Set α low, say α = .01.
17. A consumer advocacy group is testing the shock absorbency of an infant car seat they believe
may be defective. They purchase 12 of the seats whose mean absorbency is rated at 1000
lbs.
H0 : μ = 1000
Ha : μ < 1000
Type I:
Type II:
Comparing a sub-group to population – 1 Sample t-Test
Falsely claim car seat is defective (absorbs less than 1000 lbs.)
Falsely claim car seat is OK (absorbs 1000 lbs.)
It should be clear that making Type II error could cost lives. Minimize Type II. Set α high,
say α = .1.
****************************
By the way, this hypothesis can be tested in the opposite direction:
H0 : μ = 1000
Ha : μ > 1000
If you set it up this way, your Type I and I error statement are the reverse of what typed
above, and you would set low, say α = .01 or even α = .001.
*****************************
Section C: Hypothesis Testing
Conduct all relevant hypothesis testing steps for each of the following scenarios.
18. Dr. Olsen is concerned about her pH meter. She finds a neutral substance which should give
meter readings of 7.0 on the pH scale. She conducts 10 sample measurements (data given
Updated Fall 2011
below) of the substance on her balky meter. At the α = .1 level, test her hypothesis that the
meter is faulty. Assume normality.
7.07 7.0 7.1 6.97 7.0 7.03 7.01 7.01 6.98 7.08
H0 : μ = 7
Ha : μ ≠ 7
Comparing a sub-group to population
1 Sample t-Test
Type I:
Falsely claim car meter is faulty.
Type II: Falsely claim meter is OK.
Since α = .1 is given, no Type I/Type II error analysis is
necessary. (Note: I performed error analysis to provide you
with additional study material – you would NOT do this step
on the test!!)
Normality is assumed, so no graphics checks (box plot or
histogram) are needed.
Run Test (see graphics for steps)
Since p = .1062 > .1 = α, we FAIL TO REJECT THE NULL.
In real world terms, this means we believe the meter is OK.
19. You are a consultant for a manufacturer who asks you to
compare the abrasive wear for two types of lamination. For
Laminate X, 12 pieces of material are tested and earn an
average rating of 85 on a scale of 0 - 100 where 100 indicates “no visible wear” and 0
indicates “completely destroyed” (s.d. = 4). For Laminate Z, 10 pieces are tested and
average an 81 (s.d. = 5). Given that the two laminates cost roughly the same to produce, test
the hypothesis that X is significantly better than Z at an appropriate level. Assume
normality.
H0 : μX = μZ
Ha : μX > μZ
Type I:
Type II:
Comparing two samples
2 Sample t-Test Independent
Falsely claim X “wears” better.
Falsely claim X and Z are not different.
Because productions costs are similar, the company can sell
a better product (X) and have better customer satisfaction. Hence, they want to minimize
Type II. Set α high, for example, α = .1.
Normality is assumed, so no graphics checks (box plot or
Updated Fall 2011
histogram) are needed.
Run Test (see graphics for steps)
Since p = .0284 < .1 = α, we REJECT THE NULL.
In real world terms, this means we believe X is a better
product (both for customers and the company) than Z.
20. After Halloween, twins Jacob and Katie decide to see whose favorite color appears more
often in their favorite flavor. Katie thinks Green Peanut Butter M&M’s are the bomb, but
Jacob likes the classic M&M’s in the color Red. In a sample of 342 Peanut Butter M&M’s,
Katie find 52 Green ones. In a sample of 519 classic M&M’s, Jacob finds 95 Reds. Test the
hypothesis that Red M&M’s appear more often (in classic M&M packages) than Green
Peanut Butter M&M’s do. Test at the .05 level.
H0 : pJ = pK
H0 : pJ > pK
Comparing two sample proportions
2 Proportion Z-Test
No verification is needed here.
Run Test (see graphics for steps) using Plus 4 Method:
𝑝̂𝐽 =
96
521
𝑝̂𝐾 =
53
344
I am working on getting the screen shots inserted….
Z = 1.1509
Since p = .1249 > .05 = α, we FAIL TO REJECT THE NULL.
In real world terms, this means we have no evidence that the red M&M’s appear more
frequently (in classic M&M’s) than green M&M’s do (in Peanut Butter M&M’s).
Updated Fall 2011
21. Researchers are wondering if hypnosis will influence mathematics test anxiety. They give 9
students a mathematics test un-hypnotized. Three weeks later, the participants take the same
test after being hypnotized by a trained professional (do NOT try this at home!). Their scores
are given below. Test their hypothesis about hypnosis at the α = .05 level assuming
normality.
Test 1 58 67 79 59 66 71 70 52 65
Test 2 59 70 78 66 68 66 81 71 71
H0 : μd = 0
Ha : μd > 0
Pretest/posttest (remember we make 3rd list: L3 = L2 – L1)
2 Sample t-Test, Dependent
I can see an argument here for a ≠ symbol in the alternate hypothesis, but the idea is for
hypnosis to relax the anxious students and reduce test anxiety.
Type I:
Type II:
Falsely claim hypnosis helps.
Falsely claim hypnosis does not help.
Since α = .05 is given, no Type I/Type II error analysis is necessary. (Note: I performed
error analysis to provide you with additional study material – you would NOT do this step on
the test!!)
Recall that, for dependent samples, we need to construct the “Difference List.” The steps are
shown below on your TI graphing calculator. (L1 and L2 are switched below due to an
earlier error, graph below is backwards.)
Normality is assumed, so no graphics checks (box plot or histogram) are needed.
Run Test (see graphics for
steps)
Since p = .0388 < .05 = α,
we REJECT NULL.
In real world terms, this means we believe that hypnosis helps alleviate test anxiety.
Updated Fall 2011
22. The following data represents the running time of recent movies from two top motion-picture
companies. Test the hypothesis that Company 2’s movies have on average a longer running
time at the α = .025 level. Assume normality.
Company
Running Time (Minutes)
1
102 86 98 109 92
2
81 165 97 134 92 87 114
H0 : μ1 = μ2
Ha : μ1 < μ2
Comparing two samples
2 Sample t-Test Independent
Again, α is given, so there is NO NEED to perform error
analysis. This is just me working extra hard so you can
study more thoroughly!!
Type I:
times.
Type II:
times.
Falsely claim there is a difference in running
Falsely claim there is no difference in running
Given: α = .025
Normality is assumed, so no graphics checks (box plot or
histogram) are needed.
Run Test (see graphics for steps, except I used incorrect
alternative hypothesis – it should be “less than” rather than “not equal to.”)
Since p = .1652 > .025 = α, we FAIL TO REJECT NULL.
(again, graphic is wrong, but the p-value should be .1652)
In real world terms, this means we have no evidence that
either company’s movies run longer than the other’s.
Updated Fall 2011
23. Archaeology. Samples of head breadths were obtained by measuring skulls of Egyptian
males from three different epochs. Changes in head shape over time suggest inbreeding with
immigrant populations. Use a 0.05 level of significance to test the claim that head breadths
were different over time. Assume normality.
4000 B.C. 131 138 125 129 132 135 132 134 138
1850 B.C. 129 134 136 137 137 129 136 138 134
150 A.D. 128 138 136 139 141 142 137 145 137
H0 : μ4000 = μ1850 = μ150
Ha : μ4000 ≠ μ1850 ≠ μ150
Comparing three samples
ANOVA
Normality assumed, α = .05 (given)
Since p = .0305 < .05 = α, we REJECT NULL. Evidence suggests a difference in skull
breadth between epochs and therefore that Egyptians are a bunch of inbreds (lol jk).
Download