Data Analysis 1016-319

advertisement
AP Statistics
Hypothesis Testing
Comparing 2 Populations or Treatments
Name: ___________________________
Date: ________________
Block: ___________
Independent OR Dependent?
For each of the situations presented below, determine if the samples are independent or dependent
(matched pairs). EXPLAIN your choice.
A. Which coating (A or B) produces higher strength? A sample of 5 CD’s is treated with coating A,
while a second sample of 5 CD’s is treated with coating B. Each CD is then tested for strength.
B. Does environment affect intelligence? Researchers identified 25 sets of identical twins in which one
twin was raised by biological parents and the other twin was raised by adoptive parents. Each twin
was given an IQ test.
C. Which drug (A or B) relieves severe headaches faster? A sample of 10 people who suffer from severe
headaches is given drug A for their pain one month and drug B another month. Time to headache
relief is measured.
D. A total of 20 people enter a study to determine the benefits of a new drug designed to reduce
cholesterol. The new drug is given to 10 people, while a placebo is given to the other 10 people. After a
period of three months, the reduction in cholesterol is measured for each person.
Adapted from: Introduction to Statistics & Data Analysis
Chapter 11Activities Worksheets
Section 13.1 – Comparing Population Means: Independent Samples CI
You are creating an interval that will estimate the true difference in the means of both populations.
You will use calculator procedures to help.
For the 2-sample T-Interval, you have:
Formula:
Conditions:
s12 s 22

n1 n 2
SRS, Independence, and Normality for both sampling distributions
 X1  X2    t critical value 
Researchers at Rochester Institute of Technology investigated the use of isolation time-out with 155
emotionally disturbed students enrolled in a special education facility (Exceptional Children, Feb., 1995).
The students were randomly assigned to one of two types of classrooms – Type A classrooms (with a
maximum of 12 students) and Type B classrooms (with a maximum of 6 students). Over the academic
year the number of incidents resulting in an isolation time-out was recorded for each student.
Summary statistics for the two groups are shown in the following table.
Classroom
Type A
Type B
# of Students
100
55
Mean # Timeouts
78.67
102.87
Std Dev
59.08
69.33
1. Explain why time-outs for the students in Type A classrooms are independent of time-outs for the
students in Type B classrooms.
2. Create a 95% confidence interval for the difference in mean # of time outs for Type A versus Type B
classrooms. Remember to use steps A – E (Population, Statistical Method, Sample, Statistical Results,
and Conclusion).
3. Based on your CI from part 2, would you agree that "On average, students in Type A classrooms had
significantly fewer time-out incidents than students in Type B classrooms”? EXPLAIN.
Adapted from: Introduction to Statistics & Data Analysis
Chapter 11Activities Worksheets
Section 13.1 – Comparing Population Means: Independent Samples Test
Your goal is to determine if there is a significant difference in the means of two populations.
2-Sample T-Test
Hypotheses:
H0: 1 = 2 (1 - 2 = 0: no difference in population means)
Ha: 1  2, 1 < 2, or 1 > 2
Conditions:
SRS, Independence, and Normality of Both Populations
Test Statistic:
t
X
1
 X2   0
s12 s 22

n1 n 2
.
Use df, as reported by your calculator. Do not use the pooled option!!
According to WebMD (www.webmd.com), “normal” body temperature is an average. Not only is
body temperature different for different people, it also changes during the day and is very sensitive to
hormone levels.
The table below summarizes body temperature data from the Journal of Statistics Education Data
Archive (Shoemaker, 1996).
Body Temperature (F)
Gender
n
Mean
StDev
Male
65
98.105
0.699
Female 65 98.394
0.743
Does the data suggest that there a significant difference in average body temperature for men and
women? Perform a hypothesis test to answer this question. Remember to use steps A – E (Population,
Statistical Method, Sample, Statistical Results, and Conclusion).
Adapted from: Introduction to Statistics & Data Analysis
Chapter 11Activities Worksheets
Section 12.1 – Comparing Population Means: Dependent Samples CI
(Paired T-test)
In a paired T-test, you are working with dependent data. As a result, we do not use the 2-Sample
procedures. Instead, we calculate the difference between the values of our dependent sample. We,
then, use the single sample t-test on the differences.
1-Sample T-Interval (Paired)
Hypotheses:
H0: D = 0 (no difference)
Ha: D = 0, D < 0 , D > 0
Conditions:
SRS, Independence, and Normality of the population of differences
Confidence Interval:
s 
X d   t critical value   d 
n
Test Statistic:
t
XD
, df = n – 1
sD
n
Jane usually takes her dry cleaning to Lilac Cleaners in Rochester, NY, but is considering changing to
Leary’s Cleaners. According to an article in the local newspaper, Leary’s uses a dry-wet cleaning
process which is more environmentally friendly. Jane expects to pay more for this service, but wants to
know how much more before taking her business there – see the data in the table below.
Item
Suit
Shirt/Blouse
Slacks
Skirt
Sweater
Winter coat
Comforter (full-size)
Raincoat
Leary’s
$12.25
$5.99
$5.89
$5.89
$5.39
$13.49
$14.25
$12.99
Lilac Difference
$10.25
$5.25
$5.25
$5.25
$5.25
$11.95
$15.00
$11.95
Prices found in “The Bottom Line: A monthly comparison of goods and services”,
Rochester Democrat and Chronicle, Oct 16, 2005.
1. Explain why the prices in the table above are considers dependent samples.
2. Compute the differences (Leary’s – Lilac) and write them in the column labeled “Difference”.
3. Create a 90% confidence interval for the mean difference in price for the two dry cleaners.
Remember to use steps A – E (Population, Statistical Method, Sample, Statistical Results, and
Conclusion).
Adapted from: Introduction to Statistics & Data Analysis
Chapter 11Activities Worksheets
Section 12.1 – Comparing Population Means: Dependent Samples Test
A 1998 article in Measurement in Physical Education & Exercise Science (Erdmann, Dolgener, and
Hensley) examined the post-exercise heart rates of sixty-three middle school-age boys. The boys were
instructed in self-pulse counting using the carotid artery on either side of the neck, and given the
opportunity to briefly practice. Each boy was then connected to a telemetry system (to measure heart
rate) and told to walk as fast as possible over a 720 meter course. Postwalk heart rates were
simultaneously measured by the boys and via telemetry.
Postwalk Measurement
Mean
Std Dev
Telemetry Heart Rate
165.1
22.0
Self-Reported Heart Rate
143.6
31.3
Paired Difference in Heart Rate
(Telemetry – Self-Reported)
21.6
23.3
1. What information do the paired differences provide here? Why are they more informative than the
separate sets of self-reported and telemetry heart rates?
2. Does the data provide sufficient evidence to conclude that, on average, all middle school-age boys
are underreporting their post-exercise heart rates? Perform a hypothesis test to answer this question.
Remember to use steps A – E (Population, Statistical Method, Sample, Statistical Results, and
Conclusion).
Adapted from: Introduction to Statistics & Data Analysis
Chapter 11Activities Worksheets
Section 13.1 – Estimating the Difference Between Two Proportions
You are creating an interval that will estimate the true difference in the proportions of both populations. You will use
calculator procedures to help.
For the 2-proportion Z-Interval, you have:
pˆ 1 1  pˆ 1  pˆ 2 1  pˆ 2 

n1
n2
Formula:
 pˆ 1  pˆ 2    z critical value 
Conditions:
SRS, Independence, and Normality for both sampling distributions. Normality is now verified as
long as:
npˆ 1  5, n (1  pˆ 1 )  5
npˆ 2  5, n (1  pˆ 2 )  5
The National Highway Traffic Safety Administration published the Motor Vehicle Occupant Safety Survey in March 2000.
Their survey of seat belt use asked 3569 male and 3893 female drivers, “When driving this [vehicle], how often do you wear
your [lap/shoulder] belt?” In response, 74% of the male drivers and 84% of the female drivers answered “All of the time”.
1. How many of the male drivers answered “All of the time”? How many of the female drivers?
2. Estimate, with 95% confidence, the difference between the proportion of all male drivers and the proportion of all
female drivers who would answer that the wear their seat belt “all of the time”. Remember to use steps A – E (Population,
Statistical Method, Sample, Statistical Results, and Conclusion).
3. Based on your CI result, how much more likely are women to wear their seat belt than men?
Adapted from: Introduction to Statistics & Data Analysis
Chapter 11Activities Worksheets
Section 13.1 – Testing for a Difference Between Two Proportions
You are testing to determine if there is a significance difference in the proportions of both populations. You will use
calculator procedures to help.
For the 2-proportion Z-Test, you have:
Hypotheses:
H0: p1 = p2 (p1 - p2 = 0: no difference in population proportions)
Ha: p1  p2, p1 < p2, or p1 > p2
Conditions:
SRS, Independence, and Normality of Both Populations
Normality is now verified as long as:
Test Statistic:
z
 pˆ 1  pˆ 2 
pˆ c 1  pˆ c  pˆ c 1  pˆ c 
n1
p̂c 

npˆc  5, n (1  pˆc )  5
, where
n2
count of successes in both samples combined
.
count of individuals in both samples combined
In 1954 an experiment was conducted to test the effectiveness of the Salk vaccine as protection against the devastating
effects of polio. With their parents consent, 200,745 children were injected with the vaccine, while 201,229 other children
were injected with an ineffective saline solution. The experiment was “double blind” because the children being injected
didn’t know whether they were given the real vaccine or the placebo, and the doctors giving the injections didn’t know
either. The children were monitored for a period of years to determine if they developed paralytic polio. Thirty-three of the
200,745 vaccinated children later developed paralytic polio, whereas 115 of the 201,229 injected with the saline solution later
developed paralytic polio ("An Evaluation of the 1954 Poliomyelitis Vaccine Trials," American Journal of Public Health, 1955).
Does the data provide sufficient evidence to conclude that the Salk vaccine is effective at lowering the risk of developing
polio? Perform a hypothesis test to answer this question. Remember to use steps A – E (Population, Statistical Method,
Sample, Statistical Results, and Conclusion).
Adapted from: Introduction to Statistics & Data Analysis
Chapter 11Activities Worksheets
Download