Assumptions

advertisement
Handout 13.5
Mixed Inference
1. Some people think that chemists are more likely than other parents to have female children.
(Perhaps chemists are exposed to something in their laboratories that affects the sex of their children.)
The Washington State Department of Health lists the parents’ occupations on birth certificates. Between
1980 and 1990, 555 children were born to fathers who were chemists. Of these births, 273 were girls.
During this period, 48.8% of all births in Washington State were girls. Is there evidence that the
proportion of girls born to chemists is higher than the state proportion?
state proportion?
A:
H 0 : p=.488 The proportion of girls born to fathers who are chemists is 48.8%.
H a : p>.488 The proportion of girls born to fathers who are chemists is greater than 48.8%.
We do not have an SRS, and this may constitute an assumption violation. We do not know how large
the population of chemists who have fathered children is, so we do not know if it is more than 5550.
np0 = 555 (.488 ) ³ 10 n(1- p0 ) = 555 (1- .488 ) ³ 10
z=
273
p̂ - p0
555 - .488 = .1834
=
p0 (1 - p0 )
.488(1 - .488)
n
555
P(z ³ .1834) = .427
Fail to reject H0 at a = .05 , a value this extreme may occur by chance alone about
43% of the time.
We lack evidence that chemists have a higher proportion of daughters than the general population.
2. During 14 years of follow-up to the 1976 Nurses Health Study, the relationship between nut
consumption (true nuts, not peanuts) and risk of coronary heart disease was examined in a group of
86,016 female nurses aged 34 to 59 years of age without a prior diagnosis of coronary heart disease.
The data for 1255 of the nurses are given in the following table:
Frequency of nut
consumption
Fatal coronary
heart disease
Non-fatal myocardial
infarction (heart attack)
Total cases of coronary heart
disease
Almost never
197
345
542
Once a week
161
423
584
2-4 times per week
22
63
85
>4 times per week
14
30
44
Total
394
861
1255
Do the data give evidence that coronary heart disease is independent of nut consumption?
A:
H0: Nut consumption and heart disease are independent of one another.
Ha: Nut consumption and heart disease are not independent of one another.
or
H0: There is no relationship between nut consumption and heart disease.
Ha: There is a relationship between nut consumption and heart disease.
Handout 13.5
We have no evidence of a random sample so our results may not be representative. All expected counts
are greater than 5.
é 197 345 ù
é 170 372 ù
ê
ú
ê
ú
161 423 ú
183 401 ú
ê
ê
obs. =
exp =
ê 22 63 ú
ê 27 58 ú
ê 14 30 ú
ê 14 30 ú
ë
û
ë
û
c2 = å
(o - e)2
= 11.344 df = 3
e
P( c 2 ³ 11.344) = .010004
Reject H0 at a = .05 a value this extreme may occur by chance alone only 1% of
the time.
We have evidence of a relationship between nut consumption and the occurrence of fatal and nonfatal
heart disease, but recall that there was an assumption violation.
3. Researchers at the National Cancer Institute released the results of a study that examined the effect
of weed-killing herbicides on house pets. The following data is compatible with summary values given
in the report. Dogs, some of whom were from homes where the herbicide was used on a regular basis,
were examined for the presence of malignant lymphoma. Below are the data:
Group
Sample Size
# with Lymphoma
ˆ
p
Exposed
827
473
0.572
Unexposed
130
19
0.146
Estimate the difference by which the proportion of exposed dogs that develop lymphoma exceed that for
unexposed dogs.
Answer: 2-proportion Z-interval
We are uncertain of having SRSs. The samples can reasonably be expected to be independent.
Let p1=proportion of dogs exposed to herbicide that develop lymphoma.
Let p2=proportion of dogs not exposed to herbicide that develop lymphoma.
The populations of dogs exposed to herbicide and those not exposed to herbicide are each well over 10
times the sample sizes.
n1 p̂1 = 827 (.572 ) ³ 5 n2 p̂2 = 130 (.146 ) ³ 5
(
)
(
)
n1 1- p̂1 = 827 (1- .572 ) ³ 5 n2 1- p̂2 = 130 (1- .146 ) ³ 5
p̂ (1- p̂1) p̂2 (1- p̂2 )
p̂1 - p̂2 ± z* 1
+
n1
n2
.572 - .146 ±1.96
(.3563,.4953)
.572 (1- .572 ) .146 (1- .146 )
+
827
130
Handout 13.5
We are 95% confident that the proportion of dogs that develop lymphoma is between 35 and 50%,
higher than those not exposed to herbicide. In repeated random sampling this method captures the true
difference in proportions 95% of the time.
4. A distributor of raisins claims that the average box contains 36 raisins. The stem-and-leaf plot
displays the number of raisins found in 30 randomly selected 1/2 oz. boxes. Test the claim that the mean
number of raisins is actually less than 36.
Number of raisins
2
3
3
4
A:
679
13334444
555566777788899
0013
H0 : m = 36 The mean number of raisins in a box is 36.
Ha : m < 36 The mean number of raisins in a box is <36.
We are given an SRS. Our sample is large (30), so by the Central Limit Theorem the normal
approximation is useful.
t=
x - m 35.466 - 36
=
= -.7556 df = 30
s
3.8661
n
31
P(t £ -.7556) = .228
Fail to reject H0 a value this extreme may occur by chance alone about 23% of the
time. We lack strong evidence that there are fewer than 36 raisins per box.
5. An experiment on the side effects of pain relievers assigned arthritis patients to one of several overthe-counter pain medications. Of the 440 patients who took one brand of pain reliever, 23 suffered some
“adverse symptom.” Does the experiment provide strong evidence that fewer than 10% of patients who
take this medication have adverse symptoms?
H 0 : p = 0.10 The proportion of patients who suffer adverse symptoms when taking the medicine is 0.10.
H a : p < 0.10 The proportion of patients who suffer adverse symptoms when taking the medicine is <0.10
The data came from an experiment, and presumably they randomly assigned to treatment.
np = 440(.1) = 44 > 10
n(1- p) = 440(.9) = 396
10n = 4400 The population of patients taking the medicine is likely to be greater
than 4400.
z=
p̂ - p0
p0 (1- p0 )
n
23
- .1
= 440
= -3.337
.1(.9)
440
P(Z < -3.337) = 4.2 ´10-4 Reject H0,
p = 0.00042 < a = .01, a test statistic this small may occur by chance alone well less than 1% of the
time.
We have strong evidence that the true proportion of adverse symptoms is less than 10%.
Handout 13.5
6. A department store stocks blue jeans that are identical except for color. A random sample of 32
sales showed the following purchases:
Color
Number sold
Color
Number sold
Faded blue denim
13
Darker blue denim
8
Traditional blue denim
6
Black
5
Does this data indicate that one color of jeans is preferred over the others, or are consumers buying the
jeans in equal proportions?
A:
H0 : Consumers show equal preference for the various jean colors.
Ha : Consumers show unequal preference for the various jean colors.
We are given a random sample. All expected cell counts are greater than 5. Expected counts are 8,8,8,8.
c2 = å
(o - e)2
= 4.75 df = 3
e
P(c 2 ³ 4.75) = .191
Fail to reject H0 at a = .05 a value this extreme may occur by chance alone about
19% of the time.
We lack evidence that consumers have different preferences for various jean colors.
7. Poisoning by the pesticide DDT causes tremors and convulsions. In a study of DDT poisoning,
researchers fed several rats a measured amount of DDT. They then made measurements on the rats’
nervous systems that might explain how DDT poisoning causes tremors. One important variable was
the “absolute refractory period,” the time required for a nerve to recover after a stimulus. This period
varies normally. Measurements on ten rats gave the data below (in milliseconds).
1.5
2.0
1.7
1.9
1.8
1.6
2.15
1.75
1.50
3.01
(a) Give a 90% confidence interval for the mean absolutely refractory period for all rats of this
strain when subjected to the same treatment.
A: 1-sample t-interval
We are uncertain that this is an SRS. We are told that the refractory period varies normally.
s
x ± t*
n
.4455
1.891 ± t *
df = 9
10
(1.63, 2.15)
We are 90% confident that the mean refractory period is between 1.633 and 2.149 ms. In
repeated random samples this method captures the true mean difference approximately 90% of
the time.
Handout 13.5
A:
(b) Does this differ significantly from the published value of 1.88 ms for rats of this strain?
H 0 : m = 1.88 The mean refractory period is 1.88 ms.
H a : m ¹ 1.88 The mean refractory period is not 1.88 ms..
Using the confidence interval already constructed, we fail to reject H0 at a = .10 . Otherwise, we
calculate the test statistic and p-value.
t=
x-m
0 = 1.891-1.88 = 0.0781 df =9 P(t £ -.0780 or t ³ 0.0780) = .939
s
.4455
n
10
8. An educator believes that new reading activities in the classroom will help elementary school pupils
improve their reading ability. She arranges for a third-grade class of 21 students to follow these
activities for an 8-week period. A control classroom of 23 third graders follows the same curriculum
without the activities. At the end of the 8 weeks, all students are given the Degree of Reading Power
(DRP) test, which measures the aspects of reading ability that the treatment is designed to improve.
Here are the data:
Treatment
Control
24
43
58
71
43
57
42
43
55
26
62
48
49
61
44
67
49
43
37
33
41
19
54
28
53
56
59
52
62
46
20
85
46
10
17
55
54
57
33
60
53
48
37
42
Is there good evidence that the new activities improve the mean DRP score?
A: Let 1 = Treatment 2 = Control
H0 : m1 = m2 The mean DRP score is the same for both treatment and control.
Ha : m1 > m2 The mean DRP score is greater for the treatment.
We do not have SRSs. The samples are independent.
This normal probability plot of the treatment data is roughly linear suggesting a
normal model.
This normal probability plot of the control data is roughly linear suggesting a
normal model.
t=
x1 - x2
51.476 - 41.782
=
= 2.245 df=37 (calculator, round down)
2
2
2
2
s1 s2
11.007
17.201
+
+
21
23
n1 n2
Handout 13.5
P( t ³ 2.245) = .0153
Reject H0 a = .05 ,a value this extreme may occur by chance alone about 1% of the
time. We have strong evidence that the mean treatment DRP score is higher than
the mean control score. It is noted that our samples may not have been SRSs, so
our results may be in question.
9. The National Assessment of Educational Progress (NAEP) Young Adult Literacy Assessment
Survey interviewed a random sample of 1917 people 21 to 25 years old. The sample contained 840
men, of whom 775 were fully employed. There were 1077 women, and 680 of them were fully
employed.
(a) Use a 99% confidence interval to describe the difference between the proportions of young men
and young women who are fully employed. Is the difference statistically significant at the 1%
significance level?
A: Let p1=proportion of young men who are fully employed.
Let p2= proportion of young women who are fully employed
We are given two independent SRSs. The populations are large.
æ 775 ö
æ 680 ö
n1 p̂1 = 840 ç
³ 5 n2 p̂2 = 1077 ç
³5
÷
è 1077 ÷ø
è 840 ø
p̂1 - p̂2 ± z*
æ
ç
çè
p̂1(1- p̂1) p̂2 (1- p̂2 ) ö
+
÷
÷ø
n1
n2
æ .9226(1- .9226) .6313(1- .6313) ö
.9226 - .6313 ± z* ç
+
÷ø
è
840
1077
(.2465,.3359)
We are 99% confident that the true difference in proportions between men and women who are fully
employed is between 25% and 34%. In repeated random sampling, this method captures the true
difference in proportions about 99% of the time.
To answer the question of whether the difference is significant at the 1% level, we utilize our 99%
confidence interval, instead of starting all over to do a test. As we look at the interval we see that 0 is
not ion the interval. Our null hypothesis is H0 : p1 - p2 = 0 . Zero is not in the interval so we reject the
null hypothesis. The alternate hypothesis is Ha : p1 - p2 ¹ 0 . We have strong evidence that the
proportions of men and women who are fully employed are not the same.
(b) The mean and standard deviation of scores on the NAEP’s test of quantitative skills were
x1 = 272.40 and s1 = 59.2 for the men in the sample. For the women, the results were
x2 = 274.73 and s2 = 57.5 . Is the difference between the mean scores for men and women
significant at the 1% level?
Handout 13.5
A: Let 1 = Men 2 = Women
H0 : m1 = m2 The mean NAEP score is the same for men and women.
Ha : m1 ¹ m2 The mean NAEP score is not the same for men and women.
We have independent SRSs, given. We do not know if the NAEP scores vary normally, and lack data to
investigate.
t=
x1 - x2
272.40 - 274.73
=
= -.8620 df=1777(calculator, round down)
s12 s22
59.2 2 57.5 2
+
+
840
1077
n1 n2
P( t £ -.8620 or t ³ .8620) = .388
Fail to reject H0 at a = .05 , a value this extreme may occur by chance alone about
39% of the time. We lack strong evidence that men and women have different
mean NAEP scores.
Download