Topic 08

advertisement
Topic 8 - Comparing two samples
• Confidence intervals/hypothesis tests for
two means - pages 246 - 261
• Hypothesis test for two variances - pages
272 – 275
Comparing two populations
• Sometimes we want to compare two
populations rather making decisions about
a single population.
• For example, we might want to compare two
population means or two population
proportions to see if they are equal.
• Is the expected drying time for one type of
paint lower than that of another type of
paint?
• Is the proportion of republicans who favor
withdrawing from Iraq higher than the
proportion of democrats who favor
withdrawal?
Comparing two population means
• Suppose we have two independent samples,
X1,…,Xm and Y1,…,Yn, from two separate
populations.
• A natural statistic for comparing the two
population means, mX and mY, is X  Y .
• E (X  Y ) 
• Var (X  Y ) 
• The distribution of X  Y is also Normal for
m and n both large.
Large samples test for comparing population means
To test H0: mX – mY = D0, use the test statistic
Z 
HA
X  Y  D0
sX2 /m  sY2 /n
Reject H0 if
mX – mY < D0 Z < -za
mX – mY > D0 Z > za
mX – mY ≠ D0 |Z| > za/2
Home sales data
A realtor in Albuquerque wants to argue that houses in the
Northeast are more expensive on average than those in the
rest of town. The data below contain sale prices (in $100s)
for homes in the city. NE = 1 indicates a home was in the
Northeast. NE = 0 indicates a home was not in the
Northeast. Test the appropriate hypotheses with a = 0.01.
Large samples confidence interval for the
difference between two population means
• A large sample (1-a)100% confidence interval for
mX – mY is
X  Y  za /2 sX2 /m  sY2 /n
• For the home sales data, what is a 99%
confidence interval for the difference between
sale prices in the Northeast and the rest of town?
• Home sales data
Equal population variances
• Suppose we assume that the two populations
have a common variance s2.
• Var (X  Y ) 
• We can then estimate this common variance
using the pooled sample variance:
2
2
(
m

1)
s

(
n

1)
s
X
Y
s 2p 
n m 2
Small samples test for comparing population means
from Normal distributions with equal variances
To test H0: mX – mY = D0, use the test statistic
T 
X  Y  D0
s p 1/m  1/n
HA
Reject H0 if
mX – mY < D0
T < -ta,n+m-2
mX – mY > D0
T > ta,n+m-2
mX – mY ≠ D0
|T| > ta/2,n+m-2
THC example with equal variances
The active component in marijuana is THC. An
experiment was conducted to compare two slightly
different configurations of this substance. The THC data
set contains the time until the effect was perceived for 6
subjects exposed to each configuration. Is there any
evidence that the mean time to perception is different
between the two configurations using a = 0.01?
Small samples confidence interval for the
difference between two population means
• Assuming equal variances, a small sample (1a)100% confidence interval for mX – mY is
X  Y  ta /2,n m 2s p 1/m  1/n
• For the THC data, what is a 99% confidence
interval for the mean difference between the
detection times for the two configurations?
• THC data set
Unequal population variances
• The pooled procedures we have discussed
previously are fairly robust to the
assumption of equal variances.
• In other words if the two population
variances are relatively close, the procedures
perform well:
– The level of significance for the hypothesis
test is close to what it should be
– The coverage probability for the
confidence interval is close to what it
should be
• If the variances are quite different, then we
need a different procedure.
Small samples test for comparing population means
from Normal distributions with unequal variances
To test H0: mX – mY = D0, use the test statistic
X  Y  D0
T 
sX2 /m  sY2 /n
with degrees of freedom
(sX2 /m  sY2 /n )2
v 2
(sX /m )2
(sY2 /n )2

m 1
n 1
HA
Reject H0 if
mX – mY < D0
T < -ta,v
mX – mY > D0
T > ta,v
mX – mY ≠ D0
|T| > ta/2,v
THC example with unequal variances
Small samples confidence interval for the
difference between two population means
• Assuming unequal variances, a small sample (1a)100% confidence interval for mX – mY is
X  Y  ta /2,v sx2 /m  sY2 /n
• For the THC data, what is a 99% confidence
interval for the mean difference between the
detection times for the two configurations?
• THC data set
Paired data
• Sometimes we have a third variable that
connects elements from the X and Y samples.
• In this case, the assumption of independence
between the two samples may be violated.
• Is there any evidence that the first twin and
the second twin have different average weights
among boy-boy twins?
• In this case, the twins are clearly connected
by the mother.
• It might be better to base our test on the n
pairwise differences, Di = Xi – Yi.
Paired test for comparing population means
To test H0: mX – mY = D0, use the test statistic
D  D0
T 
sD
n
HA
Reject H0 if
mX – mY < D0
T < -ta,n-1
mX – mY > D0
T > ta,n-1
mX – mY ≠ D0
|T| > ta/2,n-1
Twins example
• Load the Twins data from StatCrunch sample data
sets. Is there any evidence that Twin A and Twin B
have different average weights among boy-boy
twins with a = 0.1?
• StatCrunch
Paired confidence interval for the difference
between two population means
• A small sample (1-a)100% confidence interval for
mX – mY is
D  ta /2,n 1sD / n
• For the twins data, what is a 90% confidence
interval for the mean difference between the twin A
and twin B weights?
• StatCrunch
Comparing two population proportions
• A natural statistic for comparing the two
population proportions, pX and pY, is pˆ X  pˆY.
• E( p
ˆX  p
ˆY ) 
• Var ( p
ˆX  p
ˆY ) 
ˆ X  pˆY is also Normal for
• The distribution of p
m and n both large.
Large samples test for comparing population
proportions
To test H0: pX – pY = 0, use the test statistic
where
pˆ X  pˆY  0
Z 
1 1
ˆ
ˆ
p(1  p )(  )
m n
m
n
pˆ 
pˆ X 
pˆY
m n
m n
HA
Reject H0 if
p X – pY < 0
Z < -za
p X – pY > 0
Z > za
pX – pY ≠ 0
|Z| > za/2
Polio example
• The following table summarizes a study of the
efficacy of the Salk vaccine.
Treatm
ent
Total
patients
Vaccine 201,229
Cases of
polio
33
• Was the vaccine effective? Test at a = 0.05.
Placebo 200,745
110
• StatCrunch
Large samples confidence interval for the
difference between two population proportions
• A large sample (1-a)100% confidence interval for pX
– pY is
ˆX  p
ˆY  za /2 p
ˆ X (1  p
ˆ X )/m  p
ˆY (1  p
ˆY )/n
p
• For the Polio data, what is a 95% confidence
interval for the difference between the proportion
who contract the disease under each treatment?
• StatCrunch
Comparing two population variances
• Suppose two chemical companies can supply a
raw material, but we suspect the variability in
concentration may differ between the two.
• The standard deviation of concentration in a
random sample of 15 batches from company 1
was found to be 4.7 g/l. A sample of 21
batches from company 2 yielded a standard
deviation of 5.8 g/l.
• Is there sufficient evidence to conclude that the
variability in concentration differs for the two
companies?
Test for comparing population variances from
Normal distributions
To test H0: sX2 sY2, use the test statistic
sX2
F  2
sY
HA
F calculator
Reject H0 if
sX2 > sY2
F > Fa,m-1,n-1
sX2 < sY2
F < F1a,m-1,n-1
sX2 ≠ sY2
F > Fa/2,m-1,n-1
or
F < F1a/2,m-1,n-1
Chemical example
• Is there sufficient evidence to conclude that
the variability in concentration differs for
the two companies with a = 0.05?
• F Calculator
Confidence interval for the ratio of two Normal
population variances
• A large sample (1-a)100% confidence interval for
sX2/sY2 is
2
2
2
2
 sX /sY
sX /sY 
,


 Fa /2,m 1,n 1 F1a /2,m 1,n 1 
• For the THC example, what is a 95% confidence
interval for the ratio of concentration variances?
• THC data set
Download