Topic 8 - Comparing two samples • Confidence intervals/hypothesis tests for two means - pages 246 - 261 • Hypothesis test for two variances - pages 272 – 275 Comparing two populations • Sometimes we want to compare two populations rather making decisions about a single population. • For example, we might want to compare two population means or two population proportions to see if they are equal. • Is the expected drying time for one type of paint lower than that of another type of paint? • Is the proportion of republicans who favor withdrawing from Iraq higher than the proportion of democrats who favor withdrawal? Comparing two population means • Suppose we have two independent samples, X1,…,Xm and Y1,…,Yn, from two separate populations. • A natural statistic for comparing the two population means, mX and mY, is X Y . • E (X Y ) • Var (X Y ) • The distribution of X Y is also Normal for m and n both large. Large samples test for comparing population means To test H0: mX – mY = D0, use the test statistic Z HA X Y D0 sX2 /m sY2 /n Reject H0 if mX – mY < D0 Z < -za mX – mY > D0 Z > za mX – mY ≠ D0 |Z| > za/2 Home sales data A realtor in Albuquerque wants to argue that houses in the Northeast are more expensive on average than those in the rest of town. The data below contain sale prices (in $100s) for homes in the city. NE = 1 indicates a home was in the Northeast. NE = 0 indicates a home was not in the Northeast. Test the appropriate hypotheses with a = 0.01. Large samples confidence interval for the difference between two population means • A large sample (1-a)100% confidence interval for mX – mY is X Y za /2 sX2 /m sY2 /n • For the home sales data, what is a 99% confidence interval for the difference between sale prices in the Northeast and the rest of town? • Home sales data Equal population variances • Suppose we assume that the two populations have a common variance s2. • Var (X Y ) • We can then estimate this common variance using the pooled sample variance: 2 2 ( m 1) s ( n 1) s X Y s 2p n m 2 Small samples test for comparing population means from Normal distributions with equal variances To test H0: mX – mY = D0, use the test statistic T X Y D0 s p 1/m 1/n HA Reject H0 if mX – mY < D0 T < -ta,n+m-2 mX – mY > D0 T > ta,n+m-2 mX – mY ≠ D0 |T| > ta/2,n+m-2 THC example with equal variances The active component in marijuana is THC. An experiment was conducted to compare two slightly different configurations of this substance. The THC data set contains the time until the effect was perceived for 6 subjects exposed to each configuration. Is there any evidence that the mean time to perception is different between the two configurations using a = 0.01? Small samples confidence interval for the difference between two population means • Assuming equal variances, a small sample (1a)100% confidence interval for mX – mY is X Y ta /2,n m 2s p 1/m 1/n • For the THC data, what is a 99% confidence interval for the mean difference between the detection times for the two configurations? • THC data set Unequal population variances • The pooled procedures we have discussed previously are fairly robust to the assumption of equal variances. • In other words if the two population variances are relatively close, the procedures perform well: – The level of significance for the hypothesis test is close to what it should be – The coverage probability for the confidence interval is close to what it should be • If the variances are quite different, then we need a different procedure. Small samples test for comparing population means from Normal distributions with unequal variances To test H0: mX – mY = D0, use the test statistic X Y D0 T sX2 /m sY2 /n with degrees of freedom (sX2 /m sY2 /n )2 v 2 (sX /m )2 (sY2 /n )2 m 1 n 1 HA Reject H0 if mX – mY < D0 T < -ta,v mX – mY > D0 T > ta,v mX – mY ≠ D0 |T| > ta/2,v THC example with unequal variances Small samples confidence interval for the difference between two population means • Assuming unequal variances, a small sample (1a)100% confidence interval for mX – mY is X Y ta /2,v sx2 /m sY2 /n • For the THC data, what is a 99% confidence interval for the mean difference between the detection times for the two configurations? • THC data set Paired data • Sometimes we have a third variable that connects elements from the X and Y samples. • In this case, the assumption of independence between the two samples may be violated. • Is there any evidence that the first twin and the second twin have different average weights among boy-boy twins? • In this case, the twins are clearly connected by the mother. • It might be better to base our test on the n pairwise differences, Di = Xi – Yi. Paired test for comparing population means To test H0: mX – mY = D0, use the test statistic D D0 T sD n HA Reject H0 if mX – mY < D0 T < -ta,n-1 mX – mY > D0 T > ta,n-1 mX – mY ≠ D0 |T| > ta/2,n-1 Twins example • Load the Twins data from StatCrunch sample data sets. Is there any evidence that Twin A and Twin B have different average weights among boy-boy twins with a = 0.1? • StatCrunch Paired confidence interval for the difference between two population means • A small sample (1-a)100% confidence interval for mX – mY is D ta /2,n 1sD / n • For the twins data, what is a 90% confidence interval for the mean difference between the twin A and twin B weights? • StatCrunch Comparing two population proportions • A natural statistic for comparing the two population proportions, pX and pY, is pˆ X pˆY. • E( p ˆX p ˆY ) • Var ( p ˆX p ˆY ) ˆ X pˆY is also Normal for • The distribution of p m and n both large. Large samples test for comparing population proportions To test H0: pX – pY = 0, use the test statistic where pˆ X pˆY 0 Z 1 1 ˆ ˆ p(1 p )( ) m n m n pˆ pˆ X pˆY m n m n HA Reject H0 if p X – pY < 0 Z < -za p X – pY > 0 Z > za pX – pY ≠ 0 |Z| > za/2 Polio example • The following table summarizes a study of the efficacy of the Salk vaccine. Treatm ent Total patients Vaccine 201,229 Cases of polio 33 • Was the vaccine effective? Test at a = 0.05. Placebo 200,745 110 • StatCrunch Large samples confidence interval for the difference between two population proportions • A large sample (1-a)100% confidence interval for pX – pY is ˆX p ˆY za /2 p ˆ X (1 p ˆ X )/m p ˆY (1 p ˆY )/n p • For the Polio data, what is a 95% confidence interval for the difference between the proportion who contract the disease under each treatment? • StatCrunch Comparing two population variances • Suppose two chemical companies can supply a raw material, but we suspect the variability in concentration may differ between the two. • The standard deviation of concentration in a random sample of 15 batches from company 1 was found to be 4.7 g/l. A sample of 21 batches from company 2 yielded a standard deviation of 5.8 g/l. • Is there sufficient evidence to conclude that the variability in concentration differs for the two companies? Test for comparing population variances from Normal distributions To test H0: sX2 sY2, use the test statistic sX2 F 2 sY HA F calculator Reject H0 if sX2 > sY2 F > Fa,m-1,n-1 sX2 < sY2 F < F1a,m-1,n-1 sX2 ≠ sY2 F > Fa/2,m-1,n-1 or F < F1a/2,m-1,n-1 Chemical example • Is there sufficient evidence to conclude that the variability in concentration differs for the two companies with a = 0.05? • F Calculator Confidence interval for the ratio of two Normal population variances • A large sample (1-a)100% confidence interval for sX2/sY2 is 2 2 2 2 sX /sY sX /sY , Fa /2,m 1,n 1 F1a /2,m 1,n 1 • For the THC example, what is a 95% confidence interval for the ratio of concentration variances? • THC data set