AP Statistics Section 13.1 A Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were randomly assigned to two treatment groups: Lipitor or Pravachol. At the end of the study, researchers compared the mean “bad cholesterol levels” for each group. This is a question about comparing two means. The researchers also compared the proportion of subjects who died, had a heart attack or suffered other serious consequences in the first two years. This is a question about comparing two proportions. Two-sample problems can arise from a randomized comparative experiment that randomly divides the subjects into two groups and exposes each group to a different treatment. Unlike the matched pairs design studied earlier there is no matching of the units in the two samples and the samples can even be of different sizes. Two-sample problems also arise when comparing two different samples randomly selected from two populations. Conditions for Comparing Two Means We have two SRSs from two distinct populations. This allows SRS: ___________________________________________ us to generalize our findings. We measure the same variable for both groups. Normality: Both populations are Normally distributed. In practice, it is enough that the distributions have _______________ similar shapes and that the data have no strong _________. outliers More on this at the end of the notes. Independence: The samples are independent. That is, one sample has no influence on the other. Paired observations violate independence, for example. When sampling without replacement from two distinct populations, each population must be at least _____ 10 times as large as the corresponding sample size. We want to compare the two population means, either by giving a confidence interval for their difference _______ 1 2 or by testing the hypothesis of no difference, ___________. H 0 : 1 2 0 To do inference about the difference between the means of the two populations, we start with the difference between the means of the two samples, _____. x1 x2 The Two-Sample z Statistic Here are the facts about the sampling distribution of the difference between the two sample means of independent SRSs. 1. The mean of x1 x2 equals ________ 1 2 (i.e. the difference of sample means is an __________ unbiased estimator of the difference of population means. 2. The variance of the difference is the sum of the variances 2 2 of x1 x2 , which is 1 2 n1 n2 Note: the variances add because the samples are independent. The standard deviations do not. 3. If the two population distributions are both Normal, then the distribution of x1 x2 is also Normal. Two-sample z statistic (for use when is known) Suppose that x1 is the mean of an SRS of size n1 drawn from a Normally distributed population with mean 1 and standard deviation 1 and that x2 is the mean of an SRS of size n2 drawn from a Normally distributed population with mean 2 and standard deviation 2 . Then the two-sample z statistic z x1 x2 1 2 12 n1 22 n2 has the standard Normal distribution. It is really very unlikely that both population standard deviations are known. Since this is rarely the case, let’s consider the more useful t procedures. The Two-Sample t Procedures Because we don’t know the population standard deviations, we estimate them by the standard deviations from our two samples. Recall that this is called the ______________ standard error 2 1 2 2 s s SE n1 n2 We standardize our estimate x1 x2 , using the two-sample t statistic: t x1 x2 ( 1 2 ) 2 1 2 2 s s n1 n2 The level C confidence interval for 1 2 is given by the formula: x1 x2 t 2 1 2 2 s s n1 n2 The degrees of freedom, will equal _______________________ the smaller of n1 1 and n 2 1 Note: The two-sample t statistic has approximately a t distribution. It does not have exactly a t distribution even if the populations are both exactly Normal. Example 13.2-3: Does increasing the amount of calcium in our diet reduce blood pressure? Examination of a large sample of people revealed a relationship between calcium intake and blood pressure. The relationship was strongest for black men. Such observational studies do not establish causation. Researchers therefore designed a randomized comparative experiment. The subjects in part of the experiment were 21 healthy black men. A randomly chosen group of 10 of the men received a calcium supplement for 12 weeks. The control group of 11 men received a placebo pill that looked identical. The experiment was double-blind. The response variable is the decrease in systolic blood pressure for a subject after 12 weeks, in mm of Hg. An increase appears as a negative response. 5 8.743 - .273 5.901 Hypothesis: The populationof interest is healthy black men. Wish to test H 0 : 1 2 vs H a : 1 2 where 1 mean decrease in the systolic blood pressure of the calcium group 2 mean decrease in the systolic blood pressure of the control group Conditions: SRS : While the randomization in the experiment helps, the subjects are volunteers and not an SRS so results may not generalize to the population. Normality of x : A boxplot of the data shows no outliers in either group and both distributions are approximately Normal. So I will assume the population distributions are approximately Normal. Independence : Because of the randomization, I will assume the two groups are independent samples. For each group, N 10n since we are sampling w/o replacement. Calculations: t x1 x2 ( 1 2 ) s12 s22 n1 n2 Degrees of freedom 10 - 1 9 5 (.273) 8.7432 5.9012 10 11 p - value between .05 and .1 p - value .072 1.604 TI 83 / 84 : STAT TESTS 4 : 2 - Samp T Test Choose NO to pooled question. Interpretation: My p - value of .072 is greater than the commonly accepted significance level of .05 so I fail to reject the H 0 . My conclusion is that the experiment failed to show that calcium reduces blood pressure. Example: Construct and interpret a 90% confidence interval for the previous example. x1 x2 t s12 s22 8.7432 5.9012 5 (.273) 1.833 n1 n2 10 11 (.754,11.300) I am 90% confident the difference between the mean bllod pressure readings of the calcium group and the control group is between - .754 and 11.300 mm of Hg. TI 83 / 84 : STAT TESTS 4 : 2 - Samp T Int We know that sample size does influence the P-value of a test. A result that fails to be significant at a specified level in a small sample may be significant in a larger sample. Subsequent analysis of data from an experiment with more subjects resulted in a P-value of 0.008. Robustness Again The two-sample t procedures are more robust than the one-sample t methods, particularly not symmetric when the distributions are _____________. When the sizes of the two samples are _______ equal and the two populations being compared have distributions with similar ______, shape probability values from the t table are quite accurate for a broad range of distributions, even when the sample sizes are as small as ____. 5 As a guide, n1 n2 should be greater than or equal to ___ 10 with both n1 __ 5 and n2 __. 5 In planning a two-sample study, choose _______ equal sample sizes if you can. The two-sample t procedures are most robust against non-Normality in this case and the conservative P-values are most accurate.