Pharmaceutical Statistics Lecture 14 Hypothesis Testing: The Difference Between Two Population Means Hypothesis Testing: The Difference Between Two Population Means • • Hypothesis testing involving the difference between two population means is most frequently employed to determine whether or not it is reasonable to conclude that the population means are not equal. Using the same methodology, it is possible to test the hypothesis that the difference is equal to, greater than or equal to, or less than or equal to some value other than zero. In such cases, one of the following hypotheses may be formulated: H0: μ1 – μ2 = 0, HA: μ1 – μ2 ≠ 0 H0: μ1 – μ2 ≥ 0, HA: μ1 – μ2 0 H0: μ1 – μ2 ≤ 0, HA: μ1 – μ2 0 A) When we withdraw two samples from two populations that are normally distributed with known σ1 and σ2 • Test statistic for the null hypothesis of equal population means (H0: μ1 – μ2 = 0, HA: μ1 – μ2 ≠ 0) is: z x1 x 2 (x1 x 2 ) (1 2 ) H 12 22 n1 n2 • The subscript H indicates that the difference is a hypothesized parameter (in this case (μ1-μ2)H=0). Example Researchers are interested to know if there is any difference in the mean uric acid levels between normal individuals and individuals with Down’s syndrome. They collected samples from each population and uric acid level were determined in each samples. Analysis results are summarized below: Normal individuals With Down’s Syndrome Number of volunteers 15 12 Uric acid sample mean (mg/100 mL) 4.5 3.4 Assume that samples were drawn from normally distributed population with variance equal to 1 for the Down’s syndrome population and 1.5 for the normal population. A) Construct the proper hypothesis to test if there is significant difference between the two population means; B) Test your constructed hypothesis using the experimental data. C) Decide on your hypothesis and D) Conclude on the difference in mean uric acid levels between the two studied populations E) Calculate the P-value Example • We are interested in any significant difference, so we consider the null hypothesis with equal means H0: μ1 – μ2 = 0, HA: μ1 – μ2 ≠ 0 An alternative way of stating the hypothesis is as follows H0: μ1 = μ2, HA: μ1 ≠ μ2 • The test statistic is: z x1 x 2 (x1 x 2 ) (1 2 ) H 12 22 n1 • Calculation of the test statistic: z X N X D n2 (x N x D ) (N D ) H 2N nN D2 nD (4.5 3.4) 0 1.1 2.57 1 1.5 0.4282 12 15 Example • Decision rule: We use the z-stat. Let α=0.05, the critical values of z are 1.96 and -1.96. Reject H0 unless -1.96 z 1.96 • Statistical decision to reject H0 since +2.57 +1.96 • Conclude that on the basis of the data, there is an indication that the two population means are not equal. • P-value: 0.0102 (the area to the right of 2.57 and the left of 2.57) B) When we withdraw two samples from two populations that are NOT normally distributed (σ1 and σ2 are unknown) and the sample size is large If we have no idea about the normality of the parent distributions and their parameters, we use the CLT to justify our use of the z-table to find the reliability coefficient. We use the samples standard deviations to calculate the standard error of the difference. This is only true for large samples • Test statistic for the null hypothesis of equal population means (H0: μ1 – μ2 = 0, HA: μ1 – μ2 ≠ 0) is: z (x1 x 2 ) (1 2 ) H s12 s22 n1 n 2 X 1 X 2 s X 1 X [(s12/ n 1) (s22 / n2)] 2 • The subscript H indicates that the difference is a hypothesized parameter (in this case (μ1-μ2)H=0). Example • A study was designed to test the effect of disability on the beneficiary effects of health promotion. The researchers developed a scale for testing this effect (BHADP), the scale was administered to a sample of 132 disabled (D) and 137 nondisabled (ND) subjects with the following results: Sample Mean Score Standard Deviation D 31.83 7.93 ND 25.07 4.80 • The authors wish to know if they may conclude on the basis of these results that, in general, disabled persons, on the average, score higher on the BHADP scale. Construct the proper hypothesis and test it. Note: use a significance level of 1%. Example • The statistics were computed from two independent samples that behave as simple random samples from a normally distributed population of disabled persons and a population of nondisabled persons. • Since the population variances are unknown, we will use sample variances in the calculation of the test statistic. • Since we have large samples, the central limit theorem allows us to use z as a test statistic. z • Hypotheses: (x1 x 2 ) (1 2 ) H s12 s22 n1 n 2 H0: μD – μND ≤ 0 HA: μD – μND > 0 or alternatively: H0: μD ≤ μND HA: μD > μND Disabled scores are higher!! Disabled scores are higher!! • Decision rule: Let α=0.01. this is a one-tailed (right) test with critical value of z equal to 2.33 (from SND tables, Z0.99). • Reject H0 if zcomputed≥2.33 • Calculation of the test statistic: z (31.83 25.07) 0 2 (7.93) (4.80) 132 137 8.42 2 • Reject H0 since zcomputed = 8.42 > 2.33 • The data indicate that, on average, disabled persons score higher on the BHADP scale than do nondisabled person. • For this test, p 0.001 (very highly significant) C) When we withdraw two samples from two populations that are normally distributed (σ1 and σ2 are unknown) and the sample size is small • Parent pops: Normally distributed • Variance for pops: Unknown • Sample size: Small We can not use the z-distribution in this case. We need to use the t-table to find the critical values and we need to use standard deviations of the two samples to find the standard error in the t-score calculation. Important note in this case: The calculation of the standard error from s1,n1 and s2, n2 depend on the equality of the parent populations variances C.1) If they are equal: we use tdistribution C.2) If they are not equal: we use the t’-distribution C.1) When we withdraw two samples from two populations that are normally distributed (σ1 and σ2 are unknown and equal) and the sample size is small • • In this case, we need to consider the samples variances + to use t-table. Since the sample variance is dependent on the sample size, we need to take this into consideration for samples with different sizes (n) to calculate the pooled estimate of the common variance [This formula pools both sample 2 2 s 2 p • (n 1)s (n 1)s 1 1 2 n1 n 2 2 variances with their corresponding weight that based on the sample size] 2 NOTE: If the sample size for both independent samples is equal, we take directly the arithmetic mean of the two samples variances (simple average) . The standard error of the estimate will be: s 2p s 2p X 1 X • s 2 X 1 X 2 n1 n2 The test statistic for testing H0: μ1 = μ2 is given by: t (x1 x 2 ) (1 2 ) H s2p s2p n1 n 2 For critical values: We use the t-table with D.F= n1+n2-2 Example In a study to investigate the lung destruction in cigarette smokers, a lung destructive index was measured in a sample of lifelong nonsmokers and smokers. A larger score indicates greater lung damage. The data is summarized below: Non-smokers smokers Number of volunteers 9 16 Average score 12.4 17.5 Standard deviation 4.8492 4.4711 We wish to know if we may conclude that smokers, in general, have greater lung damage measured by this destructive index than do nonsmokers? Assuming that the lung destructive index scores in both populations are approximately normally distributed with equal variances, construct the proper hypothesis and test it. Example • Hypotheses: H0: μS ≤ μNS, HA: μS μNS • Test statistic (t-test with pooled variance): nS 9,x S 17.5, SS 4.4711............. nNS 16, x NS 12.4, S NS 4.8492 2 2 2 2 (n 1)s (n 1)s 8(4.8492) 15(4.4711) S S NS NS s 2p 21.2165 nS nNS 2 23 t (x S xNS ) (S NS )0 s 2p nS s 2p n NS (17.5 12.4) 0 2.6573 21.2165 21.2165 9 16 Example • Critical values and Decision rule: let α=0.05 (right-tailed test). the critical values of t is +1.714 (from table with D.F=23, cumm. prop=95%). Reject if tcaclulated > 1.714 Confidence=95% AUC-∞t0.95 α=5% (0.05) t0.95=1.714 for DF:23 Example • Statistical decision: we reject H0 because 2.6573>1.714 (falls in the rejection zone). • Conclusion: we conclude that smokers may have greater lung damage than nonsmokers. α=5% (0.05) tcalc=2.6573 t0.95=1.714 for DF:23 • p value: 0.01>P>0.005, since 2.500 2.65732.8073 tcalc=2.6573 C.2) When we withdraw two samples from two populations that are normally distributed (σ1 and σ2 are unknown and NOT equal) and the sample size is small • We can not use the t-table to find the critical values for D.F= n1+n2-2. • Solution: Instead of finding the critical values from the tables, we need to compute it taking into consideration the critical values for each sampling distributions and the weight of each sample. How to compute the critical values for two-tailed test? ' t1 /2 w1 w1t1 w 2 t 2 w1 w 2 2 1 s n1 How to compute the critical values for one-tailed test? ' t1 w1 w1t1 w 2 t 2 w1 w 2 2 1 s n1 s22 w2 n2 s22 w2 n2 t1 t1 / 2 ....... for: n1 1 t1 t1 ....... for: n1 1 t2 t1 / 2 ....... for: n2 1 t2 t1 ....... for: n2 1 And we use S1 and S2 in the formula of t-test _ _ (x x ) (1 2) 0 t s12 s22 n1 n 2 1 2 ' t1 /2 For a two sided test, reject H0 if the computed value of t is either greater than or equal to the critical values t`(1-α/2) or less than or equal to the negative of that value. w1t1 w 2 t 2 w1 w 2 s12 w1 n1 s22 w2 n2 α/2=2.5% (0.025) t1 t1 / 2 ....... for: n1 1 t2 t1 / 2 ....... for: n2 1 t1-α/2=t97.5%=t0.975 t' 1 w1t1 w 2 t 2 w1 w 2 α=5% (0.05) s12 w1 n1 s2 w2 2 n2 t1-α=t95%=t0.95 For a one-sided test with the rejection region in the right tail of the sampling distribution, reject H0 if the computed t is equal to or greater than the critical t`1-α. t1 t1 ....... for: n1 1 For a one-sided test with the rejection region in the left tail of the sampling distribution, t2 t1 ....... for: n2 1 reject H if the computed tis equal to or smaller than the negative of the critical t` 0 1-α computed. Example Researchers wish to know if two populations differ with respect to the mean value of the total serum complement activity (CH50). The data consist of CH50 determinations on apparently normal subjects (n2=20 ) and subjects with disease (n1=10 ). The sample means and standard deviations are: _ x1 62.6 s1 33.8 _ x 2 47.2 s 2 101.1 The research team want to know if they can conclude that the two population means are different. Assuming that both populations are approximately normally distributed, construct the proper hypothesis and test it?. The populations are normally distributed with unknown variances that are unequal. With this in mind we will use t’-stat. Example • Hypothesis: H0: μ1 – μ2 = 0 HA: μ1 – μ2 ≠ 0 • Test statistic: _ t _ (x 1 x 2 ) (1 2 )0 s12 s22 n1 n2 • We obtain the critical value by the equation: w1t1 w2t2 t` (1 ) w1 w2 2 Two-tailed test!! Example t1' /2 w1t1 w 2 t 2 114.244(2.2622) 5.1005(2.0930) 2.255 w1 w 114.244 5.1005 2 s12 33.8 2 w1 114.244 n1 10 s22 10.12 w2 5.1005 n2 20 t1 t1 / 2 ....... for : n1 1........... 2.2622 t2 t1 / 2 ....... for : n 2 1.......... 2.0930 Using t-table with DF=9 and cumm prop=0.975 (let α=0.05) Using t-table with DF=19 and cumm prop=0.975 (let α=0.05) • Since we found the t`(1-α/2) to be equal to 2.255, our critical values will be ±2.255 (two-tailed test) • Our decision rule is reject H0 if the computed t is either ≥2.255 or ≤2.255. • Calculation of the test statistic: t (62.6 47.2) 0 15.4 1.41 10.92 ( 3 3.8) 2 (1 0.1) 2 10 20 • Statistical decision: since -2.255 1.41 2.255, we can not reject the H0. • On the basis of these results we can not conclude that the two population means are different. • The p value of this test 0.05 Paired Comparisons • Previously we discussed the difference between two population means assuming that the samples were independent. • Some times we may want to assess the effectiveness of a treatment or experimental procedure making use of observations resulting from nonindependent samples. • A hypothesis test based on this type of data is called a paired comparison test. Why do we need paired test?? The objective in paired comparison tests is to eliminate maximum number of sources of extraneous variations by making the pairs similar with respect to as many variables as possible. Example Not Paired laser Paired No laser No laser To study the effect of gold nanoparticles in treating tumors, we induce tumor and then target it with gold nanoparticles which serve as “nanoheaters” upon radiation with laser. In “Not Paired” case, we use two groups that one receives gold nanoparticles and the other does not (control). The difference between treated and control may be due simply a difference in external characteristics between the mice in both groups. To eliminate this difference, we can use one group of mice and induce two identical tumors in the same mouse as you see in the picture to the right. Or we can use the same mice with before/after strategy. laser Why do we need paired test?? • Related or paired observations may be obtained in a number of ways: – The same subjects may be measured before and after receiving some treatment. – Same subjects with part of their bodies (treatment/control, previous slide) – In comparing two methods of analysis, the material to be analyzed may be divided equally so that one half is analyzed by one method and one half is analyzed by another. Paired Test • Instead of performing the analysis with individual observations, we use di, the difference between pairs of observations as the variable of interest. • When the n sample differences computed from the n pairs of measurements constitute a simple random sample from a normally distributed population of differences, the test statistic for testing hypothesis about the population mean difference μd is: d d 0 t Sd / n where: d is the mean for sample differences d is the hypothesized population mean differe 0 Sd is the standard deviation of the sample differe n is the number of sample differences Paired Test • The t statistic is distributed as Student’s t with n-1 degrees of freedom (this is the general case but not always!!). • We do not have to worry about the equality of variances in paired comparisons, since our variable is the difference in the reading of the same subject or object. • If the population variance of the difference is known, we can use z-stat (this is not applicable always) • If the assumption of normality for the distribution of the differences can not be made, we can use large n and thus use the CLT to justify our use of the z-stat (we approximate σ by the use of sample S) Example • In a study to evaluate the effect of very low calorie diet (VLCD) on the weight of 9 subjects, the following data was collected before (B) and after (A) treatment: B (Kg) 117.3 111.4 98.6 104.3 105.4 100.4 81.7 89.5 78.2 A (Kg) 83.3 75.8 82.9 62.7 69 63.9 85.9 82.3 77.7 • The researchers wish to know if these data provide sufficient evidence to allow them to conclude that the treatment is effective in causing weight reduction in those individuals. Assuming that differences between A&B are approximately normally distributed, construct the proper hypothesis and test it (you need to know that this is paired t-test!!!!!)?. Example • We may obtain the differences in one of two ways: by subtracting the before weights from the after weights (A – B) or by subtracting the after weights from the before weights (B – A). • If we choose (di=A – B), the differences are: -34, -25.5, -22.8, -21.4, -23.1, -22.7, -19, -20.5, -14.3 • Assumptions: the observed differences constitute a simple random sample from a normally distributed population of differences with unknown variance (t-test). • Hypotheses: H0: μd ≥0 (left-tailed test) HA: μd 0 (indicating weight reduction) Example • Hypotheses (for (di=A – B)): H0: μd ≥ 0 (left-tailed test) HA: μd 0 (indicating weight reduction) NOTE: If we had obtained the differences by subtracting the after weights from the before weights (B – A) our hypotheses would have been: H0: μd ≤ 0 (right-tailed test) HA: μd > 0 (indicating weight reduction) • If the question had been such that a two-tailed test was indicated (any difference/change), the hypotheses would have been: H0: μd = 0 HA: μd ≠ 0 Example • The test statistic: t d d 0 Sd / n • Decision rule: Let α=0.05, the critical value of t is -1.860, reject H0 if the computed t is less than or equal to the critical value. di 2 03.3 H0: μd ≥ 0 d 22.5889 n 9 (di d) 2 2 α=5% (0.05) sd 28.2961 n 1 22.588 9 0 t 12.74 28.2961 -1.860=-t1-α=-t0.95 (n-1=8) t'calc=-12.74 9 Reject H0, since -12.7395 is in the rejection reg We may conclude that the diet program is effect P-value<0.001 since 12.74<-5.041 see the table