Introduction to Biostatistics for Clinical Researchers University of Kansas Department of Biostatistics & University of Kansas Medical Center Department of Internal Medicine Schedule 5th lecture, TBD Materials PowerPoint files can be downloaded from the Department of Biostatistics website at http://biostatistics.kumc.edu A link to the recorded lectures will be posted in the same location Topics Comparing Two (or more) Population Means (continued) Simple Linear Regression Comparing Two (or more) Independent Proportions Comparing Two (or More) Population Means (Continued) Sampling Distribution Detail What exactly is the sampling distribution of the difference in sample means? A Student’s t distribution is used with n1 - n2 - 2 degrees of freedom (total sample size minus two) Two-Sample t-test In a randomized design, 23 patients with hyperlipidemia were randomized to either treatment A or treatment B for 12 weeks 12 to A 11 to B LDL cholesterol levels (mmol/L) measured on each subject at baseline and 12 weeks The 12-week change in LDL cholesterol was computed for each subject Treatment Group A B 12 11 Mean LDL change -1.41 -0.32 Standard deviation of LDL changes 0.55 0.65 N Two-Sample t-test Is there a difference in LDL change between the two treatment groups? Methods of inference CI for the difference in mean LDL cholesterol change between the two groups Statistical hypothesis test 95% CI for Difference in Means Treatment Group A B 12 11 Mean LDL change -1.41 -0.32 Standard deviation of LDL changes 0.55 0.65 N x1 - x 2 t1-,n1 n2 -2 SE x1 - x 2 0.552 0.652 -1.41 - -0.32 t1-,n1 n2 -2 12 11 -1.09 t1-,n1 n2 -2 0.25 95% CI for Difference in Means How many standard errors to add and subtract (i.e., what is the correct multiplier)? The number we need comes from a t with 12 + 11 - 2 = 21 degrees of freedom From t table or excel, this value is 2.08 The 95% CI for true mean difference in change in LDL cholesterol, drug A to drug B is: -1.09 2.08 0.25 -1.61, -0.57 Hypothesis Test to Compare Two Independent Groups Two-sample (unpaired) t-test: getting a p-value Is the change in LDL cholesterol the same in the two treatment groups? HO: μ1 = μ2 HO: μ1 - μ2 = 0 HA: μ1 ≠ μ2 HA: μ1 - μ2 ≠ 0 Hypothesis Test to Compare Two Independent Groups Recall the general “recipe” for hypothesis testing: 1. Assume HO is true 2. Measure the distance of the sample result from the hypothesized result (here, it’s 0) 3. Compare the test statistic (distance) to the appropriate distribution to get the p-value t observed difference - null difference SE observed difference t x1 - x 2 - O SE x1 - x 2 x1 - x 2 s12 s22 n1 n2 Diet Type and Weight Loss Study In the diet types and weight loss study, recall: x1 - x2 -1.09 SE x1 - x2 0.25 In this study: t -1.09 -4.4 0.25 This study result was 4.4 standard errors below the null mean of 0 How are p-values Calculated? Is a result 4.4 standard errors below 0 unusual? It depends on what kind of distribution we are dealing with The p-value is the probability of getting a result as extreme or more extreme than what was observed (-4.4) by chance, if the null hypothesis were true The p-value comes from the sampling distribution of the difference in two sample means What is the sampling distribution of the difference in sample means? t1211-2 21 Hyperlipidemia Example To compute a p-value, we need to compute the probability of being 4.4 or more SE away from 0 on the t with 21 degrees of freedom P = 0.0003 Summary: Weight Loss Example Statistical Methods Twenty-three patients with hyperlipidemia were randomly assigned to one of two treatment groups: A or B 12 patients were assigned to receive A 11 patients were assigned to receive B Baseline LDL cholesterol measurements were taken on each subject and LDL was again measured after 12 weeks of treatment The change in LDL cholesterol was computed for each subject The mean LDL changes in the two treatment groups were compared using an unpaired t-test and a 95% confidence interval was constructed for the difference in mean LDL changes Summary: Weight Loss Example Result Patients on A showed a decrease in LDL cholesterol of 1.41 mmol/L and subjects on treatment B showed a decrease of 0.32 mmol/L (a difference of 1.09 mmol/L, 95% CI: 0.57 to 1.61 mmol/L) The difference in LDL changes was statistically significant (p < 0.001) FYI: Equal Variances Assumption The “traditional” t-test assumes equal variances in the two groups This can be formally tested using another hypothesis test But why not just compare observed values of s1 to s2? There is a slight modification to allow for unequal variances-this modification adjusts the degrees of freedom for the test, using slightly different SE computation If you want to be truly ‘safe’, it is more conservative to use the test that allows for unequal variances Makes little to no difference in large samples FYI: Equal Variances Assumption If underlying population level standard deviations are equal, both approaches give valid confidence intervals, but intervals assuming unequal standard deviations are slightly wider (p-values slightly larger) If underlying population level standard deviations are unequal, the approach assuming equal variances does not give valid confidence intervals and can severely under-cover the goal of 95% Non-Parametric Analogue to the Two-Sample t Alternative to the Two Sample t-test “Non-parametric” refers to a class of tests that do not assume anything about the distribution of the data Nonparametric tests for comparing two groups Mann-Whitney Rank-Sum test (Wilcoxon Rank Sum Test) Also called Wilcoxon-Mann-Whitney Test Attempts to answer: “Are the two populations distributions different?” Advantages: does not assume populations being compared are normally distributed, uses only ranks, and is not sensitive to outliers Alternative to the Two Sample t-test Disadvantages: often less sensitive (powerful) for finding true differences because they throw away information (by using only ranks rather than the raw data) need the full data set, not just summary statistics results do not include any CI quantifying range of possibility for true difference between populations Health Education Study Evaluate an intervention to educate high school students about health and lifestyle over a two-month period 10 students randomized to intervention or control group X = post-test score - pre-test score Compare between the two groups Health Education Study • Only five individuals in each sample • We want to compare the control and intervention to assess whether the ‘improvement’ in scores are different, taking random sampling error into account Intervention 5 0 7 2 19 Control -5 -6 1 4 6 • With such a small sample size, we need to be sure score improvements are normally distributed if we want to use the t test (BIG assumption) • Possible approach: Wilcoxon-Mann-Whitney test Health Education Study Step 1: rank the pooled data, ignoring groups Intervention 5 0 7 2 19 Control -5 -6 1 4 Intervention 7 3 9 5 10 Control 2 1 4 6 6 8 Step 2: reattach group status Step 3: find the average rank in each of the two groups 3 5 7 9 10 6.8 5 1 2 4 6 8 4.2 5 Health Education Study Statisticians have developed formulas and tables to determine the probability of observing such an extreme discrepancy in ranks (6.8 versus 4.2) by chance alone (p) The p-value here is 0.17 The interpretation is that the Mann-Whitney test did not show any significant difference in test score ‘improvement’ between the intervention and control group (p = 0.17) The two-sample t test would give a different answer (p = 0.14) Different statistical methods give different p-values If the largest observation was changed, the MW p would not change but the t p-value would Notes The t or the nonparametric test? Statisticians will not always agree, but there are some guidelines Use the nonparametric test if the sample size is small and you have no reason to believe data is ‘well-behaved’ (normally distributed) Only ranks are available Summary: Educational Intervention Example Statistical methods 10 high school students were randomized to either receive a two-month health and lifestyle education program or no program Each student was administered a test regarding health and lifestyle issues prior to randomization and after the two-month period Differences in the two test scores were computed for each student Mean and median test score changes were computed for each of the two study groups A Mann-Whitney rank sum test was used to determine if there was a statistically significant difference in test score change between the intervention and control groups at the end of the two-month study period Summary: Educational Intervention Example Results Participants randomized to the educational intervention scored a median five points higher on the test given at the end of the two-month study period, as compared to the test administered prior to the intervention Participants randomized to receive no educational intervention scored a median one point higher on the test given at the end of the two-month study period The difference in test score improvements between the intervention and control groups was not statistically significant (p = 0.17) Comparing Means between More than Two Independent Populations Motivating Example Suppose you are interested in the relationship between smoking and mid-expiratory flow (FEF), a measure of pulmonary health Suppose you recruit study subjects and classify them into one of six smoking categories Nonsmokers (NS) Passive smokers (PS) Non-inhaling smokers (NI) Light smokers (LS) Moderate smokers (MS) Heavy smokers (HS) Motivating Example You are interested in whether differences exist in mean FEF among the six groups Main outcome variable is FEF in liters per second Motivating Example One strategy is to perform lots of two-sample t-tests (for each possible two-group comparison) In this example, there would be 15 comparisons you would need to do: NS-PS NS-NI ... MS-HS It would be nice to have one “catch-all” test Something that would tell you whether there were any differences among the six groups If so, you could then do group-to-group comparisons to look for specific differences Extension of the Two-Sample t-test Analysis of Variance (ANOVA) The t-test compares means in two populations ANOVA compares means among more than two populations with one test The p-value from ANOVA answers the question: “Are there any differences in the means among the populations?” Extension of the Two-Sample t-test General idea behind ANOVA, comparing means for k > 2 groups: HO: μ1 = μ2 = . . . = μk HA: At least one μj is different Example Smoking and FEF (Forced Mid-Expiratory Flow Rate)1 A sample of over 3,000 persons was classified into one of six smoking categorizations based on responses to smoking related questions 1 White, J.R., Froeb, H.F. (1980). Small-airways dysfunction in non-smokers chronically exposed to tobacco smoke, NEJM 302: 13. Example Nonsmokers (NS) Passive smokers (PS) Non-inhaling smokers (NI) Light smokers (LS) Moderate smokers (MS) Heavy smokers (HS) Example Smoking and FEF From each smoking group, a random sample of 200 men was drawn (except for the non-inhalers, as there were only 50 male non-inhalers in the entire sample of 3,000) FEF measurements were taken on each of the subjects Data Summary Based on a one-way analysis of variance, there are statistically significant differences in FEF levels among the six smoking groups (p < 0.001) What’s the Rationale? In the simplest case, the variation in subject responses is broken down into parts: variation in response attributed to the treatment (group/sample), to error (subject characteristics + everything else not controlled for) The variation in the treatment (group/sample) means is compared to the variation within a treatment (group/sample) If the between treatment variation is a lot bigger than the within treatment variation, that suggests there are some different effects among the treatments Example: Scenarios 1 2 3 Example: Scenarios There is an obvious difference between scenarios 1 and 2. What is it? Just looking at the boxplots, which of the two scenarios (1 or 2) do you think would provide more evidence that at least one of the populations is different from the others? Why? F Distribution Properties, F(dfnum, dfden) The total area under the curve is one. The distribution is skewed to the right. The values are non-negative, start at zero and extend to the right, approaching but never touching the horizontal axis. The distribution of F changes as the degrees of freedom change. F= Variation between the sample means Natural variation within the samples F Statistic F= Variation between the sample means Natural variation within the samples Case A: If all the sample means were exactly the same, what would be the value of the numerator of the F statistic? Case B: If all the sample means were spread out and very different, how would the variation between sample means compare to the value in A? F Statistic F= Variation between the sample means Natural variation within the samples So what values could the F statistic take on? Could you get an F that is negative? Why not? What type of values of F would support the alternative hypothesis? Example: F Statistic Three independent random samples Scenario 1: means 60, 65, 70; s = 1.5 Scenario 2: means 60, 65, 70; s = 3 Scenario 3: means 65, 65, 65; s = 3 Scenario F P 1: HA is true 129 0 2: HA is true 45 0 3: HO is true 0 0.48 Summary: Smoking and FEF Statistical Methods 200 men were randomly selected from each of five smoking classification groups (non-smoker, passive smokers, light smokers, moderate smokers, and heavy smokers), as well as 50 men classified as non-inhaling smokers for a study designed to analyze the relationship between smoking and respiratory function Summary: Smoking and FEF Statistical Methods Analysis of variance was used to test for any differences in FEF levels among the six groups of men Individual group comparisons were performed with a series of two-sample t-tests and 95% confidence intervals were constructed for the mean difference in FEF between each combination of groups Analysis of variance showed statistically significant (p < 0.001) differences in FEF between the six groups of smokers Non-smokers had the highest mean FEF value (3.78 L/s) and this was statistically significantly larger than the five other smoking-classification groups Summary: Smoking and FEF Results Analysis of variance showed statistically significant (p < 0.001) differences in FEF between the six groups of smokers Non-smokers had the highest mean FEF value (3.78 L/s) and was statistically significantly larger than the five other smoking-classification groups The mean FEF value for non-smokers was 1.19 L/s higher than the mean FEF for heavy smokers (95% CI: 1.03-1/35 L/s), the largest mean difference between any two smoking groups Confidence intervals for all smoking group FEF comparisons are in Table 1 Example FEV1 and three medical centers1 Data was collected on 63 patients with coronary artery disease at 3 different medical centers: Johns Hopkins, Ranchos Los Amigos Medical Center, St. Louis University School of Medicine) Purpose of study was to investigate effects of carbon monoxide exposure on these patients Prior to analyzing CO effects data, researchers wished to compare the respiratory health of these patients across the three medical centers 1 Pagano, M., Gauvreau, K. (2000). Principles of Biostatistics. Duxbury Press Boxplots of Data ANOVA Table Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F Between Groups 1.5828 2 0.791418 3.12 Within Groups 14.48 57 0.254 Total 16.063 59 0.2723 P 0.052 ANOVA Table Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F Between Groups 1.5828 2 0.791418 3.12 Within Groups 14.48 57 0.254 Total 16.063 59 0.2723 P 0.052 Simple Linear Regression The Equation of a Line Recall (from Algebra) that there are two values which uniquely define any line Y-intercept—where the line crosses the y-axis (when x = 0) Slope—the “rise over run”—how much y changes for every unit change in x The equation of a line is given by: y = mx + b where m is the slope and b is the y-intercept The Equation of a Line Statisticians have their own notation: y = b0 + b1x b0 = y-intercept b1 = slope y = β0 + β1x β 0 = y-intercept β 1 = slope The Intercept, β0 The intercept, β0, is the value of y when x = 0 It is the point on the graph where the line crosses the y axis at the coordinate (0, β0) The Slope, β1 The slope, β1, is the change in y corresponding to a unit increase in x The Slope, β1 The slope, β1, is the change in y corresponding to a unit increase in x The Slope, β1 This change is the same across the entire line The Slope, β1 All information about the difference in the y-value for two differing values of x is contained in the slope For example: two values of x three units apart will have a difference in y values of 3(β1) The Slope, β1 For example: two values of x three units apart will have a difference in y values of 3(β1) The Slope, β1 For example: two values of x three units apart will have a difference in y values of 3(β1) The Slope, β1 The slope is the change in y corresponding to a unit increase in x: it is the difference in y-values for x + 1 compared to x If β1 = 0, this indicates that there is no association between x and y (i.e., the values of y are the same regardless of the values of x) If β1 > 0, this indicates that there is a positive association between x and y (i.e., the values of y increase with increasing values of x) If β1 < 0, this indicates that there is a negative association between x and y (i.e., the values of y decrease with increasing values of x) The Slope, β1 The Equation of a Line In linear regression, points don’t fit exactly to a line y = 2x + 1 25 20 15 10 5 0 0 2 4 6 8 10 The Equation of a Line In linear regression, points don’t fit exactly to a line y = 2x + 1 + error 25 20 15 10 5 0 0 2 4 6 8 10 The Equation of a Line In linear regression, points don’t fit exactly to a line y = 2x + 1 + more error 30 25 20 15 10 5 0 -5 0 -10 -15 2 4 6 8 10 Linear Regression Deterministic Model: model for an exact relationship between variables (y = Ax) For example: Inches = 2.54·Centimeters Probabilistic Model: model that accounts for unexplained variation in the relationship between two or more variables General Form: y = [Deterministic component] + [Random error] We estimate a line that relates the mean of an outcome y to a predictor x ˆ0 ˆ0 x E y ˆ0 , ˆ0 are where E[y] is the expected (mean) value of y and estimated y-intercept and slope, respectively The Equation of a Line ˆ0 , ˆ0 are estimated using the data The resulting estimated line is the one that “best fits the data” Example: Arm Circumference and Height Data on anthropomorphic measures from a random sample of 150 Nepali children up to 12 months old What is the relationship between average arm circumference and height? Data: Arm circumference: x 12.4cm; s 1.5cm;min 7.3cm;max 15.6cm Height: y 61.6cm; s 6.3cm;min 40.9cm;max 73.3cm Approach 1: t-Test Dichotomize height at median, compare mean arm circumference with t-test and 95% CI Approach 1: t-Test Potential advantages Gives a single summary measure for quantifying the arm circumference/height association (a sample mean difference) Potential disadvantages Throws away a lot of valuable information in the height data that was originally continuous Only allows for a single comparison between two crudely (and arbitrarily?) defined height categories Approach 2: ANOVA Categorize height into four categories by quartile, compare mean arm circumferences with ANOVA and 95% CIs Approach 2: ANOVA Potential advantages: Uses a less crude categorization of height than the previous example Potential disadvantages: Still throws away a lot of information in the height data that was originally measured as continuous Requires multiple summary measures to quantify arm circumference/height relationship Does not exploit the structure we see in the previous boxplot-as height increases so does arm circumference Approach 3: Linear Regression Treat height as continuous when estimating the relationship Linear regression is a potential option--it allows us to associate a continuous outcome with a continuous predictor via a line The line estimates the mean value of the outcome for each continuous value of height in the sample used Makes a lot of sense, but only if a line reasonably describes the relationship Visualizing the Relationship Scatterplot Visualizing the Relationship Does a line reasonably describe the general shape of the relationship? We can estimate a line using a statistical software package The line we estimate will be of the form: yˆ 0 1x Here, yˆ is the average arm circumference for a group of children all of the same height, x Arm Circumference and Height yˆ 2.7 0.16 x Here, yˆ is the estimated average arm circumference, x = height, ˆ0 2.7 and ˆ1 0.16 This is the estimated line from the sample of 150 Nepali children Arm Circumference and Height Arm Circumference and Height Arm Circumference and Height Arm Circumference and Height How do we interpret the estimated slope? The average change in arm circumference for a one-unit (1 cm) increase in height The mean difference in arm circumference for two groups of children who differ by one unit (1 cm) in height These results estimate that the mean difference in arm circumferences for a one centimeter difference in height is 0.16 cm, with taller children having greater average arm circumference Arm Circumference and Height This mean difference estimate is constant across the entire height range in the sample Arm Circumference and Height What is the estimated mean difference in arm circumference for: 60 versus 59 cm? 25 versus 24 cm? 72 versus 71 cm? Answer: 0.16 cm Arm Circumference and Height What is the estimated mean difference in arm circumference for children 60 cm versus 50 cm tall? Arm Circumference and Height What is the estimated mean difference in arm circumference for: 90 versus 89 cm? 34 versus 33 cm? 110 versus 109 cm? Answer: We don’t know! Arm Circumference and Height Our regression results only apply to the range of observed data Arm Circumference and Height How do we interpret the estimated intercept? The estimated y when x = 0--the estimated mean arm circumference for children 0 cm tall Does this make sense given our sample? Frequently, the scientific interpretation of the intercept is meaningless It is necessary for fully specifying the equation of a line Arm Circumference and Height X = 0 isn’t even on the graph Notes Linear regression performed with a single predictor (one x) is called simple linear regression Linear regression with more than one predictor is called multiple linear regression Example: Arm Circumference and Gender Data on anthropomorphic measures from a random sample of 150 Nepali children up to 12 months old What is the relationship between average arm circumference and sex of a child? Visualizing the Relationship Scatterplot Display Visualizing the Relationship Boxplot display Arm Circumference and Gender Here, y is arm circumference (continuous) and x is gender (binary) How do we handle gender as a predictor in regression? One possibility is to let x = 0 for male children and x = 1 for female children How would we interpret the regression coefficients? yˆ 0 1x Arm Circumference and Gender The resulting equation is yˆ 12.5 - 0.13x ˆ1 -0.13 --the estimated mean difference in arm circumference for female children compared to male children is -0.13 cm; female children have lower arm circumference by 0.13 cm on average ˆ0 12.5 --the mean arm circumference for male children is 12.5 cm Visualizing the Relationship Estimating the Regression Equation How do we estimate the regression equation? There must be some algorithm that will always yield the same results for the same data Estimating the Regression Equation The algorithm to estimate the equation of the line is called the least squares estimation The idea is to find the line that gets closest to all of the points in the sample How do we define “closeness” to multiple points? In regression, it is the cumulative squared distance between each point’s y-value and the corresponding value of yˆ for x Estimating the Regression Equation Each distance is computed for each data point in the sample Estimating the Regression Equation ˆ0 , ˆ1 are the values that minimize the The values chosen for cumulative distances squared: n ˆ0 ˆ1xi min yi - i 1 2 Estimating the Regression Equation The values are just estimates based on a single sample If you were to have a different random sample of 150 Nepal children from the same population of < 12 month olds, the resulting estimate would likely be different (i.e., the values that minimized the cumulative squared distance from this second sample of points would likely different) As such, all regression coefficients have an associated standard error that can be used to make statements about the true relationship between mean y and x based on a single sample Arm Circumference and Height The estimated regression equation relating arm circumference to height using a random samples of 150 Nepali children less than 12 months old was yˆ 2.7 0.16 x ˆ 0.88 SE ˆ1 0.16 SE ˆ1 0.014 ˆ0 2.70 0 Arm Circumference and Height The random sampling behavior of estimated regression coefficients is approximately normal for large samples, centered at the true, unknown population values We can use the same ideas to create 95% CIs and get p-values Arm Circumference and Height The estimated regression equation relating arm circumference to height using a random sample of 150 Nepali children < 12 months old was yˆ 2.7 0.16 x ˆ1 0.16 SE ˆ1 0.014 The 95% CI for β1 ˆ1 1.96 SE ˆ1 0.16 1.96 0.014 0.13,0.19 Arm Circumference and Height P-value for testing the hypotheses: HO: β1 = 0 HA: β1 ≠ 0 Assume the null is true and calculate the standardized “distance” ˆ1 from 0 of t ˆ1 - 0 ˆ1 SE ˆ1 ˆ1 SE 0.16 11.4 0.014 The p-value is the probability of being 11.4 or more standard errors away from a mean of 0 on a normal curve P < 0.001 Summarizing Findings: Circumference/Height This research used simple linear regression to estimate the magnitude of the association between arm circumference and height in Nepali children less than 12 months old, using data on a random sample of 150 A statistically significant positive association was found (p < 0.001) The results estimate that two groups of such children who differ by 1 cm in height will differ on average by 0.16 cm in arm circumference (95% CI 0.13 cm to 0.19 cm) In Excel “SLOPE” returns the estimate of the slope In Excel “INTERCEPT” returns the estimate of the intercept Arm Circumference and Height Estimate and 95% CI for the mean difference in arm circumference for children 60 cm tall compared to children 50 cm tall 60 - 50 ˆ1 10 1.6 1.6cm What about the standard error? ˆ1 10 SE ˆ1 10 0.014 0.14 SE 10 95% CI: ˆ1 1.96 SE 10 ˆ1 10 1.6 1.96 0.14 Notes For smaller samples, a slight change analogous to what we did with means is required The sampling distribution of the regression coefficients is a Student’s t distribution with n - 2 degrees of freedom, and approaches the standard normal distribution as the size of the sample increases 95% CI for β1: ˆ1 t.95,n-2 SE ˆ1 Arm Circumference Data Modified P-value for testing the hypotheses: HO: β1 = 0 HA: β1 ≠ 0 Suppose instead of 150 children, we have sampled only 21 Assume the null is true and calculate the standardized “distance” ˆ1 from 0 of t ˆ1 - 0 ˆ1 SE ˆ1 ˆ1 SE 0.16 11.4 0.014 The p-value is the probability of being 11.4 or more standard errors away from a mean of 0 on a t(19) distribution P < 0.001 Intercept? All the previous examples have confidence intervals for the slope (or multiples of the slope) We can also create CI/p-values for the intercept in the same manner However, many times the intercept is just a placeholder and does not describe a useful quantity Comparing Proportions Between Two Independent Populations In This Section CIs for difference in proportions between two independent populations Large sample methods for comparing proportions Normal approximation method Chi-square test Fisher’s Exact Test Relative Risk Comparing Two Proportions Pediatric AIDS Clinical Trial Group (ACTG) Protocol 076 Study Group1 Study Design “We conducted a randomized, double-blinded, placebocontrolled trial of the efficacy and safety of zidovudine (AZT) in reducing the risk of maternal-infant HIV transmission” 363 HIV infected pregnant women were randomized to AZT or placebo 1Conner, E., et al. (1994). Reduction of maternal-infant transmission of human immunodeficiency virus type 1 with zidovudine treatment, NEJM, 331:18 Comparing Two Proportions Results Of the 180 women randomized to AZT, 13 gave birth to children who tested positive for HIV within 18 months of birth Of the 183 women randomized to placebo, 40 gave birth to children who tested positive for HIV within 18 months of birth Notes Random assignment of treatment Helps insure two groups are comparable Patient and physician could not request a particular treatment Double-blind Patient and physician did not know treatment assignment Observed HIV Transmission Rates AZT ˆAZT p 13 0.07 7% 180 PLA ˆPLA p 40 0.22 22% 183 Notes Is the difference significant or can it be explained by chance? CI on the difference in proportions? P-value? Sampling Distribution of Difference in Sample Proportions Since we have large samples, we can be assured that the sampling distributions of the sample proportions in both groups are approximately normal (CLT) It turns out the difference of quantities which are approximately normally distributed are also normally distributed Sampling Distribution of Difference in Sample Proportions The sampling distribution of the difference of two sample proportions (based on large samples) approximates a normal distribution This distribution is centered at the true, unknown population difference p1-p2 AZT Group Placebo Group AZT-PLA 95% CIs for Difference in Proportions General formula: Best estimate ± multiplier*SE(best estimate) The best estimate of a population difference in sample proportions is: ˆ1 - p ˆ2 p ˆ1may represent the sample proportion of infants HIV Here, p ˆ2 may represent the positive for 180 infants in the AZT group and p same for the 183 infants in the placebo group 95% CI: AZT Study ˆ1 - p ˆ2 multiplierSE p ˆ1 - p ˆ2 p Statisticians have developed formulas for the standard error of the difference These formulas depend on both sample sizes and sample proportions SE pˆ1 - pˆ2 SE pˆ1 - pˆ2 p1 1 - p1 n1 pˆ1 1 - pˆ1 n1 p2 1 - p2 n2 pˆ2 1 - pˆ2 n2 HIV/AZT Study Recall the data: Group AZT PLA N 180 183 Sample proportion 0.07 0.22 ˆ1 - p ˆ2 SE p 0.07 0.93 180 0.22 0.78 183 0.36 95% CI: AZT Study The 95% CI for the true difference in proportions between the AZT group and PLA groups is: -0.15 1.96 SE pˆ1 - pˆ2 -0.15 1.96 0.036 -0.22,0.08 Summary Results The proportion of infants who tested positive for HIV within 18 months of birth was seven percent (95% CI 4 -12%) in the AZT group and twenty-two percent in the placebo group (95% CI 16 28%) The study results estimate the absolute decrease in the proportion of HIV positive infants born to HIV positive mothers associated with AZT to be as low as 8% and as high as 22% Two-sample z-test: Getting a p-value Are the proportions of infants contracting HIV within 18 months of birth equivalent at the population level for those whose mothers are treated with AZT versus untreated? HO: p1 = p2 HA: p1 ≠ p2 In other words, is the expected difference in proportions zero? HO: p1 - p2 = 0 HA: p1 - p2 ≠ 0 Hypothesis Test to Compare Two Independent Proportions Recipe: Assume HO is true Measure the distance of our sample result from pO (here, it’s 0) Compare the distance (test statistic) to the appropriate distribution to get the p-value observed difference - null difference z SE observed difference ˆ1 - p ˆ2 p ˆ1 1 - p ˆ1 p n1 ˆ2 1 - p ˆ2 p n2 HIV/AZT Study Recall, ˆ1 - p ˆ2 -0.15 p ˆ1 - p ˆ2 0.036 SE p So in this study, z -0.15 -4.2 0.36 This result was 4.2 SE below the null mean of 0 P-values Is a result 4.2 standard errors below 0 unusual? It depends on the distribution we’re dealing with The p-value is the probability of getting a test statistic as or more extreme than what we observed (-4.2) by chance The p-value comes from the sampling distribution of the difference in two sample proportions What is the sampling distribution of the difference in sample proportions? If both groups are large, it is approximately normal It is centered at the true difference Under the null, this true difference is 0 HIV/AZT Study To compute a p-value, we need to compute the probability of being 4.2 or more SEs away from 0 on a standard normal curve HIV/AZT Study If we were to look this up in a normal table, we would find a very small p (p < 0.001) This method is also essentially equivalent to the chi-square (χ2) method Gives about the same answer Will discuss more detail later Displaying 2X2 Data Data of this sort can be displayed using a contingency table The Chi-Square Test Testing equality of two population proportions using data from two samples HO: p1 = p2 HA: p1 ≠ p2 In other words, is the expected difference in proportions zero? HO: p1 - p2 = 0 HA: p1 - p2 ≠ 0 In the context of a 2X2 table, this is testing whether there is a relationship between the row variable (HIV status) and the column variable (treatment type) Chi-Square Test Pearson’s Chi-Square Test (χ2) can easily be done by hand Works well for “big” sample sizes--it is an approximate method Gives essentially the same p as the z test for comparing two proportions Unlike the z-test, it can be extended to compare proportions between more than two independent groups in one test The Chi-Square Method Looks at discrepancies between observed and expected cell counts (under the null hypothesis) in a 2X2 table O = observed E = expected = (row total*column total)/grand total “Expected” refers to the values for the cell counts that would be expected if the null hypothesis is true (the expected cell proportions if the underlying population proportions are equal) The Chi-Square Method Recipe . . . Start by assuming the null hypothesis is true Measure the distance of the sample result from the null value Compare the test statistic (distance) to the appropriate distribution to get the p-value 2 O - E 2 E The sampling distribution of this statistic when the null is true is a chi-square distribution with one degree of freedom Chi-Square (1) Contingency Table The observed value for cell one is 13 We have to calculate the expected count: RC 53 180 E 26.3 N 363 Expected Values We could do the same for the other three cells: Now we must compute the ‘distance’ (test statistic), χ2 2 O - E E 2 Expected Values 2 O - E 2 E 13 - 26.3 26.3 15.6 2 40 - 26.7 26.7 2 167 - 153.7 153.7 2 143 - 156.3 156.3 2 Sampling Distribution P = 0.0001 Extending Chi-Square Test The chi-square test can be extended to test for differences in proportions across more than two independent populations Analogous to ANOVA with binary outcomes Extending Chi-Square Test Example: Health care indicators by immigrant status1 1Huang, Z, et al. (2006). Health status and health service access and use among children in US immigrant families, Am Jorn PH 96:4. Extending Chi-Square Test Extending Chi-Square Test Extending Chi-Square Test Next Time More on Proportions Fisher’s Exact Test Measures of Association: risk difference, relative risk, odds ratio Survival Analysis Study Design Considerations References and Citations Lectures modified from notes provided by John McGready and Johns Hopkins Bloomberg School of Public Health accessible from the World Wide Web: http://ocw.jhsph.edu/courses/introbiostats/schedule.cfm