Introduction to sample size and power calculations How much chance do we have to reject the null hypothesis when the alternative is in fact true? (what’s the probability of detecting a real effect?) Can we quantify how much power we have for given sample sizes? study 1: 263 cases, 1241 controls Null Distribution: difference=0. Rejection region. Any value >= 6.5 (0+3.3*1.96) For 5% significance level, one-tail area=2.5% (Z/2 = 1.96) Power= chance of being in the Clinically relevant rejection region if the alternative alternative: is true=area to the right of this difference=10%. line (in yellow) study 1: 263 cases, 1241 controls Rejection region. Any value >= 6.5 (0+3.3*1.96) Power here: 6.5 10 P( Z > )= 3.3 P( Z > 1.06) = 85% Power= chance of being in the rejection region if the alternative is true=area to the right of this line (in yellow) study 1: 50 cases, 50 controls Critical value= 0+10*1.96=20 Z/2=1.96 2.5% area Power closer to 15% now. Study 2: 18 treated, 72 controls, STD DEV = 2 Critical value= 0+0.52*1.96 = 1 Clinically relevant alternative: difference=4 points Power is nearly 100%! Study 2: 18 treated, 72 controls, STD DEV=10 Critical value= 0+2.58*1.96 = 5 Power is about 40% Study 2: 18 treated, 72 controls, effect size=1.0 Critical value= 0+0.52*1.96 = 1 Power is about 50% Clinically relevant alternative: difference=1 point Factors Affecting Power 1. 2. 3. 4. Size of the effect Standard deviation of the characteristic Bigger sample size Significance level desired 1. Bigger difference from the null mean Null Clinically relevant alternative average weight from samples of 100 2. Bigger standard deviation average weight from samples of 100 3. Bigger Sample Size average weight from samples of 100 4. Higher significance level Rejection region. average weight from samples of 100 Sample size calculations Based on these elements, you can write a formal mathematical equation that relates power, sample size, effect size, standard deviation, and significance level… **WE WILL DERIVE THESE FORMULAS FORMALLY SHORTLY** Simple formula for difference in means Represents the desired power (typically .84 for 80% power). Sample size in each group (assumes equal sized groups) 2 ( Z Z/2 ) 2 n Standard deviation of the outcome variable difference Effect Size (the difference in means) 2 2 Represents the desired level of statistical significance (typically 1.96). Simple formula for difference in proportions Represents the desired power (typically .84 for 80% power). Sample size in each group (assumes equal sized groups) n 2( p )(1 p )( Z Z/2 ) A measure of variability (similar to standard deviation) (p1 p2 ) Effect Size (the difference in proportions) 2 2 Represents the desired level of statistical significance (typically 1.96). Derivation of sample size formula…. Study 2: 18 treated, 72 controls, effect size=1.0 Critical value= 0+.52*1.96=1 Power close to 50% SAMPLE SIZE AND POWER FORMULAS Critical value= 0+standard error (difference)*Z/2 Power= area to right of Z= Z critical value - alternativ e difference (here 1) standard error (diff) e.g. here :Z 0 ; power 50% standard error (diff) Power= area to right of Z= Z critical value - alternativ e difference standard error (diff) Z/2 * standard error (diff) - difference Z standard error(diff ) Power is the area to the right of Z . OR difference power is the area to the left of - Z . Z Z/2 standard error(diff ) Since normal charts give us the area to the left by convention, we need to use - Z to get the correct value. Most difference Z Z/2 textbooks just call this “Z ”; I’ll use standard error(diff ) the term Z to avoid confusion. power Z power Z the area to the left of Z power the area to the right of Z All-purpose power formula… Z power difference Z / 2 standard error(difference) Derivation of a sample size formula… s.e.(diff ) 2 n1 2 Sample size is embedded in the standard error…. n2 if ratio r of group 2 to group 1 : s.e.(diff ) 2 n1 2 rn1 Algebra… Z power difference 2 n1 Z power 2 rn1 difference (r 1) 2 rn1 ( Z power Z/2 ) ( 2 Z/2 Z/2 difference (r 1) rn1 2 )2 (r 1) ( Z power Z/2 ) rn1difference 2 rn1difference (r 1) ( Z power Z/2 ) 2 2 2 2 n1 2 (r 1) 2 ( Z power Z/2 ) 2 rdifference 2 (r 1) ( Z power Z/2 ) n1 2 r difference 2 If r 1 (equal groups), then n1 2 2 2 ( Z power Z/2 ) 2 difference 2 Sample size formula for difference in means (r 1) ( Z power Z/2 ) n1 2 r difference 2 2 where : n 1 size of smaller group r ratio of larger group to smaller group standard deviation of the characteristic diffference clinically meaningful difference in means of the outcome Z power corresponds to power (.84 80% power) Z / 2 corresponds to two - tailed significan ce level (1.96 for .05) Examples Example 1: You want to calculate how much power you will have to see a difference of 3.0 IQ points between two groups: 30 male doctors and 30 female doctors. If you expect the standard deviation to be about 10 on an IQ test for both groups, then the standard error for the difference will be about: 10 2 10 2 = 2.57 30 30 Power formula… Z power Z Z power d* Z / 2 (d *) d* 2 2 n Z / 2 d* n Z / 2 2 d* 3 d* Z / 2 1.96 .79 or ZZpower (d *) 2.57 n 3 Z / 2 2 10 30 1.96 .79 2 P(Z≤ -.79) =.21; only 21% power to see a difference of 3 IQ points. Example 2: How many people would you need to sample in each group to achieve power of 80% (corresponds to Z=.84) n 2 2 ( Z Z / 2 ) 2 (d *) 2 100 (2)(.84 1.96) 2 174 2 (3) 174/group; 348 altogether Sample Size needed for comparing two proportions: Example: I am going to run a case-control study to determine if pancreatic cancer is linked to drinking coffee. If I want 80% power to detect a 10% difference in the proportion of coffee drinkers among cases vs. controls (if coffee drinking and pancreatic cancer are linked, we would expect that a higher proportion of cases would be coffee drinkers than controls), how many cases and controls should I sample? About half the population drinks coffee. Derivation of a sample size formula: The standard error of the difference of two proportions is: p (1 p ) p (1 p ) n1 n2 Derivation of a sample size formula: Here, if we assume equal sample size and that, under the null hypothesis proportions of coffee drinkers is .5 in both cases and controls, then s.e.(diff)= .5(1 .5) .5(1 .5) .5 / n n n Z power test statistic Z / 2 s.e.(test statistic) Z power = .10 .5 / n 1.96 For 80% power… .84 .10 1.96 .5 / n .84 1.96 .10 .5 / n There is 80% area to the left of a Z-score of .84 on a standard normal curve; therefore, there is 80% area to the right of -.84. 2 . 10 n (.84 1.96) 2 .5 .5(.84 1.96) 2 n 392 2 .10 Would take 392 cases and 392 controls to have 80% power! Total=784 Question 2: How many total cases and controls would I have to sample to get 80% power for the same study, if I sample 2 controls for every case? Ask yourself, what changes here? Z power test statistic Z / 2 s.e.(teststatistic) p(1 p) p(1 p) .25 .25 .25 .5 .75 .75 2n n 2n n 2n 2n 2n 2n Different size groups… .84 .10 1.96 .75 / 2n .10 .84 1.96 .75 / 2n (.10 2 ) 2n (.84 1.96) .75 .75(.84 1.96) 2 n 294 2 ( 2).10 2 Need: 294 cases and 2x294=588 controls. 882 total. Note: you get the best power for the lowest sample size if you keep both groups equal (882 > 784). You would only want to make groups unequal if there was an obvious difference in the cost or ease of collecting data on one group. E.g., cases of pancreatic cancer are rare and take time to find. General sample size formula s.e.(diff ) p(1 p) p(1 p) rn n p(1 p) rp(1 p) (r 1) p(1 p) rn rn rn 2 r 1 p(1 p )(Z power Z / 2 ) n r ( p1 p 2 ) 2 General sample size needs when outcome is binary: 2 p ( 1 p )( Z Z ) r 1 /2 n 2 r ( p1 p2 ) where : n size of smaller group r ratio of larger group2 to smaller group 2 ( Z power Z / 2 ) 2 p1 p2 clinically n meaningful difference in proportion s of the outcome 2 Z corresponds to power (.84 80% (diff ) power) Z / 2 corresponds to two - tailed significan ce level (1.96 for .05) Compare with when outcome is continuous: (r 1) ( Z Z/2 ) n1 2 r difference 2 2 where : n1 size of smaller group r ratio of larger group to smaller group standard deviation of the characteristic diffference clinically meaningful difference in means of the outcome Z corresponds to power (.84 80% power) Z / 2 corresponds to two - tailed significan ce level (1.96 for .05) Question How many subjects would we need to sample to have 80% power to detect an average increase in MCAT biology score of 1 point, if the average change without instruction (just due to chance) is plus or minus 3 points (=standard deviation of change)? Standard error here= change n 3 n Z power Z power test statistic Z / 2 s.e.(test statistic) D D Z / 2 Where D=change from test 1 to test 2. (difference) n ( Z power Z / 2 ) 2 D 2 D ( Z power Z / 2 ) 2 n nD 2 D2 Therefore, need: (9)(1.96+.84)2/1 = 70 people total 2 Sample size for paired data: d ( Z Z/2 ) 2 n difference 2 2 where : n sample size standard deviation of the within - pair difference diffference clinically meaningful difference Z corresponds to power (.84 80% power) Z / 2 corresponds to two - tailed significan ce level (1.96 for .05) Paired data difference in proportion: sample size: n p(1 p)(Z Z / 2 ) 2 ( p1 p2 ) 2 where : n sample size for 1 group 2meaningful ( Z powerdifference Z / in2 )dependent proportion s p1 p2 clinically 2 2 n s to power (.84 80% Z correspond 2 power) (diff ) Z / 2 corresponds to two - tailed significan ce level (1.96 for .05)