Handy Reference II HANDY REFERENCE SHEET 2 – HRP 259 Calculation Formula’s for Sample Data: Univariate: n i 1 Sample proportion: pˆ 1 if success xi 0 if failure n n x Sample mean: x = i i 1 n n Sum of squares of x: SS x ( xi x ) 2 [to ease computation: SS x i 1 n (x SS x Sample variance: s x2 = = n 1 i n x 2 i nx 2 ] i 1 x)2 i 1 n 1 n SS x Sample standard deviation: s x = = n 1 (x x)2 i i 1 n 1 n (x x)2 i i 1 sx Standard error of the sample mean: n 1 n = n 2. Bivariate n Sum of squares of xy: SS xy ( xi x )( y i y ) [to ease computation: SS xy i 1 n Sample Covariance: 2 s xy = SS xy n 1 (x = i Sample Correlation: rˆ s x2 s 2y = x y i i nx y ] i 1 x )( y i y ) i 1 n 1 n 2 s xy n SS xy SS x SS y (x i x )( y i y ) i 1 n i 1 ( xi x ) 2 n (y i y) 2 i 1 Variance rules for correlated random variables: Var (x+y)=Var(x)+Var(y)+2Cov(x,y); Var (x-y)=Var(x)+Var(y)-2Cov(x,y) vii Handy Reference II Hypothesis Testing The Steps: 1. Define your hypotheses (null, alternative) 2. Specify your null distribution 3. Do an experiment 4. Calculate the p-value of what you observed 5. Reject or fail to reject (~accept) the null hypothesis The Errors Your Statistical Decision Reject H0 Do not reject H0 True state of null hypothesis H0 True H0 False Type I error () Correct Correct Type II Error ( ) Power=1- viii Handy Reference II Confidence intervals (estimation) For a mean (σ2 unknown): x t n 1, / 2 sx [if variance known or large sample size t df , / 2 n Z / 2 ] For a paired difference (σ2 unknown): d t n 1, / 2 sd [where n di = the within-pair difference] For a difference in means, 2 independent samples (σ2’s unknown but roughly equal): ( x y ) t n 2, / 2 s 2p nx s 2p ny s 2p = SS x SS y or n2 (n x 1) s x2 (n y 1) s 2y n2 For a proportion: ( pˆ )(1 pˆ ) n pˆ Z / 2 For a difference in proportions, 2 independent samples: ( pˆ 1 pˆ 2 ) Z / 2 ( pˆ 1 )(1 pˆ 1 ) ( pˆ 2 )(1 pˆ 2 ) n1 n2 For a correlation coefficient rˆ t n 2, / 2 * 1 rˆ 2 n2 For a regression coefficient: n ˆ t n 2, / 2 s2 * SS x Common values of t and Z Confidence t10, / 2 level 90% 1.81 95% 2.23 99% 3.17 [ ˆ SS xy SS x ;s2 (y i yˆ i ) 2 i 1 n2 t 20, / 2 t 30, / 2 t 50, / 2 t100, / 2 Z / 2 1.73 2.09 2.85 1.70 2.04 2.75 1.68 2.01 2.68 1.66 1.98 2.63 1.64 1.96 2.58 ] ix Handy Reference II For an odds ratio: 95% confidence limits: OR * exp 1 1 1 1 1.96 a b c d , OR * exp 1 1 1 1 1.96 a b c d For a risk ratio: 95% confidence limits: RR * exp 1 a /( a b ) 1c /(c d ) 1.96 a c , RR * exp 1 a /(a b ) 1c /(c d ) 1.96 a c x Handy Reference II Corresponding hypothesis tests Test for Ho: μ= μo (σ2 unknown): t n 1 x 0 sx n Test for Ho: μd = 0 (σ2 unknown): t n 1 d 0 sd n Test for Ho: μx- μy = 0 (σ2 unknown, but roughly equal): t n2 (x y) 0 s 2p nx s 2p ny Test for Ho: p = po: Z pˆ p 0 ( p 0 )(1 p 0 ) n Test for Ho: p1- p2= 0: Z ( pˆ 1 pˆ 2 ) 0 ( p )(1 p ) ( p )(1 p ) n1 n2 ;p n1 pˆ 1 n2 pˆ 2 n1 n2 Test for Ho: r = 0: t n2 rˆ 0 1 rˆ 2 n2 Test for: Ho: β = 0 t n2 ˆ 0 s2 SS x xi Handy Reference II Corresponding sample size/power Sample size required to test Ho: μd = 0 (paired difference ttest): n d2 ( Z power Z / 2 ) 2 d 2 Corresponding power for a given n: Z power d d n Z / 2 Smaller group sample size required to test Ho: μx – μy = 0 (two sample ttest): (where r=ratio of larger group to smaller group) n smaller 2 2 (r 1) ( Z power Z / 2 ) r ( x y ) 2 Corresponding power for a given n: Z power x y nr Z / 2 r 1 Smaller group sample size required to test Ho: p1 – p2 = 0 (difference in two proportions): (where r=ratio of larger group to smaller group) 2 (r 1) p (1 p )(Z power Z / 2 ) n smaller r ( p1 p 2 ) 2 Corresponding power for a given n: Z power p1 p 2 p (1 p ) nr Z / 2 r 1 Sample size required to test Ho: r = 0 (correlation/equivalent to simple linear regression): (where r=ratio of larger group to smaller group) n (1 r ) 2 ( Z power Z / 2 ) 2 r2 2 Corresponding power for a given n: Z power r 1 r2 Common values of Zpower Zpower: .25 Power: 60% n 2 Z / 2 .52 70% .84 80% 1.28 90% 1.64 95% 2.33 99% xii Handy Reference II Linear regression Source of variation Model Sum of squares d.f. k-1 SSM n (y i 1 (k levels of X) Error k N-k SSE y) 2 SSM F-statistic SSM k 1 SSE N (y i i Mean Sum of Squares yˆ i ) 2 ij s 2 SSE p-value Go to k 1 Fk-1,N-k chart N k N k j 1 Total variation N-1 n TSS= SS y ( y i y ) 2 i 1 Assumptions of Linear Regression Linear regression assumes that… 1. The relationship between X and Y is linear 2. Y is distributed normally at each value of X 3. The variance of Y at every value of X is the same (homogeneity of variances) ANOVA TABLE Source of variation Between d.f. k-1 Sum of squares SSB n (y i y)2 i 1 (k groups) Within k nk-k SSW k Mean Sum of Squares SSB k 1 F-statistic SSB SSW n ( y ij y i ) 2 s 2 SSW k 1 nk k p-value Go to Fk-1,nk-k chart nk k i 1 j 1 Total variation nk-1 k n TSS= SS y ( y ij y ) 2 i 1 j 1 Coefficient of Determination: r 2 R 2 variation explained by the predictor SSB 1 SSW = total variation in the outcome TSS TSS xiii Handy Reference II ANOVA TABLE FOR linear regression (more general) case Coefficient of Determination: r 2 R 2 variation explained by the predictor total variation in the outcome SSM 1 SSE TSS TSS xiv Handy Reference II Probability distributions often used in statistics: T-distribution Given n independent observations x i , t x s/ n The Chi-Square Distribution n n Z 2 ; where Z~ Normal(0,1) i 1 E(χn) = n Var(χn) = 2n The F- Distribution n Fn,m= m n m xv Handy Reference II Summary of common statistical tests for epidemiology/clinical research: Choice of appropriate statistical test or measure of association for various types of data by study design. Types of variables to be analyzed Predictor (independent) variable/s Outcome (dependent) variable Statistical procedure or measure of association Cross-sectional/case-control studies Binary Continuous T-test* Categorical Continuous ANOVA* Continuous Continuous Simple linear regression Multivariate (categorical and continuous) Continuous Multiple linear regression Categorical Categorical Chi-square test§ Binary Binary Odds ratio, Mantel-Haenszel OR Multivariate (categorical and continuous) Binary Logistic regression Cohort Studies/Clinical Trials Binary Binary Relative risk Categorical Time-to-event Kaplan-Meier curve/ log-rank test Multivariate (categorical and continuous) Time-to-event Cox-proportional hazards model Categorical Continuous—repeated Repeated-measures ANOVA Multivariate (categorical and continuous) Continuous—repeated Mixed models for repeated measures *Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small. § Fisher’s exact test is used when the expected cells contain less than 5 subjects. 16 Handy Reference II Course coverage in the HRP statistics sequence: Choice of appropriate statistical test or measure of association for various types of data by study design. Types of variables to be analyzed Predictor (independent) variable/s Outcome (dependent) variable Statistical procedure or measure of association Cross-sectional/case-control studies Binary Continuous T-test* Categorical Continuous ANOVA* Continuous Continuous Simple linear regression Multivariate (categorical and continuous) Continuous Multiple linear regression Categorical Categorical Chi-square test§ Binary Binary Odds ratio, Mantel-Haenszel OR Multivariate (categorical and continuous) Binary HRP259 Logistic regression HRP261 Cohort Studies/Clinical Trials Binary Binary Risk ratio Categorical Time-to-event Kaplan-Meier curve/ log-rank test Multivariate (categorical and continuous) Time-to-event Cox-proportional hazards model (hazard ratios) Categorical Continuous—repeated Repeated-measures ANOVA Multivariate (categorical and continuous) Continuous—repeated Mixed models for repeated measures HRP262 *Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small. § Fisher’s exact test is used when the expected cells contain less than 5 subjects. 17 Handy Reference II Corresponding SAS PROCs: Choice of appropriate statistical test or measure of association for various types of data by study design. Types of variables to be analyzed Statistical procedure or measure of association Predictor SAS PROC Outcome Cross-sectional/case-control studies Binary Continuous T-test* PROC TTEST Categorical Continuous ANOVA* PROC ANOVA Continuous Continuous Simple linear regression PROC REG Multivariate (categorical /continuous) Continuous Categorical Categorical Chi-square test§ PROC FREQ Binary Binary Odds ratio, Mantel-Haenszel OR PROC FREQ Multivariate (categorical/ continuous) Binary Logistic regression PROC LOGISTIC Multiple linear regression PROC GLM Cohort Studies/Clinical Trials Binary Binary Risk ratio PROC FREQ Categorical Time-to-event Kaplan-Meier curve/ log-rank test PROC LIFETEST Cox-proportional hazards model (hazard ratios) PROC PHREG Multivariate (categorical and Time-to-event continuous) Categorical Continuous— repeated Multivariate Continuous— (categorical and repeated continuous) Repeated-measures ANOVA PROC GLM Mixed models for repeated measures PROC MIXED *Non-parametric equivalents: PROC NPAR1WAY; §Fisher’s exact test: PROC FREQ, option: exact 18