STATISTICS Analysis Of Variance Review Preview ANOVA F test One-way ANOVA Multiple comparison Two-way ANOVA 1 Review STATISTICS Standard normal distribution Z value: (Observed - Expected) in terms of UNITS of SD x ~ N ( , 2 ) Z x x 2 Review STATISTICS Central Limit Theorem For large n, x ~ N ( , / n) 2 The beauty of CLT: Easy to calculate V The ugliness of CLT: Hard to explain p X N ( , x~ ) 2 3 Review STATISTICS Sampling Distribution of x1 ~ N (1, / n1 ) 2 1 ( x1 x2 ) x2 ~ N(2 , 22 / n2 ) 1 2 x 1 x 2 ~ N ( 1 2 , ( x x 1 12 n1 22 n2 ) (2x x 2) 1 1 2 x1 x 2 2) 4 Review STATISTICS Population & Sampling Distribution Population parameters known SD Mean x x i N x xi x N (x Z score i )2 z N SEx Population parameters unknown n Zx X ( xi ) n Please add yourself: ( x1 x2 ) Mean x x i n xi x n SD S (x SEx t score i x) 2 n 1 S n t tx xx S ( xi ) S n 5 Review Flowchart of 2G MD test STATISTICS 1-s t 1 g ro u p N > 30 1-s t ND 1. Tra n s F fo r t 2. s ig n te s t No o f g ro u p s 2-s t 2-s t N > 30 Eq u a l va ria n c e 2-s t Eq u a l N ND In d e p e n d e n t 2 g ro u p 1. tra n s fo rm fo r t 2. WRS te s t 1. Tra n s F fo r t 2. WRS te s t P a ire d t N > 30 P a ire d t ND If Ye s , g o u p ; If No , d o d o wn 1. tra n s fo rm fo r t 2. WS R te s t STATISTICS ANOVA Analysis of Variance 7 ANOVA STATISTICS Analysis of Variance The logic of ANOVA Partition of sum of squares F test One way ANOVA Multiple comparison Two way ANOVA Interaction and confounding 8 STATISTICS ANOVA Eyeball test for 3-sample means 1 2 3 1 2 3 Using 95% Confidence Limits A: Non-Significant B: Significant Why? Between group variation Within group variation Why not do 2-s test 3 times? Alpha error inflated Ex: 7 groups MD comparisons 1 / 21 < 0.05 !! A B 9 STATISTICS ANOVA Data sheet: k groups MD comparison Subjects Observed 1 X1 2 X2 3 X3 4 X4 5 X5 … … … … n Xn Tx A Group Mean Grand Mean Ma Group Effect Ma-M M Total Tx error Difference X1-Ma X1-M X2-Ma X2-M X3-Ma X3-M Mb-M B Mb … … … K Mk Mk-M 10 STATISTICS ANOVA The Logic of one-way ANOVA Total Difference divided into two parts (Observed-group mean)+(group mean-grand mean) X ij X ( X ij X . j ) ( X . j X ) Total sum of squares divided into two parts SS Total = SS Between + SS Within (or Error) SST = ( X j i ij SSB + SSE X ) 2 [( X ij X . j ) ( X . j X )] 2 ( X ij X . j ) 2 ( X . j X ) 2 j i Partition of TD & TSS Model of one-way ANOVA X ij j eij j i j i x x x 11 A B C STATISTICS Assumptions in ANOVA Normal Distribution: Y values in each group Not very important, esp. for large n If not ND and small n: Kruskal-Wallis nonparametric Equal variance: homogeneity If not: data transformation or ask for help Random & independent sample 12 ANOVA STATISTICS F test: variance ratio test Review: F test for equal variance in 2-s t test F test: F=V1/V2 The larger V is divided by the smaller V If two variances are about equal, the ratio is about 1 The critical value of F distribution depends on DFs ANOVA for mean difference, k groups Null hypothesis: 1= 2 = 3=…= k Variance Between / Variance within If F is about to 1, it’s meaningless for grouping 13 STATISTICS ANOVA F test : named after Fisher Characteristics a sickly, poor-eyesighted child The teacher used no paper/pencil to teach him Very strong instinct on geometry Mathematicians take years to prove his formulas Persistence Calculation of ANOVA tables takes Fisher 8 months, 8h/D to finish!! Reference: The lady tasting tea, Salsburg, 2001 「統計,改變了世界」天下,2001 Sir Ronald Aylmer Fisher 1890-1962 14 ANOVA STATISTICS One-way ANOVA 15 STATISTICS ANOVA One-way ANOVA table Source of variation SS DF Mean SS F ratio Between k groups SSB k-1 MSB MSB/MSE Error(within groups) SSE n-k MSE Total SST n-1 F test: MS B MBSS SSB/(k 1) F MS E MESS SSE /(n k ) 16 ANOVA STATISTICS Multiple Comparison Definition: Contrast btw 2 means: 1 2 More than 2 means is OK: [(1 2 )/2] c Compare the overall effect of the drug with that of placebo Contrast Coefficients: add to 0 Orthogonal Two contrasts are orthogonal if they don’t use the same information Ex: (1 2) and (3 4), i.e. the questions asked are INDEPENDENT Types of MC: before or after ANOVA Priori(planned) comparisons post hoc(posteriori) comparisons 17 ANOVA STATISTICS Example 1: one-way ANOVA Research problem: Life events, depressive symptoms, and immune function. Irwin M. Am J Psychiatry, 1987; 144:437-441 Subjects: women whose husbands treated for lung Ca. died of lung Ca. in the preceding 1-6 Months were in good health X: grouping by scores for major life events Measurement: Social Readjustment Rating Scale score Y: immune system function NK cell activity: lytic units 18 STATISTICS Printout Box plot & Error bar plot Error Bar Plot Box Plot 60.0 100.00 54.4 48.9 CELL 75.00 43.3 37.8 50.00 32.2 26.7 25.00 21.1 15.6 10.0 0.00 1 2 3 1 2 3 GROUP 19 STATISTICS Printout ANOVA table Analysis of Variance Table Source Term DF Sum of Squares Mean Square F-Ratio A: GROUP 2 4654.156 2327.078 S(A) 34 9479.396 278.8058 Total (Adjusted) 36 14133.55 Total 37 8.35 Prob Power(Alpha=0.05) 0.001125* 0.947488 20 STATISTICS Printout Nonparametric ANOVA Kruskal-Wallis One-Way ANOVA on Ranks Test Results Method Prob. Level Decision (0.05) DF Chi-Sq (H) Not Corrected for Ties 2 11.16963 0.003754 Reject Ho Corrected for Ties 2 11.17095 0.003752 Reject Ho Group Detail Group Count Sum of Ranks Mean Rank Z-Value Median 1 13 351.00 27.00 3.3087 37 2 12 163.50 13.63 -2.0927 14.5 3 12 188.50 15.71 -1.2815 14.05 21 STATISTICS ANOVA MC: Priori comparisons t test for orthogonal comparisons t statistic: t xi x j 2MS E / n ; not using SDp but MSE DF: (n1+n2j); n=n1=n2 Adjusting downward: / (group number) Ex: 4 comparisons, =0.05/4=0.0125 Bonferroni t procedure Applicable for both orthogonal & non-orthogonal t statistic: Multiplier 2MSE / n Multiplier table: no. of comparisons & DF for MSE Able to find CI for mean difference 22 STATISTICS ANOVA MC: Posteriori comparisons Tukey’s HSD (honestly significant difference) MSE HSD= Multiplier n Like Bonferroni, HSD multiplier table is needed (P176, table 7-7) Able to find CI for mean difference 278.82 Ex: HSD 4.42 21.31 12 24.63 22.17 2.46 LOW n=13 MOD n=12 HIGH n=12 23 STATISTICS ANOVA MC: Posteriori comparisons Scheffé’s procedure S statistic: S ( j 1) F ,df MS E C 2j nj j: No. of groups; C: contrast; (alpha, df1, df2)=(0.01, 2, 34) most versatile (not only pair-wise) & most conservative EX: Low (Moderate & High) combined; Low Moderate C 2j 12 (1) 2 12 (0.5) 2 (0.5) 2 0.125; 0.167 n j 12 12 12 n j 12 12 C 2j S (3 1) 5.31 278.82 0.167 22.24 Note: MD btw L & H not significant Able to find CI for mean difference 24 STATISTICS ANOVA MC: Posteriori comparisons Newman-Keuls procedure MS E NK statistic: m ultiplier 2 Steps 2 Steps n 3 Steps Multiplier table is needed Less conservative than Tukey’s HSD Unable to find CI for mean difference Ex:2 steps NK 3.87 4.82 18.65 ; 3 steps NK 4.42 4.82 21.31 same as HSD 25 ANOVA STATISTICS MC: Posteriori comparisons Dunnett’s procedure 2MS E Dunnett’s statistic: m ultiplier n Only used in several Tx means with single CTL mean Relatively low critical value Ex: D 2.71 6.82 18.48 2 units lower than HSD value; 4 units lower than Scheffé value 26 ANOVA STATISTICS Other posteriori comparisons Duncan’s new multiple-range test Same principle as NK test; but with smaller multiplier Least significant difference, LSD Use t distribution corresponding to the No. of DF for MSE levels are inflated. Proposed by Fisher The above two procedures are NOT recommended by statisticians for medical research. 27 ANOVA STATISTICS Summary of Multiple Comparisons Don’t care about the formulas Which procedure is better? depends on you! Pairwise comparisons: Tukey’s test: the first choice; Newman-Keuls test: second choice Several Txs with single CTL: Dunnett’s is the best Non-pairwise comparisons: Scheffé is the best When larger than 0.05 is OK to you: e.x., drug screening LSD, Duncan’s new multiple-range test are O.K. The above two are not recommended by the authors 28 STATISTICS Printout Multiple comparisons Newman-Keuls Multiple-Comparison Test Group Count Mean Different From Groups 2 12 15.60000 1 3 12 18.05833 1 1 13 40.23077 2, 3 Response: CELL; Term A: GROUP; DF=34; MSE=278.8058 Scheffe's Multiple-Comparison Test Group Count Mean Different From Groups 2 12 15.60000 1 3 12 18.05833 1 1 13 40.23077 2, 3 Critical Value=2.5596 29 ANOVA STATISTICS Two-way ANOVA 30 STATISTICS ANOVA The Logic of two-way ANOVA SST divided into 3 or 4 parts SST = SSR + SSC + SSE SST = SSR + SSC + SS(RC) +SSE Models of two-way ANOVA Without interaction: X ij i j eij With interaction: X ij i j (i j ) eij 31 STATISTICS ANOVA Simpson’s Paradox: 陳小姐買帽子 第一天 第二天 第一櫃(大人) 第二櫃(小孩) 兩櫃一起 紅色 黑色 紅色 黑色 紅色 黑色 合適 9 17 3 1 12 18 不合適 1 3 17 9 18 12 Total 10 20 20 10 30 30 90% 85% 15% 10% 40% 60% 32 STATISTICS ANOVA Statistical Interaction & confounding Interaction: 2 lines with different slope Y |T ,C 1T 2C 3TC C0 H1 : ˆ3 0 Confounding: 2 parallel lines C1 C1 Y H1 : ˆ1|c ˆ1 How to test: ANOVA C0 T0 T1 33 STATISTICS ANOVA Confounding factors Mixing effect of X2 with X1 & Y Definition: Obesity Associated With the disease of interest in the absence of exposure 本身單獨與疾病有相關;本身是危險因子 Associated With the exposure Cholesterol MI 與危險因子有相關 Not as a result of being exposed. 干擾不能是中介變項:intervening variable Intervening variable: X1X2Y Example: S/S of diseases 34 ANOVA STATISTICS Interaction & confounding Interaction: The effect of X1 varies with the level of X2 A phenomenon you have to present Main effects of X1, X2: not meaningful anymore Ex: X1(Sex), X2(teaching method) & Y (language score) Confounding: Given condition: no interaction A condition you have to control (or adjust) 35 STATISTICS ANOVA Two-way ANOVA table Source of variation SS DF Mean SS F ratio Among rows SSR r-1 MSR MSR/MSE Among columns SSC c-1 MSC MSC/MSE SS(RC) (r-1)(c-1) MS(RC) MS(RC)/MSE Error SSE rc(n-1) MSE Total SST n-1 Interaction 36 ANOVA STATISTICS Example 2: two-way ANOVA Research problem: Glucose tolerance, insulin secretion, insulin sensitivity and glucose effectiveness in normal and overweight hyperthyroid women. Gonzalo MA. Clin Endocrinol, 1996;45:689-697 X1: BMI; X2: thyroid function All categorical variables BMI: 2 level; thyroid function: 2 level; Y: Insulin sensitivity Continuous variable 37 STATISTICS Printout Box plot & Error bar plot, ex 2 Means of IS 1.00 Error Bar Plot 1.0 HT HT 0.9 0 1 0 Normal thyroid 1 Hyperthyroid 0.8 0.75 IS IS 0.7 0.50 0.6 0.4 0.3 0.25 0.2 0.1 0.00 0.0 0 1 BMI2 0 1 BMI 38 STATISTICS Printout Descriptive statistics, ex 2 Means and Standard Errors of IS Term All Count Mean SE 33 0.4647917 0 19 0.615 5.786324E-02 1 14 0.3145833 6.740864E-02 0 19 0.57375 5.786324E-02 1 14 0.3558333 6.740864E-02 0,0 11 0.68 0.0760472 0,1 8 0.55 8.917324E-02 1,0 8 0.4675 8.917324E-02 1,1 6 0.1616667 0.1029684 A: BMI2 B: HT AB: BMI2,HT 39 STATISTICS Printout 2-way ANOVA table, ex 2 Analysis of Variance Table for IS (alpha = 0.05) Source DF SS MSS F-Ratio Prob. Power A: BMI2 1 0.7112253 0.7112253 11.18 0.002293* 0.898154 B: HT 1 0.3742312 0.3742312 5.88 0.021745* 0.649738 AB 1 6.091182E-02 6.091182E-02 0.96 0.335909 0.157220 S 29 1.844833 Total (Adj.) 32 2.916255 Total 33 6.361494E-02 40 STATISTICS Summary Flowchart of 3G MD test 1 Fa c to r O n e -wa y ANOVA No . o f Fa c to rs 3 or more groups ND 2 o r m o re Fa c to rs Two -w a y ANOVA o r o th e r In d e p e d e n t Kru s ka l-Wa llis fo r 1 Fa c to r Re p e a te d ANOVA ND If Ye s , g o u p ; If No , d o d o wn Frie d m a n 41 STATISTICS QUIZ Q: Can I use ANOVA to test 2G MD? A: Yes, you can. Q: What is the relationship btw ANOVA & 2-s t? A: 2-s t test is a special case of ANOVA F, t & Z table: 2 (1). F ,(1,n 1) t1 / 2,( n 1) 2 2 (2).df2 , F ,(1,) Z12 / 2 42 STATISTICS Home Work Chapter 7, exercise 7, (table 7-20, p187) Analysis of phenotypic variation in psoriasis as a function of age at onset and family history. Arch. Dermatol. Res. 2002;294:207-213 Answering the following questions: Is there a difference in %TBSA (percent of total body surface area affected) related to age at onset? Is there a difference in %TBSA related to type of psoriasis (familial vs. sporadic)? Is the interaction significant? What is your conclusion? 43