How Statistics Can Empower Your Research? Part II Xiayu (Stacy) Huang Bioinformatics Shared Resource Sanford | Burnham Medical Research Institute OUTLINE Summary of Previous Talk Descriptive & inferential statistics Student’s T test, one-way ANOVA More common statistical tests and applications Repeated measures one-way ANOVA Two-way ANOVA Power analysis Common data transformation methods SUMMARY OF PREVIOUS TALK • Descriptive statistics • Measure of central tendency, dispersion, etc. • Inferential statistics • Hypothesis, errors, p-value, power • Three statistical tests and their applications • Two sample unpaired test, paired t test and one way ANOVA Power point presentation at http://bsrweb.burnham.org ONE-WAY ANOVA EXAMPLE • Goal:studying the effect of mice genotypes on their learning skills on rotarod. • Dependent variable: number of seconds staying on a rotarod Group1 Group2 Group3 Group4 170 116 30 114 214 102 60 24 122 120 136 72 44 82 126 42 80 90 56 20 130 54 6 32 DECISION TREE DECISION TREE----ONE-WAY ANOVA ASSUMPTION CHECK IN GRAPHPAD PRISM DATA ANALYSIS IN GRAPHPAD PRISM Variance check REPEATED MEASURES ONE-WAY ANOVA Compares the means of 3 or more groups Repeated measurements on the same group of subjects Assumptions: Sampling should be independent and randomized. Equal sample size per group preferred. Sphericity or homogeneity of covariance Data is normally distributed. APPLICATION OF REPEATED MEASURES ONE-WAY ANOVA IN BIOLOGY Days REPEATED MEASURES ONE-WAY ANOVA EXAMPLE • Goal:studying the effect of practice on maze learning for rats. • independent variable : days • dependent variable: number of errors made each day Rat ID Day 1 Day 2 Day 3 Day 4 Rat_1 3 1 0 0 Rat_1 Rat_2 Rat_3 Rat_2 3 2 2 1 Rat_3 6 3 1 2 DECISION TREE----ONE-WAY REPEATED ANOVA TABLE FORMAT IN GRAPHPAD PRISM– REPEATED MEASURES ONE-WAY ANOVA DATA FORMAT AND CHOOSING ANALYSIS METHODS DATA ANALYSIS IN GRAPHPAD PRISM ANALYSIS RESULT ONE-WAY REPEATED ANOVA COMPARED WITH REGULAR ONE-WAY ANOVA TWO-WAY ANOVA One dependent variable and two independent variables or factors Assumptions samples are normally or approximately normally distributed The samples from each treatment group must be independent The variances of the populations must be equal equal sample size per treatment group preferred Treatment group all possible combinations of the two factors Treatment Gender Placebo Female Drug Male Female Male TWO-WAY ANOVA Main effect Interaction effect Effect of individual factor Effect of one factor on the other Hypotheses The population means of the first factor A are equal The population means of the second factor B are equal There is no interaction between the two factors Test F test: mean square for each main effect and the interaction effect divided by the within variance MAIN EFFECTS Pain score • Asprin •Ibuprophen • Asprin •Ibuprophen A--Time B 1st hr 2nd hr A I. No main effects for both time and treatment 1st hr 2nd hr III. Main effect of time only 2nd hr II. Main effect of treatment only • Asprin •Ibuprophen • Asprin •Ibuprophen 1st hr B--Treatment 1st hr 2nd hr IV. Main effects of time and treatment MAIN EFFECT AND INTERACTION EFFECT Pain score • Asprin •Ibuprophen 1st hr 2nd hr V. Interaction effect only • Asprin •Ibuprophen 1st hr VI. Main effect of time only and interaction effect • Asprin •Ibuprophen 1st hr 2nd hr VII. Main effect of treatment only and interaction effect 2nd hr • Asprin •Ibuprophen 1st hr 2nd hr VIII. Main effects of time and treatment, and interaction effect TWO-WAY ANOVA EXPERIMENTAL DESIGN I. Control Treated Time 0 4 4 Time 2 4 Time 4 Time 8 Control Treated Time 0 3 4 4 Time 2 6 8 4 4 Time 4 3 4 4 4 Time 8 9 12 Balanced design with equal replication (Best) Control Treated Time 0 1 1 Time 2 1 Time 4 Time 8 II. Proportional design replication (Acceptable) Control Treated Time 0 4 3 1 Time 2 2 2 1 1 Time 4 2 2 1 1 Time 8 3 4 III. One replication only (Not recommended) IV. Disproportional design (Bad) APPLICATION OF TWO-WAY ANOVA IN BIOLOGY 0 mM 50 mM 75 mM Microarray: Time-dose relationship TWO-WAY ANOVA WITH REPLICATION EXAMPLE Study the effect of gender and anti-cancer drugs on tumor growth Drug cisplatin Gender Female Tumor Size vinblastine 5-fluorouracil Male Female Male Female Male 65 50 70 45 55 35 70 55 65 60 65 40 60 80 60 85 70 35 60 65 70 65 55 55 60 70 65 70 55 35 55 75 60 70 60 40 60 75 60 80 50 45 50 65 50 60 50 40 DECISION TREE– FACTORIAL ANOVA TABLE FORMAT IN PRISM—TWO-WAY ANOVA DATA FORMAT AND CHOOSING ANALYSIS METHODS CHOOSING MODEL ANALYSIS RESULT TWO-WAY REPEATED MEASURES ANOVA EXAMPLE Goal: Investigating gender and caffeine consumption on the effect of memory Independent variables: gender and caffeine consumptions Dependent variable: memory score Subject Sex Lowcaff Medcaff Highcaff 1 Male 10 15 17 2 Male 9 12 11 3 Male 11 14 15 4 Male 13 11 12 5 Male 11 10 16 6 Male 12 6 12 7 Female 10 14 14 8 Female 12 21 22 9 Female 21 18 23 10 Female 9 18 22 11 Female 12 16 20 12 Female 15 17 26 DECISION TREE----TWO-WAY REPEATED ANOVA TABLE FORMAT– TWO-WAY REPEATED MEASURES ANOVA DATA FORMAT AND ANALYSIS METHODS CHOOSING MODEL ANALYSIS RESULT Matching not effective??? RECONSIDERING REGULAR TWO-WAY ANOVA OUTLINE Summary of Previous Talk Descriptive & inferential statistics Student’s T test, one-way ANOVA More common Statistical tests and Applications Repeated-measures one-way ANOVA Two-way ANOVA Power analysis Common data transformation methods POWER ANALYSIS Power depends on: Sample size (n ) Standard deviation ( or s ) Minimal detectable difference ( ) False positive rate ( ) effect size Power analysis includes: Sample size required Effect size or Minimal detectable difference Power of the test POWER ANALYSIS SOFTWARE/PACKAGES G*Power (free!!!) Optimal design (free!!!) SPSS sample power PASS SAS proc power, Stata sampsi, etc Mplus for more advanced/complicated analysis Many free on-line programs http://www.stat.uiowa.edu/~rlenth/Power/ TWO INDEPENDENT SAMPLE POWER ANALYSIS --INPUT AND OUTPUT PARAMETERS IN G*POWER Sample size required Input parameters Effect size ( f ) False positive rate ( ) Minimum Power 1 ( ) Ratio of two sample sizes Output parameters Noncentrality parameter ( ) Critical t Degree of freedom Sample size for each group Total sample size Actual power TWO INDEPENDENT SAMPLES POWER ANALYSIS --INPUT AND OUTPUT PARAMETERS IN G*POWER Effect size Input parameters False positive rate Minimum power Sample size for each group Output parameters Noncentrality parameter Critical t Degree of freedom Effect size Minimal detectable difference COMPUTE SAMPLE SIZE– TWO INDEPENDENT SAMPLES DETERMINING EFFECT SIZE– TWO INDEPENDENT SAMPLES ANALYSIS RESULTS– TWO INDEPENDENT SAMPLES COMPUTE EFFECT SIZE– TWO INDEPENDENT SAMPLES X-Y PLOT FOR A RANGE OF VALUES FACTOR AFFECTING POWER—TWO INDEPENDENT SAMPLES Power increases as total sample size increases Power increases as effect size increases Power increases as significance level increases ONE-WAY ANOVA POWER ANALYSIS --INPUT AND OUTPUT PARAMETERS IN G*POWER Sample size required Input parameters Effect size ( f ) False positive rate ( ) Minimum Power 1 ( ) Number of groups Output parameters Noncentrality parameter ( ) Critical F Degree of freedom Total sample size Actual power ONE-WAY ANOVA SAMPLE POWER ANALYSIS --INPUT AND OUTPUT PARAMETERS IN G*POWER Effect size Input parameters False positive rate Minimum power Total sample size Number of groups Output parameters Noncentrality parameter Critical F Numerator and denominator degree of freedom Effect size Minimal detectable difference COMPUTE SAMPLE SIZE-- ONE-WAY ANOVA COMPUTE EFFECT SIZE– ONE-WAY ANOVA FACTORS AFFECTING POWER—ONE-WAY ANOVA Power increases as total sample size increases Power increases as effect size increases Power increases as significance level increases OUTLINE Summary of Previous Talk Descriptive & inferential statistics Student’s T test, one-way ANOVA More common Statistical tests and Applications Repeated-measures one-way ANOVA Two-way ANOVA Power analysis Common data transformation methods DATA TRANSFORMATION Why? Many biological variables do not follow normal distribution How? Applying a mathematical function on each observation Performing statistical tests using transformed data Interpreting results using back transformation Common data transformation methods in biology Log transformation Square root transformation Arcsine transformation Reciprocal transformation LOG TRANSFORMATION Usage Convert a positively skewed distribution into a symmetrical one Applicable when there is heteroscedasticity and standard deviations are proportional to the means Mathematical function x ' log2 ( x 1) Logarithms in any base are satisfactory x 2 ^ x ' 1 Back transformation: SQUARE ROOT TRANSFORMATION Usage Applicable when the group variances are proportional to the means Samples taken from Poisson distribution such as counting data Mathematical function x ' x 0.5 Back transformation: x x '^ 2 0.5 ARCSINE TRANSFORMATION Usage Applicable when data (proportions or percentages) was taken from a binomial distribution Mathematical function p ' arcsin p Back transformation:p (sin p ') ^ 2 Shortcoming Not good at the ends of the range (near 0 and 100%) Adjustment needed when p near 0 and 100% CHOOSING TRANSFORMATION BASED ON DATA DISTRIBUTION Shape Reverse J Severe skew right Moderate skew right Figure Transformation A B C 1/X Log (X) sqrt (X) CHOOSING TRANSFORMATION BASED ON DATA DISTRIBUTION Shape Moderate skew left Severe skew left J-shaped Figure D E F Transformation 1/sqrt(X) -1/Log (X) -1/X LOG TRANSFORMATION Untransformed Square-root transformed Log transformed 38 6.164 1.580 1 1.000 0.000 13 3.606 1.114 2 1.414 0.301 13 3.606 1.114 20 4.472 1.301 50 7.071 1.699 9 3.000 0.954 28 5.292 1.447 6 2.449 0.778 4 2.000 0.602 43 6.557 1.633 SUMMARY ANOVA One-way ANOVA With or without repeated measures Two-way ANOVA Regular two-way ANOVA Two-way repeated ANOVA Power analysis Two independent samples One-way ANOVA Data Transformations Log transformation Square root transformation Arcsine transformation BASIC STATISTICS TOOLS Statistics software and packages: 1.Graphpad prism, SPSS and excel addins 2. G*power, Optimal design, etc 3. SAS, R, Stata, etc Basic statistics books: 1. Intro Stats, SDSU, 2nd edition, Deveaux, Velleman, Bock 2. Choosing and Using Statistics: A Biologist's Guide 3. Biostatistical analysis, Jerrold H. Zar 4. Biostatistics: the bare essentials, Norman Streiner 5. Handbook of biological statistics Thank You All for Coming!!! Questions???