Analysis of Variance RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr, November 20, 2008 Experimental Design Terminology An Experimental Unit is the entity on which measurement or an observation is made. For example, subjects are experimental units in most clinical studies. Homogeneous Experimental Units: Units that are as uniform as possible on all characteristics that could affect the response. A Block is a group of homogeneous experimental units. For example, if an investigator had reason to believe that age might be a significant factor in the effect of a given medication, he might choose to first divide the experimental subjects into age groups, such as under 30 years old, 30-60 years old, and over 60 years old. Experimental Design Terminology A Factor is a controllable independent variable that is being investigated to determine its effect on a response. E.g. treatment group is a factor. Factors can be fixed or random – Fixed -- the factor can take on a discrete number of values and these are the only values of interest. – Random -- the factor can take on a wide range of values and one wants to generalize from specific values to all possible values. Each specific value of a factor is called a level. Experimental Design Terminology A covariate is an independent variable not manipulated by the experimenter but still affecting the response. E.g. in many clinical experiments, the demographic variables such as race, gender, age may influence the response variable significantly even though these are not the variables of interest of the study. These variables are termed as covariate. Effect is the change in the average response between two factor levels. That is, factor effect = average response at one level – average response at a second level. Experimental Design Terminology Interaction is the joint factor effects in which the effect of one factor depends on the levels of the other factors. No interaction effect of factor A and B Interaction effect of factor A and B Experimental Design Terminology Randomization is the process of assigning experimental units randomly to different experimental groups. It is the most reliable method of creating homogeneous treatment groups, without involving potential biases or judgments. Experimental Design Terminology A Replication is the repetition of an entire experiment or portion of an experiment under two or more sets of conditions. – – – – – Although randomization helps to insure that treatment groups are as similar as possible, the results of a single experiment, applied to a small number of experimental units, will not impress or convince anyone of the effectiveness of the treatment. To establish the significance of an experimental result, replication, the repetition of an experiment on a large group of subjects, is required. If a treatment is truly effective, the average effect of replicated experimental units will reflect it. If it is not effective, then the few members of the experimental units who may have reacted to the treatment will be negated by the large numbers of subjects who were unaffected by it. Replication reduces variability in experimental results and increases the significance and the confidence level with which a researcher can draw conclusions about an experimental factor. Experimental Design Terminology A Design (layout) of the experiment includes the choice of factors and factorlevels, number of replications, blocking, randomization, and the assignment of factor –level combination to experimental units. Sum of Squares (SS): Let x1, …, xn be n observations. The sum of squares of these n observations can be written as x12 + x22 +…. xn2. In notations, ∑xi2. In a corrected form this sum of squares can be written as (xi -x )2. Degrees of freedom (df): Number of quantities of the form – Number of restrictions. For example, in the following SS, we need n quantities of the form (xi -x ). There is one constraint (xi -x ) = 0. So the df for this SS is n – 1. Mean Sum of Squares (MSS): The SS divided by it’s df. Experimental Design Terminology The analysis of variance (ANOVA) is a technique of decomposing the total variability of a response variable into: – Variability due to the experimental factor(s) and… – Variability due to error (i.e., factors that are not accounted for in the experimental design). The basic purpose of ANOVA is to test the equality of several means. A fixed effect model includes only fixed factors in the model. A random effect model includes only random factors in the model. A mixed effect model includes both fixed and random factors in the model. The basic ANOVA situation Two type of variables: Quantitative response Categorical predictors (factors), Main Question: Do the (means of) the quantitative variable depend on which group (given by categorical variable) the individual is in? If there is only one categorical variable with only 2 levels (groups): • 2-sample t-test ANOVA allows for 3 or more groups One-way analysis of Variance One factor of k levels or groups. E.g., 3 treatment groups in a drug study. The main objective is to examine the equality of means of different groups. Total variation of observations (SST) can be split in two components: variation between groups (SSG) and variation within groups (SSE). Variation between groups is due to the difference in different groups. E.g. different treatment groups or different doses of the same treatment. Variation within groups is the inherent variation among the observations within each group. Completely randomized design (CRD) is an example of one-way analysis of variance. One-way analysis of variance Consider a layout of a study with 16 subjects that intended to compare 4 treatment groups (G1-G4). Each group contains four subjects. G1 G2 G3 G4 S1 Y11 Y21 Y31 Y41 S2 Y12 Y22 Y32 Y42 S3 Y13 Y23 Y33 Y43 S4 Y14 Y24 Y34 Y44 One-way Analysis: Informal Investigation Graphical investigation: • side-by-side box plots • multiple histograms Whether the differences between the groups are significant depends on • the difference in the means • the standard deviations of each group • the sample sizes ANOVA determines P-value from the F statistic One-way Analysis: Side by Side Boxplots One-way analysis of Variance Model: y e ij i ij where, yij is the ith observatio n of jth group, i is the effect of ith group, is the general mean and eij is the error. Assumptions: – Observations yij are independent. – eij are normally distributed with mean zero and constant standard deviation. – The second (above) assumption implies that response variable for each group is normal (Check using normal quantile plot or histogram or any test for normality) and standard deviations for all group are equal (rule of thumb: ratio of largest to smallest are approximately 2:1). One-way analysis of Variance Hypothesis: Ho: Means of all groups are equal. Ha: At least one of them is not equal to other. – doesn’t say how or which ones differ. – Can follow up with “multiple comparisons” Analysis of variance (ANOVA) Table for one way classified data Sources of Variation Sum of Squares df Mean Sum of Squares F-Ratio Group SSG k-1 MSG=SSG/k-1 F=MSG/MSE Error SSE n-k MSE=SSE/n-k Total SST n-1 A large F is evidence against H0, since it indicates that there is more difference between groups than within groups. Pooled estimate for variance The pooled variance of all groups can be estimated as the weighted average of the variance of each group: (n1 1)s (n 2 1)s ... (n I 1)s s nI 2 2 2 (df )s (df )s ... (df )s 2 2 2 I I sp 1 1 df1 df 2 ... df I so MSE is the SSE 2 sp MSE pooled estimate DFE of variance 2 p 2 1 2 2 2 I Multiple comparisons If the F test is significant in ANOVA table, then we intend to find the pairs of groups are significantly different. Following are the commonly used procedures: – Fisher’s Least Significant Difference (LSD) – Tukey’s HSD method – Bonferroni’s method – Scheffe’s method – Dunn’s multiple-comparison procedure – Dunnett’s Procedure One-way ANOVA - Demo MS Excel: – Put response data (hgt) for each groups (grp) in side by side columns (see next slides) – Select Tools/Data Analysis and select Anova: Single Factor from the Analysis Tools list. Click OK. – Select Input Range (for our example a1: c21), mark on Group by columns and again mark labels in first row. – Select output range and then click on ok. One-way ANOVA MS-Excel Data Layout grp1 52.50647 43.14426 55.91853 45.68187 54.76593 45.27999 41.95513 43.67319 44.01857 44.54295 46.1765 40.02826 58.09449 43.25757 42.07507 46.80839 43.80479 57.60508 42.47022 57.4945 grp2 47.83167 43.69665 40.73333 46.56424 53.03273 56.41127 43.69635 50.92119 40.36097 51.95264 57.1638 50.98321 49.23148 48.01014 45.98231 55.32514 47.21837 40.30104 50.56327 55.55145 grp3 43.82211 40.41359 57.65748 41.96207 49.08051 55.97797 47.71419 41.75912 46.21859 53.61966 49.63484 57.68229 49.08471 40.59866 38.99055 54.74286 53.74624 44.82507 45.85581 41.36863 One-way ANOVA MS-Excel output: height on treatment groups Anova: Single Factor SUMMARY Groups grp1 grp2 grp3 Count Sum Average Variance 20 949.3017 47.46509 36.88689 20 975.5313 48.77656 28.10855 20 954.7549 47.73775 37.50739 ANOVA Source of Variation SS Between Groups 19.15632 Within Groups 1947.554 Total 1966.71 df MS F P-value F crit 2 9.578159 0.280329 0.756571 3.158846 57 34.16761 59 One-way ANOVA - Demo SPSS: – Select Analyze > Compare Means > One –Way ANOVA – Select variables as Dependent List: response (hgt), and Factor: Group (grp) and then make selections as follows-click on Post Hoc and select Multiple comparisons (LSD, Tukey, Bonferroni, or Scheffe), click options and select Homogeneity of variance test, click continue and then Ok. One-way ANOVA SPSS output: height on treatment groups ANOVA hgt Between Groups Sum of Squares 19.156 Within Groups Total df 2 Mean Square 9.578 1947.554 57 34.168 1966.710 59 F .280 Sig. .757 Analysis of variance of factorial experiment (Two or more factors) Factorial experiment: – The effects of the two or more factors including their interactions are investigated simultaneously. – For example, consider two factors A and B. Then total variation of the response will be split into variation for A, variation for B, variation for their interaction AB, and variation due to error. Analysis of variance of factorial experiment (Two or more factors) Model with two factors (A, B) and their interactions: yijk i j ( ) ij eijk is the general mean αi is the effect of ith level of the factor A j is the effect of jth level of the factor B (β)ij is the interactio n effect of ith level A and jth level of B eijk is the error Assumptions: The same as in One-way ANOVA. Analysis of variance of factorial experiment (Two or more factors) Null Hypotheses: Hoa: Means of all groups of the factor A are equal. Hob: Means of all groups of the factor B are equal. Hoab:(αβ)ij = 0, i. e. two factors A and B are independent Analysis of variance of factorial experiment (Two or more factors) ANOVA for two factors A and B with their interaction AB. Sources of Variation Sum of Squares df Mean Sum of Squares F-Ratio Main Effect A SSA k-1 MSA=SSA/k-1 MSA/MSE Main Effect B SSB P-1 MSB=SSB/p-1 MSB/MSE Interaction Effect AB SSAB (k-1)(p-1) MSAB=SSAB/ (k-1)(p-1) MSAB/MSE Error SSE kp(r-1) MSE=SSE/ kp(r-1) Total SST Kpr-1 Two-factor with replication - Demo MS Excel: – Put response data for two factors like in a lay out like in the next page. – Select Tools/Data Analysis and select Anova: Two Factor with replication from the Analysis Tools list. Click OK. – Select Input Range and input the rows per sample: Number of replications (excel needs equal replications for every levels). Replication is 2 for the data in the next page. – Select output range and then click on ok. Two-factor ANOVA MS-Excel Data Layout shade1 shade1 shade1 shade1 shade1 shade1 shade1 shade1 shade1 shade1 shade2 shade2 shade2 shade2 shade2 shade2 shade2 shade2 shade2 shade2 grp1 grp2 grp3 52.50647 47.83167 43.82211 43.14426 43.69665 40.41359 55.91853 40.73333 57.65748 45.68187 46.56424 41.96207 54.76593 53.03273 49.08051 45.27999 56.41127 55.97797 41.95513 43.69635 47.71419 43.67319 50.92119 41.75912 44.01857 40.36097 46.21859 44.54295 51.95264 53.61966 46.1765 57.1638 49.63484 40.02826 50.98321 57.68229 58.09449 49.23148 49.08471 43.25757 48.01014 40.59866 42.07507 45.98231 38.99055 46.80839 55.32514 54.74286 43.80479 47.21837 53.74624 57.60508 40.30104 44.82507 42.47022 50.56327 45.85581 57.4945 55.55145 41.36863 Two-factor ANOVA MS-Excel output: height on treatment group, shades, and their interaction Anova: Two-Factor With Replication SUMMARYgrp1 grp2 grp3 Total shade1 Count Sum Average Variance 10 10 10 30 471.4869 475.201 478.2253 1424.913 47.14869 47.5201 47.82253 47.49711 26.773 29.80231 38.11298 29.46458 shade2 Count Sum Average Variance 10 10 10 30 477.8149 500.3302 476.5297 1454.675 47.78149 50.03302 47.65297 48.48916 50.87686 26.02977 41.05331 37.84396 Total Count Sum Average Variance 20 20 20 949.3017 975.5313 954.7549 47.46509 48.77656 47.73775 36.88689 28.10855 37.50739 ANOVA Source of Variation SS Sample 14.76245 Columns 19.15632 Interaction 18.9572 Within 1913.834 Total 1966.71 df 1 2 2 54 59 MS F P-value F crit 14.76245 0.416532 0.521406 4.01954 9.578159 0.270254 0.764212 3.168246 9.478602 0.267445 0.766341 3.168246 35.44137 Two-factor ANOVA - Demo SPSS: – Select Analyze > General Linear Model > Univariate – Make selection of variables e.g. Dependent varaiable: response (hgt), and Fixed Factor: grp and shades. – Make other selections as follows-click on Post Hoc and select Multiple comparisons (LSD, Tukey, Bonferroni, or Scheffe), click options and select Homogeneity of variance test, click continue and then Ok. Two-factor ANOVA SPSS output: height on treatment group, shades, and their interaction Between-Subjects Factors N grp Shades 1 20 2 20 3 20 1 30 2 30 Tests of Between-Subjects Effects Dependent Variable: hgt Source Corrected Model Intercept Type III Sum of Squares 52.876(a) df Mean Square F Sig. 5 10.575 .298 .912 138200.445 1 138200.445 3899.410 .000 grp 19.156 2 9.578 .270 .764 Shades 14.762 1 14.762 .417 .521 grp * Shades .267 .766 18.957 2 9.479 Error 1913.834 54 35.441 Total 140167.155 60 1966.710 a R Squared = .027 (Adjusted R Squared = -.063) 59 Corrected Total Repeated Measures The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject. In repeated measurements designs, we are often concerned with two types of variability: – Between-subjects - Variability associated with different groups of subjects who are treated differently (equivalent to between groups effects in one-way ANOVA) – Within-subjects - Variability associated with measurements made on an individual subject. Repeated Measures Examples of Repeated Measures designs: A. Two groups of subjects treated with different drugs for whom responses are measured at six-hour increments for 24 hours. Here, DRUG treatment is the between-subjects factor and TIME is the within-subjects factor. B. Students in three different statistics classes (taught by different instructors) are given a test with five problems and scores on each problem are recorded separately. Here CLASS is a between-subjects factor and PROBLEM is a within-subjects factor. C. Volunteer subjects from a linguistics class each listen to 12 sentences produced by 3 text to speech synthesis systems and rate the naturalness of each sentence. This is a completely within-subjects design with factors SENTENCE (1-12) and SYNTHESIZER Repeated Measures When measures are made over time as in example A we want to assess: – – – how the dependent measure changes over time independent of treatment (i.e. the main effect of time) how treatments differ independent of time (i.e., the main effect of treatment) how treatment effects differ at different times (i.e. the treatment by time interaction). Repeated measures require special treatment because: – – Observations made on the same subject are not independent of each other. Adjacent observations in time are likely to be more correlated than non-adjacent observations Response Repeated Measures Time Repeated Measures Methods of repeated measures ANOVA – Univariate - Uses a single outcome measure. – Multivariate - Uses multiple outcome measures. – Mixed Model Analysis - One or more factors (other than subject) are random effects. We will discuss only univariate approach Repeated Measures Assumptions: – Subjects are independent. – The repeated observations for each subject follows a multivariate normal distribution – The correlation between any pair of within subjects levels are equal. This assumption is known as sphericity. Repeated Measures Test for Sphericity: – Mauchley’s test Violation of sphericity assumption leads to inflated F statistics and hence inflated type I error. Three common corrections for violation of sphericity: – Greenhouse-Geisser correction – Huynh-Feldt correction – Lower Bound correction All these three methods adjust the degrees of freedom using a correction factor called Epsilon. Epsilon lies between 1/k-1 to 1, where k is the number of levels in the within subject factor. Repeated Measures - SPSS Demo Analyze > General Linear model > Repeated Measures Within-Subject Factor Name: e.g. time, Number of Levels: number of measures of the factors, e.g. we have two measurements PLUC.pre and PLUC.post. So, number of level is 2 for our example. > Click on Add Click on Define and select Within-Subjects Variables (time): e.g. PLUC.pre(1) and PLUC.pre(2) Select Between-Subjects Factor(s): e.g. grp Repeated Measures ANOVA SPSS output: Within-Subjects Factors Measure: MEASURE_1 time 1 Dependent Variable PLUC.pre 2 PLUC.post Between-Subjects Factors N grp 1 20 2 20 3 20 Mauchly's Test of Sphericity(b) Measure: MEASURE_1 Epsilon(a) Within Subjects Effect time Mauchly's W 1.000 Approx. ChiSquare .000 df Sig. 0 . GreenhouseGeisser 1.000 Huynh-Feldt Lower-bound 1.000 Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table. b Design: Intercept+grp Within Subjects Design: time 1.000 Repeated measures ANOVA SPSS output Tests of Within-Subjects Effects Measure: MEASURE_1 Source time time * grp Error(time) Type III Sum of Squares 172.952 Sphericity Assumed df 1 Mean Square 172.952 F 33.868 Sig. .000 Greenhouse-Geisser 172.952 1.000 172.952 33.868 .000 Huynh-Feldt 172.952 1.000 172.952 33.868 .000 Lower-bound 172.952 1.000 172.952 33.868 .000 Sphericity Assumed 46.818 2 23.409 4.584 .014 Greenhouse-Geisser 46.818 2.000 23.409 4.584 .014 Huynh-Feldt 46.818 2.000 23.409 4.584 .014 Lower-bound 46.818 2.000 23.409 4.584 .014 Sphericity Assumed 291.083 57 5.107 Greenhouse-Geisser 291.083 57.000 5.107 Huynh-Feldt 291.083 57.000 5.107 Lower-bound 291.083 57.000 5.107 Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source time time Linear time * grp Error(time) Type III Sum of Squares df Mean Square F Sig. 172.952 1 172.952 33.868 .000 Linear 46.818 2 23.409 4.584 .014 Linear 291.083 57 5.107 Tests of Between-Subjects Effects Measure: MEASURE_1 Transformed Variable: Average Source Intercept Type III Sum of Squares df Mean Square F Sig. 14371.484 1 14371.484 3858.329 .000 grp 121.640 2 60.820 16.328 .000 Error 212.313 57 3.725 Questions