The Mixed Effects Model - Introduction • In many situations, one of the factors of interest will have its levels chosen because they are of specific interest to the researcher. • On the other hand, there may be a second factor of interest for which it is important to generalize to all possible levels; in this case the levels of the second factor might be chosen at random. • This type of experiment is referred to as a mixed study design because one of the factors has fixed levels while the other has random levels. STA305 Week 9 1 The Mixed Effects Model • Suppose that in a given study, a levels of factor A have been chosen because they are of particular interest. Further, suppose that b levels of factor B have been chosen at random from all possible levels of this factor. • A total of abr experimental units will be used to conduct the study, r units will be randomly allocated to each of the ab experimental conditions. • The form of the statistical model that we will study is identical to that for 2 factor studies with either fixed or random effects - the difference is in the assumptions about the factor levels and the interactions. • The model equation is Yijk = μ + αi + βj + γij + εijk • As with the fixed and random effects models, we parameterize the model in such a way that μ is the overall mean of all of the responses: i.e. E Y . STA305 Week 9 2 Assumptions of the Mixed Effects Model • As in the fixed effects model, the factor A has fixed effects and we therefore require that a i 0 i 1 • Since the levels of factor B have been chosen at random, we require instead that j ~ N 0, B2 . • Since one of the factors is random, the interactions must be random as well however, since one of the factors is fixed, the sum over that component will be 0. Together these yield 2 constraints on the γij a 1 2 ij ~ N 0, A B and a a i 1 ij 0 • The factor (a-1)/a is for convenience in expressing EMS only. STA305 Week 9 3 Sums of Squares • The observed variation in the data is measure in the same manner as for the fixed effects and the random effect case. • In other words, the total variation in the data is measured by SST Yijk Y a b r 2 i 1 j 1 k 1 • The sums of squares and the degrees of freedom for the other sources of variation are also the same as in the 2-factor study with fixed or random effects model. • The only difference is in the expected mean squares. STA305 week 5 4 Expected Mean Squares • The expected mean squares are as follows: a br 2 2 2 E MS A r AB i a 1 i 1 E MS B 2 ar B2 E MS A B 2 r A2B E MS E 2 STA305 week 5 5 Hypothesis Testing • As in all of the other experimental designs that we have looked at, the motivation for the test statistics is derived from the EMS. • As in the case of both the fixed and random effects models, the test for interactions will be made by comparing MSA×B to MSE. • The test for the fixed factor, factor A, will be made by comparing MSA to MSA×B. • The test for the random effect, factor B, will be made by comparing MSB to MSE. STA305 Week 9 6 The ANOVA Table • It is useful to add the expected mean squares to the table in order to remember which ratios to form for the F-tests. • The ANOVA table is given below: STA305 week 6 7 Estimating the Model Parameters • The effect for the levels of the fixed factor can be estimated as in the fixed effects model. That is, ̂ i Yi Y • In the mixed model, however, confidence intervals for the effects of the levels of the fixed factors are constructed using MSA×B as the variance estimate. That is, a CI for the effect of the ith level of factor A is: ˆ i t ; a 1b1 2 MS AB br • Orthogonal contrasts can also be used to make inferences about the levels of factor A. • The mixed effects model also contains components of variation and these can be estimated as follows: MS B MS E ˆ ar 2 B , ˆ 2 A B STA305 week 6 MS AB MS E r 8 Random & Mixed Effects Using SAS – Example • Background: the goal of this study is to investigate the capacity of a measurement system. • Design: 10 parts are randomly selected; 2 operators are randomly selected to measure each part 3 times. • The statements required to conduct analysis in SAS are as follows: proc glm data = measurement ; class part operator ; Model measure = part | operator ; Random part operator ; Test h=part e=part*operator ; Test h=operator e=part*operator ; run ; STA305 Week 9 9 STA305 Week 9 10 STA305 Week 9 11 STA305 Week 9 12 • Suppose that parts were fixed and operators were random. • The SAS code would be as follows: proc glm data = measurement ; class part operator ; model measure = part | operator ; random part operator ; test h=part e=part*operator ; run ; • The ANOVA would look the same as above. • The fixed factor “part” would be tested against the interaction. • The (random) factor “operator” and “part×operator” would be tested against error term that can be read from the ANOVA table. STA305 Week 9 13 Three-Factor Fixed Effects Design • Suppose that in a particular experiment, there are 3 factors that are of interest to the researcher. • Assume that there are a levels of Factor A, b levels of Factor B, and c levels of Factor C. • In this case, the researcher must also be concerned with interactions between all 3 factors: A×B, A×C, B×C, and A×B×C. • The model that we will use in this case is Yijkl = μ+αi +βj+_γk+(αβ)ij+(αγ)ik+(βγ)jk +(αβγ)ijk+ εijkl. • In this notation, the interaction terms are denoted by, for instance, (αβ)ij. • This notation is used to avoid introducing more Greek letters, and does not mean that the interaction between αi and βj is αiβj STA305 Week 9 14 Model Assumptions • The assumptions about the parameters are similar to those for the 2factor fixed effects model. • We assume the following: STA305 Week 9 15 Sums of Squares and ANOVA Table STA305 Week 9 16 Blocking - Introduction • In general, the goal of experimental design is to minimization haphazard variability and to be able to see differences between treatments. • In some situations, a variable might have an impact on the response, however, this variable is not the focus of the study and we generally wish to exclude it from the design. • Such variables are called nuisance factors. • The purpose of randomization is to average out the impact of these nuisance factors. • In some cases, the nuisance factors may be both unknown and uncontrollable, in which case randomization is especially useful. STA305 Week 9 17 • In other cases, factors which influence response might be known, but possibly uncontrollable. • Although such factors cannot be included in the design, we can at least observe their value. • The analysis can then be adjusted to compensate for the effect of these variables using an analysis of covariance (to be discussed later in the course). • In other situations, a nuisance factor may be both known and controllable, in which case, we can reduce risk of haphazard error by including this factor in design of the experiment. • The type of designs, called blocked designs can be used to reduce variability of experimental error in such cases. STA305 Week 9 18 Example • Fleet manager wishes to consider 4 brands of tires to determine which has least tread wear after 20,000 miles. • Since there are 4 brands of tires to test the study should ideally include at least 4 cars. • Denote tire brands by T1, T2, T3, T4 and the cars by C1, C2, C3, C4. • One possible way to design the study is to randomly decide which car gets which type of tire. • This car would then have 4 tires of this type STA305 Week 9 19 • However, if there is a difference between cars with respect to the wear they cause on the tires, then this design will not allow us to detect a difference between brands. • Although differences between cars are not of primary interest, they need to be taken into account. • One possible way around this is to randomly assign the 16 tires (4 of each type) to the 4 cars. • The following allocation of tires to cars might result from such a randomization: STA305 Week 9 20 • However, the goal of the design was to eliminate the confounding of tire effects with car differences but this goal has not been met here. • For example, brand T1 isn’t used on car C3, brand T2 is not used on car C1, and brand T4 is not used on car C2. • So we need to ensure that there is no confounding and that random error does include differences between cars. • This could be accomplished by restricting randomization so that each car must have one tire of each brand. That is, randomize the location of tires within each car. • An example of such a randomization scheme is as follows: • This design is known as a randomized complete block design. STA305 Week 9 21 Randomized Complete Block Design • A randomized complete block design is a restricted randomization. • Experimental units are first organized into homogeneous groups called blocks. • Treatments are then randomly allocated within each block. • In the example above, cars were contributing to variation but were not of primary interest. • The fact that each car requires 4 tires means that the 4 tires on one car form a natural blocking unit. • The purpose of blocking is to ensure that experimental units within a block are as homogeneous as possible with respect to the response variable. • Units in different blocks are more heterogeneous. STA305 Week 9 22 Advantages & Disadvantages • Using blocks allows us to control a factor not of primary interest. • However, it requires that there be enough experimental units to ensure that each treatment can be used within each block. • Further, it requires the researcher to assume that there is no interaction between blocks and treatments. • Since block effects must be estimated in addition to treatment effects, the degrees of freedom available for estimating error are reduced. STA305 Week 9 23 Special Case: Paired t-test • The simplest example of a randomized complete block design is a paired t-test. • In this case there are 2 treatments to be studied, each treatment is applied to each experimental unit. • For example, twins might be randomly allocated to one of 2 treatments. • Or 2 treatments might be randomly allocated to left and right eyes, lungs, kidneys, hands, etc. STA305 Week 9 24 General Case: Two or More Treatments • Consider the case where there is one factor which will be studied for its effect on the response variable. Suppose that the number of levels of that factor is a. • Further, suppose that it is known that there is a nuisance factor which can be controlled, and that this factor will be used to form blocks. • Let b denote the number of blocks to be used in the experiment. • The order in which the treatments will be allocated within blocks is randomized. • The total number of experimental units required to conduct this experiment is N = ab. STA305 Week 9 25 The Model • We will use the following statistical model to express the response in terms of the treatment and block effects: Yij = μ+τi +βj+ εij. • Where: μ is the overall mean τi is the effect of the i-th treatment βj is the effect of the j-th block and εij is the residual or random error term. • It is possible that either or both of the treatments and blocks could be randomly chosen. But, for now we assume that both are fixed. STA305 Week 9 26 Assumptions • As before, we will assume that εij ~ N(0, σ2) and that εij are independent of each other. • Treatment and block effects are defined as deviations from the overall mean. • Therefore, we require that a i 1 i 0 b and j 1 STA305 Week 9 j 0 27 Sources of Variation • When considered as a whole, the data from all treatment groups and all blocks will contain a certain amount of variability. • Some of the variability might be due the fact that the treatments have different effects on the response. • Similarly, some of the variability might be due to the fact that blocks are quite heterogeneous with regard to the response. • Finally, even if there were no treatment or block differences, there would still be chance variation. • The total sum of squares is a measure of the overall variability in the sample, and it can be decomposed to allow us to determine how much variability is due to each source… STA305 Week 9 28