Ron Heck, Fall 2011 EDEP 768E: Seminar on Multilevel Modeling 1 Week 2: Notes From Single-Level to Multilevel Models Multilevel models necessitate some changes in the way we specify our models. We are usually trying to investigate a set of theoretical relations that are thought to exist in the population. Decisions about data analysis are embedded in research questions, designs, and the data structures themselves. We talked last week about MLMs attend to the sampling schemes of many large-scale studies as well as the specification of processes that exist at multiple levels of an educational system. Often the first step, is determining whether a multilevel analysis is indeed necessary. Typically we first partition the variance in an outcome into its between-group and within-group parts. ρ= σ2B/ (σ2B + σ2W) The intraclass correlation can also be understood as the correlation between two randomly chosen individuals in the same group. Suppose the variance in the first situation is the following: Between = 20 Within = 60 ICC = 20/80 = .25 Suppose in the second it is: Between = 40 Within = 40 ICC = 30/80 = .50 In the second case the groups are more homogenous—that is, people within each group are more alike. If there is no ICC, there is little reason to conduct a multilevel analysis. You would just analyze individuals (as randomly selected and independent from each other). If the ICC is high, there is no need to do a multilevel analysis since the groups are homogeneous. So you would just conduct the analysis at the group level. Let’s look at a simple regression analysis. The model is typically described like this: Y = BX, where the bold indicates a vector (typically a p x 1 vector, to account for the intercept). Suppose we wish to explain students’ math test scores from their SES background (coded participate in free/reduced lunch = 1; Else = 0) and female coded 1 (versus male coded 0). Sometimes people will add the subscript i to refer to individuals. We have the following model: Yi 0 1SESi 2 femalei i , (EDEP 768) Week 2 Notes: From Single-Level to Multilevel Models 1 is the unstandardized beta for SES, 2 is the unstandardized beta for female, and i represents errors in predicting values of Yi . Here is the set of estimates. where 0 is the intercept, We can plug this into the equation as follows: Yi = 650.6 -19.091(lowSES) + 5.491(female) + e The intercept can be interpreted as the estimate for an individual whose status is 0 on the other variables (i.e., low SES = 0) and female = 0. Hence, the individual who is not low SES (i.e., participating in free/reduced lunch) and male would be expected to score 650.6 on the math test. Holding SES constant, females would be expected to score the following: 650.6 + 5.491(1) = 656.091 The key part of the single-level regression analysis is that the estimates for lowSES and female are fixed in the sample—that is, the estimates are considered fixed. Moreover, the errors are assumed to be independent with mean = 0 and some variance. Now, suppose we believe that the parameter between lowSES and math might vary across schools—that is, it might be stronger in some schools and weaker in others. If we look at variability in either the outcome (math scores) or a slope, it is called a “random” effect, since it can take on different values in different groups. We might conduct a multilevel analysis. We can devise a series of steps. The proposed model might look something like this at the moment: 2 (EDEP 768) Week 2 Notes: From Single-Level to Multilevel Models You can see that it takes in the nesting of individuals within schools; that is, there is a withinschool portion of the model and a between-school portion. At this point there are school variables, but they could be added subsequently. 1. Unconditional model (Partition variance components within and between schools) At level 1, we can define students’ average achievement: Yij 0 j ij At level 2 (school level), we can allow the average achievement intercept ( 0 j ) to vary randomly across schools. The random component is indicated by the level-2 variance component ( u0 j ): 0 j 00 u0 j Through substitution, we can arrive at the one variable equation: Yij 00 u0 j ij . This suggests there are three parameters to estimate. They include the intercept, the random effect (i.e., the randomly varying intercept), and the level-1 residual. We can confirm that in the table. We can also examine the variance components. 3 (EDEP 768) Week 2 Notes: From Single-Level to Multilevel Models 4 How would we calculate the variance components for the math variable? 2. Within-School Model Now let’s look at the same analysis with two predictors at the student level, but this time we are adjusting the estimates for the nesting of individual students within schools. At level 1 we have the following model: Yij 0 j 1 jlowsesij 2 female ij . At level 2 (between schools) the intercept model remains the same. 0 j 00 u0 j We can declare the slope coefficients for lowses and female to be fixed (not varying randomly) across schools: 1 j 10 2 j 20 . We can tell the size of the coefficients is not proposed to vary across schools since there are no random components ( u1 j and u2 j , respectively). We can substitute the school level model into the level-1 model. Yij 00 10lowsesij 20 femaleij u0 j ij . We can then count up the fixed and random effects and compare them to the model dimension table. (EDEP 768) Week 2 Notes: From Single-Level to Multilevel Models Let’s look at the fixed effects. We can see there are differences in the intercept, the low SES effect and the female effect (since the estimates are at the school level now). Now we look at the random effects: We can also examine the amount of variance accounted for at each level. 3. Specifying a Random Slope We can also estimate a random slope. We specify the slope for lowSES to vary randomly. We only have to make one change: 1 j 10 u1 j . When we create the combined model, we have the following “cross-level” effect for the level-1 slope at level 2: Yij 00 10lowsesij 20 femaleij lowsesiju1 j u0 j ij . We can see there are now two random effects (the slope for lowses and the intercept). 5 (EDEP 768) Week 2 Notes: From Single-Level to Multilevel Models 6 Notice the fixed effects are different from the previous model. Also we can examine the variance components. We can see the slope is significant across schools. Numbering for equations 1. Typically, we refer to level-1 coefficients as Greek letter beta (β). We refer to the intercept as 0 and to the predictors at level 1 as X variables and number them from 1 to q. We use the subscript i to refer to individuals. Subject j refers to groups. 2. At level 2, we typically refer to the coefficients as gamma (γ). We refer to the intercept as 00 and refer to the level 2 predictors as W (or Z) and number them from 01 to q. 3. Level-1 variables that are referred to at level 2 keep their number from level 1 but add a zero behind. For example, 1 becomes γ10 (note that if 1 is randomly varying, the predictors explaining the random slope will be numbered as 11 , 12 , etc.).