Multilevel Modeling: Why, When and How? Frank Dong 1-9-2013 Outline • Why do we need the Multilevel Modeling • When do we need Multilevel Modeling • How can we conduct Multilevel Modeling analysis (live demo) Background • Everyone knows about ordinary least squares regression, aka, linear regression • The formula is π¦ = α + π₯β + ε • We typically assume the error term ε has a normal distribution N(0, σ2π ) • Everyone knows how to do it in SPSS Problems • Ordinary least squares analysis does not solve everything • There are often times where data present certain hierarchy • For example, the performance of students on the test score may depends on the students themselves, but also may depends on schools • School effects are often ignored Purpose of this presentation • To introduce the idea of multilevel modeling • Not everything can be done with the linear regression • Live demonstration of how to conduct multilevel analysis in SPSS. An example • This example is from a book called Multilevel Statistical Models, 4th Edition by Harvey Goldstein • Have data on 728 elementary students • N=50 schools • Interested in the following question: Does the student’s 8-year math score predict the 11-year math score? • Y= 11-year math score • X=8-year math score Some data points 11-year Math Score 8-year Math Gender: Boy=1 Social class: Manual=1 Score School ID Girl=0 Non-manual=0 39 36 1 1 0 11 19 1 0 1 32 31 1 0 1 27 23 1 0 0 36 39 1 0 0 Inappropriate Analysis • For each school, π¦π = πΌ + π½π₯π + ε • The overall model becomes π¦ππ = πΌπ + π½π π₯ππ + πππ • We have 50 pairs of πΌπ , π½π to estimate, one for each school • We also have a variance term, πΏ 2 to estimate Issues • Too many unknown (N=2*50+1) parameters • Unable to compare school performance if we desires to do so • Some schools have fewer students than other schools Solutions • Multilevel Modeling • Instead of estimating N=2*50+1 unknown parameters, we will simplify the model • π¦ππ = πΌπ + π½π π₯ππ + πππ -----Original model • More importantly, πΌπ and π½π are also treated as random variable • They are assumed to have a normal distribution with certain M and SD Final Solution • The final model becomes • π¦ππ = πΌ0 + π½0 π₯ππ + π0π + π’1π π₯ππ + π0ππ • The unknown parameters are πΌ0 , π½0 , variance of π0π , π’1π , and π0ππ , and covariance between π0π πππ π’1π • We reduced the number of parameters from 101 to 6 Results Parameter Fixed Intercept 8-year Math Score Random Effect Between School Variance Between Students Variance Variance Partition Coefficient Multilevel Modeling Estimate (s.e.) 13.9 0.65 (0.025) OLS Estimate (s.e.) 13.8 0.65 (0.026) 3.28 19.8 0.14 23.34 Research Question 2 • We also have the gender (1=boy, 2=girl), and social class (1=manual, 0=non-manual), would those two variables affect the performance of the 11-year math grade? • Is gender significant? • Is social class significant? Parameters Multilevel Modeling Estimate (s.e.) OLS Modeling Estimate (s.e.) 14.88 0.638 (0.025) -0.357 (0.340) -0.720 (0.387) 14.79 0.638 (0.026) -0.363 (0.358) -0.697 (0.397) Fixed Effects Intercept 8-year Math Score Gender (boy vs girl) Social Class (manual vs non-manual) Random Effect Between School Variance 3.312 Between Students Variance 19.728 Variance Partition Coefficient 0.144 49.36 How to conduct a Multilevel Modeling • You do not need to do it by yourself • You are required to be aware of the existence of multilevel modeling • The benefit is to improve the estimate accuracy • Here is how to do it in SPSS (live demo) Summary • Ordinary least squares regression is not almighty • When there is a clear structure of hierarchy, multilevel modeling will be useful • Multilevel modeling can also be used to compare the performance of hospitals