Practical 5: Introduction to Multilevel Models and MLwiN In this short practical you will have your first experience of using MLwiN. You will fit the variance components model discussed in the lecture to an educational dataset and look at certain features of this model. We will firstly give a little background on the dataset The tutorial data set The data in the worksheet we use have been selected from a very much larger data set of examination results from six inner London Education Authorities (school boards). A key aim of the original analysis was to establish whether some schools were more ‘effective’ than others in promoting students’ learning and development, taking account of variations in the characteristics of students when they started Secondary school. Below are a list of all the variables in the dataset. We will here however only focus on fitting models to the normexam variable and consider the standlrt variable as a predictor. The data set contains the following variables: School Student Normexam Cons Standlrt girl Schgend Avslrt Schav Vrband Numeric school identifier Numeric student identifier Student’s exam score at age 16, normalised to have approximately a standard Normal distribution. (Note that the normalisation was carried out on a larger sample, so the mean in this sample is not exactly equal to 0 and the variance is not exactly equal to 1.) A column of one’s. If included as an explanatory variable in a regression model, its coefficient is the intercept. See Chapter 7. Student’s score at age 11 on the London Reading Test, standardised using Z-scores. 1=girl, 0=boy School’s gender (1=mixed school, 2=boys’ school, 3=girls’ school) Average LRT score in school Average LRT score in school, coded into 3 categories (1=bottom 25%, 2=middle 50%, 3=top 25%) Student’s score in test of verbal reasoning at age 11, coded into 3 categories (1=top 25%, 2=middle 50%, 3=bottom 25%) Opening the worksheet and looking at the data When you start MLwiN the main window appears. Immediately below the MLwiN title bar are the menu bar and below it the tool bar as shown: P5-1 Below the tool bar is a blank workspace into which you will open windows. These windows form the rest of the ‘graphical user interface’ which you use to specify tasks in MLwiN. Below the workspace is the status bar, which monitors the progress of the iterative estimation procedure. Open the tutorial worksheet as follows: Select Open worksheet from the File menu. Select tutorial.ws from the list of files. Click on the Open button. When this operation is complete the filename will appear in the title bar of the main window and the status bar will be initialised. The Names window will also appear, giving a summary of the data in the worksheet: The MLwiN worksheet holds the data and other information in a series of columns, as on a spreadsheet. These are initially named c1, c2, …, but we recommend that they be given meaningful names to show what their contents relate to. This has already been done in the tutorial worksheet that you have loaded. Each line in the body of the Names window summarises a column of data. In the present case only the first 10 of the 400 columns of the worksheet contain data. Each column contains 4059 values, one for each student represented in the data set. There are no missing values, and the minimum and maximum value in each column are shown. Note P5-2 the Help button on the Windows tool bar. The remaining items on the tool bar of this window are for attaching a name to a column. We shall use these later. You can view individual values in the data using the Data window as follows: Select View or edit data from the Data Manipulation menu. When this window is first opened it always shows the first three columns in the worksheet. The exact number of values shown depends on the space available on your screen. You can view any selection of columns, spreadsheet fashion, as follows: Click the View button Select columns to view Click OK You can select a block of adjacent columns either by pointing and dragging or by selecting the column at one end of the block and holding down ‘Shift’ while you select the column at the other end. You can add to an existing selection by holding down ‘Ctrl’ while you select new columns or blocks. Use the scroll bars of the Data window to move horizontally and vertically through the data, and move or resize the window if you wish. You can go straight to line 1035, for example, by typing 1035 in the goto line box, and you can highlight a particular cell by pointing and clicking. This provides a means to edit data: see the Help system for more details. Having viewed your data you will typically wish to tabulate and plot selected variables, and derive other summary statistics, before proceeding to actual modelling. Tabulation and other basic statistical operations are available on the Basic Statistics menu. These operations are described in the Help system. P5-3 We will here consider a plot of our response normexam variable against the standlrt predictor variable. Select Customised graph from the Graphs menu In the dropdown list labelled y select normexam In the drop down list labelled x select standlrt Click on the Apply button The following graph will appear: The plot shows, as might be expected, a positive correlation where pupils with higher intake scores tend to have higher outcome scores. This suggests a regression relationship between normexam and standlrt and we can fit a linear regression model in MLwiN but we will here go straight into fitting a random intercept model. Setting up a random intercepts model in the Equations window In MLwiN we set up models in the Equations window. Select Equations from the Model menu. The Equations window will then appear: P5-4 We now have to tell the program the structure of our model and which columns hold the data for our response and predictor variables. We will firstly define our response (y) variable to do this: Click on y (either of the y symbols shown will do). In the y list, select ‘normexam’. We will next set up the structure of the model. The model is set up as follows In the N levels list, select ‘2-ij’. In the level 2 (j) list, select ‘school’. In the level 1 (i) list, select ‘student’. Click on the Done button. In the Equations window the red y has changed to a black yij to indicate that the response and the first and second level indicators have been defined. We now need to set up the predictors for the linear regression model. Click on the red x0. In the drop-down list, select ‘cons’. Note that ‘cons’ is a column containing the value 1 for every student and will hence be used for the intercept term. The fixed parameter tick box is checked by default and so we have added to our model a fixed intercept term. We also need to set up residuals and school effects so that the two sides of the equation balance. To do this: Check the box labelled ‘i(student)’. Check the box labelled ‘j(school)’. Click on the Done button. P5-5 We have now set up our intercept and residual terms but to produce the random intercepts regression model we also need to include the fixed slope (‘standlrt’) term. To do this we need to add a term to our model as follows: Click the Add Term button on the tool bar. Select ‘standlrt’ from the variable list. Click on the Done button. Click on the Estimates button on the tool bar. Note that this adds a fixed effect only for the ‘standlrt’ variable. The equations window should now look as follows: We now need to run this model using the IGLS algorithm and so to do this and see the estimates we do the following Click on the Start button. Click on the Estimates button on the tool bar. Having run the model the estimates will turn from blue to green to denote convergence of the IGLS method. The Equations window should now look as follows: P5-6 Here we see from the coefficient β1 that there is a strong positive effect of prior attainment (as measured by the London reading test) on the exam performance at age 16. We also see that the unexplained variance has been split into two parts, the between schools variance σ2u and the residual variance σ2e. It is often of interest to find how important the schools are in explaining the variation and this is captured by a statistic called the Variance Partition Coefficient (VPC) which measures the percentage of variation explained by a classification. Here the VPC = 0.092/(0.566+0.092)= 0.140, in other words after accounting for prior attainment schools explain 14% of the remaining variation. Note that you can calculate the VPC in MLwiN yourself by doing the following: Select Command Interface window from the Data Manipulation menu. Type calc 0.092/(0.566+0.092) in the Command Interface window Predictions We will probably also be interested in seeing a picture of the fitted school lines for our random intercepts model. This can be done via the predictions and graph windows. Select the Predictions window from the Model menu. At the bottom of the window click on variable and from the options that appear Select Include all explanatory variables. Click on the e0ij to remove level 1 residuals from the prediction. The Predictions window look follows: Select column c11 from should the dropnow down listasnext to output from prediction to P5-7 If you click on the Calc button then the fitted predictions will be stored in the column c11. We now wish to see these displayed graphically so we need to do the following: Select Customised graph(s) from the Graphs menu. Select D1 from the drop down list in the top left of this window. Select ‘c11’ from the y drop down list. Select ‘standlrt’ from the x drop down list. Select ‘school’ from the group drop down list. Select ‘line’ from the plot type drop down list. The window should be as follows: If we now click on the Apply button we will see the 65 parallel school lines that we have fitted. P5-8 What you should have learnt from this exercise How to set up variance components models in MLwiN How to calculate the VPC for a multilevel model How to graph predictions in MLwiN. P5-9