PRELIMINARY RESEARCH ON STUDENT PERFORMANCE ON VIRGINIA’S SOL TESTS Patricia Campbell University of Maryland Model Building The primary research question for these preliminary analyses was whether or not math coaches increase student math achievement scores. We use data on roughly 17,000 student test scores in 33 schools over two school years. To fit the nested nature of the data (students nested within classrooms, nested within schools) we used Hierarchical Linear Modeling to analyze school level coaching effects on students’ SOL test scores.1 For these preliminary models we separately analyzed third, fourth, and fifth grade students’ scores across two school years (20052006 through 2006-2007). For each grade level we ran parallel models with six dependent variables. The primary dependent variable was the overall Standards of Learning Mathematics (SOL) scale score on Virginia’s statewide standardized assessment as required by No Child Left Behind. We also examined the five component subscale scores in that assessment: Numbers and Number Sense, Computation and Estimation, Measurement and Geometry, Probability and Statistics, and Patterns, Functions and Algebra. Before running the complete models we ran three fully unconditional models (or two level models that partition the variance of the primary independent variable ,the overall scale score, with no other variables added) to determine the interclass correlation coefficient (ICC). The ICC is the amount of variance in student scores that can be attributed to school rather than individual differences. Put simply, the ICC indicates how much of students scores are due to the schools they attend. The ICC indicates that 12.4% of the variance in the third-grade scores was do to schools, 11.2% in fourth grade, and 9.0% in fifth grade. (See Table 1.) The ICC tells us that Table 1. Inter-Class Correlations by Grade ICC Percent Variability Due to School Differences Third Grade Overall Scale 0.124 12.4% Fourth Grade Overall Scale 0.112 11.2% 1 In the future we will use a three level model and examine students, within teachers within schools. However at this time the teacher level data is still being cleaned so we can only examine the effects coaches have at the school level. 1 Fifth Grade Overall Scale 0.090 9.0% on average, roughly 11% of the difference in student scores on the SOL in mathematics within our sample is due to the schools they attended. We created a parallel model so the results of each test would be comparable within each grade level. Six independent variables were included in the model. Our primary independent variable was a school level variable indicating whether this was a school that had the services of a mathematics coach. Eleven of the 33 schools in the current sample2 were randomly assigned a coach and therefore identified as treatment schools, meaning a coach served the school the year of the test. All of the other variables are used as controls to allow us to more clearly determine the effects of coaches holding all our control variables constant. Two school-level independent variables (level two variables in a two level model) were used to control for the school’s impact on student scores beyond the effect of math coaches. A binary variable, indicative of high poverty schools, indicated that the school had a majority of its students on free or reduced priced meals (≥ 70%). A second binary variable indicated whether or not the school had a relatively high proportion of students with special education needs (≥ 20%). Both of these variables are rather crude proxies for poverty and special education needs, but are important because they “clean up” the statistical model and allow us to more accurately determine the effects of coaches. Three student characteristics (level one variables in a two level model) were used to control for basic differences between students. These three variables were binary controls for gender, poverty, and special education status. A measure for race/ethnicity is conspicuously absent in our model. Data on students’ race and ethnicity is not yet available and was not included in this preliminary model. Applying such a measure in the future will make our models more accurate. Nonetheless, these preliminary models are sufficient to show some significant results. One of the student level variables, special education status, had different effects in different schools (a random slope). Put simply, the effect of being a special education student was not the same in all the schools in our sample. We modeled this variation with an additional 2 In the future we will have one more school in this treatment group and two more schools in the control group, but the data for the one treatment school is currently unavailable thus necessitating removal of its control pair. 2 student level binary variable to identify whether the school the special education student attended had a relatively high proportion of students with special education needs. This school characteristic was consistently significant (positive impact on special education students) and helps the model more accurately capture the effects of coaches, but as a control variable in a preliminary analysis it does not warrant particular attention. Findings For each grade the results are presented in a table with models presented in the far left column. These tables (Tables 2, 3, and 4) present the coefficients (providing a metric of raw point gains on the SOL assessment) and p-values for each model using robust standard errors. Normal standard errors generally generate a more conservative test for significance while robust standard errors produce more sensitive tests. While these models were also assessed in a parallel analysis for each grade using normal standard errors, those results are not presented here. Using robust standard errors, coaching had a statistically significant effect on our primary dependent variable, overall SOL Math scale score, in the fourth and fifth grade only. In this model the coefficient for having a coach in a school, holding all else constant, was 24.74 points (p=0.01) in the fourth grade, or a little over a quarter of a standard deviation. In the fifth grade, the impact of a coach, holding all else constant in the model, was 18.81 points (p = .04), slightly more than 0.2 standard deviations. In the third grade, the effects were slightly smaller and not significant at the 0.05 level. Note that the coaching effects in all three grades have fairly substantial coefficients that are all in the same direction. While the lack of significance keeps us from making inferences based on the third-grade model, the results of these relatively crude preliminary models are similar enough to suggest that more accurate future models are warranted. Further analysis will allow us to capture and model any coaching effects more clearly. See Tables 2, 3, and 4 below. The subscale analyses reflect the findings of the overall scale models. While the thirdgrade model had no significant effects on any of the dependent variables, all five of the fourthgrade subscales and two of the fifth-grade subscales had significant positive effects with magnitudes comparable to the overall scale effects (roughly a fifth to a quarter of a standard deviation) using robust standard errors. A noticeable pattern is that the Number and Number 3 Sense Subscale yielded lower coefficients and higher p-values than most of the other dependent subscale variables in the fourth- and fifth-grade models. In addition to the primary independent variables, the controls included in the models are relatively stable across all models. Such stability provides evidence that the findings are not by chance but reflects patterns of influence on test scores. Those school characteristics that were used as control variables (level 2 variables) had few effects. The effects of attending a high poverty school or a school with a high proportion of students with special education needs were not significant on any dependent variable in any grade. The student (level one) control variables were significant in most models. Gender had a significant but substantively negligible negative effect in the fifth grade and a more pronounced influenced in fourth grade. The individual effects of poverty and special education status had expected and consistent negative effects on student achievement scores. Generally, the effect of poverty was slightly less than half a standard deviation decrease, and the effect of special education status was slightly more than half a standard deviation decrease in scale scores. Discussion. We believe these preliminary results offer convincing evidence that coaches do have an effect on the schools in which they work. Three factors suggest there is much more to be learned about how coaches effect student achievement. First, it is important to note that these are crude preliminary models. All the predictors in the model are simple binary dummy control variables and historically powerful predictors such as race have yet to be included. If these crude preliminary models reveal significant findings, it is likely that more well-developed models will have more to say about how coaching effects schools. Second, these crude preliminary models show consistent, substantial and reliable results. While one statistically significant positive coaching effect would be suggestive, the multiple significant results across both subscales and grades constitute substantial evidence. The fact that all the coefficients are in the same positive direction and have similar magnitudes more strongly reinforces this conclusion. Additionally, by comparing the effects of coaches to the effects of typically powerful predictors such as student social-economic status and special education needs, it is clear that coaches can have a substantial effect on student scores. Our preliminary results show coaches have an effect that is approximately three-fourths (Grade 4) to half (Grade 5) the size of the effect of their socio-economic status on mathematics achievement and half (Grade 4) 4 to one-fourth (Grade 5) the size of the effect of their special education status on mathematics achievement. Third, these data were also examined with identical models using normal standard errors, rather than robust standard errors. The similarity of the p-values rendered using normal and robust standard errors is further evidence of the stability of these models. If the data were less reliable the differences in the p-values rendered by the two types of standard errors would be greater. Finally, it is important to note that these school effects are averaged across all teachers in the schools. It is reasonable to believe that the effect of having a coach would be different for different teachers. When data on coaches interactions with teachers (PDA data) and the data linking students to teachers, permitting analysis of the nesting of students with teachers, become available it is likely that the effects of coaches in specific classrooms and with specific teachers will be more pronounced that it is here. 5 Table 2. Coaches Effects on Third Grade SOL Math tests and Subscales (using Robust Standard Errors) Number & Overall Scale Number Computation Measurement Probability & Dependent Variable Score Sense & Estimation & Geometry Statistics 487.08 0.00 39.12 0.00 38.73 0.00 40.00 0.00 40.32 0.00 Intercept High Poverty School -4.29 0.77 -0.68 0.65 -0.76 0.62 -0.42 0.73 -0.34 0.81 High Proportion of Sp. Ed. Students 10.21 0.62 0.84 0.67 1.97 0.41 0.66 0.71 1.24 0.54 Coach in school 11.68 0.22 0.97 0.25 1.00 0.28 1.28 0.18 1.03 0.29 0.06 0.01 Female -0.35 0.40 -0.09 0.13 -0.03 0.60 -0.06 0.18 -32.82 0.00 -3.09 0.00 -3.14 0.00 -3.33 0.00 -3.51 0.00 Low SES Special Education -45.44 0.00 -4.67 0.00 -3.74 0.00 High Proportion of 30.06 0.05 3.54 0.03 Sp. Ed. Students 2.06 0.13 Bold black coefficients and P-values =p<0.05. Black text =p.<0.10. Grey text p>0.10. Patterns, Functions & Algebra 40.86 0.00 -1.64 0.24 1.00 0.66 0.00 -2.41 0.61 0.46 0.99 0.00 -5.13 0.00 -5.51 0.00 -5.12 0.00 3.41 0.04 3.44 0.04 2.14 0.12 6 Table 3. Coaches Effects on Fourth Grade SOL Math tests and Subscales (using Robust Standard Errors) Number & Overall Scale Number Computation Measurement Probability & Dependent Variable Score Sense & Estimation & Geometry Statistics 459.82 0.00 36.80 0.00 35.56 0.00 35.56 0.00 34.95 0.00 Intercept High Poverty School 10.83 0.29 0.92 0.48 1.66 0.10 1.10 0.37 0.87 0.26 High Proportion of Sp. Ed. Students 8.75 0.45 -0.22 0.87 0.78 0.54 1.19 0.34 1.62 0.08 24.74 0.01 1.74 0.05 2.47 0.02 2.35 0.01 1.53 0.05 Coach in school -6.77 0.00 -1.40 0.00 -0.70 0.00 -0.94 0.00 Female 0.10 0.57 -33.78 0.00 -3.35 0.00 -2.63 0.00 -2.92 0.00 -3.55 0.00 Low SES Special Education -50.69 0.00 -5.83 0.00 -3.42 Patterns, Functions & Algebra 36.09 0.00 0.02 0.98 0.14 1.93 -0.30 -3.31 0.88 0.01 0.27 0.00 0.00 -4.76 0.00 -4.76 0.00 -4.76 0.00 High Proportion of 3.47 0.03 Sp. Ed. Students 28.40 0.08 1.49 0.20 Bold black coefficients and P-values =p<0.05. Black text =p.<0.10. Grey text p>0.10. 3.58 0.04 2.20 0.24 1.90 0.25 7 Table 4. Coaches Effects on Fifth Grade SOL Math tests and Subscales (using Robust Standard Errors) Number & Overall Scale Number Computation Measurement Probability & Dependent Variable Score Sense & Estimation & Geometry Statistics 490.32 0.00 39.73 0.00 38.53 0.00 37.64 0.00 38.88 0.00 Intercept High Poverty School 13.64 0.23 0.66 0.67 2.00 0.05 1.43 0.12 0.81 0.38 High Proportion of 2.11 0.04 Sp. Ed. Students 22.65 0.13 2.80 0.08 0.91 0.57 1.82 0.15 18.81 0.04 1.82 0.02 1.69 0.02 Coach in school 1.50 0.14 1.67 0.08 0.41 0.02 Female 0.01 0.92 -0.03 0.27 0.05 0.26 0.07 0.05 -35.11 0.00 -3.08 0.00 -2.83 0.00 -2.89 0.00 -3.00 0.00 Low SES Special -65.76 0.00 -6.72 0.00 -4.39 0.00 -5.73 0.00 -7.19 0.00 Education High Proportion of 57.34 0.00 4.93 0.00 5.78 0.00 3.26 0.00 5.32 0.00 Sp. Ed. Students Patterns, Functions & Algebra 38.62 0.00 1.36 0.08 1.58 1.48 0.05 -3.02 0.18 0.07 0.02 0.00 -6.27 0.00 4.79 0.00 Bold black coefficients and P-values =p<0.05. Black text =p.<0.10. Grey text p>0.10. 8