The Appropriate Use of NAPLAN Data National Symposium, 23 July, 2010 Margaret Wu University of Melbourne m.wu@unimelb.edu.au 1 NAPLAN Tests Conducted once a year About 40 test questions per subject area Test scores are used to infer ◦ the achievement levels of students How reliable can NAPLAN test scores reflect ◦ Student achievement level? ◦ School performance? 2 Margin of error in measuring student performance David - a Grade 5 student in 2008. Reading score was 25 out of 40. David’s reading test scores could vary between 20 and 30, out of 40. ◦ if similar tests are administered (e.g., 2009, 2010 tests ) One test collects only a small sample of performance. Variation in scores is called Measurement Error. 3 How big an error size is acceptable? The answer is ◦ It depends. An example ◦ ◦ ◦ ◦ Effectiveness of a weight loss program Expect a loss of 0.5 kg after one week. Measurement scale is accurate to 1kg. Not good enough for measuring individual change ◦ OK for a group change, if group size is ‘large’. 4 On the NAPLAN scale… NAPLAN 2008 reading scores 800 700 600 2.5%tile 500 mean 97.5%tile 400 300 200 grade 3 grade 5 grade 7 grade 9 5 On the NAPLAN scale… NAPLAN 2008 reading scores 800 700 600 2.5%tile 500 mean 97.5%tile 400 300 200 grade 3 grade 5 grade 7 grade 9 6 Measuring Growth NAPLAN 2008 reading scores Expected growth is 50 points 800 Growth measure? 700 600 2.5%tile 500 mean 97.5%tile 400 Margin of error of growth measure ± 76 points 300 200 grade 3 grade 5 grade 7 grade 9 7 Class mean scores Average score for a class ◦ Effect of measurement error reduces New source of error ◦ Sampling error Cohort of students changes from year to year Variation in class mean score because of the sample of students in a class Class mean ± 20 points ◦ (1 year’s growth) 8 Teacher effect A high performing teacher can raise student standards by one more year of growth as compared to a low performing teacher. NAPLAN 2008 reading scores 800 700 600 excellent teacher 2.5%tile 500 average teacher mean 50 points 97.5%tile 400 poor teacher 300 Margin of error of teacher effect based on two testing 200 occasions: ± 320grade points grade 5 grade 7 grade 9 9 MySchool Website It is a league table ◦ It compares and ranks schools It is the worst kind of league table ◦ Because it is claimed that the red bars reflect “underperforming schools” ◦ Simple league tables do not have this claim. 10 Summary - 1 NAPLAN results are NOT suitable for measuring Student achievement level ◦ beyond a rough “lower”, “average”, “higher” groups Student progress Teacher effect School performance 11 Summary - 2 NAPLAN results are for the systems, e.g. ◦ Compare girls and boys ◦ Compare rural and urban ◦ Trends, if equating design is improved NAPLAN results should NEVER be published. Parents/caregivers should not be encouraged to use the results to judge schools. 12 Finally… Conflicting advice from different experts? An easy way to check out: Ask proponents of MySchool website to publicly name one underperforming school. 13 References Wu, M.L. (2010). Measurement, sampling and equating errors in large-scale assessments. Educational Measurement: Issues and Practice, (In press:Volume 29 Number 4). Nye, B., Konstantopoulos, S., & Hedges, L. (2004). How Large Are Teacher Effects? Educational Evaluation and Policy Analysis, Vol. 26, No. 3 (Autumn, 2004), pp. 237-257 . 14 Leigh, A. (2009). Estimating teacher effectiveness from two-year changes in students’ test scores. Economics of Education Review. Byrne, Coventry, Olson, Wadsworth, Samuelsson, Petrill, Willcutt and Corley. (2009). Teacher Effects in Early Literacy Development: Evidence From a Study of Twins. Journal of Educational Psychology, 2009. 15