Improving Education through large scale testing? April 5th, 2012 Society for the Advancement of Education Presenter: Salaeya Butt Based on the study “Improving Education through Large Scale Testing: ?” Authors: Abbas Rashid, Ayesha Awan , Irfan Muzzaffar & Salaeya Butt Overview Overview Assessments in Pakistan ◦ Why assess students? ◦ What kinds of assessments are conducted? ◦ What does assessment data tell us about student performance? Key findings of PEC Exams Study ◦ Findings ◦ Recommendations Way Forward ◦ Making decisions on what kinds of assessments are needed ◦ How data can be more effectively used Introduction If quality education is about imparting a measure of agreed upon knowledge and a set of skills to children, How are we to determine that such a process is indeed in place? How much do children in our schools really know? • More than most, it is the teachers who need to know the answers so that they can concentrate their time and attention where it is most needed Why Assess Students? To ensure conformity to minimal standards of education To improve the examination system which also has potential to influence teaching and learning downstream. To enable critical thinking, analytical and reflective skills in students To measure student performance on learning outcomes in order to identify needs as well as design policies and interventions Examinations in Pakistan Pakistan Overview: Examinations Level Province Designed and Administered Framework Grade 5 & Grade 8 Punjab only Punjab Examination Commission (PEC) Items test curriculum based student learning outcomes (SLOs) and different difficulty levels in all subjects Grade 9, Grade 10 (SSC) Grade 12 (HSC) All provinces Boards of Secondary & Intermediate Education (BISE) Several in each province Items gauge textbook based knowledge tested in all subjects Government Supported Assessments Pakistan Pakistan Overview: Assessment Systems Type Level Province Large Scale Grades 3, 4, 5 monthly tests Punjab only Directorate of Staff Development (inservice professional development institution) Curriculum based SLOs in all subjects Sample Based (provincially representative) Grade 4 & Grade 8 Once every two years Punjab & Sindh Provincial Education Assessment Center (PEACE) All provinces National Education Assessment System (NEAS) Curriculum based SLOs in selected subjects (language, math, science, social studies) Sample Based (nationally representative) Designed and Administered Framework Assessment Surveys Pakistan Type Level Province Designed and Administered Framework Sample Based Grades 3 122 rural villages in Punjab World Bank/ LEAPS Curriculum based items (English, Urdu, Maths) Sample Based Grade 1 to 10 National survey (rural and urban) Idara Taleem-oAgahi Curriculum based items What does assessment tell us about performance? The average student in public schools in Punjab is performing below acceptable levels of proficiency (PEAS) In Math 2010 Class-V PEC exam one in every five students scored less than 20. One single digit addition & subtraction Curriculum Standard 2: Children of Grade 1 & 2 should be able to add and subtract up to 3 digit numbers Curriculum Standard 3: In grade 3 to 5, children should be able to multiply and divide upto 6-digit numbers by 2 and 3 digit numbers. (19%) PEC EXAMS STUDY (SAHE) KEY FINDINGS Good Exam Design Exam Design? Reliable assessment tools require compliance with international standards which are • Adequate coverage of content and the taxonomical objectives implicit in SLOs • Technically equivalent: Comparability of scores across different versions of the exam and from year to year • Reliable: Test items should behave the same way with different populations of test-takers • Valid: Test items actually measure what they purport to measure • • Pilot: Items must be piloted to develop empirical data and to conduct subsequent relevant statistical analysis Technical review & Content review PEC Findings Item Developers receive training by PEC but still greater in house capacity is needed Government school teachers have little training in technical aspects of paper setting (content and other types of validity) and marking (reliability). PEC is developing documented criteria for acceptable coverage of SLOs this year. However, there is no item bank. Panel consisting of content experts at the specific grade level, as well as psychometric experts are required to review the item writers work Item writers are required to indicate desired psychometric properties and difficulty levels. But difficulty levels cannot be established by writer’s judgment alone, a pilot is essential. Why Focus on Exam Conduct & Marking? PEC examinations are conducted on a large scale, with thousands of exam centers and over 2.5 million students across the Punjab To ensure quality in results it is important that: ◦ Exams are conducted in an efficient and transparent manner ◦ Exam paper open-ended questions are marked accurately to ensure quality in exam results ◦ Marking and scoring scheme does not fluctuate beyond an agreed percentage point PEC Findings Conduct ◦ The schools designated as exam centres often lack of minimum requisite facilities ◦ Existing students are displaced to make room for exam candidates ◦ Schools experience trouble ensuring adequate number of supervisory staff ◦ Having more than one paper per day Paper Marking ◦ Lack of adequate subject specialization among examiners Provincial government adjusts pass percentage to prevent large number of students from failing Data Interpretation & Dissemination One of the primary purpose of this examination system is to inform the teaching and learning process and improve the delivery of quality education. This is only possible when the results of this exam are widely disseminated and shared in meaningful manner with a variety of stakeholders. PEC Findings Information on student scores widely disseminated include: ◦ Gazette (aggregate score) ◦ Student score card (detailed marks) arrive late (2-3 months later) and in some instances not at all ◦ Website is not fully functional and shuts down Dissemination is more efficient in urban districts and less efficient in smaller rural districts No widely disseminated information on district based and SLO based analysis for teaching Only use is of exam results is for admission into next grade. Recommendation - Develop Proficiency Ranges ◦ Detailed feedback should be provided to all stakeholders about the quality of student performance beyond the pass/fail and grade categories. ◦ The scores will be most useful if they provide information about proficiency of students in relation to the SLOs. Consider developing ranges to provide information about student proficiency levels. Performance Level 1-Advanced 2-Proficient 3-Partially Proficient 4-Not Proficient Scaled Score Range (929-1056) (900-928) (881-899) (773-880) Recommendation - Delineating Role & Ensuring Ownership Decision-making atleast with regard to conduct of exam should be devolved to the district level Developing capacity for item development An examination as carried out by the PEC should then ideally be a validation exercise. ◦ In principle, assessments are best carried out on a continuous basis and by the teacher who, in effect, ensures that learning takes place for it to be meaningfully assessed. Given the very large number of exam centers, district and local-level decision-making in this regard should be encouraged Frequency and regularity of testing from the point of view of the expenditures as well as the burden these tests impose should be kept in mind Recommendation - Communication Strategy Examples Communication Strategy Examples Province & District ◦ Develop and disseminate overall report of findings for province and district performance. ◦ Hold dialogue with relevant district officials on uses of data Schools & Teachers ◦ Report on student performance findings and their relevance for teaching according to key SLOs Teacher Education Institutions ◦ Feedback relevant to teacher training based on SLOs ◦ Session on current year’s exam data in annual training Curriculum & Textbook Authorities ◦ Feedback on performance according to relevant SLOs IMPLICATIONS Adequately gauging learning Achievements? There are technical challenges related to determining whether the assessment actually measures the learning outcome ◦ Technically sound tests are required Exam boards in Pakistan spend a disproportionate amount of money on invigilation to control malpractice and a fraction of that on improving the quality of examinations or their management. ◦ Even if the grades are the same, it is difficult to establish that they actually reflect similar standards of student achievement Consequently, there is an underlying decline in standards of student learning and competence. Appropriate apparatus for Policy reform? A process of deliberation to ensure that the diverse assessment related streams are meaningfully brought together to inform policy is needed to address the following questions: ◦ How can achievement data be used to develop policies and comparison across different groups of students with varied characteristics? ◦ What combination of formative assessment tests and summative exams are actually required at different levels of schooling ? ◦ What kind of assessment is needed at different levels of school education? At the provincial level a coordinating arrangement or forum is needed to ensure that assessment needs have been met Issues to be Conclusion addressed? When assessment does not just drive instruction for learning, the important objective of the education system is ceased. Hence, examinations have multiple audiences and multiple purposes, but they must essentially be seen as serving the curriculum and instruction and not driving it.