RAMA State of Israel The National Authority for Measurement & Evaluation in Education Ministry of Education Assessment in the Service of Learning: Theory and Practice Professor Michal Beller Director-General RAMA March 2013 Bet Avgad, 5 Jabotinsky Road, Ramat Gan, 2nd Floor, 5252006, ISRAEL Tel: +972-3-5205555 ▪ Fax: +972-3-5205509 ▪ e-mail: rama@education.gov.il ▪ http://rama.education.gov.il Contents Introduction ............................................................................................................................ 3 Large-Scale Tests in Educational Systems and their Frequency ....................................... 4 Updating the Measurement and Evaluation Format: Integrating External and Internal Assessment ................................................................................................................ 7 School-Based External Assessment ....................................................................................... 8 The Meitzav ...................................................................................................................................... 8 What Can be Learned from the Meitzav? ........................................................................................ 11 Trends over Time and Comparison of Assessment Scores ............................................................. 12 Sample-Based National Assessment .................................................................................... 14 International Studies........................................................................................................................ 14 National, Sample-Based Assessment – Mashov Artzi ..................................................................... 16 National Sample-Based Monitoring of School Violence Level ...................................................... 17 School-Based Internal Assessment ...................................................................................... 19 Internal Meitzav ............................................................................................................................... 19 Off-the-Shelf Tests, Formative Tests and Banks of Performance Tasks ........................................ 21 The School-Based Assessment Coordinator ....................................................................... 24 Evaluation of Teaching Staff and Investigation of Teaching Practice ............................ 25 Evaluation of teaching staff............................................................................................................. 25 International Teacher Survey - TALIS ............................................................................................ 28 The Use of Test and Survey Data for Research and for Program and Project Evaluation ............................................................................................................................. 28 Assessment in the Service of Learning – Summary........................................................... 34 Bibliography .......................................................................................................................... 36 2 Assessment in the Service of Learning Professor Michal Beller, Director-General RAMA Introduction A review of the goals of education ministers in Israel over the years reveals three metagoals of the education system: realization of the potential of every student (scholastic, creative and development of values), narrowing education gaps and maintaining a safe learning environment. Assuming wide-spread agreement with regard to these goals, the question arises as to how we know whether these goals have been achieved. How are parents to know whether the education system has provided their children with the tools necessary to successfully function as active citizens in society? How are each of the partners in the educational process (teachers, principals, and position holders at various levels of the education system) to know whether they have satisfactorily fulfilled their role and whether the needs of students from different backgrounds have been appropriately addressed? How are we to detect educational gaps and how are we to know whether they have been narrowed? How are the public to know that the future generation of Israeli children has been adequately prepared to face the challenges of the 21st Century? How can the public be sure that the extensive resources made available to the education system are being used judiciously and have the planned effect? How can the benefit of increasing the State’s investment in education, even at the expense of other important needs, be proven to policymakers? To examine the extent to which these goals have been achieved, professionally-designed, valid measurement and evaluation tools are needed. Measurement and assessment are complex issues in all sectors – in the business sector, and even more so in the public sector. In the education sphere, the implementation of measurement and assessment is that much more difficult: learning and teaching processes are inordinately complex, diversity among students is immense, there are different pedagogical approaches to achieving educational goals, many programs are implemented in the education system and more often than not the outcomes and results are realized only after years of investment. 3 In education there is no single, "one size fits all" answer suited for all needs, nor is there a single formula for implementing measurement and assessment processes. Different pedagogical approaches and programs require, accordingly, different measurement and assessment models. Therefore, the optimal education process must be accompanied by measurement and assessment processes whose results guide educators and assist them in deciding what is more and what is less suited for their students, what they should change and improve and what is better maintained as is. The National Authority for Measurement and Evaluation in Education (known by its Hebrew acronym, RAMA) was founded in 2005 to address the need for professional measurement, evaluation and assessment in the education system. The ideology underlying RAMA’s activities rests on two principles: (a) assessment in the service of learning, and (b) provision of professional solutions that effectively integrate different measurement and evaluation components (for additional information about RAMA, see http://rama.education.gov.il). Large-Scale Tests in Educational Systems and their Frequency Over the last decades national tests have been administered to large numbers of students in educational institutions in various countries throughout the world, including Israel. The significance and importance of these tests is growing, leaving its mark on all parties in the learning and educational process. Large-scale assessments and professional surveys are vital instruments for monitoring and tracking student achievements as well as the extent to which the education system has been successful in imparting knowledge and values to all learners within the system. Through objective and professional analysis of assessment tests it is possible to identify gaps which need to be rectified and to highlight areas that may have been overlooked and in which even greater resources should be invested. Assessment tests may also spur learning, foster responsibility and accountability on the part of those in charge of teaching, and enhance the congruence between teaching and the Education Ministry policy as reflected in its curricula and in frameworks for teacher training and professional development. 4 Alongside the many benefits inherent in the use of these test systems, it has been acknowledged that over time they may be accompanied by negative effects on the education system and on the quality of pedagogical processes in schools. These negative effects intensify as the tests become more central and important in the eyes of those at all levels of the system, and particularly as they are perceived as "high-stakes"1 in the eyes of principals, teachers and students. Professor Don Campbell (Campbell, 1979), one of the greatest scholars in the social sciences, wrote about this tendency and its ramifications: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor… achievement tests may well be valuable indicators of general school achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.” The negative effects have been documented for quite some time in the research literature (for example, Campbell, 1979; Hamilton, 2008; Koretz, 2005; Koretz & Hamilton, 2006; Nichols & Berliner, 2007) and reported by RAMA in other publications. Among the adverse impact of improper implementation of wide-scale tests are: Diverting teaching resources from subjects that are not included in the national assessments in favor of subjects included. Focusing on test preparation, through intensive test-oriented study. This type of study is often based on memorization and repetition involving fewer higher-order thinking skills critical for comprehension and long-term mastery of the study material and for generalization to additional fields of knowledge. Furthermore, this type of study may bore students and erode their joy of learning, curiosity and motivation. 1 A test system is “high stakes” when any of the entities in the system feels threatened by the results and perceives himself or herself as someone who may be hurt if results are low or benefit when results are high. Risk abundance increases when school supervision is perceived as threatening schools prior to the tests, and when it tends to use test results in order to reprimand or impose some kind of sanctions. 5 In extreme cases, as a result of the pressure felt by some schools and their desire to raise their achievements at any cost, some may resort to illegitimate actions that harm test integrity (for example keeping weak students away from school on the test day, attempting to obtain in advance information related to test topics and questions in order to teach them in class, helping students during the test, etc.). Even worse, these actions send an undesirable educational message to students. Besides damage to the quality of the pedagogical processes, tests perceived as "high stake" may also impair test result validity and the ability to draw conclusions that will serve the system and promote its improvement. Thus, improved test results achieved through intensive preparation in sample schools tested in a given year, does not necessarily indicate improvement in the education system as a whole as it does not represent an increase in the knowledge level of all students. This improved achievement, even if it appears to enhance the public image of the education system or parts of it, is in many ways only cosmetic and worthless to policymakers and decision makers who strive to create real and sustainable change in the system as a whole over time. Only test results collected under “true conditions”, and without special preparation, can testify to the state of the system. This is the only way decision makers at different hierarchy levels can become cognizant of strengths and weaknesses and take action to improve the system. The education system, in cooperation with RAMA, must act to minimize these negative phenomena, primarily through a cultural change which upholds “measurement in the service of learning”, whereby measurement and assessment are intended to serve learning and not vice versa. For the system to improve it must ensure that its measurement and assessment tools provide valid data as much as possible, i.e., its results correctly and accurately reflect the condition of the system and do not stem from unique and targeted efforts designed solely to raise test grades. Steps should be taken to eliminate negative phenomena by sending the correct message to the field and by reducing the pressure and the threat that external test results serve as the sole evidence as to the quality of school pedagogical processes. 6 Updating the Measurement and Evaluation Format: Integrating External and Internal Assessment The perpetual dilemma facing the education system is the choice between independent, internal measurement carried out by the school (partially free of the pressures described above, and more suited to students and the material studied in each educational institution) and external measurement which is standardized, professional and centralized. In other words, there is a constant tension between decentralized and centralized measurement and assessment. Some maintain that independent internal assessment is less intrusive and empowers school principals and the teaching staff compared to external assessment. However, considerations of responsibility, accountability, transparency, professionalism, viability, and mainly the ability to make valid comparisons between schools (or between sectors, countries or other groups), including multi-year comparisons, require that part of the assessment be centralized, external and carried out by a professional entity responsible for educational measurement and assessment. In order to integrate the two approaches, which separately cannot address all needs, and enjoying the benefits of both, the format of Israel’s national assessment was updated in 2007. The new format was designed by RAMA in collaboration with various entities in the Ministry of Education and in consultation with school principals and many teachers. The format is intended to provide a professional solution for educational measurement and assessment to all stakeholders in the education system: schools and various external entities. This new format was designed to improve the then existing assessment system from which it was derived and to address its shortcomings, and was based on the following principles: Implementation of a culture of “assessment in the service of learning” in which measurement is intended to support continued learning improvement through congruency with learning goals and school vision, and which is based on the understanding that tests are not a goal in and of itself, but rather a tool in the service of learning. 7 Informed integration of internal and external assessment, and between formative (assessment in the process of and for the purpose of learning) and summative assessment (assessment of learning products). Maximum decentralization of assessment while ensuring use of professional tools provided to schools by RAMA. Empowerment of school principals and teachers. Reduced pressure and frequency of external tests. Preferred use of external tests to a sample of students (representative) over external tests encompassing all students. The new format established by RAMA combines three elements: sample external assessment (international and national), independent school-based assessment using standardized and external tools and internal school-based assessment. In order to maintain a balance between the various elements, significant reinforcement of internal school-based assessment is required, and to this end more extensive teacher training in this field is needed. The internal Meitzav (detailed below), established concurrently with reduced frequency of external Meitzav tests, is an important element of the new format. It serves teachers and management and its results are not reported. School-Based External Assessment The Meitzav The Meitzav (Hebrew acronym for “School Growth and Efficiency Measures”) is a system of “School Growth and Efficiency Measures” that includes student achievement tests as well as questionnaires designed to glean information about the school climate and pedagogical environment (administered to principals, teachers and students). The purpose of the Meitzav at the school level is to provide school principals and teaching staff with a tool for planning and utilizing resources for realizing student potential, improving the pedagogical climate and enhancing instruction in school. At the system level, the Meitzav is intended to provide a snapshot of the mastery level of Israel’s students in the curricula of four core subjects, and to serve professional entities and decision makers in the 8 Education Ministry in setting policy on various educational issues, including climate and pedagogical environment. The Meitzav achievement tests focus on four core subjects: Native Language (Hebrew/Arabic) Mathematics, English, and Science and Technology. Tests are administered to students at two grade levels: five and eight, and the native language test (Hebrew/Arabic) is administered in grade two as well. The achievement tests are designed in full congruence with the curricula in each of the subjects and are intended to examine the extent to which elementary and junior-high school students meet the expected level required of them according to these curricula. Examples of these tests can be found on the RAMA website in the “School Assessment” tab on the topic of “Off the Shelf Assessments”. Each school belongs to one of four "Meitzav Clusters" – four equal and representative groups of elementary and junior-high schools in Israel (Clusters A, B, C, and D). Each cluster of schools is selected such that it will represent all schools nationwide. Meitzav tests, in the format set by RAMA when it was established, are administered in each of the four core subjects in a four-year cycle. Schools are tested once every two years in external tests (external Meitzav) in only two subjects: Mathematics and Native Language (Hebrew/Arabic) or English and Science and Technology, and two years later are tested in the two other subjects. In a year in which the school is not tested on an external test in a given subject it is tested on it through an internal test (internal Meitzav) which is the same test administered that year on the external Meitzav (for details see the chapter on Internal Meitzav). Thus each school is tested in an external Meitzav test in each subject (in the relevant grades) once every four years and in the internal Meitzav test in the same subject in the following three years. For further information see the RAMA website, “Meitzav” tab on the subject “General Background – External and Internal Meitzav”. 9 Table Number 1 – Cycles of Meitzav clusters by subject and year Cluster of Schools A B C D Knowledge Field 2006/7 2007/8 2008/9 2009/10 2010/11 2011/12 2012/13 2013/14 Science & Tech English Math Native Language Science & Tech English Math Native Language Science & Tech English Math Native Language Science & Tech English Math Native language External External Internal Internal Internal Internal Internal Internal Internal Internal External External Internal Internal Internal Internal External External Internal Internal Internal Internal Internal Internal Internal Internal External External Internal Internal Internal Internal Internal Internal Internal Internal External External Internal Internal Internal Internal Internal Internal Internal Internal External External Internal Internal Internal Internal External External Internal Internal Internal Internal Internal Internal Internal Internal External External Internal Internal External External Internal Internal Internal Internal External External Internal Internal Internal Internal Internal Internal Internal Internal External External Internal Internal Internal Internal External External Internal Internal Internal Internal Internal Internal Internal Internal Internal Internal Internal Internal External External Internal Internal Internal Internal External External Internal Internal Internal Internal Internal Internal Internal Internal External External Internal Internal Internal Internal External External Internal Internal Cycle 1 Cycle 2 Internal tests are accompanied by pedagogical material for teachers. The school can use internal Meitzav grades as they see fit for improvement and internal learning and as part of the annual assessment provided students. The process of teachers' grading internal Meitav tests enhances their professional development as it exposes them to professional indicators/rubrics that define expectations from students and enables them to learn from students' answers about their knowledge and comprehension levels. Internal Meitav grades serve, as noted, only the school staff, and the school is not required to report them to an external entity (see expanded discussion on internal Meitzav below). National norms, derived from the results of the external Meitzav tests, are reported to the schools that administer internal Meitzav tests. These norms help the school principal and the teaching staff interpret data obtained from the same tests administered internally that year. 10 The school climate and pedagogical environment surveys in the Meitzav are designed to provide a detailed picture of the school climate and pedagogical processes as revealed through student questionnaires and interviews with teachers. The questionnaires provide comprehensive and relevant information about important dimensions in this area, including: student motivation level; the relationship between teachers and students; violent events and students’ feelings of safety and sense of feeling protected; team work among teachers, and more. These dimensions are based on insights gathered from several sources: focus groups of teachers and principals, discussions with Education Ministry officials, consultation with academic scholars and reviews of the current literature. The questionnaires are administered to fifth through ninth graders and to elementary and junior-high school teachers. In the 2008/9 school year a pedagogic and school climate questionnaire was administered for the first time to high school students. A school survey for high school teachers is in advanced development stages. For more information see the RAMA website, “Climate and pedagogical setting surveys” tab. At the end of the external Meitzav administering process, RAMA produces a comprehensive school Meitzav report (with respect to school achievements and climate), as well as a detailed national Meitzav report. What Can be Learned from the Meitzav? The importance of using the Meitzav as a working tool stems from the need to obtain an updated, diagnostic picture of the level of implementation and fulfillment of various system goals (at the student, class, school, and overall education system level) in order to realize the potential for continued improvement of schools and the education system. At the school level – the process in which insights are gleaned from detailed school reports enables the school staff to examine itself and to view the school as a holistic system from various aspects: achievements, climate and pedagogical setting. The findings presented in the reports enable the school staff to identify strengths and weaknesses in the subjects tested, to identify topics or abilities that were not stressed, to propose hypotheses with respect to the findings, to learn what additional data should be collected to confirm or reject hypotheses, to examine the reasons for difficulties found (for 11 example, why students have difficulty in writing) and to design long-term programs that will address these difficulties. Effective use of Meitzav findings can help schools design mechanisms to improve school-based processes and plan long-term steps to sustain improvement over time. At the system level – the external Meitzav tests provide data about the education system and schools across different cross-sections that, similar to school-level Meitzav tests, help identify difficulties and gaps that need to be addressed in order to improve the system. Thus, for example, achievement data is compared by socio-economic level and across different sub-groups (for example by sector or gender). Comparisons of this kind provide decision makers with information about gaps in the education system that require intervention. Since 2008, valid comparisons of Meitzav grades in different years (see below) are possible, improving the ability to monitor system quality over time. Moreover, Meitzav data can serve to evaluate the effectiveness of national educational programs implemented in the education system. The annual reports are published on the RAMA website, “RAMA Publications” tab on the topic of “System and Local Government Meitzav Data”. Trends over Time and Comparison of Assessment Scores Two of the more common though not necessarily informed uses of national external test grades are the comparison of achievements over time and by grade level (“league tables”). However not every comparison is valid or valuable; for example, does a grade of 85 on an easy test in any particular year indicate an improvement compared to a grade of 75 on a more difficult test the previous year? Of course not! Thus, unprofessional and irresponsible rankings of this nature between schools may cause serious damage. RAMA is working with all partners to the educational process and the general public to instill the understanding that rankings have significance, if at all, only among relevant populations, and that comparisons over years have value only when grade scales are calibrated. Calibration is designed to neutralize the effect of differing levels of tests administered in different years in the same subjects, allowing for valid comparisons of test achievements. The need to calibrate scores from year to year stems from the fact that test 12 formulations differ every year with respect to the difficulty level of test questions. The grades on the raw grade scale for each year are the total points accumulated by testees for their answers on a given test formulation. It is impossible to determine for example, whether the reasons for the rise in the raw score average is the result of an easier test formulation that particular year or of a rise in achievement among students (or both). In order to solve this problem and to examine Meitzav grade trends over time, in 2008 RAMA established a statistical calibration scheme of test grades that translates test grades for each year to a new comparative scale – the multi-year Meitzav scale. The new calibrated scale is designed to allow valid comparisons of Meitzav test scores over time. The calibration scheme takes into account differences in test difficulty levels in different years, "positioning” grades on a new measurement scale, allowing for multi-year comparisons. The multi-year Meitzav scale was designed such that in the base year 2008 – grade average was 500 and standard deviation was 100. The calibration procedure and implementation of a multi-year scale is standard practice in various national testing systems (for example in the Israeli psychometric exam and the NAEP in the U.S.A.) and in tests administered as part of international studies. However, it should be stressed that grades on the new Meitzav scale cannot be translated into grades on other test systems; they only allow for multi-year comparison of Meitzav achievements within each subject and grade level. A comparison of Meitzav results for the years 2008 and 2012 points to a general trend of improvement in the four core subjects – Native Language, Science and Technology, and English – in grades five and eight and in Mathematics in grade 5, , in both the Hebrewspeaking and Arabic-speaking sectors. For grade five, a comparison with respect to the test results in 2007 was also possible. In grade five – in the six years 2007-2012 a moderate to large cumulative increase in the four subjects was recorded. The increase changes between knowledge areas and language sector, and ranges between 30 to 60 points on the multi-year scale. In grade eight- in the five years 2008-2012 a cumulative increase was recorded in three subjects: a slight increase in English (about 10 points) and a large increase in Native Language and in Science and Technology (about 50 points). Student 13 achievements on the 2012 Mathematics test are similar to the achievements recorded in 2008. The upward trend in achievements over the years is also reflected separately for each of the three socio-economic groups (low, medium and high socio-economic background) for each of the subjects. For most subjects, changes in achievement gaps (narrow or widen) in favor of students from a higher socio-economic background were not recorded over time. Exceptions are Hebrew and English tests for grade five: in 2012 the gap between achievements on these tests among students in Hebrew-speaking schools narrowed. The Meitzav also includes an examination of school climate and pedagogical environment. In this area the findings indicate stability over time with several positive trends (including an increase in students' reported sense of safety and protection, an increase in reported appropriate behavior among students in class, a slight decline in reported violence in elementary school and an increase in reported use of computer mediated communication for learning purposes). Sample-Based National Assessment International Studies Among its roles, RAMA is responsible for conducting the international studies in Israel. These studies make possible the comparison of student achievements across many countries in several subjects as well as the study of other educational issues. Furthermore, results of these studies enable comparison between different sectors and different population groups within each participating country. The tests are administered in a fixed cycle once every few years, and allow for the study of trends over time (calibration is structured into these tests as well). The international organizations that develop the tests are among the leaders in the field of evaluation and measurement. The tests, translated into different languages, are meticulously designed and have high levels of reliability and validity. Each of the tests and questionnaires is designed according to a detailed and rigorous theoretical framework, drafted by experts in the subject tested and pedagogy from around the world. 14 In each country the tests and questionnaires are administered to a representative sample of the population and scores are not reported at the class or school level, only at the country level. Israel has participated in a series of international studies in recent years, including: PISA (Reading Literacy, Mathematics and Science for age 15+), TIMSS (Mathematics and Science for grade eight) and PIRLS (Reading Literacy for grade four). These studies provide reliable information about the Israeli education system from an international perspective. This information may be of great importance to policymakers who strive to obtain an accurate picture of the weaknesses and strengths of the education system in Israel through objective comparison with other education systems in the world. Furthermore, participation in international studies enables Israel to learn about new and contemporary approaches in the subjects tested and to examine its curricula in relation to curricula in other countries in the world. Thus, for example, the need to strengthen literacy in Science, Mathematics and Language was identified through the PISA study. The studies also enable participants to learn from models of successful education systems in different countries, through comparisons of high and low achieving countries, of countries that differ in education gaps between and within schools, and through an examination of the relationship between student achievement and different background variables (such as parents’ education, socio-economic status, student attitudes towards the subject, etc.). For example, this is how the world learned about the successful education system of Finland having very high achievements on the PISA study. The success of the Finish education system has been attributed mainly to its social and cultural norms accompanied by a reform based on the empowerment of school teachers and principals. Media and policymakers often tend to emphasize the rank achieved by each country on the country ranking, comparing it to the position achieved on previous tests, and presenting the ranking as the main finding of international studies. However, the main value of these tests lies in the opportunity they offer to conduct comparisons and examinations within each country separately, addressing objectives reached in each subject and the relationship between achievement and background variables and attitudes related to the various study topics. In Israel, for example, study results allow for comparison between student achievements in the Arab-speaking sector and the Hebrew-speaking sector, and also comparison between other sub-groups (boys and girls for example). Above all, through 15 these studies the achievements of the State of Israel can be examined over time, owing to a cyclical pattern of these tests conducted in a similar format once every three to five years, while maintaining calibration of the grade scale from one cycle to the next. The PISA 2009 results published in December 2010 showed a significant improvement in student achievements in Israel in Reading Literacy (and stability in students’ achievements in Mathematics and Science Literacy). The results of PISA 2012 are expected to be published in December 2013. The results recently published for PIRLS 2011 and TIMSS 2011 indicated significant improvement in the achievements of students in Israel. We are currently preparing for participation in the PISA 2015 study that will be computerized in its entirety. Further details about each of these tests can be found on the specific websites of each of the international studies and on the RAMA website, under the “International Tests” tab. National, Sample-Based Assessment – Mashov Artzi RAMA is currently launching a sample-based, national assessment, known as the Mashov Artzi, that will focus on a different subject each time, and will provide system-level information about educational achievements in the education system in Israel and also information about the specific pedagogical context. The Mashov Artzi will enable the policymakers to examine various subjects, over and above those tested on the Meitzav. Similar national systems are implemented in several countries, among them the National Assessment of Educational Progress (NAEP) which has been administered for decades in the USA and which is the most well-known and advanced. For each subject that will be tested on the Mashov Artzi a representative sample of schools will be selected to participate. The Mashov Artzi for each subject will include achievement tests and questionnaires intended for students, teachers and school principals. The Mashov Artzi will provide information with regard to learning outcomes related to a wide range of content, skills and thinking strategies relevant to each subject, and would be based on relatively small sample of students. The multiplicity of test formats in each subject will allow for a wide and in-depth coverage of the subject and will provide reliable information with regard to the sub-topics included in it. The questionnaires will allow for the collection of information with regard to various variables for describing and characterizing the context 16 in which the subject is taught (e.g., prevalent teaching practices, professional development of subject teachers, school policy with regard to the subject, etc.). This information would then contribute to interpreting and explaining the learning outcomes. The Mashov Artzi will enable policymakers to learn about the success of different instructional and educational methods by examining gaps found between sectors, and to examine trends over time. In light of the study goals, the findings on the Mashov Artzi will be analyzed and reported at the national system level only. Findings at the student, class or school level will not be analyzed or reported. The Mashov Artzi will be conducted cyclically once every few years and each knowledge field will be testesd based on its own test cycle, in order to track indicative trends over time. The national Mashov will be administered for the first time in the 2014 school year and the first knowledge field to be examined will be Geography. The Mashov Artzi will be administered in ninth grade, towards the end of mandatory studies in this subject. Study tools (achievement tests and questionnaires) will be administered in a computerized setting. Computerized tests in Geography will assess, among other things, geographical skills in using technological tools (such as interactive maps). In this way there will be an attempt to collect information about certain skills that are included among those that are required of a citizen of the 21st Century. Further details about the national Mashov in Geography, including the framework document, can be found in the RAMA website, “National Tests” tab, on the subject “Sample-Based National Assessment/Mashov”. Additional knowledge fields in which the national Mashov will be conducted have yet to be determined. National Sample-Based Monitoring of School Violence Level School violence is one of the central issues on the public agenda with respect to the education system in Israel. In light of the extensive discourse on this topic and in order to identify trends related to violence in the education system there has arisen a need to monitor the level of school violence. . In 2009 and 2011 RAMA conducted a large scale survey for monitoring violence among a national, representative sample of students in grades four through eleven in both the Hebrew speaking sector and the Arabic speaking sector. The aim is to continue to monitor 17 levels of violence using similar questionnaires to be administered once every two years among a representative sample of students. The third monitoring survey is being conducted in 2013. The questionnaires administered to students examined a series of violent and dangerous behaviors among students which have been summarized in the following indices: severe violence, moderate violence, social violence, violence using digital media, verbal violence, violent gangs and bullying, sexual violence, alcohol and drug abuse, violence by and towards the school staff, cold weapons in school, violence on school buses to and from school, absenteeism due to fear of injury, students' feelings of safety and protection in school, school efforts to prevent violence. Questionnaire development was the result of the work of a steering committee comprised of personnel from RAMA, entities from the Education Ministry that deal in school violence (the Psychological and Counseling Services Division - SHEFI, the Youth and Society Department and the two age divisions – elementary and secondary) and experts from academia. For the majority of indices a trend of improvement was evident between 2009 and 2011, at the different age levels and in the two language sectors. Stability was recorded on the remaining indices for these years. Improvement was especially evident (in other words a decline in the rate of student reporting) for the following indices: severe violence, social violence, violence by and towards school staff, sexual violence, violence on school buses and alcohol abuse. Improvement characterizes both language sectors, and is especially evident in Arabicspeaking schools, and also particularly in 4th-6th and 7th-8th grades. In reports of 10th-11th grade students in Hebrew-speaking schools stability was recorded for the most part during these years. In the reports of their counterparts in these grades in Arabic-speaking schools a trend of improvement was recorded for the most part. In each of the years 2009 and 2011: The older the students, the fewer reports of most types of violence – except for violence towards school staff, bringing cold weapons to school, alcohol and drug abuse. In each of the years 2009 and 2011: among students in Arab-speaking schools the reporting rate of most of the negative behaviors examined by the questionnaire was higher in 18 comparison to Hebrew-speaking schools. This holds true except for verbal violence and alcohol consumption, in which the gap is in the opposite direction. Further information can be found on the RAMA website in the “Research/Studies and Project Assessments” tab, under the topic “Violence Monitoring”. School-Based Internal Assessment Internal assessment is carried out continuously, and by definition is performed by and under the initiative of the school staff. The main goal of school-based internal assessment is to promote student learning. Data/Information collected from the internal assessment help teachers and school management identify students’ strengths and weaknesses with respect to expected achievements and guide them regarding the needs of the students and adapting instruction to these needs. School-based internal assessment is based on an approach of assessment for learning (Halel – Hebrew acronym), which is intended to provide an answer to two key questions: where is the student on the way to achieving learning objectives? What steps are required to promote learning and realize its objectives? The assessment process is based on gathering information and evidence from a variety sources and through a range of tools (tests, performance tasks, assessment assignments) together with an integrative interpretation of the evidence. Interpretation and conclusions then constitute/are a foundation on which to design appropriate intervention aimed at achieving learning objectives. RAMA contributes to the reinforcement of school-based internal assessment by providing professional assessment tools which also include pedagogical materials (for example, detailed indicators/rubrics, explanations about mapping questions and definition of tested abilities, examples of common mistakes and suggestions for further instruction activities , etc.). Various tools available to teachers for school-based internal assessment are as follows: Internal Meitzav RAMA provides schools with the external Meitzav tests developed professionally for use in the context of school-based internal assessment. The internal Meitzav tests are intended to 19 be included as an integral part of a school-based internal assessment routine and to complement the other internal assessment tools. In the Bulletin of the Director-General, it has been emphasized that the purpose of the internal Meitzav grades is to exclusively serve the school staff, and therefore there is no requirement to report results to any /external entity. For further information see the RAMA website, “School Assessment” tab, on the subject of “Internal Meitzav”. The internal Meitzav test is based on the following principles: An objective, external, national test, with psychometric qualities of reliability and validity, developed by RAMA in collaboration with professional committees. The test reflects the curriculum and requirements expected from students in each core subject and at given grade levels, in terms of knowledge and skills. Internal examination scored by school staff (with the help of indicators/rubrics and scoring tools) of individual and group assessment can be produced quickly regarding students’ proficiency in every subject. Enables comparison of student achievements to external norms (national, district, sector) gleaned from the external Meitzav test. The benefits for schools from the internal Meitzav test include the following: Enhanced school-based internal assessment processes. Schools can gain a snapshot of their condition that combines information based on external assessment sources that can be adapted to the school context. Data-based decision making. School administration and teaching staff can gain insights from the test grading process and the results that will assist them focus on appropriate educational and learning goals in alignment with school vision. Adapted to school needs. The school can use internal Meitzav grades as it sees fit, as part of annual student assessment. Reduction of the negative phenomena that often accompany the external Meitzav, for example diversion of resources (study time and teaching cadre) at the expense of other subjects, distancing weak students from school and reduced motivation of some students to take a test that “does not count,” for the individual grade. 20 Off-the-Shelf Tests, Formative Tests and Banks of Performance Tasks RAMA provides a wide variety of class-based internal assessment tools: previous versions of achievement tests, formative exams and tests, specifications and test rubrics, banks of performance tasks in different subjects. Information gathered from these tools can serve as the basis for developing intervention programs adapted to student needs. The performance tasks designed and provided to schools by RAMA are intended to assess learning processes and products as well as complex learning abilities. These include: a system perspective, problem solving, taking a position, critical thinking, drawing conclusions, planning and identifying connections. Following are additional tools available to schools for internal assessment: Hebrew reading and writing test for grade one: The purpose is to track the proficiency of students in Hebrew reading and writing skills. Test results serve teachers in designing appropriate intervention. The test is comprised of a series of tasks which are administered individually (teacher—student) during the school year. Arabic reading and writing test for grade one: The purpose is to track the proficiency of students in Arabic reading and writing skills. Test results serve teachers in designing appropriate intervention. The test is comprised of a series of tasks which are administered individually (teacher-student) during the school year. Kit for assessing beginning reading in English for grade five: The purpose is to identify students with difficulty in beginning reading in English and identify particular difficulties that may be obstacles to reading acquisition. Test results serve teachers in designing appropriate intervention. The test is comprised of two components: one is a screening test administered by the teacher to all students, the other is a diagnostic test administered individually. 21 Amit test in native language for grade seven (Hebrew and Arabic): The purpose is to assess the proficiency of students in reading comprehension and writing reading and writing (Hebrew/Arabic as first language). Test results serve teachers in designing appropriate intervention. The test is comprised of three parts. Each part includes one reading text and test items relevant to reading comprehension, grammar and writing. The texts cover different genres, and questions relate to skills such as locating information with a low, medium and high access level; linguistic structures, and vocabulary. All students participate in the first part, and then proceed to a second part which is suited to their level of performance (as indicated by results on the first part). Kit for assessing spoken language in Hebrew in grade eight: The purpose is to assess the proficiency of students in speaking Hebrew and was originally developed as part of the internal Meitzav for native speakers of Hebrew in grade eight.. The kit covers three aspects of spoken language: reading out aloud, reporting and group discussion. It includes assessment tasks, rubrics for assessing student performance and a teacher's guide. Kit for assessing spoken language English for junior high school: The purpose is to assess the proficiency of students in speaking English The kit includes five units that relate to various aspects of oral social interaction and presentation. Each of the units is comprised of assessment tasks, rubrics for teachers and students suggestions for teaching activities that promote development of spoken language, and a teacher's guide Kit for assessing Hebrew among immigrant students entering grades three through nine: The purpose is to examine when and how immigrant students can be successfully integrated into homeroom classes – among their Hebrew native speaking contemporaries– and to participate in classes in the various subjects, while receiving assistance. 22 The kit includes five sections: discourse, reading out aloud, listening comprehension, reading comprehension and writing. The kit is accompanied by a teacher's guide including very detailed guidelines regarding the structure, goals, administration dates, administration mode, duration and scoring. Bank of performance tasks in "Culture and Heritage of Israel" for grades six through eight: The purpose is to assess achievement in this school subject. Student performance on these tasks serves as the basis for planning and improving teaching processes and is a basis on which the teacher can provide effective feedback that promotes learning. The kit includes 12 tasks that reflect four main themes: Jewish literature; Jewish calendar and life cycle; the affinity of the Jewish people to the Land of Israel; the image of the State of Israel as the State of the Jewish people. There are four tasks for each grade level, each deriving from one of the four themes. The kit is accompanied by a teacher's guide including very detailed guidelines as to the administration and scoring of the tasks, as well as recommendations for instruction. Use of off-the-shelf tests and tasks can contribute to integrated assessment in the learning processes, improved planning of classroom instruction and increased effectiveness of decision-making processes in class and in school. Analysis of the results of the various tools enables teachers to plan intervention activities in line with the various needs of their students, such as: create learning groups according to difficulties detected, select appropriate learning material, reinforce topics not learned properly, address mistakes and common misconceptions, etc. Schools can set priorities regarding resource allocation and professional development of teaching staff in subjects in which many students encountered difficulties. The use of formative measurement and assessment tools and strategies requires a cultural change at the school and class level, in favour of cultivating a culture that does not only focus of grades, but also on the learning process itself. Within such a culture students receive feedback, support and assistance which is based on the assessment process. Most importantly, this culture ensures that student assessment is used effectively. 23 The School-Based Assessment Coordinator RAMA attaches great importance to the development of school-based assessment processes to assist schools in defining their information needs to function optimally and to reap the utmost from internal and external assessment. To assist schools in collecting valid data and making informed decisions based on these data, RAMA has acted to define a new position in the education system – the school-based assessment coordinator. Within the framework of two agreements, the “Ofek Hadash” and the "Oz Le'Tmura" between the Ministry of Education and the Teachers’ Unions (Histadrut Ha'Morim and Irgun Ha'Morim) the role of assessment coordinator is included in the list of role holders entitled to remuneration. According to the agreements, all schools can appoint a school-based assessment coordinator, provided that he or she has teaching experience and a Master’s degree in measurement and assessment, or alternately, in another field and has completed an academic specialization in measurement and assessment. The role of the school-based assessment coordinator is to take the lead in incorporating a school-based assessment culture in collaboration with other position holders on the school staff, and under the supervision of the school principal. A school-based assessment culture assumes that the school community is a learning one that views assessment as a central component of the teaching-learning-assessment process and uses different assessment tools through mechanisms that foster school norms and values as an integral part of its work and/or its learning. The position entails dealing with assessment topics common to all schools, but mainly with unique topics that address the needs of the specific school and are congruent with its educational objectives, characteristics, its world view and culture. The implementation of a school-based assessment culture begins with the design of an organizational system and school-based mechanisms that allow for cooperation alongside the development and management of a repository of internal assessment tools and building a school-based database. It is believed that the assessment coordinator will set systematic and continuous change in motion within school and thus the school staff will make the connection between the curricula and the goals measured in each of the subjects tested. This information will help interpret the test grades in a meaningful way that allows for the identification of student strengths and weaknesses and achievement gaps between learners. 24 Utilizing this information will help monitor progress and examine changes in student performance, and to identify topics and fields in the curricula that require strengthening, reinforcement, or improvement. The assessment coordinator advises school staff in all matters regarding the assessment of student achievements and their progress using varied and innovative assessment methods. The coordinator acts to foster the perception of “assessment for learning”, that stresses improvement and streamlining of school teaching and learning methods based on information gleaned from measurement data. It is also the coordinator’s responsibility to help in interpreting data from school-based assessment reports, articles, and other data, and to extract implications at the specific school level. As part of the assessment culture fostered by the assessment coordinator, systematic information can be collected about the myriad educational projects in which the school is involved, which in turn will lead to system level discussion and conclusions. Evaluation of Teaching Staff and Investigation of Teaching Practice Evaluation of teaching staff Teacher2 and principal evaluation is an important component in the process of promoting teaching and learning quality (Isore, 2009). For years principals have evaluated teachers and supervisors have evaluated principals, each in their own way: at different times, with different tools and with respect to different aspects. As part of the “Ofek Hadash” reform, a new promotion scale for teachers and other educationalists3, vice principalsand principals was introduced, including a number of junctures at which summative evaluation is required. The need for summative evaluation as part of the career path of teachers and principals creates in effect a continuous, organized and uniform evaluation procedure for the whole system for promoting teachers, other educationalists, vice principals and principals at different stages. 2 This article will not expand on the teacher evaluation model at the high school level which was addressed in the “Oz LeTemura” reform agreement, as it is still in its infancy. 3 Other educationalists: kindergarten teachers, counselors, health professions, interns 25 The teacher evaluation tool was developed by RAMA in 2010, in collaboration with representatives of the Ministry of Education and its districts. Tool development was accompanied by many focus groups comprised of supervisors, principals and teachers, using existing tools found in Israel and other countries. The tool was designed to reflect the complexity of the teacher’s work and to create a common language among all Ministry of Education entities (supervisors, principals, teachers and Ministry personnel) in relation to all aspects of teacher performance. The teacher evaluation tool is based on the following four meta-indicators: Meta-indicator 1: Role perception and professional ethics refers to aspects related to identification with the teaching and educational role and commitment to the organization and the system. Meta-indicator 2: Subject knowledge refers to knowledge of the subject and its teaching. Meta-indicator 3: Educational and learning processes refers to aspects related to lesson design and organization, teaching methods, learning and assessment and a supportive learning environment. Meta-indicator 4: Partnership in a professional community refers to aspects relating to the teacher’s participation in the professional community of the school and that of the subject. One of the important achievements deriving from the tool development process for teacher evaluation is system-wide agreement that these four meta- indicators provide a uniform and structured answer to the question: “Who is a good teacher?” Teacher evaluation is carried out by principals based on systematic data collection, including documented observations of teachers and gathering of other relevant material. Additionally, for evaluation purposes teachers as requested to complete a self-evaluation questionnaire based on the meta-indicators. The principal and teacher then meet for a feedback meeting aimed at discussing the gaps in their evaluations. The entire evaluation process is carried out on an online system specifically developed for this purpose. For further information see the RAMA website in the “Evaluation of Educationalists and 26 Administrators”, on the topic of “The Teacher Evaluation Tool” and "Evaluation of Other Educationalists". The school principal evaluation tool was designed by RAMA based on the perception of the principal’s role as outlined by the Israeli Institute for School-Based Leadership - “Avnei Rosha” - that includes the following four meta-indicators: Meta-indicator 1: Formulating a vision and leading school-based policy refers to the following aspects of leadership: formulating an education-based vision, teaching and learning in a socio-environmental context, designing a school-based work plan and tracking its implementation. Meta-indicator 2: Improving teaching, learning and education refers to the following aspects of leadership: planning learning in school, institutionalization of school-based assessment and learning; school culture and climate that support learning and learners, and accountability. Meta-indicator 3: Leading and professional development of school staff refers to the following aspects of leadership: management of the professional development of school staff; promoting a school-based professional community and cultivate school-based leadership, and professional ethics. Meta-indicator 4: Mutual relations with the community refer to maintaining mutual relations with the community of parents and the community at large. For further information see the RAMA website in the “Teacher and Management Personnel Evaluation”, on the topic of “The Tool for Evaluating Principals and Vice-Principals”. These two tools, the teacher evaluation tool and the school principal evaluation tool, include descriptions of teacher/principal behaviour in each of the four meta-indicators, at different performance levels which represent a professional development scale. The teacher/principal evaluation is determined based on the performance level on each of the detailed components of the tool in each of the meta-indicators. As such, these tools provide a framework for the professional development of all educators, and allow for the identification of needs at the individual, school and system level. The other evaluation tools (the tool for evaluating vice-principals, the tool for evaluating kindergarten teachers, the 27 tool for evaluating interns, the tool for evaluating counsellors, the tool for evaluating health professions) were developed based on the same principles that have been described here, but the dimensions are suited to each particular population. International Teacher Survey - TALIS Israel participates in the Teaching and Learning International Survey (TALIS) conducted within the framework of the OECD Education Administration. The survey is designed to help decision makers to formulate policy that defines suitable conditions for promoting effective teaching and effective schools. Participation in the study also allows for learning about differences in educational policy between countries, and about the influence of this policy on the school environment. The study is based on a set of indicators that provide a solid and coherent description of a “healthy” school system and is based on measurable variables of the functioning of education systems and their performance. The set of indicators is founded on a conceptual framework of “teaching at its best” and on the school-based conditions that make it possible. Thus, for example, study provides information about: teachers’ teaching practices, their beliefs and attitudes, the functioning and actions of school leadership, evaluation and feedback that teachers receive regarding their work, classroom and school climate, teachers’ sense of self efficacy and their work satisfaction. The Use of Test and Survey Data for Research and for Program and Project Evaluation The desire to integrate measurement and assessment as an integral part of educational programs reflects a growing trend in recent years, a trend that attaches importance to research-based educational interventions and to findings about their efficacy, and they have become a prerequisite for program implementation. In a policy formulated by the United States government at the beginning of the millennium (as it appeared in the No Child Left Behind Act, 2001), educational interventions were required to prove that they were evidence- based and predicated on meticulous and systematic research that produced valid findings before receiving funding for implementation. However the demand for a research base does not necessarily define the validity of the findings. The quality of the findings 28 depends on the research method. The most valid findings are produced from controlled experimental studies based on randomized controlled trials (RCT). Such studies are not common in education and are complicated to perform due to the variance between schools, the complexity of schools as organizations, the difficulty in uniformly implementing them and in controlling experiment conditions as well as ethical problems involved in conducting such experiments (providing certain programs to certain students and preventing them from others). In light of these difficulties there are researchers who claim that the education field differs from clinical fields even to the extent that it is impossible to conduct randomized and controlled experimental studies. On the other hand, there are those who claim that the importance of the education field demands that a special effort be made in order to produce research- based findings, as otherwise choosing policy, interventions or educational practices will be a matter of taste or fashion forced on schools without examining the cost and benefit. Though the debate between these approaches is growing the development in the field cannot be stopped. The desire to produce research-based data that will allow for informed decision making can also be realized by other research methods, more accessible and simple to perform, that provide findings at a reasonable level of validity. Among these methods, the best one is quasi-experimental research that is based on comparison groups (in terms of characteristics). At lower levels of validity are studies conducted according to the method of pre- post-tests; after them – studies based on correlations and those based on case studies. At the bottom of the list are anecdotal studies through which generalization is not possible and therefore their findings are not considered to be valid according to accepted standards. RAMA strives to produce the most valid data and as such its studies are based on randomized controlled trial (RCT) research methods, quasi-experiments, pre-post studies, correlations and sometimes also on case study studies. In this sense, RAMA’s operating theory is closer to the perception according to which education policy must be based on valid findings as much as possible despite the uniqueness of the field and the difference compared to clinical fields. The qualitative ranking of the various research methods is reflected in practice in comparison databases of education studies – databases that evaluate 29 the quality of existing knowledge in different areas of the educational endeavour and summarize it from a critical perspective. One of the important databases is the What Works Clearinghouse of the U.S. Department of Education's Institute of Education Sciences. From this and similar databases one may learn about different programs for which there are valid findings that testify their efficacy level. These databases evaluate the quality of programs based on various studies. RAMA’s activities are slightly different since it deals mainly with applied and not in theoretical research of education. The purpose of assessment is to provide information about results of programs or policy, as a means for their improvement, in other words information that will be meaningful to decision makers in real time (Weiss, 1998). Assessment focuses on findings that can be implemented in the field, while research aims to reach data that can be generalized and used to advance science. Research focuses on the possible theoretical contribution and from this it derives the research question and method, whereas assessment derives questions and goals from the needs of the field and those of policy makers, and therefore its priority is for immediate, practical and specific uses. Despite the differing goals, research methods are similar and RAMA bases its work on controlled assessment schemes that will produce valid findings. The program or project evaluation process is comprised of several milestones: at the first stage expectations are coordinated with the evaluation requestor, and these include the definition of evaluation goals versus project goals. At the second stage, a research team that suits the nature of the required study is established (the study usually being a combination of qualitative and quantitative methods). In the case of large-scale projects, a steering committee is also appointed to operate in conjunction with the study team. The latter formulates an evaluation proposal that includes schedule and budget and also a literature review of the field. The third and final stage includes performing the evaluation and preparing a summary report. The report is submitted for review and following corrections is published. At the end of the process a discussion is held with the participation of relevant entities in the Education Ministry to study the findings in order to learn and draw conclusions. 30 The programs and projects that RAMA evaluates are varied – in terms of content (educational, social, value, institutional and others) as well as scope and complexity. Requests for evaluation are received first and foremost from the Education Ministry administration and its various divisions. Nonetheless, when planning a study scheme an attempt is made to collect knowledge and information that will be relevant to a wider circle than that of the original requestor, and findings are also published on the RAMA website. Each program is evaluated according to a set framework. This framework addresses context, input, processes and outputs that RAMA evaluates (Stufflebeam, 1983), while adjusting for the appropriate research method for the program and the requestor’s needs. Examining the context of a program begins by identifying needs and mapping the target audience and the environment in which the program will be conducted. This is followed by learning about the goals of the program and how it is suited to the context. Sometimes evaluation even helps program developers to articulate for themselves the underlying program principles, goals and perceptions as it requires contending with structured questions about the nature of the project. Inputs are reviewed by examining the program rationale (with the help of experts in the specific field and literature review) and the suitability of inputs to the needs. Process review includes analysis of program implementation in terms of implementation methods, target audience, operating entities, resource utilization, etc. (through surveys, observations and interviews). Outputs are examined by measuring the effect on the target audience over different time periods. Educational program evaluation activities also provide formative evaluation that collects information in real time to improve implementation as well as summative evaluation that examines project products. In summative evaluation the main goal is to evaluate program or policy effectiveness. Effectiveness evaluation investigates program implementation methods and its results. Sometimes evaluation requests deviate from the accepted distinction between formative and summative evaluation. Thus for example is the request to monitor and control various programs operated by Education Ministry divisions. It is important to stress that collecting, monitoring and controlling data is not RAMA’s responsibility, and should be performed by program leaders. Nonetheless, monitoring and control data serve RAMA as part of the information required for project assessment. 31 Many tools and data are used in evaluations: structured interviews, observations, focus groups, achievement data (based on Meitzav, matriculation and other tests), data about school climate and pedagogical environment, data from the evaluation of teaching staff and international studies. RAMA adapts varied research tools to the changing needs of the types of different projects and to the type of evaluation required. As stated, RAMA evaluates a wide-range of programs for varied reasons – some due to their broad scope and their importance to the system (evaluation of the “Ofek Hadash” reform, for example). There are others that are evaluated despite their limited scope. For example, programs that have the potential to influence the entire system if they are successful (this is the case with the program based on the personal educational model operating in the city of Bat Yam, or evaluation of the centers for adults completing their secondary education). Another example are programs that are the focus of attention of the requesting entity (for example the Immigrant Absorption Division’s program for student dropout prevention or the Amirim excellence program). Another type of program is evaluated as part of an effort to pool resources from a system perspective – several programs are often operated by different entities although their common aspects exceed the differences between them since they all have the same goal, and often even the same pedagogical method (this is the case with programs for increasing matriculation eligibility, programs for encouraging reading at the kindergarten level, etc.). In these instances the purpose of evaluation is to map the field, describe the variety of programs and finally to examine the effect of the programs taken together. In the evaluation of programs RAMA gives priority to a research scheme that will produce the most valid results. Accordingly, when possible, a controlled experimental study is used. Notwithstanding, implementing such a scheme involves many difficulties as described above. A good example of this is the program for integrating computer mediated communication technologies in elementary schools that was operated by the Education Ministry. In light of resource limitations the Ministry sought to implement the program in periphery areas of the country. The pedagogical logic underlying the decision is clear, yet in terms of research it does not allow for a controlled experiment study. Nevertheless an attempt was made to create an experimental scheme that would answer the questions as to 32 whether schools that received advanced computer mediated communication resources improve teaching and learning processes and student achievements. For the study experimental group 60 schools that took part in the program were randomly sampled, and for the control group 60 schools that did not participate in the program and whose computer mediated communication level was low were sampled. The experimental group was invited to a conference in which it was explained that the attending schools were selected for a study and they received tools for implementing the plan. Schools selected for the control group were not told anything. The problem in the study arose of course when schools in the control group began to obtain technological equipment on their own or with the help of extra-program entities (local government for example), since after all it is not possible to halt/stop their progress. In this sense the difference between a clinical medical study and a study in education is obvious, as in the latter the effect of a placebo pill cannot be simulated and it is impossible to control or oversee the control group in a way required by the experiment. Other difficulties in performing evaluations include: lack of organized databases in many programs, delayed request to RAMA to combine an evaluation scheme in a project in advanced implementation stages, the large number of studies, tests and questionnaires with which schools must contend, the tension between complex processing of data and preserving the relevance of knowledge produced from them, difficulty in attaining meaningful data (compared to the relative ease of gathering data of limited significance such as satisfaction surveys), lack of clarity regarding program goals (even for program developers) and political changes that create frequent changes in the Ministry's interest in certain programs. Despite these challenges, RAMA makes an effort to integrate advanced evaluation models and data processing methods with the intention of expanding the knowledge gathered through project evaluation and make it even more meaningful in decision making processes. The various research and evaluation reports prepared by RAMA are presented on the RAMA website, under the “Studies/Research and Project Assessment” tab. 33 Assessment in the Service of Learning – Summary Effective use of measurement and evaluation, at the school and at the system level, is dependent on the continued existence of a learning culture that views test and survey findings and internal assessment findings as having the potential to create genuine improvement rather than alleged improvement – this is the essence of “measurement in the service of learning”. The natural tendency of any person being assessed is to try to improve his or her ranking with respect to assessment index in any way possible, especially when the assessment is high-stakes, and this exists in many and varied fields. For example, in the field of medicine the demand to publish hospital mortality data seems to have brought about a tendency among some of the hospitals not to admit patients whose condition is serious. Publishing the number of operations per physician in the United States lead hospital physicians to unjustifiably and disproportionally increase the number of operations performed even when they were deemed unnecessary. In August 2012 the High Court of Justice in Israel published its decision/judgement (1245/12) that accepts the demand of the Movement for Freedom of Information in Israel to publish Meitzav grades at the school level, and rejects the position of the Ministry of Education and of RAMA according to which the damage expected from the publication exceeds the benefit. This judgement may have a significant effect on Meitzav tests and perhaps also even on their administration in their current format. For details see RAMA website in “Meitzav” – “General Background – external and internal Meitzav” tab in the link (“Policy for Publishing Meitzav Results”). RAMA sees it main purpose in assisting the education system to improve through informed use of measurement and assessment. The concept of “measurement in the service of learning” is not only a slogan – RAMA views it as the justification for its existence. RAMA believes in combining external and internal assessment. The two types of assessment are related to differing approaches to generate change processes in organizations in general and in educational organizations in particular. External assessment deals in overall change that takes place in education systems and is executed at the national 34 level by the government. This approach reflects top down change. Internal assessment deals in bottom up change and is led by the initiative of teachers or the school as an organization – they are the ones that determine activity patterns. Education literature recommends the effective combination between “bottom up” and “top down” change. RAMA also believes that only dialogue and a process that acts simultaneously in both modes can create the desired change. Creating a dialogue that will lead to shared consensus (at the school, Education Ministry and the social level) is the desired path (Levin, Glaze & Fullan, 2008). This assessment model, that includes large-scale external assessment alongside professional internal assessment, strives to limit the role of large-scale assessment as the sole determining assessment, and alongside it poses standards for beneficial internal assessment. Class assessment is still far from providing quality information and from gaining recognition and trust, however it must be recognized that large-scale assessment cannot fill these needs and we must direct efforts to create a proper balance between the two. To this end there is a need for professional training and development in the field of measurement and assessment. The more professional the school assessment processes, less the influence of external large-scale tests. In summary, effective measurement and assessment require the existence of these conditions: cooperation between all stakeholders, and between them and the measuring entity; agreement as to educational goals and objectives; agreement about measurement goals and honing them; good understanding of the series of tests and interpretation of their findings; access to findings and transparency of results; continuity and consistency between measurement cycles; fairness towards those assessed, and moderation of threatening aspects; professionalism – and integrity. Only by this will we have “assessment in the service of learning”. 35 Bibliography Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2, 67-90. Hamilton, L.S. (2008). High-stakes testing. In N. J. Salkind (Ed.), Encyclopedia of Educational Psychology, Vol. 1 (pp. 465-470). Thousand Oaks, CA: Sage. Isoré, M. (2009). Teacher Evaluation: Current Practices in OECD Countries and a Literature Review. OECD Working Papers No. 23, OECD Publishing. Koretz, D. (2005). Alignment, high stakes, and the inflation of test scores. In J. Herman & E. Haertel (Eds.), Uses and misuses of data in accountability testing. Yearbook of the National Society for the Study of Education, Vol. 104, Part 2, 99-118. Malden, MA: Blackwell Publishing. Koretz, D., & Hamilton, L. S. (2006). Testing for accountability in K-12. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 531-578). Westport, CT: Praeger. Levin, B., Glaze, A., & Fullan, M. (2008). Results without Rancor or Ranking: Ontario’s Success Story, Phi Delta Kappa, 90 (4), 273-280. Nichols, S. N. & Berliner, D. C. (2007). Collateral Damage: The effects of high-stakes testing on America’s schools. Cambridge, MA: Harvard Education Press Stufflebeam, D.L. (1983). The CIPP Model for Program Evaluation. In G.F. Madaus, M. Scriven, and D.L. Stufflebeam (Eds.), Evaluation Models: Viewpoints on Educational and Human Services Evaluation. Boston: Kluwer Nijhof. Weiss, C. (1998). Evaluation: Methods for studying programs and policies. New-Jersey: Prentice Hall. 36