March 2013

advertisement
RAMA
State of Israel
The National Authority for
Measurement & Evaluation in
Education
Ministry of Education
Assessment in the Service of Learning:
Theory and Practice
Professor Michal Beller
Director-General RAMA
March 2013
Bet Avgad, 5 Jabotinsky Road, Ramat Gan, 2nd Floor, 5252006, ISRAEL
Tel: +972-3-5205555 ▪ Fax: +972-3-5205509 ▪ e-mail: rama@education.gov.il ▪
http://rama.education.gov.il
Contents
Introduction ............................................................................................................................ 3
Large-Scale Tests in Educational Systems and their Frequency ....................................... 4
Updating the Measurement and Evaluation Format: Integrating External and
Internal Assessment ................................................................................................................ 7
School-Based External Assessment ....................................................................................... 8
The Meitzav ...................................................................................................................................... 8
What Can be Learned from the Meitzav? ........................................................................................ 11
Trends over Time and Comparison of Assessment Scores ............................................................. 12
Sample-Based National Assessment .................................................................................... 14
International Studies........................................................................................................................ 14
National, Sample-Based Assessment – Mashov Artzi ..................................................................... 16
National Sample-Based Monitoring of School Violence Level ...................................................... 17
School-Based Internal Assessment ...................................................................................... 19
Internal Meitzav ............................................................................................................................... 19
Off-the-Shelf Tests, Formative Tests and Banks of Performance Tasks ........................................ 21
The School-Based Assessment Coordinator ....................................................................... 24
Evaluation of Teaching Staff and Investigation of Teaching Practice ............................ 25
Evaluation of teaching staff............................................................................................................. 25
International Teacher Survey - TALIS ............................................................................................ 28
The Use of Test and Survey Data for Research and for Program and Project
Evaluation ............................................................................................................................. 28
Assessment in the Service of Learning – Summary........................................................... 34
Bibliography .......................................................................................................................... 36
2
Assessment in the Service of Learning
Professor Michal Beller, Director-General RAMA
Introduction
A review of the goals of education ministers in Israel over the years reveals three metagoals of the education system: realization of the potential of every student (scholastic,
creative and development of values), narrowing education gaps and maintaining a safe
learning environment.
Assuming wide-spread agreement with regard to these goals, the question arises as to how
we know whether these goals have been achieved. How are parents to know whether the
education system has provided their children with the tools necessary to successfully
function as active citizens in society? How are each of the partners in the educational
process (teachers, principals, and position holders at various levels of the education system)
to know whether they have satisfactorily fulfilled their role and whether the needs of
students from different backgrounds have been appropriately addressed? How are we to
detect educational gaps and how are we to know whether they have been narrowed? How
are the public to know that the future generation of Israeli children has been adequately
prepared to face the challenges of the 21st Century? How can the public be sure that the
extensive resources made available to the education system are being used judiciously and
have the planned effect? How can the benefit of increasing the State’s investment in
education, even at the expense of other important needs, be proven to policymakers? To
examine the extent to which these goals have been achieved, professionally-designed, valid
measurement and evaluation tools are needed.
Measurement and assessment are complex issues in all sectors – in the business sector, and
even more so in the public sector. In the education sphere, the implementation of
measurement and assessment is that much more difficult: learning and teaching processes
are inordinately complex, diversity among students is immense, there are different
pedagogical approaches to achieving educational goals, many programs are implemented in
the education system and more often than not the outcomes and results are realized only
after years of investment.
3
In education there is no single, "one size fits all" answer suited for all needs, nor is there a
single formula for implementing measurement and assessment processes. Different
pedagogical approaches and programs require, accordingly, different measurement and
assessment models. Therefore, the optimal education process must be accompanied by
measurement and assessment processes whose results guide educators and assist them in
deciding what is more and what is less suited for their students, what they should change
and improve and what is better maintained as is.
The National Authority for Measurement and Evaluation in Education (known by its
Hebrew acronym, RAMA) was founded in 2005 to address the need for professional
measurement, evaluation and assessment in the education system. The ideology underlying
RAMA’s activities rests on two principles: (a) assessment in the service of learning, and (b)
provision of professional solutions that effectively integrate different measurement and
evaluation
components
(for
additional
information
about
RAMA,
see
http://rama.education.gov.il).
Large-Scale Tests in Educational Systems and their Frequency
Over the last decades national tests have been administered to large numbers of students in
educational institutions in various countries throughout the world, including Israel. The
significance and importance of these tests is growing, leaving its mark on all parties in the
learning and educational process.
Large-scale assessments and professional surveys are vital instruments for monitoring and
tracking student achievements as well as the extent to which the education system has been
successful in imparting knowledge and values to all learners within the system. Through
objective and professional analysis of assessment tests it is possible to identify gaps which
need to be rectified and to highlight areas that may have been overlooked and in which
even greater resources should be invested. Assessment tests may also spur learning, foster
responsibility and accountability on the part of those in charge of teaching, and enhance the
congruence between teaching and the Education Ministry policy as reflected in its curricula
and in frameworks for teacher training and professional development.
4
Alongside the many benefits inherent in the use of these test systems, it has been
acknowledged that over time they may be accompanied by negative effects on the
education system and on the quality of pedagogical processes in schools. These negative
effects intensify as the tests become more central and important in the eyes of those at all
levels of the system, and particularly as they are perceived as "high-stakes"1 in the eyes of
principals, teachers and students. Professor Don Campbell (Campbell, 1979), one of the
greatest scholars in the social sciences, wrote about this tendency and its ramifications:
“The more any quantitative social indicator is used for social decision making, the more
subject it will be to corruption pressures and the more apt it will be to distort and corrupt
the social processes it is intended to monitor… achievement tests may well be valuable
indicators of general school achievement under conditions of normal teaching aimed at
general competence. But when test scores become the goal of the teaching process, they
both lose their value as indicators of educational status and distort the educational process
in undesirable ways.”
The negative effects have been documented for quite some time in the research literature
(for example, Campbell, 1979; Hamilton, 2008; Koretz, 2005; Koretz & Hamilton, 2006;
Nichols & Berliner, 2007) and reported by RAMA in other publications.
Among the adverse impact of improper implementation of wide-scale tests are:

Diverting teaching resources from subjects that are not included in the national
assessments in favor of subjects included.

Focusing on test preparation, through intensive test-oriented study. This type of
study is often based on memorization and repetition involving fewer higher-order
thinking skills critical for comprehension and long-term mastery of the study
material and for generalization to additional fields of knowledge. Furthermore, this
type of study may bore students and erode their joy of learning, curiosity and
motivation.
1
A test system is “high stakes” when any of the entities in the system feels threatened by the results and
perceives himself or herself as someone who may be hurt if results are low or benefit when results are
high. Risk abundance increases when school supervision is perceived as threatening schools prior to the
tests, and when it tends to use test results in order to reprimand or impose some kind of sanctions.
5

In extreme cases, as a result of the pressure felt by some schools and their desire to
raise their achievements at any cost, some may resort to illegitimate actions that
harm test integrity (for example keeping weak students away from school on the test
day, attempting to obtain in advance information related to test topics and questions
in order to teach them in class, helping students during the test, etc.). Even worse,
these actions send an undesirable educational message to students.
Besides damage to the quality of the pedagogical processes, tests perceived as "high stake"
may also impair test result validity and the ability to draw conclusions that will serve the
system and promote its improvement. Thus, improved test results achieved through
intensive preparation in sample schools tested in a given year, does not necessarily indicate
improvement in the education system as a whole as it does not represent an increase in the
knowledge level of all students. This improved achievement, even if it appears to enhance
the public image of the education system or parts of it, is in many ways only cosmetic and
worthless to policymakers and decision makers who strive to create real and sustainable
change in the system as a whole over time. Only test results collected under “true
conditions”, and without special preparation, can testify to the state of the system. This is
the only way decision makers at different hierarchy levels can become cognizant of
strengths and weaknesses and take action to improve the system.
The education system, in cooperation with RAMA, must act to minimize these negative
phenomena, primarily through a cultural change which upholds “measurement in the
service of learning”, whereby measurement and assessment are intended to serve
learning and not vice versa.
For the system to improve it must ensure that its measurement and assessment tools provide
valid data as much as possible, i.e., its results correctly and accurately reflect the condition
of the system and do not stem from unique and targeted efforts designed solely to raise test
grades. Steps should be taken to eliminate negative phenomena by sending the correct
message to the field and by reducing the pressure and the threat that external test results
serve as the sole evidence as to the quality of school pedagogical processes.
6
Updating the Measurement and Evaluation Format: Integrating
External and Internal Assessment
The perpetual dilemma facing the education system is the choice between independent,
internal measurement carried out by the school (partially free of the pressures described
above, and more suited to students and the material studied in each educational institution)
and external measurement which is standardized, professional and centralized. In other
words, there is a constant tension between decentralized and centralized measurement and
assessment. Some maintain that independent internal assessment is less intrusive and
empowers school principals and the teaching staff compared to external assessment.
However, considerations of responsibility, accountability, transparency, professionalism,
viability, and mainly the ability to make valid comparisons between schools (or between
sectors, countries or other groups), including multi-year comparisons, require that part of
the assessment be centralized, external and carried out by a professional entity responsible
for educational measurement and assessment.
In order to integrate the two approaches, which separately cannot address all needs, and
enjoying the benefits of both, the format of Israel’s national assessment was updated in
2007. The new format was designed by RAMA in collaboration with various entities in the
Ministry of Education and in consultation with school principals and many teachers. The
format is intended to provide a professional solution for educational measurement and
assessment to all stakeholders in the education system: schools and various external
entities. This new format was designed to improve the then existing assessment system
from which it was derived and to address its shortcomings, and was based on the following
principles:

Implementation of a culture of “assessment in the service of learning” in which
measurement is intended to support continued learning improvement through
congruency with learning goals and school vision, and which is based on the
understanding that tests are not a goal in and of itself, but rather a tool in the
service of learning.
7

Informed integration of internal and external assessment, and between formative
(assessment in the process of and for the purpose of learning) and summative
assessment (assessment of learning products).

Maximum decentralization of assessment while ensuring use of professional tools
provided to schools by RAMA.

Empowerment of school principals and teachers.

Reduced pressure and frequency of external tests.

Preferred use of external tests to a sample of students (representative) over
external tests encompassing all students.
The new format established by RAMA combines three elements: sample external
assessment (international and national), independent school-based assessment using
standardized and external tools and internal school-based assessment. In order to maintain a
balance between the various elements, significant reinforcement of internal school-based
assessment is required, and to this end more extensive teacher training in this field is
needed.
The internal Meitzav (detailed below), established concurrently with reduced frequency of
external Meitzav tests, is an important element of the new format. It serves teachers and
management and its results are not reported.
School-Based External Assessment
The Meitzav
The Meitzav (Hebrew acronym for “School Growth and Efficiency Measures”) is a system
of “School Growth and Efficiency Measures” that includes student achievement tests as
well as questionnaires designed to glean information about the school climate and
pedagogical environment (administered to principals, teachers and students). The purpose
of the Meitzav at the school level is to provide school principals and teaching staff with a
tool for planning and utilizing resources for realizing student potential, improving the
pedagogical climate and enhancing instruction in school. At the system level, the
Meitzav is intended to provide a snapshot of the mastery level of Israel’s students in the
curricula of four core subjects, and to serve professional entities and decision makers in the
8
Education Ministry in setting policy on various educational issues, including climate and
pedagogical environment.
The Meitzav achievement tests focus on four core subjects: Native Language
(Hebrew/Arabic) Mathematics, English, and Science and Technology. Tests are
administered to students at two grade levels: five and eight, and the native language test
(Hebrew/Arabic) is administered in grade two as well. The achievement tests are designed
in full congruence with the curricula in each of the subjects and are intended to examine the
extent to which elementary and junior-high school students meet the expected level
required of them according to these curricula. Examples of these tests can be found on the
RAMA website in the “School Assessment” tab on the topic of “Off the Shelf
Assessments”.
Each school belongs to one of four "Meitzav Clusters" – four equal and representative
groups of elementary and junior-high schools in Israel (Clusters A, B, C, and D). Each
cluster of schools is selected such that it will represent all schools nationwide.
Meitzav tests, in the format set by RAMA when it was established, are administered in each
of the four core subjects in a four-year cycle. Schools are tested once every two years in
external tests (external Meitzav) in only two subjects: Mathematics and Native Language
(Hebrew/Arabic) or English and Science and Technology, and two years later are tested in
the two other subjects. In a year in which the school is not tested on an external test in a
given subject it is tested on it through an internal test (internal Meitzav) which is the same
test administered that year on the external Meitzav (for details see the chapter on Internal
Meitzav). Thus each school is tested in an external Meitzav test in each subject (in the
relevant grades) once every four years and in the internal Meitzav test in the same subject in
the following three years. For further information see the RAMA website, “Meitzav” tab on
the subject “General Background – External and Internal Meitzav”.
9
Table Number 1 – Cycles of Meitzav clusters by subject and year
Cluster
of
Schools
A
B
C
D
Knowledge
Field
2006/7
2007/8
2008/9
2009/10 2010/11 2011/12 2012/13 2013/14
Science & Tech
English
Math
Native
Language
Science & Tech
English
Math
Native
Language
Science & Tech
English
Math
Native
Language
Science & Tech
English
Math
Native
language
External
External
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
External
External
Internal
Internal
External
External
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Internal
Internal
External
External
Internal
Internal
Cycle 1
Cycle 2
Internal tests are accompanied by pedagogical material for teachers. The school can use
internal Meitzav grades as they see fit for improvement and internal learning and as part of
the annual assessment provided students. The process of teachers' grading internal Meitav
tests enhances their professional development as it exposes them to professional
indicators/rubrics that define expectations from students and enables them to learn from
students' answers about their knowledge and comprehension levels. Internal Meitav
grades serve, as noted, only the school staff, and the school is not required to report
them to an external entity (see expanded discussion on internal Meitzav below).
National norms, derived from the results of the external Meitzav tests, are reported to the
schools that administer internal Meitzav tests. These norms help the school principal and
the teaching staff interpret data obtained from the same tests administered internally that
year.
10
The school climate and pedagogical environment surveys in the Meitzav are designed to
provide a detailed picture of the school climate and pedagogical processes as revealed
through student questionnaires and interviews with teachers. The questionnaires provide
comprehensive and relevant information about important dimensions in this area, including:
student motivation level; the relationship between teachers and students; violent events and
students’ feelings of safety and sense of feeling protected; team work among teachers, and
more. These dimensions are based on insights gathered from several sources: focus groups
of teachers and principals, discussions with Education Ministry officials, consultation with
academic scholars and reviews of the current literature. The questionnaires are administered
to fifth through ninth graders and to elementary and junior-high school teachers. In the
2008/9 school year a pedagogic and school climate questionnaire was administered for the
first time to high school students. A school survey for high school teachers is in advanced
development stages. For more information see the RAMA website, “Climate and
pedagogical setting surveys” tab.
At the end of the external Meitzav administering process, RAMA produces a
comprehensive school Meitzav report (with respect to school achievements and climate), as
well as a detailed national Meitzav report.
What Can be Learned from the Meitzav?
The importance of using the Meitzav as a working tool stems from the need to obtain an
updated, diagnostic picture of the level of implementation and fulfillment of various system
goals (at the student, class, school, and overall education system level) in order to realize
the potential for continued improvement of schools and the education system.
At the school level – the process in which insights are gleaned from detailed school reports
enables the school staff to examine itself and to view the school as a holistic system from
various aspects: achievements, climate and pedagogical setting.
The findings presented in the reports enable the school staff to identify strengths and
weaknesses in the subjects tested, to identify topics or abilities that were not stressed, to
propose hypotheses with respect to the findings, to learn what additional data should be
collected to confirm or reject hypotheses, to examine the reasons for difficulties found (for
11
example, why students have difficulty in writing) and to design long-term programs that
will address these difficulties. Effective use of Meitzav findings can help schools design
mechanisms to improve school-based processes and plan long-term steps to sustain
improvement over time.
At the system level – the external Meitzav tests provide data about the education system
and schools across different cross-sections that, similar to school-level Meitzav tests, help
identify difficulties and gaps that need to be addressed in order to improve the system.
Thus, for example, achievement data is compared by socio-economic level and across
different sub-groups (for example by sector or gender). Comparisons of this kind provide
decision makers with information about gaps in the education system that require
intervention. Since 2008, valid comparisons of Meitzav grades in different years (see
below) are possible, improving the ability to monitor system quality over time. Moreover,
Meitzav data can serve to evaluate the effectiveness of national educational programs
implemented in the education system. The annual reports are published on the RAMA
website, “RAMA Publications” tab on the topic of “System and Local Government Meitzav
Data”.
Trends over Time and Comparison of Assessment Scores
Two of the more common though not necessarily informed uses of national external test
grades are the comparison of achievements over time and by grade level (“league tables”).
However not every comparison is valid or valuable; for example, does a grade of 85 on an
easy test in any particular year indicate an improvement compared to a grade of 75 on a
more difficult test the previous year? Of course not! Thus, unprofessional and irresponsible
rankings of this nature between schools may cause serious damage.
RAMA is working with all partners to the educational process and the general public to
instill the understanding that rankings have significance, if at all, only among relevant
populations, and that comparisons over years have value only when grade scales are
calibrated. Calibration is designed to neutralize the effect of differing levels of tests
administered in different years in the same subjects, allowing for valid comparisons of test
achievements. The need to calibrate scores from year to year stems from the fact that test
12
formulations differ every year with respect to the difficulty level of test questions. The
grades on the raw grade scale for each year are the total points accumulated by testees for
their answers on a given test formulation. It is impossible to determine for example,
whether the reasons for the rise in the raw score average is the result of an easier test
formulation that particular year or of a rise in achievement among students (or both).
In order to solve this problem and to examine Meitzav grade trends over time, in 2008
RAMA established a statistical calibration scheme of test grades that translates test grades
for each year to a new comparative scale – the multi-year Meitzav scale. The new
calibrated scale is designed to allow valid comparisons of Meitzav test scores over time.
The calibration scheme takes into account differences in test difficulty levels in different
years, "positioning” grades on a new measurement scale, allowing for multi-year
comparisons. The multi-year Meitzav scale was designed such that in the base year 2008 –
grade average was 500 and standard deviation was 100.
The calibration procedure and implementation of a multi-year scale is standard practice in
various national testing systems (for example in the Israeli psychometric exam and the
NAEP in the U.S.A.) and in tests administered as part of international studies. However, it
should be stressed that grades on the new Meitzav scale cannot be translated into grades on
other test systems; they only allow for multi-year comparison of Meitzav achievements
within each subject and grade level.
A comparison of Meitzav results for the years 2008 and 2012 points to a general trend of
improvement in the four core subjects – Native Language, Science and Technology, and
English – in grades five and eight and in Mathematics in grade 5, , in both the Hebrewspeaking and Arabic-speaking sectors. For grade five, a comparison with respect to the test
results in 2007 was also possible.

In grade five – in the six years 2007-2012 a moderate to large cumulative increase
in the four subjects was recorded. The increase changes between knowledge areas
and language sector, and ranges between 30 to 60 points on the multi-year scale.

In grade eight- in the five years 2008-2012 a cumulative increase was recorded in
three subjects: a slight increase in English (about 10 points) and a large increase in
Native Language and in Science and Technology (about 50 points). Student
13
achievements on the 2012 Mathematics test are similar to the achievements
recorded in 2008.

The upward trend in achievements over the years is also reflected separately for
each of the three socio-economic groups (low, medium and high socio-economic
background) for each of the subjects.

For most subjects, changes in achievement gaps (narrow or widen) in favor of
students from a higher socio-economic background were not recorded over time.
Exceptions are Hebrew and English tests for grade five: in 2012 the gap between
achievements on these tests among students in Hebrew-speaking schools narrowed.
The Meitzav also includes an examination of school climate and pedagogical
environment. In this area the findings indicate stability over time with several positive
trends (including an increase in students' reported sense of safety and protection, an
increase in reported appropriate behavior among students in class, a slight decline in
reported violence in elementary school and an increase in reported use of computer
mediated communication for learning purposes).
Sample-Based National Assessment
International Studies
Among its roles, RAMA is responsible for conducting the international studies in Israel.
These studies make possible the comparison of student achievements across many countries
in several subjects as well as the study of other educational issues. Furthermore, results of
these studies enable comparison between different sectors and different population groups
within each participating country. The tests are administered in a fixed cycle once every
few years, and allow for the study of trends over time (calibration is structured into these
tests as well). The international organizations that develop the tests are among the leaders
in the field of evaluation and measurement. The tests, translated into different languages,
are meticulously designed and have high levels of reliability and validity. Each of the tests
and questionnaires is designed according to a detailed and rigorous theoretical framework,
drafted by experts in the subject tested and pedagogy from around the world.
14
In each country the tests and questionnaires are administered to a representative sample of
the population and scores are not reported at the class or school level, only at the country
level. Israel has participated in a series of international studies in recent years, including:
PISA (Reading Literacy, Mathematics and Science for age 15+), TIMSS (Mathematics and
Science for grade eight) and PIRLS (Reading Literacy for grade four).
These studies provide reliable information about the Israeli education system from an
international perspective. This information may be of great importance to policymakers
who strive to obtain an accurate picture of the weaknesses and strengths of the education
system in Israel through objective comparison with other education systems in the world.
Furthermore, participation in international studies enables Israel to learn about new and
contemporary approaches in the subjects tested and to examine its curricula in relation to
curricula in other countries in the world. Thus, for example, the need to strengthen literacy
in Science, Mathematics and Language was identified through the PISA study. The studies
also enable participants to learn from models of successful education systems in
different countries, through comparisons of high and low achieving countries, of countries
that differ in education gaps between and within schools, and through an examination of the
relationship between student achievement and different background variables (such as
parents’ education, socio-economic status, student attitudes towards the subject, etc.). For
example, this is how the world learned about the successful education system of Finland
having very high achievements on the PISA study. The success of the Finish education
system has been attributed mainly to its social and cultural norms accompanied by a reform
based on the empowerment of school teachers and principals.
Media and policymakers often tend to emphasize the rank achieved by each country on the
country ranking, comparing it to the position achieved on previous tests, and presenting the
ranking as the main finding of international studies. However, the main value of these
tests lies in the opportunity they offer to conduct comparisons and examinations
within each country separately, addressing objectives reached in each subject and the
relationship between achievement and background variables and attitudes related to the
various study topics. In Israel, for example, study results allow for comparison between
student achievements in the Arab-speaking sector and the Hebrew-speaking sector, and also
comparison between other sub-groups (boys and girls for example). Above all, through
15
these studies the achievements of the State of Israel can be examined over time, owing to a
cyclical pattern of these tests conducted in a similar format once every three to five years,
while maintaining calibration of the grade scale from one cycle to the next.
The PISA 2009 results published in December 2010 showed a significant improvement in
student achievements in Israel in Reading Literacy (and stability in students’ achievements
in Mathematics and Science Literacy). The results of PISA 2012 are expected to be
published in December 2013. The results recently published for PIRLS 2011 and TIMSS
2011 indicated significant improvement in the achievements of students in Israel. We are
currently preparing for participation in the PISA 2015 study that will be computerized in its
entirety. Further details about each of these tests can be found on the specific websites of
each of the international studies and on the RAMA website, under the “International Tests”
tab.
National, Sample-Based Assessment – Mashov Artzi
RAMA is currently launching a sample-based, national assessment, known as the Mashov
Artzi, that will focus on a different subject each time, and will provide system-level
information about educational achievements in the education system in Israel and also
information about the specific pedagogical context. The Mashov Artzi will enable the
policymakers to examine various subjects, over and above those tested on the Meitzav.
Similar national systems are implemented in several countries, among them the National
Assessment of Educational Progress (NAEP) which has been administered for decades in
the USA and which is the most well-known and advanced.
For each subject that will be tested on the Mashov Artzi a representative sample of schools
will be selected to participate. The Mashov Artzi for each subject will include achievement
tests and questionnaires intended for students, teachers and school principals. The Mashov
Artzi will provide information with regard to learning outcomes related to a wide range of
content, skills and thinking strategies relevant to each subject, and would be based on
relatively small sample of students. The multiplicity of test formats in each subject will
allow for a wide and in-depth coverage of the subject and will provide reliable information
with regard to the sub-topics included in it. The questionnaires will allow for the collection
of information with regard to various variables for describing and characterizing the context
16
in which the subject is taught (e.g., prevalent teaching practices, professional development
of subject teachers, school policy with regard to the subject, etc.). This information would
then contribute to interpreting and explaining the learning outcomes.
The Mashov Artzi will enable policymakers to learn about the success of different
instructional and educational methods by examining gaps found between sectors, and to
examine trends over time. In light of the study goals, the findings on the Mashov Artzi will
be analyzed and reported at the national system level only. Findings at the student, class or
school level will not be analyzed or reported.
The Mashov Artzi will be conducted cyclically once every few years and each knowledge
field will be testesd based on its own test cycle, in order to track indicative trends over
time. The national Mashov will be administered for the first time in the 2014 school year
and the first knowledge field to be examined will be Geography. The Mashov Artzi will be
administered in ninth grade, towards the end of mandatory studies in this subject. Study
tools (achievement tests and questionnaires) will be administered in a computerized setting.
Computerized tests in Geography will assess, among other things, geographical skills in
using technological tools (such as interactive maps). In this way there will be an attempt to
collect information about certain skills that are included among those that are required of a
citizen of the 21st Century. Further details about the national Mashov in Geography,
including the framework document, can be found in the RAMA website, “National Tests”
tab, on the subject “Sample-Based National Assessment/Mashov”. Additional knowledge
fields in which the national Mashov will be conducted have yet to be determined.
National Sample-Based Monitoring of School Violence Level
School violence is one of the central issues on the public agenda with respect to the
education system in Israel. In light of the extensive discourse on this topic and in order to
identify trends related to violence in the education system there has arisen a need to
monitor the level of school violence. .
In 2009 and 2011 RAMA conducted a large scale survey for monitoring violence among
a national, representative sample of students in grades four through eleven in both the
Hebrew speaking sector and the Arabic speaking sector. The aim is to continue to monitor
17
levels of violence using similar questionnaires to be administered once every two years
among a representative sample of students. The third monitoring survey is being conducted
in 2013.
The questionnaires administered to students examined a series of violent and dangerous
behaviors among students which have been summarized in the following indices: severe
violence, moderate violence, social violence, violence using digital media, verbal violence,
violent gangs and bullying, sexual violence, alcohol and drug abuse, violence by and
towards the school staff, cold weapons in school, violence on school buses to and from
school, absenteeism due to fear of injury, students' feelings of safety and protection in
school, school efforts to prevent violence. Questionnaire development was the result of the
work of a steering committee comprised of personnel from RAMA, entities from the
Education Ministry that deal in school violence (the Psychological and Counseling Services
Division - SHEFI, the Youth and Society Department and the two age divisions –
elementary and secondary) and experts from academia.
For the majority of indices a trend of improvement was evident between 2009 and
2011, at the different age levels and in the two language sectors. Stability was recorded
on the remaining indices for these years. Improvement was especially evident (in other
words a decline in the rate of student reporting) for the following indices: severe violence,
social violence, violence by and towards school staff, sexual violence, violence on school
buses and alcohol abuse.
Improvement characterizes both language sectors, and is especially evident in Arabicspeaking schools, and also particularly in 4th-6th and 7th-8th grades. In reports of 10th-11th
grade students in Hebrew-speaking schools stability was recorded for the most part during
these years. In the reports of their counterparts in these grades in Arabic-speaking schools a
trend of improvement was recorded for the most part.
In each of the years 2009 and 2011: The older the students, the fewer reports of most types
of violence – except for violence towards school staff, bringing cold weapons to school,
alcohol and drug abuse.
In each of the years 2009 and 2011: among students in Arab-speaking schools the reporting
rate of most of the negative behaviors examined by the questionnaire was higher in
18
comparison to Hebrew-speaking schools. This holds true except for verbal violence and
alcohol consumption, in which the gap is in the opposite direction.
Further information can be found on the RAMA website in the “Research/Studies and
Project Assessments” tab, under the topic “Violence Monitoring”.
School-Based Internal Assessment
Internal assessment is carried out continuously, and by definition is performed by and under
the initiative of the school staff. The main goal of school-based internal assessment is to
promote student learning. Data/Information collected from the internal assessment help
teachers and school management identify students’ strengths and weaknesses with respect
to expected achievements and guide them regarding the needs of the students and adapting
instruction to these needs. School-based internal assessment is based on an approach of
assessment for learning (Halel – Hebrew acronym), which is intended to provide an answer
to two key questions: where is the student on the way to achieving learning objectives?
What steps are required to promote learning and realize its objectives?
The assessment process is based on gathering information and evidence from a variety
sources and through a range of tools (tests, performance tasks, assessment assignments)
together with an integrative interpretation of the evidence. Interpretation and conclusions
then constitute/are a foundation on which to design appropriate intervention aimed at
achieving learning objectives.
RAMA contributes to the reinforcement of school-based internal assessment by providing
professional assessment tools which also include pedagogical materials (for example,
detailed indicators/rubrics, explanations about mapping questions and definition of tested
abilities, examples of common mistakes and suggestions for further instruction activities ,
etc.).
Various tools available to teachers for school-based internal assessment are as follows:
Internal Meitzav
RAMA provides schools with the external Meitzav tests developed professionally for use in
the context of school-based internal assessment. The internal Meitzav tests are intended to
19
be included as an integral part of a school-based internal assessment routine and to
complement the other internal assessment tools. In the Bulletin of the Director-General, it
has been emphasized that the purpose of the internal Meitzav grades is to exclusively serve
the school staff, and therefore there is no requirement to report results to any /external
entity. For further information see the RAMA website, “School Assessment” tab, on the
subject of “Internal Meitzav”.
The internal Meitzav test is based on the following principles:

An objective, external, national test, with psychometric qualities of reliability and
validity, developed by RAMA in collaboration with professional committees. The test
reflects the curriculum and requirements expected from students in each core subject
and at given grade levels, in terms of knowledge and skills. Internal examination
scored by school staff (with the help of indicators/rubrics and scoring tools) of
individual and group assessment can be produced quickly regarding students’
proficiency in every subject.

Enables comparison of student achievements to external norms (national, district,
sector) gleaned from the external Meitzav test.
The benefits for schools from the internal Meitzav test include the following:

Enhanced school-based internal assessment processes. Schools can gain a snapshot
of their condition that combines information based on external assessment sources
that can be adapted to the school context.

Data-based decision making. School administration and teaching staff can gain
insights from the test grading process and the results that will assist them focus on
appropriate educational and learning goals in alignment with school vision.

Adapted to school needs. The school can use internal Meitzav grades as it sees fit, as
part of annual student assessment.

Reduction of the negative phenomena that often accompany the external Meitzav,
for example diversion of resources (study time and teaching cadre) at the expense of
other subjects, distancing weak students from school and reduced motivation of
some students to take a test that “does not count,” for the individual grade.
20
Off-the-Shelf Tests, Formative Tests and Banks of Performance Tasks
RAMA provides a wide variety of class-based internal assessment tools: previous versions
of achievement tests, formative exams and tests, specifications and test rubrics, banks of
performance tasks in different subjects. Information gathered from these tools can serve as
the basis for developing intervention programs adapted to student needs. The performance
tasks designed and provided to schools by RAMA are intended to assess learning processes
and products as well as complex learning abilities. These include: a system perspective,
problem solving, taking a position, critical thinking, drawing conclusions, planning and
identifying connections.
Following are additional tools available to schools for internal assessment:
Hebrew reading and writing test for grade one:

The purpose is to track the proficiency of students in Hebrew reading and writing
skills. Test results serve teachers in designing appropriate intervention.

The test is comprised of a series of tasks which are administered individually
(teacher—student) during the school year.
Arabic reading and writing test for grade one:

The purpose is to track the proficiency of students in Arabic reading and writing
skills. Test results serve teachers in designing appropriate intervention.

The test is comprised of a series of tasks which are administered individually
(teacher-student) during the school year.
Kit for assessing beginning reading in English for grade five:

The purpose is to identify students with difficulty in beginning reading in English
and identify particular difficulties that may be obstacles to reading acquisition. Test
results serve teachers in designing appropriate intervention.

The test is comprised of two components: one is a screening test administered by
the teacher to all students, the other is a diagnostic test administered individually.
21
Amit test in native language for grade seven (Hebrew and Arabic):

The purpose is to assess the proficiency of students in reading comprehension and
writing reading and writing (Hebrew/Arabic as first language). Test results serve
teachers in designing appropriate intervention.

The test is comprised of three parts. Each part includes one reading text and test
items relevant to reading comprehension, grammar and writing. The texts cover
different genres, and questions relate to skills such as locating information with a
low, medium and high access level; linguistic structures, and vocabulary. All
students participate in the first part, and then proceed to a second part which is
suited to their level of performance (as indicated by results on the first part).
Kit for assessing spoken language in Hebrew in grade eight:

The purpose is to assess the proficiency of students in speaking Hebrew and was
originally developed as part of the internal Meitzav for native speakers of Hebrew in
grade eight..

The kit covers three aspects of spoken language: reading out aloud, reporting and
group discussion. It includes assessment tasks, rubrics for assessing student
performance and a teacher's guide.
Kit for assessing spoken language English for junior high school:

The purpose is to assess the proficiency of students in speaking English

The kit includes five units that relate to various aspects of oral social interaction
and presentation. Each of the units is comprised of assessment tasks, rubrics for
teachers and students suggestions for teaching activities that promote development
of spoken language, and a teacher's guide
Kit for assessing Hebrew among immigrant students entering grades three through nine:

The purpose is to examine when and how immigrant students can be successfully
integrated into homeroom classes – among their Hebrew native speaking
contemporaries– and to participate in classes in the various subjects, while receiving
assistance.
22

The kit includes five sections: discourse, reading out aloud, listening
comprehension, reading comprehension and writing. The kit is accompanied by a
teacher's guide including very detailed guidelines regarding the structure, goals,
administration dates, administration mode, duration and scoring.
Bank of performance tasks in "Culture and Heritage of Israel" for grades six through eight:

The purpose is to assess achievement in this school subject. Student performance
on these tasks serves as the basis for planning and improving teaching processes and
is a basis on which the teacher can provide effective feedback that promotes
learning.

The kit includes 12 tasks that reflect four main themes: Jewish literature; Jewish
calendar and life cycle; the affinity of the Jewish people to the Land of Israel; the
image of the State of Israel as the State of the Jewish people. There are four tasks
for each grade level, each deriving from one of the four themes. The kit is
accompanied by a teacher's guide including very detailed guidelines as to the
administration and scoring of the tasks, as well as recommendations for instruction.
Use of off-the-shelf tests and tasks can contribute to integrated assessment in the learning
processes, improved planning of classroom instruction and increased effectiveness of
decision-making processes in class and in school. Analysis of the results of the various
tools enables teachers to plan intervention activities in line with the various needs of their
students, such as: create learning groups according to difficulties detected, select
appropriate learning material, reinforce topics not learned properly, address mistakes and
common misconceptions, etc. Schools can set priorities regarding resource allocation and
professional development of teaching staff in subjects in which many students encountered
difficulties.
The use of formative measurement and assessment tools and strategies requires a cultural
change at the school and class level, in favour of cultivating a culture that does not only
focus of grades, but also on the learning process itself. Within such a culture students
receive feedback, support and assistance which is based on the assessment process. Most
importantly, this culture ensures that student assessment is used effectively.
23
The School-Based Assessment Coordinator
RAMA attaches great importance to the development of school-based assessment processes
to assist schools in defining their information needs to function optimally and to reap the
utmost from internal and external assessment. To assist schools in collecting valid data and
making informed decisions based on these data, RAMA has acted to define a new position
in the education system
–
the school-based assessment coordinator. Within the
framework of two agreements, the “Ofek Hadash” and the "Oz Le'Tmura" between the
Ministry of Education and the Teachers’ Unions (Histadrut Ha'Morim and Irgun Ha'Morim)
the role of assessment coordinator is included in the list of role holders entitled to
remuneration. According to the agreements,
all schools can appoint a school-based
assessment coordinator, provided that he or she has teaching experience and a Master’s
degree in measurement and assessment, or alternately, in another field and has completed
an academic specialization in measurement and assessment.
The role of the school-based assessment coordinator is to take the lead in incorporating a
school-based assessment culture in collaboration with other position holders on the school
staff, and under the supervision of the school principal. A school-based assessment culture
assumes that the school community is a learning one that views assessment as a central
component of the teaching-learning-assessment process and uses different assessment
tools through mechanisms that foster school norms and values as an integral part of its
work and/or its learning. The position entails dealing with assessment topics common to all
schools, but mainly with unique topics that address the needs of the specific school and are
congruent with its educational objectives, characteristics, its world view and culture.
The implementation of a school-based assessment culture begins with the design of an
organizational system and school-based mechanisms that allow for cooperation alongside
the development and management of a repository of internal assessment tools and building
a school-based database. It is believed that the assessment coordinator will set systematic
and continuous change in motion within school and thus the school staff will make the
connection between the curricula and the goals measured in each of the subjects tested.
This information will help interpret the test grades in a meaningful way that allows for the
identification of student strengths and weaknesses and achievement gaps between learners.
24
Utilizing this information will help monitor progress and examine changes in student
performance, and to identify topics and fields in the curricula that require strengthening,
reinforcement, or improvement.
The assessment coordinator advises school staff in all matters regarding the assessment of
student achievements and their progress using varied and innovative assessment methods.
The coordinator acts to foster the perception of “assessment for learning”, that stresses
improvement and streamlining of school teaching and learning methods based on
information gleaned from measurement data. It is also the coordinator’s responsibility to
help in interpreting data from school-based assessment reports, articles, and other data, and
to extract implications at the specific school level. As part of the assessment culture
fostered by the assessment coordinator, systematic information can be collected about the
myriad educational projects in which the school is involved, which in turn will lead to
system level discussion and conclusions.
Evaluation of Teaching Staff and Investigation of Teaching Practice
Evaluation of teaching staff
Teacher2 and principal evaluation is an important component in the process of promoting
teaching and learning quality (Isore, 2009). For years principals have evaluated teachers
and supervisors have evaluated principals, each in their own way: at different times, with
different tools and with respect to different aspects.
As part of the “Ofek Hadash” reform, a new promotion scale for teachers and other
educationalists3, vice principalsand principals was introduced, including a number of
junctures at which summative evaluation is required. The need for summative evaluation as
part of the career path of teachers and principals creates in effect a continuous, organized
and uniform evaluation procedure for the whole system for promoting teachers, other
educationalists, vice principals and principals at different stages.
2
This article will not expand on the teacher evaluation model at the high school level which was addressed in
the “Oz
LeTemura” reform agreement, as it is still in its infancy.
3
Other educationalists: kindergarten teachers, counselors, health professions, interns
25
The teacher evaluation tool was developed by RAMA in 2010, in collaboration with
representatives of the Ministry of Education and its districts. Tool development was
accompanied by many focus groups comprised of supervisors, principals and teachers,
using existing tools found in Israel and other countries. The tool was designed to reflect the
complexity of the teacher’s work and to create a common language among all Ministry of
Education entities (supervisors, principals, teachers and Ministry personnel) in relation to
all aspects of teacher performance.
The teacher evaluation tool is based on the following four meta-indicators:
Meta-indicator 1: Role perception and professional ethics refers to aspects related
to identification with the teaching and educational role and commitment to the
organization and the system.
Meta-indicator 2: Subject knowledge refers to knowledge of the subject and its
teaching.
Meta-indicator 3: Educational and learning processes refers to aspects related to
lesson design and organization, teaching methods, learning and assessment and a
supportive learning environment.
Meta-indicator 4: Partnership in a professional community refers to aspects
relating to the teacher’s participation in the professional community of the school
and that of the subject.
One of the important achievements deriving from the tool development process for teacher
evaluation is system-wide agreement that these four meta- indicators provide a uniform and
structured answer to the question: “Who is a good teacher?”
Teacher evaluation is carried out by principals based on systematic data collection,
including documented observations of teachers and gathering of other relevant material.
Additionally, for evaluation purposes teachers as requested to complete a self-evaluation
questionnaire based on the meta-indicators. The principal and teacher then meet for a
feedback meeting aimed at discussing the gaps in their evaluations. The entire evaluation
process is carried out on an online system specifically developed for this purpose. For
further information see the RAMA website in the “Evaluation of Educationalists and
26
Administrators”, on the topic of “The Teacher Evaluation Tool” and "Evaluation of Other
Educationalists".
The school principal evaluation tool was designed by RAMA based on the perception of
the principal’s role as outlined by the Israeli Institute for School-Based Leadership - “Avnei
Rosha” - that includes the following four meta-indicators:
Meta-indicator 1: Formulating a vision and leading school-based policy refers to
the following aspects of leadership: formulating an education-based vision, teaching
and learning in a socio-environmental context, designing a school-based work plan
and tracking its implementation.
Meta-indicator 2: Improving teaching, learning and education refers to the
following aspects of leadership: planning learning in school, institutionalization of
school-based assessment and learning; school culture and climate that support
learning and learners, and accountability.
Meta-indicator 3: Leading and professional development of school staff refers to
the following aspects of leadership: management of the professional development of
school staff; promoting a school-based professional community and cultivate
school-based leadership, and professional ethics.
Meta-indicator 4: Mutual relations with the community refer to maintaining
mutual relations with the community of parents and the community at large.
For further information see the RAMA website in the “Teacher and Management Personnel
Evaluation”, on the topic of “The Tool for Evaluating Principals and Vice-Principals”.
These two tools, the teacher evaluation tool and the school principal evaluation tool,
include descriptions of teacher/principal behaviour in each of the four meta-indicators, at
different performance levels which represent a professional development scale. The
teacher/principal evaluation is determined based on the performance level on each of the
detailed components of the tool in each of the meta-indicators. As such, these tools provide
a framework for the professional development of all educators, and allow for the
identification of needs at the individual, school and system level. The other evaluation tools
(the tool for evaluating vice-principals, the tool for evaluating kindergarten teachers, the
27
tool for evaluating interns, the tool for evaluating counsellors, the tool for evaluating health
professions) were developed based on the same principles that have been described here,
but the dimensions are suited to each particular population.
International Teacher Survey - TALIS
Israel participates in the Teaching and Learning International Survey (TALIS) conducted
within the framework of the OECD Education Administration. The survey is designed to
help decision makers to formulate policy that defines suitable conditions for promoting
effective teaching and effective schools. Participation in the study also allows for learning
about differences in educational policy between countries, and about the influence of this
policy on the school environment.
The study is based on a set of indicators that provide a solid and coherent description of a
“healthy” school system and is based on measurable variables of the functioning of
education systems and their performance. The set of indicators is founded on a conceptual
framework of “teaching at its best” and on the school-based conditions that make it
possible. Thus, for example, study provides information about: teachers’ teaching practices,
their beliefs and attitudes, the functioning and actions of school leadership, evaluation and
feedback that teachers receive regarding their work, classroom and school climate,
teachers’ sense of self efficacy and their work satisfaction.
The Use of Test and Survey Data for Research and for Program and
Project Evaluation
The desire to integrate measurement and assessment as an integral part of educational
programs reflects a growing trend in recent years, a trend that attaches importance to
research-based educational interventions and to findings about their efficacy, and they have
become a prerequisite for program implementation. In a policy formulated by the United
States government at the beginning of the millennium (as it appeared in the No Child Left
Behind Act, 2001), educational interventions were required to prove that they were
evidence- based and predicated on meticulous and systematic research that produced valid
findings before receiving funding for implementation. However the demand for a research
base does not necessarily define the validity of the findings. The quality of the findings
28
depends on the research method. The most valid findings are produced from controlled
experimental studies based on randomized controlled trials (RCT). Such studies are not
common in education and are complicated to perform due to the variance between schools,
the complexity of schools as organizations, the difficulty in uniformly implementing them
and in controlling experiment conditions as well as ethical problems involved in conducting
such experiments (providing certain programs to certain students and preventing them from
others). In light of these difficulties there are researchers who claim that the education field
differs from clinical fields even to the extent that it is impossible to conduct randomized
and controlled experimental studies. On the other hand, there are those who claim that the
importance of the education field demands that a special effort be made in order to produce
research- based findings, as otherwise choosing policy, interventions or educational
practices will be a matter of taste or fashion forced on schools without examining the cost
and benefit.
Though the debate between these approaches is growing the development in the field
cannot be stopped. The desire to produce research-based data that will allow for informed
decision making can also be realized by other research methods, more accessible and
simple to perform, that provide findings at a reasonable level of validity. Among these
methods, the best one is quasi-experimental research that is based on comparison groups (in
terms of characteristics). At lower levels of validity are studies conducted according to the
method of pre- post-tests; after them – studies based on correlations and those based on
case studies. At the bottom of the list are anecdotal studies through which generalization is
not possible and therefore their findings are not considered to be valid according to
accepted standards.
RAMA strives to produce the most valid data and as such its studies are based on
randomized controlled trial (RCT) research methods, quasi-experiments, pre-post studies,
correlations and sometimes also on case study studies. In this sense, RAMA’s operating
theory is closer to the perception according to which education policy must be based on
valid findings as much as possible despite the uniqueness of the field and the difference
compared to clinical fields. The qualitative ranking of the various research methods is
reflected in practice in comparison databases of education studies – databases that evaluate
29
the quality of existing knowledge in different areas of the educational endeavour and
summarize it from a critical perspective. One of the important databases is the What Works
Clearinghouse of the U.S. Department of Education's Institute of Education Sciences. From
this and similar databases one may learn about different programs for which there are valid
findings that testify their efficacy level. These databases evaluate the quality of programs
based on various studies.
RAMA’s activities are slightly different since it deals mainly with applied and not in
theoretical research of education. The purpose of assessment is to provide information
about results of programs or policy, as a means for their improvement, in other words
information that will be meaningful to decision makers in real time (Weiss, 1998).
Assessment focuses on findings that can be implemented in the field, while research aims
to reach data that can be generalized and used to advance science. Research focuses on the
possible theoretical contribution and from this it derives the research question and method,
whereas assessment derives questions and goals from the needs of the field and those of
policy makers, and therefore its priority is for immediate, practical and specific uses.
Despite the differing goals, research methods are similar and RAMA bases its work on
controlled assessment schemes that will produce valid findings.
The program or project evaluation process is comprised of several milestones: at the first
stage expectations are coordinated with the evaluation requestor, and these include the
definition of evaluation goals versus project goals. At the second stage, a research team that
suits the nature of the required study is established (the study usually being a combination
of qualitative and quantitative methods). In the case of large-scale projects, a steering
committee is also appointed to operate in conjunction with the study team. The latter
formulates an evaluation proposal that includes schedule and budget and also a literature
review of the field. The third and final stage includes performing the evaluation and
preparing a summary report. The report is submitted for review and following corrections is
published. At the end of the process a discussion is held with the participation of relevant
entities in the Education Ministry to study the findings in order to learn and draw
conclusions.
30
The programs and projects that RAMA evaluates are varied – in terms of content
(educational, social, value, institutional and others) as well as scope and complexity.
Requests for evaluation are received first and foremost from the Education Ministry
administration and its various divisions. Nonetheless, when planning a study scheme an
attempt is made to collect knowledge and information that will be relevant to a wider circle
than that of the original requestor, and findings are also published on the RAMA website.
Each program is evaluated according to a set framework. This framework addresses
context, input, processes and outputs that RAMA evaluates (Stufflebeam, 1983), while
adjusting for the appropriate research method for the program and the requestor’s needs.
Examining the context of a program begins by identifying needs and mapping the target
audience and the environment in which the program will be conducted. This is followed by
learning about the goals of the program and how it is suited to the context. Sometimes
evaluation even helps program developers to articulate for themselves the underlying
program principles, goals and perceptions as it requires contending with structured
questions about the nature of the project. Inputs are reviewed by examining the program
rationale (with the help of experts in the specific field and literature review) and the
suitability of inputs to the needs. Process review includes analysis of program
implementation in terms of implementation methods, target audience, operating entities,
resource utilization, etc. (through surveys, observations and interviews). Outputs are
examined by measuring the effect on the target audience over different time periods.
Educational program evaluation activities also provide formative evaluation that collects
information in real time to improve implementation as well as summative evaluation that
examines project products. In summative evaluation the main goal is to evaluate program or
policy effectiveness. Effectiveness evaluation investigates program implementation
methods and its results. Sometimes evaluation requests deviate from the accepted
distinction between formative and summative evaluation. Thus for example is the request to
monitor and control various programs operated by Education Ministry divisions. It is
important to stress that collecting, monitoring and controlling data is not RAMA’s
responsibility, and should be performed by program leaders. Nonetheless, monitoring and
control data serve RAMA as part of the information required for project assessment.
31
Many tools and data are used in evaluations: structured interviews, observations, focus
groups, achievement data (based on Meitzav, matriculation and other tests), data about
school climate and pedagogical environment, data from the evaluation of teaching staff and
international studies. RAMA adapts varied research tools to the changing needs of the types
of different projects and to the type of evaluation required.
As stated, RAMA evaluates a wide-range of programs for varied reasons – some due to
their broad scope and their importance to the system (evaluation of the “Ofek Hadash”
reform, for example). There are others that are evaluated despite their limited scope. For
example, programs that have the potential to influence the entire system if they are
successful (this is the case with the program based on the personal educational model
operating in the city of Bat Yam, or evaluation of the centers for adults completing their
secondary education). Another example are programs that are the focus of attention of the
requesting entity (for example the Immigrant Absorption Division’s program for student
dropout prevention or the Amirim excellence program). Another type of program is
evaluated as part of an effort to pool resources from a system perspective – several
programs are often operated by different entities although their common aspects exceed the
differences between them since they all have the same goal, and often even the same
pedagogical method (this is the case with programs for increasing matriculation eligibility,
programs for encouraging reading at the kindergarten level, etc.). In these instances the
purpose of evaluation is to map the field, describe the variety of programs and finally to
examine the effect of the programs taken together.
In the evaluation of programs RAMA gives priority to a research scheme that will produce
the most valid results. Accordingly, when possible, a controlled experimental study is used.
Notwithstanding, implementing such a scheme involves many difficulties as described
above. A good example of this is the program for integrating computer mediated
communication technologies in elementary schools that was operated by the Education
Ministry. In light of resource limitations the Ministry sought to implement the program in
periphery areas of the country. The pedagogical logic underlying the decision is clear, yet
in terms of research it does not allow for a controlled experiment study. Nevertheless an
attempt was made to create an experimental scheme that would answer the questions as to
32
whether schools that received advanced computer mediated communication resources
improve teaching and learning processes and student achievements. For the study
experimental group 60 schools that took part in the program were randomly sampled, and
for the control group 60 schools that did not participate in the program and whose computer
mediated communication level was low were sampled. The experimental group was invited
to a conference in which it was explained that the attending schools were selected for a
study and they received tools for implementing the plan. Schools selected for the control
group were not told anything. The problem in the study arose of course when schools in the
control group began to obtain technological equipment on their own or with the help of
extra-program entities (local government for example), since after all it is not possible to
halt/stop their progress. In this sense the difference between a clinical medical study and a
study in education is obvious, as in the latter the effect of a placebo pill cannot be simulated
and it is impossible to control or oversee the control group in a way required by the
experiment.
Other difficulties in performing evaluations include: lack of organized databases in many
programs, delayed request to RAMA to combine an evaluation scheme in a project in
advanced implementation stages, the large number of studies, tests and questionnaires with
which schools must contend, the tension between complex processing of data and
preserving the relevance of knowledge produced from them, difficulty in attaining
meaningful data (compared to the relative ease of gathering data of limited significance
such as satisfaction surveys), lack of clarity regarding program goals (even for program
developers) and political changes that create frequent changes in the Ministry's interest in
certain programs.
Despite these challenges, RAMA makes an effort to integrate advanced evaluation models
and data processing methods with the intention of expanding the knowledge gathered
through project evaluation and make it even more meaningful in decision making
processes.
The various research and evaluation reports prepared by RAMA are presented on the
RAMA website, under the “Studies/Research and Project Assessment” tab.
33
Assessment in the Service of Learning – Summary
Effective use of measurement and evaluation, at the school and at the system level, is
dependent on the continued existence of a learning culture that views test and survey
findings and internal assessment findings as having the potential to create genuine
improvement rather than alleged improvement – this is the essence of “measurement in the
service of learning”.
The natural tendency of any person being assessed is to try to improve his or her ranking
with respect to assessment index in any way possible, especially when the assessment is
high-stakes, and this exists in many and varied fields. For example, in the field of medicine
the demand to publish hospital mortality data seems to have brought about a tendency
among some of the hospitals not to admit patients whose condition is serious. Publishing
the number of operations per physician in the United States lead hospital physicians to
unjustifiably and disproportionally increase the number of operations performed even
when they were deemed unnecessary.
In August 2012 the High Court of Justice in Israel published its decision/judgement
(1245/12) that accepts the demand of the Movement for Freedom of Information in Israel to
publish Meitzav grades at the school level, and rejects the position of the Ministry of
Education and of RAMA according to which the damage expected from the publication
exceeds the benefit. This judgement may have a significant effect on Meitzav tests and
perhaps also even on their administration in their current format. For details see RAMA
website in “Meitzav” – “General Background – external and internal Meitzav” tab in the
link (“Policy for Publishing Meitzav Results”).
RAMA sees it main purpose in assisting the education system to improve through
informed use of measurement and assessment. The concept of “measurement in the
service of learning” is not only a slogan – RAMA views it as the justification for its
existence. RAMA believes in combining external and internal assessment. The two types of
assessment are related to differing approaches to generate change processes in
organizations in general and in educational organizations in particular. External assessment
deals in overall change that takes place in education systems and is executed at the national
34
level by the government. This approach reflects top down change. Internal assessment deals
in bottom up change and is led by the initiative of teachers or the school as an organization
– they are the ones that determine activity patterns. Education literature recommends the
effective combination between “bottom up” and “top down” change. RAMA also believes
that only dialogue and a process that acts simultaneously in both modes can create the
desired change. Creating a dialogue that will lead to shared consensus (at the school,
Education Ministry and the social level) is the desired path (Levin, Glaze & Fullan, 2008).
This assessment model, that includes large-scale external assessment alongside professional
internal assessment, strives to limit the role of large-scale assessment as the sole
determining assessment, and alongside it poses standards for beneficial internal assessment.
Class assessment is still far from providing quality information and from gaining
recognition and trust, however it must be recognized that large-scale assessment cannot fill
these needs and we must direct efforts to create a proper balance between the two. To this
end there is a need for professional training and development in the field of measurement
and assessment. The more professional the school assessment processes, less the influence
of external large-scale tests.
In summary, effective measurement and assessment require the existence of these
conditions: cooperation between all stakeholders, and between them and the measuring
entity; agreement as to educational goals and objectives; agreement about measurement
goals and honing them; good understanding of the series of tests and interpretation of their
findings; access to findings and transparency of results; continuity and consistency between
measurement cycles; fairness towards those assessed, and moderation of threatening
aspects; professionalism – and integrity. Only by this will we have “assessment in the
service of learning”.
35
Bibliography
Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and
Program Planning, 2, 67-90.
Hamilton, L.S. (2008). High-stakes testing. In N. J. Salkind (Ed.), Encyclopedia of Educational
Psychology, Vol. 1 (pp. 465-470). Thousand Oaks, CA: Sage.
Isoré, M. (2009). Teacher Evaluation: Current Practices in OECD Countries and a Literature
Review. OECD Working Papers No. 23, OECD Publishing.
Koretz, D. (2005). Alignment, high stakes, and the inflation of test scores. In J. Herman & E.
Haertel (Eds.), Uses and misuses of data in accountability testing. Yearbook of the National
Society for the Study of Education, Vol. 104, Part 2, 99-118. Malden, MA: Blackwell
Publishing.
Koretz, D., & Hamilton, L. S. (2006). Testing for accountability in K-12. In R. L. Brennan
(Ed.), Educational measurement (4th ed., pp. 531-578). Westport, CT: Praeger.
Levin, B., Glaze, A., & Fullan, M. (2008). Results without Rancor or Ranking: Ontario’s
Success Story, Phi Delta Kappa, 90 (4), 273-280.
Nichols, S. N. & Berliner, D. C. (2007). Collateral Damage: The effects of high-stakes testing
on America’s schools. Cambridge, MA: Harvard Education Press
Stufflebeam, D.L. (1983). The CIPP Model for Program Evaluation. In G.F. Madaus, M.
Scriven, and D.L. Stufflebeam (Eds.), Evaluation Models: Viewpoints on Educational and
Human Services Evaluation. Boston: Kluwer Nijhof.
Weiss, C. (1998). Evaluation: Methods for studying programs and policies. New-Jersey:
Prentice Hall.
36
Download