Uploaded by Reagan Miruka

epsc 311 measurement and evaluation-1-1

advertisement
EPSC 311: Measurement and Evaluation
Course Outline
Course Purpose
This course prepares the student teachers in measuring and evaluation of learners’ ability.
Expected Learning Outcomes
1. Describe the relationship between measurement, evaluation, testing and examination
2. Explain the historical development of measuring instruments
3. Explain the scales of measurement
4. Describe the characteristics of a good test
5. Discuss test construction, planning and Administration
Course Content
Week 1: Measurement, assessment, evaluation, test and examination
Week 2: Principles of evaluation
Week 3: Historical development of measuring instruments
Week 4: CAT 1
Week 5: Scales of measurement
Week 6: Characteristics of a good test
Week 7: Types of tests
Week 8: Test item formats
Week 9: CAT 2
Week 10: Test construction, planning and administration
Week 11: Quantitative methods applied in selection of test items
Week 12: Qualitative methods applied in selection of test items
Week 13: Test validation
Week 14: Interpreting test results
Week 15: Reporting test results
Teaching and Learning Methodology
Lectures, Discussions, Assignments
Course Evaluation
Continuous Assessment Test:
30%
Examination:
70%
Total:
100%
REFERENCES
Bennaars, G. A.; Otiende, J. E. & Boisvert, R. (1994). Theory and Practice of Education.
Nairobi: East African Educational Publishers.
Ebel, R. L. & Frisbie, D. A. (1991). Essentials of Educational measurement. Fifth Edition.
London: Prentice Hall Inc.
Musial, D.; Nieminen, G.; Thomas, J. & Burke, K. (2009). Foundations of Meaningful
Educational Measurement. New York: McGraw Hill
Nasibi, M. W. (2003). Instructional Methods. General Methods for Teaching Across the
Curriculaum. Nairobi: Strongwall Africa.
Measurement, Assessment, Evaluation, Test and Examination
Measurement refers to the procedure of assigning numbers to a specified attribute or behaviour
of an individual in such a way that the number describes the degree to which the individual
possesses that attribute or behaviour. Measurement answers the questions “how much?” or
“how many?”
Assessment is the process of determining what learners have achieved by using a specific
measure which yields quantitative data.
Evaluation is the process of collecting quantitative or qualitative information, analysing the
information and presenting it in a form that facilitates decision making among alternatives. It
is the systematic process of collecting, analyzing and interpreting information to determine the
extent to which students are achieving instructional objectives. It is the process of attaching
value judgement on whether teaching and learning has occurred. Tests and examinations
contribute valuable information for making comprehensive judgement on the quality and
relevance of educational programmes against the stated educational objectives.
A test is a systematic procedure for measuring a sample of behaviour of a candidate. A
systematic procedure implies that a test is developed, moderated, test papers produced,
administered, marked, scored and outcome released according to prescribed rules. A sample
implies that a test contains only a selection of all possible items (questions) that could be
developed to measure a particular behaviour. Behaviour implies that a test only measures the
test taking attributes (stated in the specific objectives of the curriculum) in a standard situation.
A test answers the question “How well does the individual perform” and consists of several
questions or tasks.
An examination consists of several tests that measure different characteristics, attributes or
behaviour of a candidate for purposes of decision making. It is the process of gathering
information to monitor progress and make educational decisions.
Importance of measurement and evaluation
Teaching is a conscious educational activity or process and therefore it is important to take
stock and gauge how the teaching and learning process is fairing from time to time by
measuring and evaluating the performance of the teachers as well as that of the students.
i. To provide information for effective educational and vocational guidance and
counselling services.
ii. To provide information for grading students, for promoting students to the next level
of education and for making meaningful reports to parents.
iii. To provide information to teachers on the extent to which instructional objectives have
been achieved; that is whether the most appropriate teaching strategies and
teaching/learning resources were used.
iv. To provide information regarding student’s success or failure in mastery of skills,
attitudes and behaviour. This will help teachers to identify areas in which students
require remedial teaching with respect to their performance. Similarly, students will
identify what they have done well and what they need to improve.
[EPSC 311]
2
v.
To provide information regarding the effectiveness of the entire educational institution
and point out certain aspects may be improved. This will enable the local community
identify areas in which it can supplement school effort in improving the academic social
life of the community.
vi. To provide information for students’ certification purpose. This will enhance placement
of students to various curricula, universities, colleges and vocations.
vii. To provide information that will help in determination of the quality of education being
given to students in relation to national goals where the system of education can be
retained or overhauled. This may enable research on the existing curriculum by the KIE
which could suggest necessary amendments to improve it.
Principles of evaluation
Principles of evaluation are laws, rules or general patterns that explain how evaluation should
be conducted.
i. Evaluation should be continuous at four levels:
a. At the beginning of the school year to determine the learners’ entry behaviour.
This is referred to as pre assessment.
b. During teaching with the purpose of getting feedback on learning. This is done
using continuous assessment tests and well as internal end of term examinations
c. At the end of the programme for example end of primary school education or
end of secondary school education.
d. Follow up evaluation carried out after the implementation of the programme.
ii. Evaluation should be comprehensive: This means that a variety of instruments should
be employed in evaluation to test many variables. This calls for essay tests, objective
tests as well as practicals.
iii. Evaluation should be consistent with objectives: This means that the evaluation content
or items should be related to the instructional objectives the evaluator had set to achieve.
iv. Evaluation should be valid: This means that the process of evaluation should facilitate
measurement of what the evaluator intend to measure.
v. Evaluation should be reliable: This means that the evaluation results should be
consistent if the evaluation process is repeated under similar conditions for the same
group of candidates.
Types of Evaluation
1. Placement Evaluation: This is concerned with students’ entry behaviour and it focus
on questions like
a. Does the student possess knowledge and skills needed to begin the planned
instruction?
b. To what extent has the student mastered the objectives of the planned instruction
c. To what extent do the students’ interests, work habits and personal
characteristics indicate that one mode of instruction might be better than the
other.
Answers to these questions require the use of a variety of techniques such as readiness
test, aptitude test and observational techniques. Placement evaluation determines the
position of teaching that is most likely to benefit the students.
[EPSC 311]
3
2. Formative Evaluation: This is used to monitor the learning progress during instruction
or teaching and to provide continuous feedback to both learners and teachers
concerning learning success and failures. The feedback to students reinforces
successful learning and identifies the learning errors that need correction. Formative
evaluation is directed towards improving learning and teaching. The results of
formative evaluation are not used to assess course grades.
3. Summative Evaluation: This is designed to determine the extent to which the
instructional objectives have been achieved and is primarily for assigning grades or
certifying students. Summative evaluation is done at the end of the course or unit of
instruction. The techniques used in summative evaluation are determined by
instructional objectives and the techniques include teacher made achievement tests,
rating of various types of performance such as laboratory oral report and evaluation of
products such as themes, drawings and research reports. Summative evaluation also
provides information for judging the appropriateness of the course objectives and the
effectiveness of instruction.
4. Diagnostic Evaluation: This is concerned with persistent or recurring learning
difficulties that are left unresolved by standard prescriptions of formative evaluation. If
the students continue to fail in for example reading and computing despite the use of
prescribed methods of teaching, then a more detailed diagnosis is indicated. Serious
learning problems require the services of remedial, psychological or medical
specialists. Diagnostic evaluation determines the causes of learning problems and
formulation of a plan for remedial action.
Historical development of measuring instruments
Scales of measurement
Measurement scale refers to assigning of numbers to objects or events according to a specific
set of rules. There are four measurement scales which include: nominal scale, ordinal scale,
interval scale and ratio scale.
i. Nominal scale: This is a scale in which numbers are used to level, classify or identify
people, events or objects of interest. The numbers used in nominal scale are arbitrary
and do not represent any quantity because they are merely used for categorizing or
labeling data. Nominal scale is used when one is interested in knowing if certain objects
belong to the same or different classes or groups e.g. teachers in a school may be
assigned into two groups: peer teachers = 0, untrained teachers = 1, trained teachers =
2. The numbers 0, 1 and 2 are just codes and do not indicate magnitude. They can only
be used for counting.
ii. Ordinal scale: This scale groups subjects into categories and ranks them in some order
such as ascending or descending order. Ordinal scale allows one to determine which
measure is better than others but it does not allow one to assess by how much or how
many a measure is better than the other. Therefore, there is classification as well as an
indication of magnitude (size) and rank. What is important in ordinal scale is the
position of the subjects. These positions or ranks cannot be compared because equal
intervals on ordinal scale do not represent equal quantities. For example, classes in
which learners have been ranked according to test results, one cannot say that the 4th
learner is half as good as the 2nd learner in spite of the fact that 2 is a half of 4. Ordinal
numbers are used for both counting and ranking (less than or greater than).
[EPSC 311]
4
iii. Interval scale: This scale groups subjects into categories and ranks them in some order
with equal distances between adjacent numbers representing equal quantities. This
scale allows one to determine by how much or how many a measure is better than the
others. For example, in an examination, a learner who scored 90% is 20% greater than
a learner who scored 70%. However, the zero point on the interval scale is arbitrary and
not absolute which means that if a class is given a test and a learner scores a zero, it
does not imply that the learner knows nothing at all. Likewise, there can never be zero
days on the calendar or zero time. Thus, examples of interval scale numbers are test
scores, temperature and calendar days. Interval scale numbers can be used for counting,
ranking, addition and subtraction.
iv. Ratio scale: This scale groups subjects into categories and ranks them in some order
with equal distances between adjacent numbers representing equal quantities and has
an absolute or real zero point. For example, a meter ruler has a zero mark on it meaning
that if it is used to measure height and the measurement is zero, then there is no height
at all. In ratio scale, each number is a distance from zero which means there is an
absolute zero on the scale e.g. height, weight, balance and income. Ratio scale numbers
are used for counting, ranking, addition, subtraction, multiplication and division.
Characteristics of a good test
A test is a statistical instrument used to reveal an individual’s performance in comparison to
those of others in the same class doing the same tasks under the same conditions. A test is also
defined as predetermined collection of questions or tasks to which predetermined types of
responses are sought in order to measure the performance and capabilities of a student or a
class. Thus, a good test should have the following characteristics:
i.
Objectivity: A good test should be objective. A test is said to be objective if it is free
from personal biases in interpreting its scope by students as well as in scoring the
responses by examiners. The objectivity of a test can be increased by using more
objective test items and answers are scored according to model answers provided.
Therefore, if the test is marked by two or more competent examiners, the score will be
the same.
ii. Validity: A good test should be valid. A test is said to be valid if it measures what it
intends to measure meaning that there should be a correlation between what the test
measures and the function the test intended to measure.
iii. Reliability: A good test should be reliable. A reliable test yields consistent scores when
administered to the same individuals under similar circumstances or conditions. A test
may be reliable but not necessarily valid. This is because it may yield consistent scores
after repeated trials but these scores may not represent what exactly was to be measured.
However, a test with high validity has to be reliable meaning that the scores will be
consistent in both cases.
iv. Comprehensiveness: A good test should be comprehensive. A test is said to be
comprehensive if it covers the entire syllabus meaning that the test should consider all
relevant learning materials and cover all the anticipated objectives.
[EPSC 311]
5
v.
Simplicity: Simplicity means that a good test should be written in a clear, correct and
uncomplicated or straightforward language avoiding ambiguous questions and
instructions.
vi. Discriminating power: Discriminating power of a test refers to the test’s ability to
distinguish between the upper and lower ability groups who took the test. This means
that the test should contain different difficulty level of test items in order to
accommodate both the upper and lower ability groups.
vii. Practicability: The practicability of a good test depends upon ease in administration;
ease in scoring the test items; ease in interpreting the test scores and finally economy
meaning that it should not be too expensive to administer in relation to the value of the
information obtained.
Types of tests
1. Norm referenced tests (NRT): This is a test designed to provide a measure of
performance that is interpretable in terms of individuals relative standing In the same
known group. This enables a teacher to determine how an individual’s performance
compares with that of the others in the local or national group depending on how the
results are to be used. For example, using a national norm, students’ performance in
mathematics can be described as equating to or exceeding that of 70% of national
performance.
2. Criterion referenced tests (CRT): This is a test designed to provide a measure of
performance that is interpretable in terms of clearly defined and limited domain of
learning tasks. This enables a teacher to describe what an individual can do without
reference to other students’ performance. For example, CRT can:
a. Indicate the percentage of tasks a student performs correctly e.g. spells 60% of the
words correctly.
b. Describe the specific learning tasks a student is able to perform e.g. count from one
to a hundred
c. Compare the test performance to a set standard and make a mastery-none mastery
decision e.g. measuring the identification of the main tasks in a paragraph
3. Objective referenced test (ORT): This is a test designed to provide a measure of
performance that is interpretable in terms of specific instructional objectives.
4. Paper and pencil tests: These are tests that require the examinee to record responses
to questions on a paper. They are used to assess students’ cognitive abilities and to a
lower extent used to measure practical skills for example science practical that require
students to carry out certain procedures and record their observations.
Advantages of paper and pencil tests
a. They are economical in terms of time and money
b. It is easy to test a large number of students at the same time
c. It allows one to test students under uniform conditions
d. The tests may contain a series of questions that cover the whole syllabus
Disadvantages of paper and pencil tests
a. They attach undue importance to a very small sample of student behaviour
b. Students’ results may be unnecessarily influenced by extraneous factors
c. They may have adverse side effects on instructional programs
5. Performance tests: These are tests concerned with assessing students’ ability in using
various skills and procedures in various academic courses. For example, map reading
in geography, fine arts, music and home economics. A performance test is
[EPSC 311]
6
individualized such that a student is given a task to perform and while performing the
task, the student is observed and judgment is done. Performance test is essential when
the process of evaluation is needed for the purpose of assessment. For example, in home
economics, a teacher may need to judge whether a student used safe, hygienic and
correct method in preparing a meal.
Advantages of performance tests
a. Provides an opportunity to test in a realistic setting
b. Allows the examiner to observe and check individual performance
c. Allows for testing of students’ ability to bring together a number of different
skills
d. Assesses the students’ proficiency in performing an activity
Disadvantages of performance tests
a. Only a small proportion of the syllabus can be tested
b. It may impose some pressure on laboratory and workshop equipment
c. Where equipment is not enough, it may result in none standardized examination
conditions thus making comparison of results invalid
d. Where the examiner is required to observe the actual process or working,
additional examiners may be required making it costly.
e. It has limited feasibility for large groups
6. Oral tests: These are tests that involve the word of mouth only. They are suitable when
assessing students’ ability in communication especially in English and other foreign
languages, when assessing a graduate student’s defense of thesis and when the
examinee is visually handicapped. Oral tests require face to face situation between the
examiner and the examinee.
Advantages of oral tests
a. They provide direct contact with the candidate
b. They provide an opportunity to access strong and weak areas of each candidate
c. They provide an examiner an opportunity to question the candidate about how
he/she arrived at the answer.
d. More than one examiner can examine the candidate simultaneously
Disadvantages of oral tests
a. They lack standardization. This means that the results of the test cannot compare
across candidates
b. Test results tend to be subjective. This means that the test results depend on the
opinion of the examiner
c. They require a large number of examiners
d. They are usually depend on the experience of the examiners and their ability to
retain in their minds an accurate impression of the standard required
Test item formats
i. Multiple choice items: These are items that require the candidates to select one or more
responses from a set of options. The correct alternative in each item is called answer or
the key and the remaining alternatives are called distracters.
ii. True/false items: These are items that require the candidate to select one of the two
choices given as possible responses to a question
iii. where the choice is either between true and false, yes or no, or right and wrong.
iv. Matching items: These are test items which consist of a list of premises, a list of
responses and direction for matching the two. Candidates must match each premise
with one of the responses on the basis of the criteria described in the directions.
[EPSC 311]
7
v.
Essay items: These are test items that require the candidate to supply answers in form
a paragraph or any written analysis or compositions.
vi. Short answer items: These are test items which require the candidate to supply a word,
number or phrase that answers the question or completes a sentence.
Test construction, planning and administration
Planning a test
The function of test planning is to outline the task of preparing a test as clearly as possible.
Through test planning, the qualities and characteristics of a test are predetermined prior to test
construction. The test specification should be so complete and clear such that two or more test
constructors operating independently under these specifications would produce comparable
and interchangeable tests differing only in the sampling of the questions.
a.
b.
c.
d.
e.
f.
g.
Test planning involves decision making guided by a series of specific questions:
What is the general purpose and requirement of the test?
What content should the test cover?
Which items should be used in the test?
How many questions or items should be included?
How much time will the examiners need to answer the test?
How will test items be written and administered?
1. Defining the purpose: The test constructor should state the specific areas of achievement
the test is expected to include. The concern is not with the syllabus but the course objectives
stated in behavioural terms. The following factors should be considered while stating the
purpose:
i. The candidate to be tested
ii. The purpose to be served by test results or scores for example, for selection, guidance,
placement, diagnostic purpose etc.
iii. The analysis of objectives of instructions in order to determine what activities and skills
should be appraised in the test.
The source of the objectives is the Blooms taxonomy cognitive domains which include
knowledge, comprehension, application, analysis, synthesis and evaluation.
i. Knowledge requires an individual to recall or recognize an appropriate content of
material whether it be specific facts, universal principles, generalizations, methods,
processes, patterns, structures etc.
ii. Comprehension requires an individual to be able to paraphrase knowledge accurately,
to explain or summarize it in his or her own words or to show logical extensions in
terms of implications.
iii. Application is the ability to select a given abstraction (generalization or principle)
appropriate for a new situation and correctly use it. The candidate is able to use
generalizations and principles on new problems and situations. The testing situation
must be new, unfamiliar and in some way different from those used in the instruction.
[EPSC 311]
8
iv. Analysis is the ability to break part of communication or concept into its constituent
elements to show the hierarchy or other internal relation of ideas and to show the basis
for organization or to indicate how it conveys its effects.
v. Synthesis involves putting together parts and elements to form a new whole unit. The
candidate is expected to produce something that is new and different when provided
with a problem or a task, a set of specifications or a collection of material.
vi. Evaluation is the quantitative and qualitative judgement about the extent to which
materials and methods satisfy the criteria determined by the teacher or student.
In test planning, the list of specific objectives on which the test will be based should be made.
2. Defining the content: The objectives appraised by the test should be related to content
which is the means through which the objectives are taught, learnt and demonstrated. The
content dimensions of the test outline should consist of a detailed analysis of the curriculum
areas that are to be considered in the test. The constructor should then relate each objective to
the specific content area.
3. Defining the format of the test items: Tests may be oral or written. If written, they may be
essay or objective type. The constructor must clearly state in advance whether essay or
objective test items will be used. In general, higher cognitive objectives (application, synthesis
and evaluation) use essay test item. If knowledge and comprehension objectives are used, then
objective type questions may be used.
4. Defining the number of test items: The number of questions to be included should be stated
in the planning stage. The number of items will depend on the specified duration, the type of
test items used and the complexity of thought processes involved in answering the test items.
5. Specification of time limits: Time required by the candidates to answer the test items should
be specified. The required time is a function of the mental processes involved and the kind of
test format used.
6. Writing the test items: The test specification should indicate the qualifications of the writers
of the test items. Good test item writers should have a thourogh grasp of the subject matter
dealt with in the test. A single writer may concentrate on a particular area of content or
objective.
Test Construction
Test construction involves the preparation of the test items for the purposes of determining the
achievement of specific objectives. Tests are constructed to determine how much a learner
knows or has learned.
Factors Considered During Test Construction
i. The objectives of the syllabus
ii. The academic level of the learners
iii. Specific topics or content to be covered by the test
iv. Table of specification
Table of Specification
A table of specification is a table that aligns topics that will be on a test and the number of test
items each topic will have on the final grade based on the cognitive levels at which specific
objectives should be achieved.
[EPSC 311]
9
A table of specification shows the number of questions or test items based on the topics being
tested versus their cognitive levels of a given test.
Format of the table of specification
Cognitive
Skills
Versus Topics
Knowledge
Comprehension
Application
Analysis
Synthesis
Evaluation
Total
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Total
Importance of the table of Specification
i. It helps the teacher to set a balanced test or examination in relation to types of test items
and cognitive levels in the Bloom’s Taxonomy
ii. It helps the teacher to determine the volume of content to be included in a test or
examination
iii. It ensures that the syllabus is adequately covered. In any test or examination, 75% of
the syllabus should be covered
iv. It ensures that specific objectives of the syllabus are addressed in a given test or
examination
v. It ensures a sense of balance in the cognitive abilities being tested
vi. It ensures the validity of a given test or examination
Test administration
i. The students should have adequate working space and sit in a manner that will avoid
distraction and cheating. A label should be placed on the door stating “testing in
progress, do not disturb”
ii. Ensure that there are sufficient resources needed for the test eg test papers, pens, answer
sheets etc
iii. Clear the room of any relevant materials like charts, pictures, objects, maps etc which
may give some learners an added advantage in the test.
iv. Have a clock in a central place in the room for all to see and write the start and finish
times for the test on the chalkboard.
v. Distribute the papers in such a way that the learners are not accessible to the test items
until the distribution is complete when the learners are allowed to start writing.
vi. After candidates have completed the test, count the answer sheets and record any
incidents that might invalidate students’ scores.
[EPSC 311]
10
Quantitative methods applied in selection of test items
i. Simple Random Method: This is a method that provides each question in the pool of
questions an equal chance of being selected for the test. It may involve giving a number
to every question in the pool of questions, placing the numbers in a container and then
selecting any number at random. The questions whose numbers are selected form the
test.
ii. Systemic Random Method: This is a method in which every Kth question from a pool
of questions is selected for the test. To obtain a truly random sample, the list of all
questions in the pool must be randomized. A decision is made regarding the selection
interval (K) by dividing the number of questions in the pool of questions with the
number of questions required for the test.
iii. Stratified Random Method: This is a method in which the pool of questions is divided
or stratified into two or more groups or strata using a given criteria (topics) and then a
given number of questions are randomly selected from each group or stratum.
Qualitative methods applied in selection of test items
i. Purposive Method: This is a method in which only those questions that have the
required knowledge and skills to be tested are selected for the test.
ii. Quota Method: The objective of quota method is to include various groups or quotas
of the questions from the pool of questions. Therefore, only those questions that fit the
identified groups or quotas are selected for the test.
iii. Convenient Method: This method involves selecting the required questions as they
become available until the required number of questions for the test is obtained.
Test validation
Validity reflects how well a test measures what it is intended to measure. A test is said to be
valid if its items measure what they are intended to measure meaning that the test should cover
the intended objectives, content and learning experiences. Therefore, validity of a test is
defined as the degree to which it measures what it is intended to measure.
Types of Validity
i. Construct validity: This is the measure of the degree to which results obtained from a
test meaningfully and accurately reflects or represents a theoretical concept. For
example, would a score of 90% on a reading test accurately reflect the true reading
ability of a learner?
ii. Content Validity: This is a measure of the degree to which data collected using a
particular test represents a specific domain of indicators or content of a particular
concept. For example, a test of arithmetic for standard four learners would not yield
content valid data if items do not include all the four operations that is addition,
subtraction, multiplication and division.
iii. Criterion-related Validity: This refers to the use of a test in assessing subjects’
behaviour or performance in specific situations. For example, if a test purports to
measure performance in a job, the subjects who score high on the test must also perform
well on the job. There are two types of criterion related validity that is predictive
[EPSC 311]
11
validity which refers to the degree to which obtained data predict the future behaviour
of the subjects and the second one is the concurrent validity which refers to the degree
to which obtained data are able to predict the behaviour of the subjects in the present
and not in the future (e.g. psychiatrist might use a measure to establish whether a patient
is schizophrenic in which case a patient’s scores on the psychiatric test would correlate
highly with the patient’s present behaviour if the test does indeed yield data that
accurately represents this type of mental illness).
Interpreting test results
An understanding of descriptive statistics provides the foundation of interpreting test results.
Question
a. A student reported to his parents that he had obtained a score of 50% in a mathematics
test. List three possible misconceptions that the parents may have made about the
performance of the students?
i. The 50% is the average and therefore the student is of average ability
ii. The 50% is a pass mark and therefore the student has passed
iii. The student occupies the middle position and may have defeated about half number of
his classmates
b. What information would the parents require in order to be able to interpret the
student’s test score of 50% correctly?
i. The mean
ii. The range
iii. The median
iv. The mode
c. What accurate inferences would the parents make using the information stated in [b]
above?
i. The mean is needed to infer whether the student was above or below the average
ii. The range is required to determine how far below or above the average the student was
iii. The median is necessary to determine how far the student was in relation to the middle
score
iv. The mode is needed to determine whether the student was among those who obtained
the popular score
Types of scores
i. Raw score: A student’s raw score on a test is the number of items the student answered
correctly. For example, on a 50 item test, a student who correctly answers 37 items out
of 50 would have a raw score of 37.
ii. Percent correct score: This score is attained by dividing the raw score by the number
of items on the test and multiplying by 100. This allows for comparison of performance
among different tests.
iii. Percentile rank score: This refers to the learner’s performance relative to all the other
test takers. When a test result indicates that a student performed at 60th percentile, it
means that the student performed better than 60% of the students in the norm sample.
Percentile rank scores are easy to understand and interpret.
[EPSC 311]
12
iv. Stanines (standard line): Scores that are reported as stanines indicate where students’
performance fall on a scale of 1 to 9. They are a form of standard scores in which 4, 5
and 6 is the average, 2 and 3 is below average , 1 is well below average, 7 and 8 is
above average while 9 is well above average.
v. Grade equivalent score: A grade equivalent score communicates a level of
performance relative to test takers in the same grade. Grade equivalent score is
represented as two numbers separated by a decimal. The first number represents a grade
in school and the second represents a tenth of a school year or about one month. For
example, a grade equivalent of 5.2 represents the expected performance of a student in
the second month of the fifth grade. A fourth grade student who receives a grade
equivalent of 7.4 means that the score is equivalent to a score of a seventh grade student
in the fourth month on the mathematics test modeled for fourth grade students.
Reporting test results
i. The test papers should be returned to the students as soon as possible because most
students are eager to know how they performed. Delay may kill the motivation for the
ongoing lesson.
ii. To avoid embarrassment, hand the test paper directly to the owner.
iii. Do not deal with complaints in class but genuine complain can be considered after class.
iv. When making comments about test results, comment on improvement and avoid
discouraging remarks such as you are lazy, this is a poor grade etc.
v. Test scores should not be reported in isolation. Therefore, the mean, mode, range,
median or standard deviations should be included in order to report the score in relation
to other test takers.
[EPSC 311]
13
Download