Uploaded by Linda Rosli

HMEF5053 Measurement and Evaluation in Education vDec19

advertisement
HMEF5053
Measurement and Evaluation in Education
Copyright © Open University Malaysia (OUM)
HMEF5053
MEASUREMENT
AND EVALUATION
IN EDUCATION
Prof Dr John Arul Phillip
Yap Yee Khiong
Copyright © Open University Malaysia (OUM)
Project Directors:
Prof Dr Widad Othman
Dr Aliza Ali
Open University Malaysia
Module Writers:
Prof Dr John Arul Phillip
Yap Yee Khiong
Moderator:
Prof Dr Kuldip Kaur
Open University Malaysia
Enhancer:
Assoc Prof Dr Chung Han Tek
Developed by:
Centre for Instructional Design and Technology
Open University Malaysia
First Edition, May 2006
Fourth Edition, August 2016 (rs)
Sixth Edition, December 2019 (MREP)
Copyright © Open University Malaysia (OUM), December 2019, HMEF5053
All rights reserved. No part of this work may be reproduced in any form or by any means without
the written permission of the President, Open University Malaysia (OUM).
Copyright © Open University Malaysia (OUM)
Table of Contents
Course Guide
xiăxv
Topic 1
The Role of Assessment in Teaching and Learning
1.1 Test, Measurement and Assessment
1.1.1 Test
1.1.2 Measurement
1.1.3 Assessment or Evaluation
1.2 The Why, What and How of Educational Assessment
1.3 Purposes of Assessment
1.3.1 To Help Learning
1.3.2 To Improve Teaching
1.4 General Principles of Assessment
1.5 Types of Assessment
1.5.1 Formative versus Summative Assessments
1.5.2 Norm-referenced versus Criterion-referenced Tests
1.6 Trends in Assessment
Summary
Key Terms
References
1
2
2
3
3
5
6
7
9
10
12
13
15
17
18
19
19
Topic 2
Foundation of Assessment: What to Assess?
2.1 Identifying What to Assess
2.1.1 Three Types of Learning Outcomes
2.2 Assessing Cognitive Learning Outcomes or Behaviour
2.2.1 BloomÊs Taxonomy
2.2.2 The Helpful Hundred
2.3 Assessing Affective Learning Outcomes or Behaviour
2.4 Assessing Psychomotor Learning Outcomes or Behaviour
2.5 Important Trends in What to Assess
Summary
Key Terms
References
21
22
23
25
26
28
32
37
41
42
43
44
Copyright © Open University Malaysia (OUM)
iv

TABLE OF CONTENTS
Topic 3
Planning Classroom Tests
3.1 Purposes of Classroom Testing
3.2 Planning a Classroom Test
3.2.1 Deciding Its Purposes
3.2.2 Specifying the Intended Learning Outcomes
3.2.3 Selecting Best Item Types
3.2.4 Developing a Table of Specifications
3.2.5 Constructing Test Items
3.2.6 Preparing Marking Schemes
3.3 Assessing TeacherÊs Own Test
Summary
Key Terms
45
46
47
47
48
50
53
58
60
61
62
62
Topic 4
How to Assess? ă Objective Tests
4.1 What is an Objective Test?
4.2 Multiple-choice Questions (MCQs)
4.2.1 What is a Multiple-choice Question?
4.2.2 Construction of Multiple-choice Questions
4.2.3 Advantages of Multiple-choice Questions
4.2.4 Limitations of Multiple-choice Questions
4.3 True-False Questions
4.3.1 What are True-False Questions?
4.3.2 Advantages of True-False Questions
4.3.3 Limitations of True-False Questions
4.3.4 Suggestions for Constructing True-False Questions
4.4 Matching Questions
4.4.1 Construction of Matching Questions
4.4.2 Advantages of Matching Questions
4.4.3 Limitations of Matching Questions
4.4.4 Suggestions for Constructing Good Matching
Questions
4.5 Short-answer Questions
4.5.1 Strengths and Weaknesses of Short-answer
Questions
4.5.2 Guidelines on Constructing Short-answer
Questions
Summary
Key Terms
References
63
64
65
65
67
73
74
75
76
76
77
78
80
80
81
82
82
Copyright © Open University Malaysia (OUM)
83
84
85
88
89
89
TABLE OF CONTENTS
Topic 5
Topic 6
Topic 7
How to Assess? ă Essay Tests
5.1 What is an Essay Question?
5.2 Formats of Essay Tests
5.3 Advantages of Essay Questions
5.4 Deciding Whether to Use Essay Questions
or Objective Questions
5.5 Limitations of Essay Questions
5.6 Misconceptions About Essay Questions in Examinations
5.7 Guidelines on Constructing Essay Questions
5.8 Verbs Describing Various Kinds of Mental Tasks
5.9 Marking an Essay
5.10 Suggestions for Marking Essays
Summary
Key Terms
References

v
90
91
92
94
95
96
97
99
107
110
117
119
120
120
Authentic Assessment
6.1 What is Authentic Assessment in the Classroom?
6.2 Alternative Names for Authentic Assessment
6.3 How to Use Authentic Assessment?
6.4 Advantages of Authentic Assessment
6.5 Disadvantages of Authentic Assessment
6.6 Characteristics of Authentic Assessment
6.7 Differences between Authentic and Traditional
Assessments
Summary
Key Terms
References
122
123
124
125
126
128
130
132
Project and Portfolio Assessments
7.1 Project Assessment
7.1.1 What is Assessed Using Projects?
7.1.2 Designing Effective Projects
7.1.3 Possible Problems with Project Work
7.1.4 Group Work in Projects
7.1.5 Assessing Project Work
7.1.6 Evaluating Process in a Project
7.1.7 Self-assessment in Project Work
7.2 What is a Portfolio?
7.2.1 What is Portfolio Assessment?
7.2.2 Types of Portfolios
7.2.3 Developing a Portfolio
137
138
141
144
147
148
150
154
156
158
159
160
161
Copyright © Open University Malaysia (OUM)
135
135
136
vi

TABLE OF CONTENTS
7.2.4 Advantages of Portfolio Assessment
7.2.5 Disadvantages of Portfolio Assessment
7.2.6 How and When Should Portfolios be Assessed?
Summary
Key Terms
References
162
164
165
169
171
171
Topic 8
Reliability and Validity of Assessment Techniques
8.1 What is Reliability?
8.2 The Reliability Coefficient
8.3 Methods to Estimate the Reliability of a Test
8.4 Inter-rater and Intra-rater Reliability
8.5 Types of Validity
8.6 Factors Affecting Reliability and Validity
8.7 Relationship between Reliability and Validity
Summary
Key Terms
References
173
174
175
179
183
185
189
191
192
193
193
Topic 9
Item Analysis
9.1 What is Item Analysis?
9.2 Steps in Item Analysis
9.3 Difficulty Index
9.4 Discrimination Index
9.5 Application of Item Analysis on Essay-type Questions
9.6 Relationship between Difficulty Index and
Discrimination Index
9.7 Distractor Analysis
9.8 Practical Approach in Item Analysis
9.9 Usefulness of Item Analysis to Teachers
9.10 Caution in Interpreting Item Analysis Results
9.11 Item Bank
9.12 Psychometric Software
Summary
Key Terms
References
195
196
197
199
200
203
206
Copyright © Open University Malaysia (OUM)
208
210
212
213
214
216
216
217
218
TABLE OF CONTENTS
Topic 10
Analysis of Test Scores
10.1 Why Use Statistics?
10.2 Describing Test Scores
10.2.1 Central Tendency
10.2.2 Dispersion
10.3 Standard Scores
10.3.1 Z-score
10.3.2 Example of Using the Z-score to Make Decisions
10.3.3 T-score
10.4 The Normal Curve
10.5 Norms
Summary
Key Terms
Copyright © Open University Malaysia (OUM)

vii
219
220
222
223
224
229
230
231
232
233
235
237
239
xxvi
X COURSE ASSIGNMENT GUIDE
Copyright © Open University Malaysia (OUM)
Copyright © Open University Malaysia (OUM)
Copyright © Open University Malaysia (OUM)
COURSE GUIDE

xi
COURSE GUIDE DESCRIPTION
You must read this Course Guide carefully from the beginning to the end. It tells
you briefly what the course is about and how you can work your way through
the course material. It also suggests the amount of time you are likely to spend in
order to complete the course successfully. Please refer to the Course Guide from
time to time as you go through the course material as it will help you to clarify
important study components or points that you might miss or overlook.
INTRODUCTION
HMEF5053 Measurement and Evaluation in Education is one of the courses
offered at Open University Malaysia (OUM). This course is worth three credit
hours and should be covered over eight to 15 weeks.
COURSE AUDIENCE
This course is a core subject for learners undertaking the Master of Education
(MEd) programme. Its main aim is to provide you with a foundation in the
principles and theories of educational testing and assessment as well as their
applications in the classroom.
The course introduces the differences between testing, measurement and
assessment. The focus is on the role of assessment in teaching and learning,
followed by discussions on „what to assess‰ and „how to assess‰. Regarding
the „what‰, the emphasis is on the cognitive, affective and psychomotor learning
outcomes. Next, the „how‰ of assessment is discussed with emphasis on the
various assessment techniques that teachers can adopt. Besides the usual
traditional assessment techniques such as objective and essay tests, authentic
assessment techniques such as projects and portfolios are presented. There is also
a general discussion on how authentic assessment is similar to and different from
traditional assessment. Also discussed are techniques to determine the
effectiveness of various assessment approaches, focusing on reliability, validity
and item analysis. Finally, various statistical procedures are presented in the
analysis of assessment results and their interpretation.
As an open and distance learner, you should be acquainted with learning
independently and being able to optimise the learning modes and environment
available to you. Before you begin this course, please ensure that you have the
right course material, and understand the course requirements as well as how the
course is conducted.
Copyright © Open University Malaysia (OUM)
xii

COURSE GUIDE
STUDY SCHEDULE
It is a standard OUM practice that learners accumulate 40 study hours for every
credit hour. As such, for a three-credit hour course, you are expected to spend
120 study hours. Table 1 gives an estimation of how the 120 study hours could be
accumulated.
Table 1: Estimation of Time Accumulation of Study Hours
Study Activities
Study
Hours
Briefly go through the course content and participate in initial discussions
4
Study the module
60
Attend five tutorial sessions
15
Online participation
11
Revision
15
Assignment(s), Test(s) and Examination(s)
15
TOTAL STUDY HOURS ACCUMULATED
120
COURSE LEARNING OUTCOMES
By the end of this course, you should be able to:
1.
Identify the different principles and theories of educational testing and
assessment;
2.
Compare the different principles, theories and procedures of educational
testing and assessment;
3.
Apply the different principles and theories in the development of
assessment techniques for use in the classroom; and
4.
Critically evaluate the principles and theories in educational testing and
assessment.
Copyright © Open University Malaysia (OUM)
COURSE GUIDE

xiii
COURSE SYNOPSIS
This course is divided into 10 topics. The synopsis for each topic is as follows:
Topic 1 discusses the differences between testing, measurement, evaluation and
assessment, the role of assessment in teaching and learning and some general
principles of assessment. Also explored is the difference between formative and
summative assessments as well as the differences between criterion and normreferenced tests. The topic concludes with a brief discussion of the current trends
in assessment.
Topic 2 discusses the behaviours to be tested focussing on cognitive, affective
and psychomotor learning outcomes and reasons why assessments of the latter
two outcomes are ignored.
Topic 3 provides some useful guidelines to help teachers plan valid, reliable and
useful classroom tests. It discusses the steps involved in planning and designing
a test. These steps are deciding the purpose, specifying the intended learning
outcomes, selecting best item types, developing a table of specifications,
constructing test items and preparing marking schemes. The topic also includes a
subtopic on how teachers can assess their own tests.
Topic 4 discusses the design and development of objective tests in the assessment
of various kinds of behaviours with emphasis on the limitations and advantages
of using this type of assessment tool.
Topic 5 examines the role of essay tests in assessing various kinds of learning
outcomes as well as its limitations and strengths, and the procedures involved in
the design of good essay questions.
Topic 6 introduces a form of assessment in which learners are assigned to
perform real-world tasks that demonstrate meaningful application of essential
knowledge and skills. Teachers will be able to understand how authentic
assessment is similar to or different from traditional assessment. Emphasis is also
given to scoring rubrics.
Topic 7 discusses in detail two examples of authentic assessments, namely
portfolio and project assessments. Guidelines to portfolio entries and project
works and evaluation criteria are discussed in detail.
Topic 8 focuses on basic concepts of test reliability and validity. The topic also
includes methods to estimate the reliability of a test and factors to increase
reliability and validity of a test.
Copyright © Open University Malaysia (OUM)
xiv  COURSE GUIDE
Topic 9 examines the concept of item analysis and the different procedures for
establishing the effectiveness of objective and essay-type tests focussing on item
difficulty and item discrimination. The topic concludes with a brief discussion on
the usefulness of item analysis and the cautions in interpreting the item analysis
results.
Topic 10 focuses on the analysis and interpretation of the data collected by tests.
For quantitative analysis of data, various statistical procedures are used. Some of
the statistical procedures used in the interpretation and analysis of assessment
results are measures of central tendency and correlation coefficients. There is also
a brief discussion on the use of standard scores.
TEXT ARRANGEMENT GUIDE
Before you go through this module, it is important that you note the text
arrangement. Understanding the text arrangement will help you to organise your
study of this course in a more objective and effective way. Generally, the text
arrangement for each topic is as follows:
Learning Outcomes: This section refers to what you should achieve after you
have completely covered a topic. As you go through each topic, you should
frequently refer to these learning outcomes. By doing this, you can continuously
gauge your understanding of the topic.
Self-Check: This component of the module is inserted at strategic locations
throughout the module. It may be inserted after one sub-section or a few subsections. It usually comes in the form of a question. When you come across this
component, try to reflect on what you have already learnt thus far. By attempting
to answer the question, you should be able to gauge how well you have
understood the sub-section(s). Most of the time, the answers to the questions can
be found directly from the module itself.
Activity: Like Self-Check, the Activity component is also placed at various
locations or junctures throughout the module. This component may require you to
solve questions, explore short case studies, or conduct an observation or research.
It may even require you to evaluate a given scenario. When you come across an
Activity, you should try to reflect on what you have gathered from the module and
apply it to real situations. You should, at the same time, engage yourself in higher
order thinking where you might be required to analyse, synthesise and evaluate
instead of only having to recall and define.
Copyright © Open University Malaysia (OUM)
COURSE GUIDE

xv
Summary: You will find this component at the end of each topic. This component
helps you to recap the whole topic. By going through the summary, you should
be able to gauge your knowledge retention level. Should you find points in the
summary that you do not fully understand, it would be a good idea for you to
revisit the details in the module.
Key Terms: This component can be found at the end of each topic. You should go
through this component to remind yourself of important terms or jargon used
throughout the module. Should you find terms here that you are not able to
explain, you should look for the terms in the module.
References: The References section is where a list of relevant and useful
textbooks, journals, articles, electronic contents or sources can be found. The list
can appear in a few locations such as in the Course Guide (at the References
section), at the end of every topic or at the back of the module. You are
encouraged to read or refer to the suggested sources to obtain the additional
information needed and to enhance your overall understanding of the course.
PRIOR KNOWLEDGE
Although this course assumes no previous knowledge of educational assessment
and measurement, you are encouraged to tap into your experiences as a teacher,
instructor, lecturer or trainer and relate them to the principles of assessment and
measurement discussed.
ASSESSMENT METHOD
Please refer to myINSPIRE.
TAN SRI DR ABDULLAH SANUSI (TSDAS) DIGITAL
LIBRARY
The TSDAS Digital Library has a wide range of print and online resources for the
use of its learners. This comprehensive digital library, which is accessible
through the OUM portal, provides access to more than 30 online databases
comprising e-journals, e-theses, e-books and more. Examples of databases
available are EBSCOhost, ProQuest, SpringerLink, Books247, InfoSci Books,
Emerald Management Plus and Ebrary Electronic Books. As an OUM learner,
you are encouraged to make full use of the resources available through this
library.
Copyright © Open University Malaysia (OUM)
xvi  COURSE GUIDE
Copyright © Open University Malaysia (OUM)
Topic  The Role of
1
Assessment in
Teaching and
Learning
LEARNING OUTCOMES
By the end of the topic, you should be able to:

1.
Differentiate between test, measurement and assessment;
3.
Explain the purposes of assessment;
4.
Discuss the general principles of assessment; and
5.
Identify four types of assessment and examine their differences.
INTRODUCTION
This topic discusses the difference between test, measurement, and evaluation, the
purposes of assessment and some general principles of assessment. Also explored
are the differences between formative assessment and summative assessment as
well as the differences between criterion tests and norm-referenced tests.
Copyright © Open University Malaysia (OUM)
2
 TOPIC 1
1.1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING
TEST, MEASUREMENT AND ASSESSMENT
Many people are confused about the fundamental differences between test,
measurement and assessment as they are all used in education. Do you know what
they entail? Let us find out the answer in this subtopic.
1.1.1
Test
Most of us are familiar with tests because at some point in our lives, we are
required to sit for tests. In school, tests are given to measure our academic aptitude
and evaluate whether we have gained any understanding from our learning. In
the workplace, tests are conducted to select people for specific jobs, for promotion
and to encourage re-learning. Physicians, lawyers, insurance consultants, realestate agents, engineers, civil servants and many other professionals are required
to take tests to demonstrate their competence in specific areas and in some cases
to be granted licence to practise their profession or trade.
Throughout their professional careers, teachers, counsellors and school
administrators are required to give, score and interpret a wide variety of tests. For
example, school administrators rate the performance of individual teachers and
school counsellors record the performance of their clients. It is possible that a
teacher may construct, administer and mark thousands of tests during his or her
career! According to the joint committee of the American Psychological
Association (APA), the American Educational Research Association (AERA) and
the National Council on Measurement in Education (NCME), a test may be
thought of as a set of tasks or questions intended to elicit particular types of
behaviour when presented under standardised conditions to yield a score that has
desirable psychometric properties. Psychometrics is concerned with the objective
measurement of skills and knowledge, abilities, attitudes, personality traits and
educational achievement. So, when a teacher assigns a set of questions to
determine studentsÊ achievement in Mathematics, he or she is conducting a
Mathematics test. While most people know what a test is, many have difficulty
differentiating it from measurement, evaluation and assessment. Some have even
argued that they are the same!
Copyright © Open University Malaysia (OUM)
TOPIC 1
1.1.2
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING 
3
Measurement
Generally, measurement is the act of assigning numbers to a phenomenon. In
education, it is the process by which the attributes of a person are measured and
assigned numbers. Remember, it is a process which indicates that there are certain
steps involved. As educators, we frequently measure human attributes such as
attitudes, academic achievements, aptitudes, interests, personalities and so forth.
Hence, to measure these attributes, we have to use certain instruments so that we
can conclude that, for example, Ahmad is better in Mathematics than Kumar, while
Tajang is more inclined towards Science than Kong Beng. We measure to obtain
information about „what is‰. Such information may or may not be useful,
depending on the accuracy of the instruments we use and our skill at using them.
For example, we measure temperature using a thermometer and so the
thermometer is an instrument.
How do you measure performance in Mathematics? We use a Mathematics test
which is an instrument containing questions and problems to be solved by
students. The number of right responses obtained by a student is an indication of
his performance in Mathematics. Note that we are only collecting information. The
information collected is a numerical description of the degree to which an
individual possesses an attribute. Measurement answers the question „How much
does an individual possess a particular attribute?‰ Note that we are not assessing!
Assessment is therefore quite different from measurement.
1.1.3
Assessment or Evaluation
The literature has used the terms „assessment‰ and „evaluation‰ in education as
two different concepts and yet the two terms are used interchangeably i.e. they are
regarded as similar. For example, some authors use the term „formative
evaluation‰ while others use the term „formative assessment‰. We will use the two
terms interchangeably because there is too much overlap in the interpretations of
the two concepts. In this module, we will use the term „assessment‰. Generally,
assessment is viewed as the process of collecting information with the purpose of
making decisions about students.
Copyright © Open University Malaysia (OUM)
4
 TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING
We may collect information using various tests, observations of students and
interviews. Rowntree (1974) views assessment as a human encounter in which one
person interacts with another directly or indirectly with the purpose of obtaining
and interpreting information about the knowledge, understanding, abilities and
attitudes possessed by that person. For example, based on assessment information,
we can determine, whether Chee Keong needs special classes to assist him in
developing reading skills or whether Khairul, who was identified as dyslexic,
needs special attention. The key words in the definition of assessment is collecting
data and making decisions. However, to make decisions, one has to evaluate,
which is the process of making judgement about a given situation.
When we evaluate, we are saying that something is good, appropriate, valid,
positive and so forth. To make an evaluation, we need information, and it is
obtained by measuring using a reliable instrument. For example, you measure the
temperature in the classroom and it is 30C, which is simply information. Some
students may find the temperature in the room too warm for learning, while others
may say that it is ideal for learning. Educators are constantly evaluating students
and it is usually done in comparison with some standards. For example, if the
objective of the lesson was for students to apply BoyleÊs Law to the solution of a
problem and 80 per cent of learners were able to solve the problem, then the
teacher may conclude that his or her teaching of the principle was quite successful.
So, evaluation is the comparison of what is measured against some defined criteria,
to determine whether the criteria have been achieved, and whether it is
appropriate, good, reasonable, and valid and so forth.
The three terms „test‰, „measurement‰ and „assessment‰ are easily confused
because all may be involved in a single process. For example, to determine a
studentÊs performance in Mathematics, a teacher may assign him or her a task or a
set of questions, which is a test to obtain a numeric score, which is a measurement.
Based on the score, the teacher decides whether this particular student is good,
average or poor in Mathematics, which is an assessment.
SELF-CHECK 1.1
Explain the differences between test, measurement and assessment.
Copyright © Open University Malaysia (OUM)
TOPIC 1
1.2
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING 
5
THE WHY, WHAT AND HOW OF
EDUCATIONAL ASSESSMENT
The practice among many educators who want to assess students is to either use
ready-made tests or construct their own tests or a combination of both. In the
United States, it is common for teachers to use various standardised tests in
assessing their students. These tests have national norms, against which teachers
can compare the performance of their students. For example, a student scores 78
in a Mathematics test; the score when compared with the norm may put the
student in the 85th percentile. It means that 15 per cent of students tested earlier
scored higher than him or her. It also means that 85 per cent of students tested
earlier scored lower than him or her. Figure 1.1 shows the why, what and how of
assessment.
Figure 1.1: The why, what and how of assessment
The focus of this course will be on how teachers can build their own assessment
instruments, how to ensure their instruments are effective, how to interpret
assessment results and how to report these assessment results. In this topic, we
will discuss the role of assessment, which is, „Why do we assess?‰ In Topic 2, we
will focus on the foundation of assessment, which is „What to assess?‰ which
involves determining the behaviours of students to be assessed. Topic 3 will focus
on planning classroom tests. Topics 4 to 7 will address the issue „How to assess?‰
In Topics 8 and 9, we will attempt to answer the question „How do we know our
assessment is effective?‰ Finally, in Topic 10, we will deal with the question „How
do we interpret assessment results?‰
Copyright © Open University Malaysia (OUM)
6
 TOPIC 1
1.3
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING
PURPOSES OF ASSESSMENT
Let us begin this subtopic by asking the question, „Why do we as educators assess
students?‰ Some of us may find the question rather strange. The following may be
a likely scenario:
Question: Why do you assess?
Answer:
Well, I assess to find out whether my students understand what has
been taught.
Question: What do you mean by „understand‰?
Answer:
Whether they can remember what I taught them and how to solve
problems.
Question: What do you do with the test results?
Answer:
Well, I give students the right answers and point out their mistakes
in answering the questions.
Educators may give the above reasons when asked about the purpose of
assessment. In the context of education, assessment is performed to gain an
understanding of an individualÊs strengths and weaknesses in order to make
appropriate educational decisions. The best educational decisions are based on
information, and better decisions are usually based on more information (Salvia &
Ysseldyke, 1995). Based on the reasons for assessment provided by Harlen (1978)
and Deale (1975), two main reasons or purposes of assessment may be identified
(refer to Figure 1.2).
Figure 1.2: Purposes of assessment
These two purposes are further explained in the next subtopics.
Copyright © Open University Malaysia (OUM)
TOPIC 1
1.3.1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING 
7
To Help Learning
With regard to learning, assessment is aimed at providing information that
will help make decisions concerning remediation or enrichment, placement,
exceptionality and certification. It also aims at providing information to parents so
that they are kept informed of their childrenÊs learning progress in school.
Likewise, school administration collects assessment information to determine how
the school is performing and for student counselling purposes (refer to Table 1.2).
Table 1.2: Why We Assess: To Help Learning
Aspect
Questions to be Answered
Diagnosis for
remedial action

Should the student be sent for remedial classes so that
difficulty in learning can be overcome?
Diagnosis for
enrichment

Should the student be provided with enrichment
activities?
Exceptionality

Does the student have special learning needs that require
special education assistance?
Placement

Should the student be streamed to X or Y?
Progress

To what extent is the student making progress towards
specific instructional goals?
Communication to
parents

How is the child doing in school and how can parents
help?
Certification

What are the strengths and weaknesses in the overall
performance of a student in the specific areas assessed?
Administration and
counselling

How is the school performing in comparison with other
schools?

Why should students be referred for counselling?
(a)
Diagnosis
Diagnostic assessment is performed at the beginning of a lesson or unit for a
particular subject area to assess studentsÊ readiness and background for what
is about to be taught. This pre-instructional assessment is done when you
decide that you need information on a student, group of students or a whole
class before you can proceed with the most effective form of instruction. For
example, you can administer a Reading Test to Year One students to assess
their reading level. Based on the information, you may want to assign
weak readers for special intervention or remedial action programme.
Copyright © Open University Malaysia (OUM)
8
 TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING
Alternatively, the tests may reveal that some students are reading at an
extremely high level and you may want to recommend that they be assigned
to an enrichment programme.
(b) Exceptionality
Assessment is also conducted to make decisions on exceptionality. Based on
the information obtained from assessment, teachers may make decisions as
to whether a particular student needs to be assigned to a class with
exceptional students. Exceptional students are those who are physically,
mentally, emotionally or behaviourally different from the normal
population. For example, based on assessment information, a child may be
discovered to be dyslexic and may be assigned to a special treatment
programme. In another example, a student who has been diagnosed as
having learning disability may be assigned to a special education
programme.
(c)
Certification
Certification is perhaps the most important reason for assessment. For
example, Sijil Pelajaran Malaysia (SPM) is an examination aimed at
providing students with certificates. The marks obtained are converted into
letter grades signifying performance in different subject areas and used as a
basis for comparison between students. The certificate obtained is further
used in selecting students for further studies, scholarships or jobs.
(d)
Placement
Besides certification, assessment is conducted for the purpose of placement.
Students are endowed with varying abilities and one of the tasks of the
school is to place them in classes according to their aptitude and interest. For
example, performance in the SPM is used as the basis for placing students in
the arts or science stream in Form Six. Assessment is also used to stream
students according to their academic performance. It has been the tradition
that the A and B classes will consist of high achievers from the end-ofsemester examinations or end-of-year examinations. Placement tests have
even been used in preschools to stream children according to their literacy
levels! The practice of placing students according to academic achievement
has been debated for decades with some educationists arguing against it and
others supporting it.
(e)
Communicate to Parents
Families want to know how their children are doing in school and family
members appreciate specific indicators of studentsÊ progress. Showing
examples of a childÊs work over time enables parents to personally assess the
growth and progress of their child. It is essential to tell the whole story when
Copyright © Open University Malaysia (OUM)
TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING 
9
reporting information about performance progress. Talking with families
about standards, sharing student work samples, using rubrics in conferences
and differentiating between performance and progress are some ways to
ensure that families are given an accurate picture of student learning.
(f)
School Administration and Counselling
A school collects assessment information in order to determine how the
school is performing in relation to other schools for a particular semester or
year. Assessment results are also used to compare performance over the
years for the same school. Based on the results, school administrators may
institute measures to remedy weaknesses such as channelling more
resources into teaching students who are performing poorly in their studies.
This kind of measure is pertinent, in view of the increasing number of
students who are unable to read and write at a satisfactory level.
Assessment results (especially relating to socio-emotional development)
may be used by school administrators and counsellors in planning
intervention strategies for at-risk students. Assessment by counsellors will
enable them to identify students presenting certain socio-emotional
problems that require counselling services or referral to specialists such as
psychiatrists, legal counsellors and law enforcement authorities.
1.3.2
To Improve Teaching
With regard to teaching, assessment provides information regarding achievement
of intended learning outcomes, effectiveness of teaching methods and learning
materials.
If 70 per cent of your students fail in a test, do you investigate whether your
teaching and learning strategy is appropriate or do you attribute it to your students
being academically weak or not having revised their work? Most teachers would
attribute the poor performance to the latter. This is not a fair judgment about your
studentsÊ performance and abilities. The problem might lie with the teachers.
Assessment information is valuable in indicating which of the learning outcomes
have been successfully achieved and which concepts students have the most
difficulty with and require special attention. Assessment results are also valuable
in providing clues to the effectiveness of the teaching strategy implemented and
teaching materials used. Besides, assessment information might indicate whether
students have the required prior knowledge to grasp the concepts and principles
discussed. All this assessment information will indicate to the teachers what they
should do to improve their teaching. They should reflect on the information and
examine their approaches, methods and techniques of teaching. Finally,
Copyright © Open University Malaysia (OUM)
10
 TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING
assessment data may also provide insight into why some teachers are more
successful in teaching a particular group of students as compared to others (refer
to Table 1.3).
Table 1.3: Why We Assess: To Improve Teaching
Aspect
Questions to be Answered
Objectives
Were the desired learning outcomes achieved?
Teaching method
Were the teaching methods effective?
Prior knowledge
Did students have the relevant prior knowledge?
Teaching materials
Were the teaching materials effective?
Teacher differences
Were particular teachers more effective than others?
ACTIVITY 1.1
In the myINSPIRE online forum, discuss the following:
(a)
„Streaming according to academic abilities should be discouraged
in schools‰.
(b)
To what extent have you used assessment data to review your
teaching and learning strategies?
1.4
GENERAL PRINCIPLES OF ASSESSMENT
There are literally hundreds of guiding principles of assessment generated by
various sources such as educational institutions and individual scholars. The
following are a few principles, which can be applied in every level of assessment:
(a)
What is to be assessed has to be clearly specified. The specification of the
characteristics to be measured should precede the selection or development
of assessment procedures. In other words, in assessing student learning, the
intended learning outcomes should be clearly specified. Only with a clear
specification of the intended learning outcomes to be measured would
appropriate assessment procedures or methods be selected. The following is
an example of an intended learning outcome for an assessment course:
Copyright © Open University Malaysia (OUM)
TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING 
11
By the end the lesson, the student will be able to write effective learning
outcomes that include lower-order and higher-order cognitive skills for a
one-semester course.
A clear statement of learning outcome normally consists of three
components, a verb, condition and standard. The verb describes what the
student will be doing or the behaviour, the condition refers to the context
under which the behaviour is to occur and the standard indicates the criteria
of acceptable level of performance. The three components of the intended
learning outcome are as follows (refer to Table 1.4):
Table 1.4: Components in Learning Outcome
Verb
Write
Condition
effective learning outcomes that
include lower-order and higher-order
cognitive skills
Standard
for a one-semester
course.
The verb used in the statement of learning outcome should be specific,
measurable, achievable and realistic. Avoid words such as „understand‰,
„appreciate‰, „know‰ and „learn‰.
Note that it is not mandatory for every learning outcome to have all the three
components of verb, condition and standard. However, it must at least have
the verb and the condition, while the standard may be optional as
exemplified as follows:
By the end the lesson, the students will be able to write effective learning
outcomes that include lower-order and higher-order cognitive skills.
(b)
An assessment procedure or method should be selected based on its
relevance to the characteristics or performance to be measured. When
selecting an assessment procedure to measure a specific learning outcome,
teachers should always ask themselves whether the procedure is the most
effective method for measuring the learning or development to be assessed.
There must be a close match between the intended learning outcomes and
the types of assessment tasks to be used. For example, if the development of
the ability to organise ideas is being measured, the use of multiple-choice test
would be a poor choice. Instead, the appropriate assessment method to be
used should be essay questions.
Copyright © Open University Malaysia (OUM)
12
 TOPIC 1
(c)
Different assessment procedures are required to provide a complete picture
of student achievement and development. This is because no single
assessment procedure can assess all the different learning outcomes in a
school curriculum. Different assessment procedures are different in their
usefulness. For example, multiple-choice questions are useful for measuring
knowledge, understanding and application of outcomes, while essay tests
are appropriate for measuring the ability to organise and express ideas.
Projects that require conducting library research are needed to measure
certain skills in formulating and solving problems. Observational techniques
are needed to assess performance skills and various aspects of student
behaviour.
(d)
The assessment must be aligned to instruction. What to be assessed in the
classroom must be consistent with what has been taught and vice versa. For
example, it would not be fair to assess studentsÊ higher-order thinking skills
when what is taught is only lower-order level thinking skills. Of course, what
is taught in class must be in line with what has been planned as indicated by
the learning outcomes for the course. According to Biggs and Tang (2007),
the relationship among assessment, instruction and learning outcomes is
referred to as constructive alignment (refer to Figure 1.3)
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING
Figure 1.3: Constructive alignment
1.5
TYPES OF ASSESSMENT
Before we proceed to discuss assessment in detail, you need to be clear about these
often-used concepts in assessment:
(a)
Formative and summative assessments (or evaluation); and
(b)
Criterion-referenced and norm-referenced tests.
Copyright © Open University Malaysia (OUM)
TOPIC 1
1.5.1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING 
13
Formative versus Summative Assessments
Assessment can be done at various times throughout the school year. A
comprehensive assessment plan will include both formative and summative
assessments. The point at which assessment occurs and the aim of assessing
distinguish these two types of assessment.
(a)
Formative Assessment
Formative assessment is often done at the beginning or during the school
year, thus providing the opportunity for obtaining timely information about
student learning in a particular subject area or at a particular point in a
programme. Classroom assessment is one of the most common formative
assessment techniques. The purpose of this technique is to improve the
quality of student learning. It should not be evaluative or involved grading
students.
In formative assessment, the teacher compares the performance of a student
with the performance of other students in the class and not all students in the
same year (or form). Usually, a small section of the content is tested to
determine whether the learning outcomes have been achieved. Formative
assessment is action-oriented and forms the basis for improvement of
instructional methods (Scriven, 1996).
For example, if a teacher observes that some students have not grasped a
concept, he or she may design a review activity or use a different
instructional strategy. Likewise, students can monitor their progress with
periodic quizzes and performance tasks. The results of formative
assessments are used to modify and validate instruction. In short, formative
assessments are ongoing and include reviews and observations of what is
happening in the classroom. Some examples of formative assessment are
monthly tests, weekly quizzes, class exercises and homework.
(b)
Summative Assessment
When the cook tastes the soup, thatÊs formative evaluation; when the
guests taste the soup, thatÊs summative evaluation.
(Robert Stakes)
Copyright © Open University Malaysia (OUM)
14
 TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING
Summative assessment is comprehensive in nature, provides accountability
and is used to check the level of learning at the end of the programme (which
may be at the end of the semester, end of the year or after two years). For
example, after five years in secondary school, students take Sijil Pelajaran
Malaysia (SPM) which is summative in nature since it is based on the
cumulative learning experiences of students. Summative assessments are
typically used to evaluate the effectiveness of an instructional programme at
the end of an academic year or at a pre-determined time. The goal of
summative assessments is to make a judgement on a studentÊs competency
after an instructional phase is completed. For example, national
examinations are administered in Malaysia each year. It is a summative
assessment to determine each studentÊs acquisition of knowledge in several
subject areas during a period of five years. Summative evaluations are used
to determine whether students have mastered specific competencies and
letter grades are assigned to assess student achievement. Besides the national
examinations such as Sijil Pelajaran Malaysia (SPM) and Sijil Tinggi Pelajaran
Malaysia (STPM), end of the year or semester examinations in schools,
colleges and universities can also be considered as examples of summative
assessment.
The question that arises is whether summative assessment data can be used
formatively. The answer is positive and it can be done for the following
purposes:
(i)
To Improve Learning among Students
Based on summative assessment data, poor and good students may be
identified and be given different attention in the subsequent year or
semester.
(ii)
To Improve Teaching Methods
Based on summative assessment data, teachers are able to find out if
their teaching methods or strategies are appropriate and effective.
(iii) To Plan and Improve Curriculum
Based on summative assessment data, teachers and administrator can
identify if the curriculum designed is appropriate for the studentsÊ
ability levels and the needs of the industry.
Copyright © Open University Malaysia (OUM)
TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING 
15
Let us look at Table 1.5 which summarises the differences between formative and
summative assessments.
Table 1.5: Differences between Formative and Summative Assessments
Criteria
Formative Assessment
Summative Assessment
Timing

Conducted throughout the
teaching-learning process
(monthly, weekly or even
daily).

Conducted at the end of a
teaching-learning phase (such
as end of semester or year)
Method

Paper-and-pencil tests,
observations, quizzes,
exercises, practical sessions
administered to the group and
individually

Paper-and-pencil tests, oral
tests administered to the
group
Aim

To assess learning progress


To identify needs for
remediation or enrichment
To assess achievement of the
instructional goals of a course
or programme i.e. terminal
exam

To certify students and
improve curriculum

Final exam, qualifying tests,
national examinations (UPSR,
SPM, STPM, etc.)
Example
1.5.2

Monthly tests, weekly quizzes,
daily reports, etc.
Norm-referenced versus Criterion-referenced
Tests
The main difference between norm-referenced and criterion-referenced tests lies
in the purpose or aim of assessing students, the way in which content is selected
and the scoring processes which define how the test results are interpreted.
(a)
Norm-referenced Tests
The major reason for norm-referenced tests is to classify students. It is
designed to highlight achievement differences between and among students
to produce a dependable rank order of students across a continuum of
achievement from high achievers to low achievers (Stiggins, 1994). With
norm-referenced tests, a representative group of students is given a test and
their scores form the norm after having gone through a complex
administration and analysis. Anyone taking the norm-referenced test can
compare his or her score against the norm. For example, a score of 70 on a
norm-referenced test will not mean much until it is compared with the norm.
Copyright © Open University Malaysia (OUM)
16
 TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING
When compared with the norm, if the studentÊs score is in the 80th percentile,
this means that he or she performed as well or better than 20 per cent of the
students in the norm group. This type of information can be useful for
deciding whether or not the student needs remedial assistance or is a
candidate for the gifted programme.
However, the score gives little information about what the student actually
knows or can do. A major criticism of norm-referenced tests is that they tend
to focus on assessing low-level, basic skills (Romberg, Zarinnia & Williams,
1989).
(b)
Criterion-referenced Tests
Criterion-referenced tests determine what students can or cannot do, and not
how they compare with others (Anastasi, 1988). Criterion-referenced tests
report how well students are doing relative to a pre-determined performance
level on a specified set of educational goals or outcomes included in the
curriculum. Criterion-referenced tests are used when teachers wish to know
how well students have learnt the content and skills which they are expected
to have mastered. This information may be used to determine how well the
students are learning the desired curriculum and how well the school is
teaching that curriculum. Criterion-referenced tests give detailed
information about how well each student has performed on each of the
educational goals or outcomes included in that test. For instance, a criterionreferenced test score might describe which arithmetic operations a student
could perform or the level of reading difficulty he or she experienced.
Let us look at Table 1.6 which summarises the differences between these two types
of assessments.
Table 1.4: Differences between Norm-referenced and Criterion-referenced Tests
Criteria
Aim
Norm-referenced Test

Compares a studentÊs
performance with that of
other students

Selects students for
certification
Criterion-referenced Test

Compares a studentÊs
performance against some criteria

Determines the extent to which a
student has acquired the
knowledge or skill

Improves teaching and learning
Types of
questions

Questions from simple to
difficult

Questions of nearly similar
difficulty relating to the criteria
Reporting of
results

Grades are assigned

No grades are assigned (whether
skill or knowledge has been
achieved or not)
Copyright © Open University Malaysia (OUM)
TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING 
Content
coverage

Wide content coverage

Specific aspects of the content
Examples

UPSR, SPM, STPM
national examinations,
end-of-semester
examinations, end-ofyear examinations

Class tests, exercises and
assignments
17
SELF-CHECK 1.2
1.
List the main differences between formative and summative
assessments.
2.
Explain the differences between norm-referenced and criterionreferenced tests.
1.6
TRENDS IN ASSESSMENT
In the past two decades, major changes have occurred in assessment practices
in many parts of the world. The following trends in educational assessment have
been identified:
(a)
Written examinations are gradually being replaced by more continuous
assessment and coursework;
(b)
There is a move towards more student involvement and choice in
assessment;
(c)
Group assessment is becoming more popular in an effort to emphasise
collaborative learning among students and to reduce excessive competition;
(d)
Subject areas and courses state more explicitly the expectations in
assessment. Students are clearer about the kinds of performance required of
them when they are assessed. This is unlike earlier practice where assessment
was so secretive that students had to figure out for themselves what was
required of them;
(e)
An understanding of the process is now seen as equally important to
knowledge of facts. This is in line with the general shift from product-based
assessment towards process-based assessment; and
Copyright © Open University Malaysia (OUM)
18
 TOPIC 1
(f)
Student-focused „learning outcomes‰ have begun to replace teacheroriented „objectives‰. The focus is more on what the students will learn
rather than what the teacher plans to teach. This is in line with the principle
of outcomes-based teaching and learning.
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING
ACTIVITY 1.2
To what extent do you agree with the current trends in assessment?
Discuss this issue with your coursemates in the myINSPIRE online
forum.

A test may be thought of as a set of tasks or questions intended to elicit
particular types of behaviours when presented under standardised conditions
to yield scores that have desirable psychometric properties.

Measurement in education is the process by which the attributes of a person
are measured and assigned numbers.

Assessment is the process of collecting information with the purpose of making
decisions about students. Assessment is aimed at helping learning and
improving teaching.

What is to be assessed must be clearly specified in the intended learning
outcomes.

An assessment procedure or method should be selected based on its relevance
to the characteristics or performance to be measured.

The assessment must be aligned to instruction and learning outcomes. The
relationship among assessment, instruction and learning outcomes is referred
to as constructive alignment.

Different assessment procedures are required to provide a complete picture of
student achievement and development.

There are four types of assessment namely formative, summative, normreferenced and criterion-referenced.
Copyright © Open University Malaysia (OUM)
TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING 
19

Formative assessment is often done at the beginning of or during, the school
year, thus providing the opportunity to obtain timely information about
students' learning in a particular subject area or at a particular point in a
programme.

Summative assessment is comprehensive in nature, provides accountability
and is used to check the level of learning at the end of a programme.

The major reason for norm-referenced tests is to classify students. These tests
are designed to highlight achievement differences between and among
students to produce a dependable rank order of students.

Criterion-referenced tests determine what students can or cannot do, and not
how they compare with others.

Among the trends in assessment are more continuous assessment and
coursework as well as more choices of assessment.
Assessment
Measurement
Constructive alignment
Norm-referenced
Criterion-referenced
Psychometrics
Formative assessment
Summative assessment
Learning outcome
Test
Anastasi, A. (1988). Psychological testing. New York, NY: MacMillan.
Biggs, J., & Tang, C. (2007). Teaching for quality learning at university. Berkshire,
England: McGraw Hill.
Deale, R. N. (1975). Assessment and testing in secondary school. Chicago, IL:
Evans Bros.
Copyright © Open University Malaysia (OUM)
20
 TOPIC 1
THE ROLE OF ASSESSMENT IN TEACHING AND LEARNING
Flanagan, D. P., Genshaft, J., & Harrison, P. L. (1997). Contemporary intellectual
assessment: Theories, tests and issues. New York, NY: Guildford Press.
Harlen, W. (1978). Does content matter in primary science? School Science Review,
59(Jun), 614ă625.
Irvine, P. (1986). Sir Francis Galton (1822ă1911). Journal of Special Education, 20(1),
6ă7.
Romberg, T. A., Zarinnia E. A., & Williams, S. R. (1989). The influence of mandated
testing on mathematics instruction: Grade eight teachersÊ perceptions.
Madison, WI: National Center for Research in Mathematical Sciences
Education.
Rowntree, D. (1974). Educational technology in curriculum development. New
York, NY: Harper & Row.
Salvia, J., & Ysseldyke, J. E. (1995). Assessment (6th ed.). Boston, MA: Houghton
Mifflin.
Scriven, M. (1996). Types of evaluation and types of evaluator. American Journal
of Evaluation, 17(2), 151ă161.
Stiggins, R. J. (1994). Student-centered classroom assessment. New York, NY:
Merrill.
Copyright © Open University Malaysia (OUM)
Topic  Foundation of
2
Assessment:
What to
Assess?
LEARNING OUTCOMES
By the end of the topic, you should be able to:

1.
Justify the behaviours to be measured to present a holistic
assessment of students;
2.
Describe the various levels of cognitive learning outcomes to be
assessed;
3.
Explain the various levels of affective learning outcomes to be
assessed; and
4.
Describe the various levels of psychomotor learning outcomes to be
assessed.
INTRODUCTION
If you were to ask a teacher about the things that should be assessed in the
classroom, the immediate response would most probably be „the facts and
concepts taught‰. The facts and concepts may be in Science, History, Geography,
Language, Arts, Religious Education and other similar subjects. The Malaysian
Philosophy of Education states that education should aim towards the holistic
development of the individual.
Copyright © Open University Malaysia (OUM)
22
 TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
Hence, it is only logical that the assessment system should also seek to assess more
than the acquisition of facts and concepts of a subject area. What about assessment
of physical and motor abilities? What about socio-emotional behaviours such as
attitudes, interests, personality and so forth? Do they not contribute to the holistic
person? Let us find out the answer as this topic will highlight what to assess.
2.1
IDENTIFYING WHAT TO ASSESS
When educators are asked what should be assessed in the classroom, the majority
would refer to evaluating the acquisition of facts, concepts, principles, procedures
and methods of a subject area. You might find a minority of educators who insist
that skills acquired by students should also be assessed especially in subjects such
as physical education, art, drama, music, technical drawing, carpentry, automobile
engineering and so forth. Even fewer educators would propose that the socioemotional behaviour of students should also be assessed.
Let us refer to the National Philosophy of Malaysian Education (see Figure 2.1),
which has important implications for assessment.
Figure 2.1: The National Philosophy of Malaysian Education
Theoretically, a comprehensive assessment system should seek to provide
information on the extent to which the National Philosophy of Education has
achieved its goal. In other words, the assessment system should seek to determine:
(a)
Whether our schools have developed „the potential of individuals in a
holistic and integrated manner‰;
(b)
Whether students have developed „intellectually, spiritually, emotionally
and physically‰;
Copyright © Open University Malaysia (OUM)
TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
23
(c)
Whether students are „knowledgeable and competent‰ and „possess high
moral standards‰;
(d)
Whether students have achieved a „high level of personal well-being‰; and
(e)
Whether students are equipped with abilities and attitudes that will enable
them „to contribute to the harmony and betterment of the family, society and
the nation at large‰.
Yet, in actual practice, assessment tends to overemphasise on assessing intellectual
competence which translates into the measurement of cognitive learning outcomes
of specific subject areas. The other aspects of the holistic individual are given
minimal attention because of various reasons. For example, how does a teacher
assess spiritual or emotional growth or development? These are constructs that are
difficult to evaluate and extremely subjective. Hence, it is no surprise that
assessment of cognitive outcomes has remained the focus of most assessment
systems all over the world because it is relatively easier to observe and measure.
However, in this topic, we will make an attempt to present a more „holistic‰
assessment of learning which focuses on three main types of human behaviour.
These are behaviours psychometricians and psychologists have attempted to
assess and are closely aligned in realising the goals of the National Philosophy of
Malaysian Education.
2.1.1
Three Types of Learning Outcomes
Few people will dispute that the purpose of schooling is the development of a
holistic person. In the late 1950s and early 1960s, a group of psychologists and
psychometricians proposed that schools should seek to assess three domains of
learning outcomes which are:
(a)
Cognitive learning outcomes (knowledge or mental skills);
(b)
Affective learning outcomes (growth in feelings or emotional areas); and
(c)
Psychomotor learning outcomes (manual or physical skills).
Copyright © Open University Malaysia (OUM)
24
 TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
These three domains are closely interrelated as shown in Figure 2.2.
Figure 2.2: Holistic assessment of students
Domains can be thought of as categories. Educators often refer to these three
domains as KSA (knowledge, skills and attitude). Each domain consists of
subdivisions, starting from the simplest behaviours to the most complex, thus
forming a taxonomy of learning outcomes. Each taxonomy of learning behaviour
can be thought of as „the goals of the schooling process‰. That is, after schooling,
the students should have acquired new skills, knowledge and/or attitudes.
However, the levels of each division outlined are not absolutes. While there are
other systems or hierarchies that have been devised in the educational world, these
three taxonomies are easily understood and are probably the most widely used
today.
To assess these three domains, one has to identify and isolate those behaviours that
represent these domains. When we assess, we evaluate some aspects of the
studentÊs behaviour, such as his or her ability to compare, explain, analyse, solve,
draw, pronounce, feel, reflect and so forth. The term „behaviour‰ is used broadly
to include the studentÊs ability to think (cognitive), feel (affective) and perform a
skill (psychomotor). For example, you have just taught the topic „The Rainforests
of Malaysia‰ and now you want to assess your students in the following ways:
(a)
Their Thinking
You may ask them to list the characteristics of the Malaysian rainforest and
compare them with those of the coniferous forests of Canada.
(b)
Their Feelings (Emotions, Attitudes)
You may ask them to design an exhibition on how students can contribute
towards conserving the rainforest.
(c)
Their Skills
You may ask them to prepare satellite maps on the changing Malaysian
rainforest using websites from the Internet.
Copyright © Open University Malaysia (OUM)
TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
25
ACTIVITY 2.1
To what extent are affective and psychomotor behaviours assessed in
Malaysian schools?
Discuss this with your coursemates in the myINSPIRE online forum.
2.2
ASSESSING COGNITIVE LEARNING
OUTCOMES OR BEHAVIOUR
When we evaluate or assess a human being, we are assessing or evaluating the
behaviour of a person. This might be a bit confusing for some people. Are we not
assessing a personÊs understanding of the facts, concepts and principles of a subject
area? Every subject, be it History, Science, Geography, Economics or Mathematics,
has its unique repertoire of facts, concepts, principles, generalisations, theories,
laws, procedures and methods to be transmitted to students. This concept can be
illustrated as in Figure 2.3.
Figure 2.3: Contents of a subject assessed
Copyright © Open University Malaysia (OUM)
26
 TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
When we assess, we do not assess studentsÊ storage of facts, concepts or principles
of a subject, but rather what students are able to do with the facts, concepts or
principles of the subject area. For example, we evaluate studentsÊ ability to
compare facts, explain concepts, analyse a generalisation (or statement) or solve a
problem based on a given principle. In other words, we assess understanding or
mastery of a body of knowledge based on what students are able to do with the
contents of the subject.
2.2.1
Bloom’s Taxonomy
In 1956, Benjamin Bloom headed a group of educational psychologists who
developed a classification of levels of intellectual behaviour important to learning.
They found that over 95 per cent of test questions students encountered required
them to think only at the lowest possible level i.e. the recall of information. Bloom
and his colleagues developed a widely accepted taxonomy (method of
classification on differing levels) for cognitive learning outcomes. This is referred
to as BloomÊs taxonomy (refer to Figure 2.4).
Figure 2.4: BloomÊs taxonomy of cognitive learning outcomes
There are six levels in BloomÊs classification with the lowest level termed
knowledge. The knowledge level is followed by five increasingly complex levels
of mental abilities namely comprehension, application, analysis, synthesis and
evaluation.
(a)
Knowledge (C1)
The behaviours at the knowledge level require students to recall specific
information. The knowledge level is the lowest cognitive level. Examples of
verbs describing behaviours at the knowledge level include the ability to list,
define, name, state, recall, match, identify, tell, label, underline, locate,
recognise and so forth.
Copyright © Open University Malaysia (OUM)
TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
27
For example, students are able to recite the factors leading to World War II,
quote formula for density and force, and list laboratory safety rules.
(b)
Comprehension (C2)
The behaviours at the comprehension level, which is a higher level of mental
ability than the knowledge level, require an understanding of the meaning
of concepts and principles, translation of words and phrases into oneÊs own
words, interpolation which involves filling in missing information, and
interpretation which involves inferring and going beyond the given
information. Examples of verbs describing behaviours at the comprehension
level are explain, distinguish, infer, interpret, convert, generalise, defend,
estimate, extend, paraphrase, retell by using own words, rewrite, translate
and so forth.
For example, students are able to rewrite NewtonÊs three laws of motion,
explain in oneÊs own words the steps for performing a complex task and
translate an equation into a computer spreadsheet.
(c)
Application (C3)
The behaviours at the application level require students to apply a rule or
principle learnt in the classroom into novel or new situations in the
workplace or unprompted use of an abstraction. Examples of verbs
describing behaviours at the application level are apply, change, compute,
demonstrate, discover, manipulate, modify, give an example, operate,
predict, prepare, produce, relate, show, solve, use and so forth.
For example, students are able to use the formula for projectile motion to
calculate the maximum distance a long jumper can jump and apply the
principles of statistics to evaluate the reliability of a written test.
(d)
Analysis (C4)
The behaviours at the analysis level require students to identify component
parts and describe their relationship, separate material or concepts into
component parts so that its organisational structure may be understood and
distinguish between facts and inferences. Examples of verbs describing
behaviours at the analysis level are analyse, break down, compare,
contrast, diagnose, deconstruct, examine, dissect, differentiate, discriminate,
distinguish, illustrate, infer, outline, relate, select and separate.
For example, students are able to troubleshoot a piece of equipment by using
logical deduction, recognise logical fallacies in reasoning, gather information
from a company and determine needs for training.
Copyright © Open University Malaysia (OUM)
28
 TOPIC 2
(e)
Synthesis (C5)
The behaviours at the synthesis level require students to build a structure or
pattern from diverse elements, put parts together to form a whole with
emphasis on creating a new meaning or structure. Examples of verbs
describing behaviours at the synthesis level are categorise, combine, compile,
compose, create, devise, design, generate, modify, organise, plan, rearrange,
reconstruct, relate, reorganise, find an unusual way, formulate, revise,
rewrite, tell, write and so forth.
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
For example, students are able to write a creative short story, design a
method to perform a specific task, integrate ideas from several sources to
solve a problem and revise a process to improve the outcome.
(f)
Evaluation (C6)
The behaviours at the evaluation level require students to make a judgement
about materials and methods, as well as the value of ideas or materials.
Examples of verbs describing behaviours at the evaluation level are appraise,
conclude, criticise, critique, defend, rank, give your own opinion,
discriminate, evaluate, value, justify, relate, support and so forth.
For example, students are able to evaluate and decide on the most effective
solution to a problem and justify the choice of a new procedure or course of
action.
2.2.2
The Helpful Hundred
Heinich, Molenda, Russell and Smaldino (2001) suggested 100 verbs which
highlighted performance or behaviours that were observable and measurable.
These 100 verbs are not the only ones but they are a great reference for educators.
Table 2.1 displays the verbs that would be appropriate to use when you are writing
learning outcomes in each level of BloomÊs taxonomy.
Copyright © Open University Malaysia (OUM)
TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
29
Table 2.1: The Helpful Hundred
add
contrast
generate
operate
ski
alphabetise
convert
graph
order
solve
analyse
correct
grasp
organise
sort
apply
cut
grind
outline
specify
arrange
deduce
hit
pack
square
assemble
defend
hold
paint
state
attend
define
identify
plot
subtract
bisect
demonstrate
illustrate
position
suggest
build
derive
indicate
predict
swing
cave
describe
install
prepare
tabulate
categorise
design
kick
present
throw
choose
designate
label
produce
time
classify
diagram
locate
pronounce
translate
colour
distinguish
make
read
type
compare
drill
manipulate
reconstruct
underline
complete
estimate
match
reduce
verbalise
compose
evaluate
measure
remove
verify
compute
explain
modify
revise
weave
conduct
extrapolate
multiply
select
weigh
construct
fit
name
sketch
write
Do note that there is a lot of overlap in the use of verbs to describe behaviours. The
same verb may be used to describe different behaviours. For example, the verb
„explain‰ may be used to describe the behaviours of evaluation (C6), analysis (C4)
and comprehension (C2) depending on the context it is used as shown as follows:
(a)
Students are able to explain how effective essay questions are in assessing
studentsÊ critical thinking ability. (C6)
(b)
Students are able to explain how essay questions are different from multiple
choice questions in assessing studentsÊ performance. (C4)
(c)
Students are able explain, in their own words, the criteria that they would
consider when formulating an essay question as a test item. (C2)
Copyright © Open University Malaysia (OUM)
30
 TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
Likewise, different verbs may be used to describe the same behaviour. For
example, the behaviour of „analysis‰ may be expressed using the verbs such as
„compare and contrast‰, „explain‰ and „distinguish‰ as follows:
(a)
Students are able to compare and contrast formative assessment from
summative assessment. (C4)
(b)
Students are able to explain the difference between formative assessment and
summative assessment. (C4)
(c)
Students are able to distinguish formative assessment from summative
assessment. (C4)
In 2001, Anderson and Krathwohl modified the original BloomÊs taxonomy and
identified and isolated the following list of behaviours that an assessment system
should address (refer to Table 2.2).
Table 2.2: Revised Version of BloomÊs Taxonomy
Category and Cognitive
Process
1.
2.
3.
4.
Alternative Names
Remember
(a)
Recognising
Ć Identifying
(b)
Recalling
Ć Retrieving
Understand
(a)
Interpreting
Ć Clarifying, paraphrasing, representing, translating
(b)
Exemplifying
Ć Illustrating, instantiating
(c)
Classifying
Ć Categorising, subsuming
(d)
Summarising
Ć Abstracting, generalising
(e)
Inferring
Ć Concluding, extrapolating, interpolating, predicting
(f)
Comparing
Ć Contrasting, mapping, matching
(g)
Explaining
Ć Constructing models
Apply
(a)
Executing
Ć Carrying out
(b)
Implementing
Ć Using
Analyse
(a)
Differentiating
Ć Discriminating, distinguishing, focusing, selecting
(b)
Organising
Ć Finding coherence, integrating, outlining, structuring
(c)
Attributing
Ć Deconstructing
Copyright © Open University Malaysia (OUM)
TOPIC 2
5.
6.
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
31
Evaluate
(a)
Checking
Ć Coordinating, detecting, monitoring, testing
(b)
Critiquing
Ć Judging
Create
(a)
Generating
Ć Hypothesising
(b)
Planning
Ć Designing
(c)
Producing
Ć Constructing
Source: Anderson and Krathwohl (2001)
Note that the first two original levels of „knowledge‰ and „comprehension‰ were
replaced with „remember‰ and „understand‰ respectively. The „synthesis‰ level
was renamed „create‰. Note that in the original taxonomy, the sequence was
„synthesis‰ followed by „evaluate‰. In the modified taxonomy, however, the
sequence was rearranged to „evaluate‰ followed by „create‰.
As you can see, the primary differences between the „original‰ and the revised
taxonomy are not in the listings or rewordings from nouns to verbs, or in the
renaming of some of the components or even in the re-positioning of the last two
categories. The major differences lie in the more useful and comprehensive
additions of how the taxonomy intersects and acts upon different types and levels
of knowledge ă factual, conceptual, procedural and metacognitive.
(a)
Factual Knowledge
It refers to essential facts, terminology, details or elements students must
know or be familiar with in order to understand a discipline or solve a
problem in it.
(b)
Conceptual Knowledge
The knowledge of classifications, principles, generalisations, theories,
models or structures pertinent to a particular disciplinary area.
(c)
Procedural Knowledge
It refers to information or knowledge that helps students to do something
specific to a discipline, subject or area of study. It also refers to methods of
inquiry, very specific or finite skills, algorithms, techniques and particular
methodologies.
Copyright © Open University Malaysia (OUM)
32
 TOPIC 2
(d)
Metacognitive Knowledge
Metacognition is, simply, thinking about oneÊs thinking. More precisely, it
refers to the processes used to plan, monitor and assess oneÊs understanding
and performance. Activities such as planning how to approach a given
learning task, monitoring comprehension and evaluating progress towards
the completion of a task are metacognitive in nature.
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
SELF-CHECK 2.1
1.
Explain the differences between analysis and synthesis according
to BloomÊs taxonomy.
2.
How is the revised version of BloomÊs taxonomy different
from the original version?
ACTIVITY 2.2
Discuss in the myINSPIRE online forum:
(a)
Do you agree that BloomÊs taxonomy is a hierarchy of cognitive
abilities? Why?
(b)
Do you agree that you need to be able to „analyse‰ before being
able to „evaluate‰? Why?
2.3
ASSESSING AFFECTIVE LEARNING
OUTCOMES OR BEHAVIOUR
Affective characteristics involve the feelings or emotions of a person. Attitudes,
values, self-esteem, locus of control, self-efficacy, interests, aspirations and anxiety
are all examples of affective characteristics. Unfortunately, affective outcomes
have not been a central part of schooling, even though they are arguably as
important as or even more important than, any cognitive or psychomotor domain
of learning outcomes targeted by schools. Some possible reasons for the historical
lack of emphasis on affective outcomes include the following:
Copyright © Open University Malaysia (OUM)
TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
33
(a)
The belief that the development of appropriate feelings is the task of family
and religion.
(b)
The belief that appropriate feelings develop automatically from knowledge
and experience with content and do not require any special pedagogical
attention.
(c)
Attitudinal and value-oriented instructions are difficult to develop and
assess because:
(i)
Affective goals are intangible;
(ii)
Affective learning outcomes cannot be attained in the typical periods
of instruction offered in schools;
(iii) Affective characteristics are considered to be private rather than public
matters; and
(iv) There are no sound methods to gather information about affective
characteristics.
However, affective goals are no more intangible than cognitive ones. Some have
claimed that affective behaviours can be developed automatically when specific
knowledge is taught while others argue that affective behaviours have to be
explicitly developed in schools. Affective goals do not necessarily take longer to
achieve in the classroom than cognitive goals. All that is required is to state a goal
more concretely and behaviourally oriented so that it can be assessed and
monitored.
There is also the belief that affective characteristics are private and should not be
made public. While people value their privacy, the public also has the right to
information. If the information gathered is needed to make a decision, then
gathering of such information is not generally considered an invasion of privacy.
For example, if the assessment is used to determine if a student needs further
attention such as special education, then gathering such information is not an
invasion of privacy. On the other hand, if the information being sought is not
relevant to the stated purpose, then gathering of such information is likely to be
an invasion of privacy.
Similarly, information about affect can be used either for good or ill purposes. For
example, if a Mathematics teacher discovers that a student has a negative attitude
towards Mathematics and ridicules that student in front of the class, then the
information has been misused. However, if the teacher uses the information to
change his or her instructional methods so as to help the student develop a more
positive attitude towards Mathematics, then the information has been used wisely.
Copyright © Open University Malaysia (OUM)
34
 TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
Krathwohl, Bloom and Bertram (1973) and his colleagues developed the affective
domain which deals with things related to emotion, such as feelings, values,
appreciation, enthusiasm, motivation and attitudes. The five major categories
listed the simplest behaviour to the most complex: receiving, responding, valuing,
organisation and characterisation (refer to Figure 2.5).
Figure 2.5: Krathwohl, Bloom and BertramÊs taxonomy of affective learning outcomes
Source: Krathwohl et al. (1973)
These categories are further explained as follows:
(a)
(b)
Receiving (A1)
The behaviours at the receiving level require the student to be aware, willing
to hear and focused or attentive. Verbs describing behaviours at the receiving
level include ask, listen, choose, describe, follow, give, hold, locate, name,
point to, select, reply and so forth. For example, the student:
(i)
Listens to others with respect; and
(ii)
Listens and remembers the names of other students.
Responding (A2)
The behaviours at the responding level require the student to be an active
participant, attend and react to a particular phenomenon, willing to respond
and gain satisfaction in responding (motivation). Verbs describing
behaviours at the responding level include answer, assist, aid, comply,
conform, discuss, greet, help, label, perform, practise, present, read, recite,
report, select, tell, write and so forth. For example, the student:
(i)
Participates in class discussion;
(ii)
Gives a presentation; and
(iii) Questions new ideals, concepts or models in order to fully understand
them.
Copyright © Open University Malaysia (OUM)
TOPIC 2
(c)
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
35
Valuing (A3)
This level relates to the worth or value a person attaches to a particular object,
phenomenon or behaviour. This ranges from simple acceptance to the more
complex state of commitment. Valuing is based on the internalisation of a set
of specified values, while clues to these values are expressed in the student
as overt behaviour and are often identifiable. Verbs describing behaviours at
the valuing level include demonstrate, differentiate, follow, form, initiate,
invite, join, justify, propose, read, report, select, share, study, work and so
forth. For example, the student:
(i)
Demonstrates belief in the democratic process;
(ii)
Is sensitive towards individual and cultural differences (values
diversity);
(iii) Shows the ability to solve problems;
(iv) Proposes a plan for social improvement; and
(v)
(d)
Follows through with commitment.
Organisation (A4)
At this level, people organise values into priorities by contrasting different
values, resolving conflicts between them and creating a unique value
system. The emphasis is on comparing, relating and synthesising
values. Verbs describing behaviours at the level of organisation are adhere,
alter, arrange, combine, compare, complete, defend, explain, formulate,
generalise, identify, integrate, modify, order, organise, prepare, relate,
synthesise and so forth. For example, the student:
(i)
Recognises the need for balance between freedom and responsible
behaviour;
(ii)
Accepts responsibility for his or her behaviour;
(iii) Explains the role of systematic planning in solving problems;
(iv) Accepts professional ethical standards;
(v)
Creates a life plan in harmony with abilities, interests and beliefs; and
(vi) Prioritises time effectively to meet the needs of the organisation, family
and self.
Copyright © Open University Malaysia (OUM)
36
 TOPIC 2
(e)
Characterisation (A5)
At this level, a personÊs value system controls his or her behaviour. The
behaviour is pervasive, consistent, predictable and most importantly,
characteristic of the student. Verbs describing behaviours at this level
include act, discriminate, display, influence, listen, modify, perform,
practise, propose, qualify, question, revise, serve, solve and verify. For
example, the student:
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
(i)
Shows self-reliance when working independently;
(ii)
Cooperates in group activities (displays teamwork);
(iii) Uses an objective approach in problem solving;
(iv) Displays a professional commitment to ethical practice on a daily basis;
(v)
Revises judgement and changes behaviour in light of new evidence;
and
(vi) Values people for what they are and not how they look.
Table 2.3 shows how affective taxonomy may be applied to a value such as
honesty. It traces the development of an affective attribute such as honesty from
the „receiving‰ level until the „characterisation‰ level where the value becomes a
part of the individualÊs character.
Table 2.3: Affective Taxonomy for Honesty
Individual Character
Explanation
Receiving (attending)
Aware that certain things are honest or dishonest
Responding
Saying honesty is better and behaving accordingly
Valuing
Consistently (but not always) telling the truth
Organisation
Being honest in a variety of situations
Characterisation by a value or
value complex
Honest in most situations, expects others to be honest
and interacts with others fully and honestly
Copyright © Open University Malaysia (OUM)
TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
37
SELF-CHECK 2.2
1.
Explain the differences between characterisation and valuing
according to the affective taxonomy of learning outcomes.
2.
„A student is operating at the responding level.‰ What does this
mean?
ACTIVITY 2.3
The Role of Affect in Education
„Some say schools should be concerned only with content.‰
„It is impossible to teach content without also teaching affect.‰
„To what extent, if at all, should we be concerned with the assessment of
affective learning outcomes?‰
In the myINSPIRE online forum, discuss the three statements in the
context of the Malaysian education system.
2.4
ASSESSING PSYCHOMOTOR LEARNING
OUTCOMES OR BEHAVIOUR
The psychomotor domain includes physical movement, coordination and use of
motor-skill areas. Development of these skills requires practice and is measured in
terms of speed, precision, distance, procedures and techniques in execution. There
are seven major categories listed in this domain from the simplest to the most
complex behaviour as shown in Figure 2.6.
Copyright © Open University Malaysia (OUM)
38
 TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
Figure 2.6: Taxonomy of psychomotor learning outcomes
These learning outcomes are further explained as follows:
(a)
Perception (P1)
This is the ability to use sensory cues to guide motor activity. It ranges from
sensory stimulation and cue selection to translation. Verbs describing these
types of behaviours include choose, describe, detect, differentiate,
distinguish, identify, isolate, relate, select and so forth. For example, the
student:
(i)
Detects non-verbal communication cues from the coach;
(ii)
Estimates where a ball will land after it is thrown and then moves to
the correct location to catch the ball;
(iii) Adjusts the heat of the stove to the correct temperature by the smell and
taste of food; and
(iv) Adjusts the height of the ladder in relation to the point on the wall.
(b)
Set (P2)
This includes mental, physical and emotional sets. These three sets are
dispositions that predetermine a personÊs response to different situations
(sometimes called mindsets). Verbs describing „set‰ include begin, display,
explain, move, proceed, react, show, state and volunteer. For example, the
student:
(i)
Knows and acts upon a sequence of steps in a manufacturing process;
(ii)
Recognises his or her abilities and limitations; and
(iii) Shows the desire to learn a new process (motivation).
Note: This subdivision of the psychomotor domain is closely related to the
„responding‰ subdivision of the affective domain.
Copyright © Open University Malaysia (OUM)
TOPIC 2
(c)
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
39
Guided Response (P3)
The early stages in learning a complex skill that includes imitation, and trial
and error. Adequacy of performance is achieved by practising. Verbs
describing „guided response‰ include copy, trace, follow, react, reproduce
and respond. For example, the student:
(i)
Performs a mathematical equation as demonstrated;
(ii)
Follows instructions when building a model of a kampung house; and
(iii) Responds to the hand signals of the coach while learning gymnastics.
(d)
Mechanism (P4)
This is the intermediate stage in learning a complex skill. Learned responses
have become habitual and the movements can be performed with some
confidence and proficiency. Verbs describing „mechanism‰ include
assemble, calibrate, construct, dismantle, display, fasten, fix, grind, heat,
manipulate, measure, mend, mix and organise. For example, the student:
(i)
Uses a computer;
(ii)
Repairs a leaking tap;
(iii) Fixes a three-pin electrical plug; and
(iv) Rides a motorbike.
(e)
Complex Overt Response (P5)
The skilful performance of motor acts that involve complex movement
patterns. Proficiency is indicated by a quick, accurate and highly coordinated
performance, requiring a minimum of energy. This category includes
performing without hesitation and automatic performance. For example,
players often utter sounds of satisfaction or expletives as soon as they hit a
tennis ball (like world famous tennis players Maria Sharapova and Serena
Williams) or a golf ball (golfers will immediately know they have hit a bad
shot!) because they can tell by the feel of the act and what the result will be.
Verbs describing „complex overt responses‰ include assemble, build,
calibrate, construct, dismantle, display, fasten, fix, grind, heat, manipulate,
measure, mend, mix, organise and sketch. For example, the student:
(i)
Manoeuvres a car into a tight parallel parking spot;
(ii)
Operates a computer quickly and accurately; and
(iii) Displays competence while playing the piano.
Copyright © Open University Malaysia (OUM)
40
 TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
Note: Many of the verbs are the same as „mechanism‰, but there are adverbs
or adjectives that indicate that the performance is quicker, better and more
accurate.
(f)
Adaptation (P6)
Skills are well developed and the individual can modify movement patterns
to fit special requirements. Verbs describing „adaptation‰ include adapt,
alter, change, rearrange, reorganise, revise and vary. For example, the
student:
(i)
Responds effectively to unexpected experiences;
(ii)
Modifies instruction to meet the needs of the students; and
(iii) Performs a task with a machine that it was not originally intended to
do (machine is not damaged and there is no danger in performing the
new task).
(g)
Origination (P7)
Creating new movement or pattern to fit a particular situation or specific
problem. Learning outcomes emphasise creativity based on highlydeveloped skills. Verbs describing „origination‰ include arrange, build,
combine, compose, construct, create, design, initiate, make, and originate.
For example, the student:
(i)
Constructs a new theory;
(ii)
Develops a new technique for goalkeeping; and
(i)
Creates a new gymnastic routine.
Table 2.4 shows how psychomotor taxonomy may be applied to kicking a football.
It traces the development of the psychomotor skill of kicking a football from the
„perception‰ level until the „origination‰ level.
Copyright © Open University Malaysia (OUM)
TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
41
Table 2.4: Psychomotor Taxonomy for Kicking a Football
Level
Explanation
Perception
Able to estimate where the ball would land after it was kicked
Responding
Shows the desire to learn and perform a kicking technique
Guided response
Able to kick the ball under guidance through trial and error or
imitation
Mechanism
Able to kick the ball mechanically with some confidence and
proficiency
Complex overt
response
Able to kick the ball skilfully using a proper technique learnt
Adaptation
Able to modify the kicking technique to suit different situations
Origination
Able to create a new kicking technique
SELF-CHECK 2.3
1.
Explain the differences between adaptation and guided response
according to the taxonomy of psychomotor learning outcomes.
2.
„A student is operating at the origination level.‰ What does this
mean?
2.5
IMPORTANT TRENDS IN WHAT TO
ASSESS
Since the influence of testing on curriculum and instruction is now widely
acknowledged, educators, policymakers and others are turning to alternative
assessment methods as a tool for educational reform. The call is to move away
from traditional objective and essay tests towards alternative assessments focusing
on authentic assessment and performance assessment (we will discuss these
assessment methods in Topics 5 and 6). Various techniques have been proposed to
assess learners more holistically, focusing on both the product and process of
learning (refer to Figure 2.7).
Copyright © Open University Malaysia (OUM)
42
 TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
Figure 2.7: Trends in what to assess
Source: Dietel, Herman and Knuth(1991)

Assessment of cognitive outcomes has remained the focus of most assessment
systems all over the world because it is relatively easier to observe and
measure.

Each domain of learning consists of subdivisions, starting from the simplest
behaviours to the most complex, thus forming a taxonomy of learning
outcomes.

When we evaluate or assess a human being, we are assessing or evaluating the
behaviour of a person.

Every subject area has its unique repertoire of facts, concepts, principles,
generalisations, theories, laws, procedures and methods to be transmitted to
learners.
Copyright © Open University Malaysia (OUM)
TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS? 
43

There are six levels in BloomÊs taxonomy of cognitive learning outcomes with
the lowest level termed knowledge, followed by five increasingly difficult
levels of mental abilities: comprehension, application, analysis, synthesis
and evaluation. The six levels in the revised version are remembering,
understanding, applying, analysing, evaluating and creating.

Affective characteristics involve the feelings or emotions of a person. Attitudes,
values, self-esteem, locus of control, self-efficacy, interests, aspirations and
anxiety are all examples of affective characteristics.

The five major categories of the affective domain from the simplest behaviour
to the most complex are receiving, responding, valuing, organisation and
characterisation.

The psychomotor domain includes physical movement, coordination and use
of the motor-skill areas.

The seven major categories of the psychomotor domain from the simplest
behaviour to the most complex are perception, set, guided response,
mechanism, complex overt response, adaptation and origination.

The ideal situation is an alignment of objectives, instruction and assessment.

The trend in assessment is to move away from traditional objective and essay
tests towards alternative assessments focusing on authentic assessment and
performance assessment.
Affective outcomes
Cognitive-constructivist
Authentic assessment
Cognitive outcomes
Behaviour
Holistic assessment
Behavioural view
Psychomotor outcomes
BloomÊs taxonomy
The Helpful Hundred
Copyright © Open University Malaysia (OUM)
44
 TOPIC 2
FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?
Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching,
and assessing: A revision of BloomÊs taxonomy of educational objectives.
Boston, MA: Allyn & Bacon.
Dietel, R., Herman, J., & Knuth, R. (1991). What does research say about
assessment. Retrieved from https://bit.ly/2ECzOXP
Dwyer, F. M. (1991). A paradigm for generating curriculum design oriented
research questions in distance education. Second American Symposium
Research in Distance Education. University Park, PA: Pennsylvania State
University.
Heinich, R., Molenda, M., Russell, J. D., & Smaldino, S. E. (2001). Instructional
media and technologies for learning (7th ed.). Englewood Cliffs, NJ: Prentice
Hall.
Krathwohl, D., Bloom, B., & Bertram, B. (1973). Taxonomy of educational
objectives, the classification of educational goals, handbook II: Affective
domain. New York, NY: David McKay.
Copyright © Open University Malaysia (OUM)
Topic  Planning
3
Classroom
Tests
LEARNING OUTCOMES
By the end of the topic, you should be able to:

1.
Describe the process of planning a classroom test;
2.
Explain the purposes of a test and their impotance in test planning;
3.
Describe how learning outcomes to be assessed affect test planning;
4.
Select the best item types for a test in line with learning outcomes;
5.
Develop a table of specifications for a test;
6.
Identify appropriate marking schemes for an essay test; and
7.
Explain the general principles of constructing relevant test items.
INTRODUCTION
In this topic, we will focus on the process of planning classroom tests. Testing is
part of the teaching and learning process. The importance of planning and writing
a reliable, valid and fair test cannot be underestimated. Designing tests is an
important part of assessing studentsÊ understanding of course content and their
level of competency in applying what they have learnt. Whether you use lowstakes quizzes or high-stakes mid-semester and final examination tests, careful
design will help provide more calibrated results. Assessments should reveal how
well students have learnt what teachers want them to learn while instruction
ensures that they learn it.
Copyright © Open University Malaysia (OUM)
46
 TOPIC 3
PLANNING CLASSROOM TESTS
Thus, thinking about summative assessment at the end of a programme of teaching
is not enough. It is also helpful to think about assessment at every stage of the
planning process, because identifying the ways in which teachers will assess their
students will help clarify what it is that teachers want them to learn, and this in
turn will help determine the most suitable learning activities.
This topic will discuss the general guidelines applicable to most assessment tools
when planning a test. Topics 4 and 5 will discuss in detail the objectives of essay
tests. The authentic assessment tools such as projects and portfolios will be
discussed in the respective topics.
3.1
PURPOSES OF CLASSROOM TESTING
Tests can refer to traditional paper-and-pencil or computer-based tests, such as
multiple choice, short answer and essay tests. Tests provide teachers with objective
feedback as to how much students are learning and how much they have
understood what they have learnt. Commercially published achievement tests to
some extent can provide evaluation of the knowledge levels of individual students,
but provide only limited instructional guidance in assessing the wide range of
skills taught in any given classroom.
Teachers know their students and they are the best assessors of their students.
Tests developed by the individual teachers for use with their own class are most
instructionally relevant. Teachers can tailor tests to emphasise the information
they consider important and to match the ability levels of their students. If
carefully constructed, classroom tests can provide teachers with accurate and
useful information about the knowledge retained by their students.
The key to this process is the test questions that are used to elicit evidence of
learning. Test questions and tasks are not just planning tools; they also form an
essential part of the teaching sequence. Incorporating the tasks into teaching and
using the evidence about the student learning to determine what happens next in
the lesson is truly an embedded formative assessment.
Copyright © Open University Malaysia (OUM)
TOPIC 3
3.2
PLANNING CLASSROOM TESTS 
47
PLANNING A CLASSROOM TEST
A well constructed test must have high quality items. The well constructed test is
an instrument that provides accurate measure of test takerÊs ability within a
particular domain. It is worth spending time writing high quality items for the
tests. In order to produce high quality questions, the test construction has to be
properly planned. Let us look at the following steps of planning a test (refer to
Figure 3.1).
Figure 3.1: Planning a test
3.2.1
Deciding Its Purposes
The first step in test planning is to decide on the purpose of the test. Tests can be
used for many different purposes. If a test is to be used formatively, it should
indicate precisely what the student needs to study, and to what level. The purpose
of formative tests is to assess progress and to direct the learning process. These
tests will have limited sample of content and learning outcomes. Teachers must
prepare sufficient mix of easy and difficult items. These items are used to make
corrective prescriptions such as practice exercises for some students who do not
perform satisfactorily in the tests.
Copyright © Open University Malaysia (OUM)
48
 TOPIC 3
PLANNING CLASSROOM TESTS
If a test is to be used summatively, the coverage of content and learning outcomes
would be different from that of formative tests. Summative tests are normally
conducted at the end of a teaching and learning phase, for example, at the end of
a course. They are used to determine the studentsÊ mastery level of the course and
help teachers to decide whether a particular student can proceed to the next level
of his or her studies. The summative tests should therefore cover the whole content
areas and learning outcomes of the course, or should at least cover a representative
sample of the contents and learning outcomes of the course. The test items are also
varied in their levels of difficulty and complexity as defined by the learning
outcomes.
Tests can also serve a diagnostic purpose. Diagnostic tests are used to find out
what students know and do not know, and their strengths and weaknesses. They
typically happen at the start of a new phase of education, like when they start
learning a new course. The tests normally cover topics (content as well as learning
outcomes) that students will study in the upcoming course. The test items included
in the test are usually simple. Besides, diagnostic tests are also used to „diagnose‰
the learning difficulties encountered by students. When used for this purpose, the
tests will cover specific content areas and learning outcomes and hope to unravel
the causes of the learning problems so that remediation can be implemented.
3.2.2
Specifying the Intended Learning Outcomes
The focus of instruction in a course of study is not mere acquisition of knowledge
by students but more importantly on how they can use and apply the acquired
knowledge in different and meaningful situations. The latter has been referred to
as course learning outcomes (CLOs), which should cover the cognitive, affective
and psychomotor domains as explained in Topic 2. In other words, the emphasis
in instruction should be on the mastery of CLOs when teachers deliver the content
covered in the topics of the course. The syllabus of a course should therefore
present not only the relevant content areas in the form of topics but also indicate
the CLOs to be achieved. A course of study might have a number of topics but only
three to five CLOs. For instance, for an Educational Assessment course, there may
be 10 topics to be covered with four CLOs, which are spread across the 10 topics
as shown in Table 3.1.
Copyright © Open University Malaysia (OUM)
TOPIC 3
PLANNING CLASSROOM TESTS 
49
Table 3.1: Mapping of Course Learning Outcomes Across Topics
CLO
Topic
1
Explain the
Different
Principles and
Theories of
Educational
Testing and
Assessment (C2)
Compare the
Different
Methods of
Educational
Testing and
Assessment
(C4)
Develop
Different
Assessment
Methods for
Use in the
Classroom (C3)
Critically Evaluate
the Suitability of
Different
Assessment
Methods for Use in
the Classroom (C6)
x
2
x
3
x
x
x
4
x
x
x
5
x
6
x
x
7
x
x
8
x
9
x
10
x
Note: The parentheses indicate the levels of complexity according to the BloomÊs
taxonomy
In line with the principle of constructive alignment, assessment of a course should
also focus on the mastery of CLOs. CLOs are normally written in general terms.
Under each topic, the learning outcome is more specific and is often referred to as
an intended learning outcome (ILO). In assessing a topic of a course, it is
imperative that its ILO is clearly specified. Table 3.2 states examples of CLO and
its related ILO for a specific topic in the Educational Assessment course (i.e.
Portfolio Assessment)
Table 3.2: Example of Course Learning Outcome (CLO) and Intended
Learning Outcome (ILO)
CLO
Critically evaluate the suitability of different assessment methods for use in the
classroom (C6)
ILO
Critically evaluate the usefulness of portfolios as an assessment tool (C6)
Copyright © Open University Malaysia (OUM)
50
 TOPIC 3
PLANNING CLASSROOM TESTS
A word of caution. Remember, not all ILOs can be assessed by tests. Tests are only
appropriate in assessing cognitive learning outcomes. For example, of the
following three intended learning outcomes (ILO), only ILO 1 can be assessed by
a test using an essay question. On the other hand, ILO 2, which belongs to the
pyschomotor domain is more appropriately assessed by practical work via teacher
observation, while ILO 3 which belongs to the affective domain, may be assessed
during the implementation of the class project via peer evaluation.
ILO 1 Explain the differences among the cognitive, affective and psychomotor
domains of learning outcomes.
ILO 2 Demonstrate the proper technique of executing a table tennis top-spin
in service.
ILO 3 Work collaboratively with other students in the team to complete the
class project.
SELF-CHECK 3.1
1.
What type of learning outcome in BloomÊs taxonomy can be
assessed by tests? Why?
2.
How is the intended learning outcome (ILO) different from course
learning outcome (CLO)?
3.2.3
Selecting Best Item Types
Once the intended learning outcomes (ILOs) for the topics to be assessed have been
specified, the next step in planning a test is to select the best item types. Different
item types have different purposes and are different in their usefulness. Table 3.3
shows two common item types used in a test ă multiple-choice and essay questions
and their respective purposes and usefulness. Refer to Topics 4 and 5 for more
details.
Copyright © Open University Malaysia (OUM)
TOPIC 3
PLANNING CLASSROOM TESTS 
51
Table 3.3: Item Types and Their Respective Purposes and Usefulness
No.
1
2
Item Type
Multiple-choice
Essay
Purpose and Usefuness

Test for factual knowledge

Assess a large number of items

Score rapidly, accurately and objectively

Require candidates to write an extended piece on a
certain topic

Assess higher-order thinking skills such as analysing,
sythesising and evaluating in BloomÊs taxonomy
It is thus imperative that the item types selected in assessment should be relevant
to the ILO to be assessed. There must be a close match between the ILOs and the
types of items to be used. For example, if the ILO is to develop the ability to
organise ideas, the use of multiple-choice test would be a poor choice. The best
item type would be an essay question. The following are two intended learning
outcomes (refer to Table 3.4). Can you select the best item types to assess them?
Table 3.4: Examples of Intended Learning Outcomes
ILO 1
Discuss the usefulness of portfolios as an assessment tool in education.
ILO 2
Define what a portfolio is.
ILO 1 requires students to present a discussion. They need to thoroughly review,
examine, debate or argue the pros and cons of a subject. To do this, they need to
write an extended response. ILO 1 can only be assessed by an essay test. However,
ILO 2 merely requires students to identify a definition. A multiple-choice question
(MCQ) is good enough to perform the assessment task.
Copyright © Open University Malaysia (OUM)
52
 TOPIC 3
PLANNING CLASSROOM TESTS
ACTIVITY 3.1
The following is a list of learning outcomes. Identify the best item type
to assess each of them.
No.
Learning Outcome
MCQ/Essay
1
Name the levels of BloomÊs taxonomy and identify
the intellectual behaviour each refers to.
2
Devise a table of specification, complete with
information on what to assess and how to assess.
3
Discuss the strengths and weaknesses of using
essay questions as an assessment tool.
4
Defend the use of portfolios for classroom
assessment.
5
Define norm-referenced and criterion-referenced
assessments.
6
Explain the purposes of assessment in education.
7
Describe the process involved in planning a test.
8
Illustrate the use of item analysis in assessing the
quality of a MCQ.
9
Differentiate between formative and summative
assessments.
10
Develop appropriate scoring rubrics as marking
schemes for essay questions.
11
State the advantages and disadvantages
multiple-choice items as an assessment tool.
12
Examine the usefulness of project work as an
assessment tool.
of
Copyright © Open University Malaysia (OUM)
TOPIC 3
3.2.4
PLANNING CLASSROOM TESTS 
53
Developing a Table of Specifications
Making a test blueprint or table of specifications is the next important step that
teachers should do. The table presents the topics of the course, the cognitive
complexity levels of the test items according to BloomÊs taxonomy, the number of
test items corresponding to the number of hours devoted to the topics and course
learning outcomes in class. In fact, the decision of exactly how many test items to
include in a test is based on the importance of the topics and learning outcomes as
indicated by student learning time, the item types used, and also the amount of
time available for testing.
A table of specifications is a two-way table with the cognitive complexity levels
across the top, and the topics and course learning outcomes to be covered by a test
and hours of interaction down one side. The item numbers associated with each
topic are presented under the complexity level as determined by the CLO.
Table 3.5 presents an example of a table of specifications with MCQs as the item
type. For ease of understanding, let us assume that the test will only cover the
first three complexity levels of BloomÊs taxonomy, namely Knowledge (C1),
Comprehension (C2) and Application (C3).
Copyright © Open University Malaysia (OUM)
 TOPIC 3
PLANNING CLASSROOM TESTS
Table 3.5: Table of Specifications: MCQ Item Type
54
Copyright © Open University Malaysia (OUM)
TOPIC 3
PLANNING CLASSROOM TESTS 
55
In this example, the vertical columns on the left of the two-way table show a list of
the topics covered in class and the amount of time spent on those topics. The
amount of time spent on the topics as shown in the column „Hours of Interaction‰
is used as a basis to compute the weightage or percentage (% hours) and the marks
allocated. For a test with MCQs, the marks allocated also indicate the number of
test items for each topic.
In this hypothetical case, the teacher has spent 20 hours teaching the three topics
of which 4 hours are alloted to Topic 1. Thus, 4 hours from a total of 20 hours
amount to 20 per cent or six items from the total of 30 items as planned by the
teacher. Likewise, the weightage or percentage and the marks alloted for Topics 2
and 3 are computed in the same manner. The weightage and number of items for
Topic 2 are 30 per cent and nine items respectively. For Topic 3, they are 50 per
cent and 15 items respectively.
Based on the cognitive complexity level of the CLO for each topic, the teacher will
decide on the number of items to be included under each level. This information
is presented in the cells of the column on Item No. For example, the cognitive
complexity level of the CLO1 for Topic 1 is C2, the teacher has decided to have two
items at C1 (i.e. items 1 and 2) and four items at C2 (i.e. items 10, 11, 12 and 13). Of
course, he or she can decide to have all the six items framed at C2, but not at C3.
For Topic 2, the number of items required is nine at C2. Again, the teacher has
decided to have some items at C1 (i.e. four items) and the rest at C2 (i.e. five items)
to make up the required number of items. Topic 3 seems to be the most important
topic and it requires 15 items, i.e. half of the total in the test, and the teacher has
decided to have three items at C1, six items at C2 and another six items at C3.
Overall, of the total 30 items in the test, 30 per cent of them are at C1, 50 per cent
at C2 and 20 per cent at C3. The teacher, of course, might have a reason for such a
distribution. Perhaps, he or she feels that this is the beginning of the course, and
he or she wants to focus on the understanding of the key concepts of the course.
Whatever it is, the decision is the prerogative of the teacher who knows best on
what and how he or she wants to assess the students.
Table 3.6 is another example of a table of specifications. The table focuses on essay
items.
Copyright © Open University Malaysia (OUM)
 TOPIC 3
PLANNING CLASSROOM TESTS
Table 3.6: Table of Specifications: Essay Questions
56
Copyright © Open University Malaysia (OUM)
TOPIC 3
PLANNING CLASSROOM TESTS 
57
The first vertical column on the left presents the five topics identified for
assessment, followed by hours of interaction for each topic in the second column.
Based on this formation, the teacher can work out the weightage in terms of %
hours and marks for each topic. In this hypothetical case, the teacher has spent
50 hours teaching the five topics of which 5 hours each are alloted to Topics 1 and
2. Thus, 5 hours from a total of 50 hours amount to 10 per cent or 10 marks from
the total of 100 total marks as planned by the teacher. Likewise, the weightage or
percentage and the marks alloted for Topic 2, Topic 3, Topic 4 and Topic 5 are
computed in the same manner. The weightage and marks alloted for Topic 2,
Topic 3, Topic 4 and Topic 5 are 10, 20, 20 and 40 respectively.
Based on the marks alloted and the cognitive complexity levels of the CLOs, the
teacher then decides how he or she is going to distribute the marks according to
the levels of complexity. For example, for Topic 1, he or she can have one essay
item carrying 10 marks at C3 or two essay items, one at C2 and the other at C3 and
each carries 5 marks. In this hypothetical case, the teacher has decided to have two
items for Topic 1 and another two for Topic 2, each carrying 5 marks. The four
items make up the Section A of the test. For Topic 3, he or she has decided to
distribute the 20 marks between two items, each carrying 10 marks. He or she has
done the same for Topic 4. This makes up the Section B. Section C is alloted to
Topic 5 with two items, each carrying 20 marks. This is just an example of how the
marks for each topic are distributed and the number of items decided. There can
be many other variations, of course.
So far, we have looked at the table of specifications in the form of a two-way table.
A table of specifications can be in the form of a three-way table with item types as
an additional level. Whatever the format, the table of specifications is a very useful
piece of document in assessment. This kind of table ensures that a fair and
representative sample of items or questions appear in the test. Teachers cannot
measure every piece of content in the syllabus and cannot ask every question they
might wish to ask. A table of specifications allows the teacher to construct a test
which focuses on the key contents as defined by the weights in percentages given
to them. A table of specifications provides the teacher with evidence that a test
has content validity, that it covers what should be covered. This table also allows
teacher to view the test as a whole.
The teacher, especially a newly trained one, is advised to have this table of
specifications together with the subject syllabus reviewed by the subject expert or
the subject head of department whether the test plan has included what it is
supposed to measure. In other words, it is important that the table of specifications
must have content validity. To ensure this, the students should ideally not be given
choices in a test. Without choices, all students are thus assessed equally.
Copyright © Open University Malaysia (OUM)
58
 TOPIC 3
PLANNING CLASSROOM TESTS
SELF-CHECK 3.2
What is a table of specifications?
ACTIVITY 3.2
1.
Have you used a table of specifications?
2.
Identify a course of your choice, prepare a table of specifications for
a test.
Share your answers with your coursemates in the myINSPIRE online
forum.
3.2.5
Constructing Test Items
Once a valid table of specifications has been prepared, the next step is constructing
the test items. While the different item types such as multiple choice, short answer,
true-false, matching and essay items are constructed differently, the following
principles apply to constructing test items in general.
(a)
Make the instructions for each type of item simple and brief;
(b)
Use simple and clear language in the questions;
(c)
Write items that are appropriate for the learning outcomes to be measured;
(d)
Do not provide clue or suggest the answer to one question in the body of
another question;
(e)
Avoid writing questions in the negative. If you must use negatives, highlight
them, as they may mislead students into answering incorrectly;
(f)
Specify the precision of answers;
(g)
Try as far as possible to write your own questions. Check to make sure the
questions fit the learning objectives and requirements in the table of
specifications if you need to use questions from other sources; and
(h)
If the item was revised, recheck its relevance.
Copyright © Open University Malaysia (OUM)
TOPIC 3
PLANNING CLASSROOM TESTS 
59
In writing test items, you must also consider the length of the test as well as the
reading level of your students. You do not want students to feel rushed and
frustrated because they are not able to demonstrate their knowledge of the material
in the allotted time. Some general guidelines regarding time requirements for
secondary school student test takers are shown in Table 3.7.
Table 3.7: Allotment of Time for Each Type of Question
Task
Approximate Time per Item
True-false items
20ă30 seconds
Multiple choice (factual)
40ă60 seconds
Multiple choice (complex)
70ă90 seconds
Matching (five stems/six choices)
2ă4 minutes
Short answer
2ă4 minutes
Multiple choice (with calculations)
2ă5 minutes
Word problems (simple math)
5ă10 minutes
Short essays
15ă20 minutes
Data analysis/graphing
15ă25 minutes
Extended essays
35ă50 minutes
If you are combining multiple choice and essay items, these estimates may help
you decide how many of each type of items to include. One mistake often made
by many educators is having too many questions for the time allowed.
Once your items are developed, make sure that you include clear directions to the
students. For the objective items, specify that they should select one answer for
each item and indicate the point value of each question, especially if you are
weighting sections of the test differently. For essay items, indicate the point value
and suggested time to be spent on the item (we will discuss essay questions in
more detail in Topic 5). If you are teaching a large class with close seating
arrangements and are giving an objective test, you may want to consider
administering several versions of your test to decrease the opportunities for
cheating. You can create versions of your test with different arrangements of the
items.
More detailed guidelines to prepare and write multiple choice, short answer, truefalse, matching, essay, portfolios and projects will be discussed in Topics 4, 5, 6
and 7 respectively.
Copyright © Open University Malaysia (OUM)
60
 TOPIC 3
PLANNING CLASSROOM TESTS
ACTIVITY 3.3
To what extent do you agree with the allotment of time for each type of
question shown in Table 3.7?
Justify your answer by discussing it with your coursemates in the
myINSPIRE online forum.
3.2.6
Preparing Marking Schemes
Preparing a marking scheme well in advance of testing date will give teachers
ample time to review their questions and make changes to answers when
necessary.
The teacher should make it a habit to write a model answer which can be easily
understood by others. This model answer can be used by other teachers who act
as external examiners, if need be. For objective test items, the model answers are
simple. The marking scheme is just a list of answers with the marks alloted for
each. However, for essay items, the marking schemes can be a bit complicated and
require special skills and knowledge to prepare. The marking schemes may take
the form of a checklist, a rubric or a combination of both. Refer to Topic 5 for a
detailed explanation on the marking scheme.
Coordination on the use of marking schemes should be done once the test answer
scripts are collected. Teachers should try to read answers from some scripts and
review the correct answers in the marking scheme. Teachers may sometimes find
that students have interpreted the test question in a way that is different from what
is intended. Students may come up with excellent answers that may fall slightly
outside what has been asked. Consider giving these students marks accordingly.
Likewise, teachers should make a note in the marking scheme for any error made
earlier but carried through the answer; marks should be deducted if the rest of the
response is sound.
SELF-CHECK 3.3
Why is it neccessary for a test to be accompanied by a marking scheme?
Copyright © Open University Malaysia (OUM)
TOPIC 3
3.3
PLANNING CLASSROOM TESTS 
61
ASSESSING TEACHER’S OWN TEST
Regardless of the kind of tests teachers use, they can assess their effectiveness by
asking the following questions:
(a)
Did I Test for What I Thought I Was Testing for?
If you wanted to know whether students could apply a concept to a new
situation, but mostly asked questions determining whether they could label
parts or define terms, then you tested for recall rather than application.
(b)
Did I Test What I Taught?
For example, your questions may have tested the studentsÊ understanding of
surface features or procedures, while you had been lecturing on causation or
relation ă not so much what the names of the bones of the foot are, but how
they work together when we walk.
(c)
Did I Test for What I Emphasised in Class?
Make sure that you have asked most of the questions about the material you
feel is the most important, especially if you have emphasised it in class.
Avoid questions on obscure material that are weighted the same as questions
on crucial material.
(d)
Is the Material I Tested for Really What I Wanted Students to Learn?
For example, if you wanted students to use analytical skills such as the ability
to recognise patterns or draw inferences, but only used true-false questions
requiring non-inferential recall, you might try writing more complex truefalse or MCQs.
Students should know what is expected of them. They should be able to identify
the characteristics of a satisfactory answer and understand the relative importance
of those characteristics. This can be achieved in many ways: you can provide
feedback on tests, describe your expectations in class, or post model solutions on
a class blog. Teachers are encouraged to make notes on the scripts. When exams
are returned to the students, the notes will help them understand their mistakes
and correct them.
SELF-CHECK 3.4
Describe the steps involved in planning a test.
Copyright © Open University Malaysia (OUM)
62
 TOPIC 3
PLANNING CLASSROOM TESTS

The first step in test planning is to decide on the purpose of the test. Tests can
be used for many different purposes.

The next step is to consider the learning outcomes and their complexity levels
as defined by BloomÊs taxonomy. The teachers will have to select the
appropriate knowledge and skills to be assessed and include more questions
about the most important learning outcomes.

The learning outcomes that the teachers want to emphasise will determine not
only what material to include on the test, but also the specific form the test will
take.

Making a test blueprint or table of specifications is the next important step that
teachers should do.

The table describes the topics, the behaviour of the students, the number of
questions on the test corresponding to the number of hours devoted to the
topics in class.

The table of specifications helps to ensure that there is a match between what
is taught and what is tested.

Classroom assessment is driven by classroom teaching which itself is driven
by learning outcomes.

The test format used is one of the main driving factors in the studentsÊ learning
behaviour.
Checklist
Intended learning outcome (ILO)
Complexity levels of BloomÊs
taxonomy
Marking schemes
Course learning outcome (CLO)
Rubrics
Table of specifications
Hours of interaction
Copyright © Open University Malaysia (OUM)
Topic  How to Assess?
4
ă Objective
Tests
LEARNING OUTCOMES
By the end of the topic, you should be able to:

1.
Define an objective test and list the different types of objective tests;
2.
Construct short-answer questions;
3.
Construct multiple-choice questions;
4.
Develop true-false questions; and
5.
Prepare matching questions.
INTRODUCTION
In Topic 2, we discussed the need to assess students holistically based on cognitive,
affective and psychomotor learning outcomes, and in Topic 3, we looked at the
steps involved in planning a class test.
In this topic, we will focus on using objective tests in assessing various kinds of
behaviour in the classroom. Four types of objective tests are examined and the
guidelines for the construction of each type of test are discussed. The advantages
and limitations of these types of objective tests are explained too.
Copyright © Open University Malaysia (OUM)
64
 TOPIC 4
4.1
HOW TO ASSESS? – OBJECTIVE TESTS
WHAT IS AN OBJECTIVE TEST?
When objective tests were first used in 1845 by George Fisher in the United States,
they were not well-received by the society. However, over the years, they have
gained acceptance and are now widely used in schools, industries, businesses,
professional organisations, universities and colleges. In fact, they have become the
most popular format of assessing various types of human abilities, competencies
and socio-emotional attributes.
What is an objective test? An objective test is a written test consisting of items or
questions which require the respondent to answer by supplying a word, phrase or
symbol or by selecting from a list of possible answers. The former is referred to as
supply-type items while the latter is referred to as selection-type items. The
common supply-type items are short-answer questions and the selection-type
items are multiple-choice questions, true-false questions and matching questions.
The word objective means „accurate‰. An objective item or question is „accurate‰
because there is only one correct answer and the marking cannot be influenced by
the personal preferences and prejudices of the marker. In other words, it is not
„subjective‰ and not open to varying interpretations. This is one of the reasons
why the objective test is popular in measuring human abilities, competencies and
many other psychological attributes such as personality, interest and attitude.
Figure 4.1 describes how objective tests were used in Malaysian schools since their
inception and also how they are used today.
Figure 4.1: Objective tests in Malaysian schools
Copyright © Open University Malaysia (OUM)
TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS 
65
Objective tests vary depending on how the questions are presented. The four
common types of questions used in most objective tests are multiple-choice
questions, true-false questions, matching questions and short-answer questions
(refer to Figure 4.2).
Figure 4.2: Common formats of objective tests
4.2
MULTIPLE-CHOICE QUESTIONS (MCQS)
Let us take a look at one of the most popular objective tests which is the multiplechoice question.
4.2.1
What is a Multiple-choice Question?
Multiple-choice questions or MCQs are widely used in many different settings
because they can be used to measure low-level cognitive outcomes as well as more
complex cognitive outcomes. It is challenging to write test items to tap into higherorder thinking. All the demands of good item writing can only be met when test
writers have been well-trained. Above all, test writers need to have expertise in the
subject area being tested so they can gauge the difficulty and content coverage of
the test items.
Copyright © Open University Malaysia (OUM)
66
 TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS
Multiple-choice questions are the most difficult to prepare. These questions have
two parts:
(a)
A stem that contains the question; and
(b)
Four or five options with one containing the correct answer called the key
response. Three-option multiple-choice questions are also gaining
acceptance.
The other incorrect options are called distractors. The stem may be presented as a
question or a statement, while the options can be a word, phrase, numbers,
symbols and so forth. The role of the distractors is to attract the attention of
respondents who are not sure of the correct answer.
A traditional multiple-choice question (or item) is one in which a student chooses
one answer from a number of choices supplied (as illustrated in Figure 4.3).
Figure 4.3: Multiple-choice question
(a)
The stem should:
(i)
Be in the form of a question or statement to be completed;
(ii)
Be expressed clearly and concisely, avoiding poor grammar, complex
syntax, ambiguity and double negatives;
(iii) Generally present a positive question (if a negative is used, it should be
emphasised with italics or underlining);
(iv) Generally ask for one answer only (the correct or the best answer); and
(v)
Include as many of the words common to all alternatives as possible.
Copyright © Open University Malaysia (OUM)
TOPIC 4
(b)
HOW TO ASSESS? – OBJECTIVE TESTS 
67
The options or alternatives should:
(i)
Ensure that each item has either three, four or five alternatives, all of
which should be mutually exclusive and not too long;
(ii)
All follow grammatically from the stem and be parallel in grammatical
form;
(iii) Be unambiguous and expressed simply enough to make clear the
essential differences between them; and
(iv) Ensure that the intended answer or key be clearly correct to the
informed, while the distractors should be definitely incorrect, but
plausible.
SELF-CHECK 3.1
1.
What is an objective test?
2.
Why is multiple-choice questions (MCQs) test a popular form of
objective test?
4.2.2
Construction of Multiple-choice Questions
Did you know that MCQs test writing is a profession? By that, we mean that good
test writers are professionally trained in designing test items. Test writers have
knowledge of the rules of constructing items, but at the same time they have the
creativity to construct items that capture studentsÊ attention. Test items need to be
succinct but clear in meaning.
McKenna and Bull (1999) offered some guidelines for constructing stems for
multiple-choice questions. All the options in multiple-choice items need to be
plausible, but they also need to separate students of different ability levels. Let us
take a look at these guidelines.
(a)
When writing stems, present a single, definite statement to be completed or
answered by one of the several given choices (see Example 4.1).
Copyright © Open University Malaysia (OUM)
68
 TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS
Example 4.1:
Weak Question
Improved Question
World War II was:
A.
The result of the failure of the
League of Nations
In which of these time periods was
World War II fought?
A.
1914ă1917
B.
Horrible
B.
1929ă1934
C.
Fought in Europe, Asia and Africa
C.
1939ă1945
D.
Fought during the period of 1939ă
1945
D.
1951ă1955
Note: In the weak question, there is no clue from the stem to what the
question is asking. The improved version identifies more clearly the question
and offers the student a set of homogeneous choices.
(b)
When writing stems, avoid unnecessary and irrelevant material (see
Example 4.2).
Example 4.2:
Weak Question
Improved Question
For almost a century, the Rhine river has
been used by Europeans for a variety of
purposes. However, in recent years, the
increased river traffic has resulted in
increased levels of diesel pollution in the
waterway. Which of the following would
be the most dramatic result if, because of
the pollution, the Council of Ministers of
the European Community decided to
close the Rhine to all shipping?
Which of the following would be
the most dramatic result if, because
of diesel pollution from ships, the
river Rhine was closed to all
shipping?
A.
Increased prices for Ruhr products
B.
Shortage of water for Italian
industries
C.
Reduced competitiveness of the
French Aerospace Industry
D.
Closure of the busy river Rhine
ports of Rotterdam, Marseilles and
Genoa
A.
Increased prices for Ruhr
products
B.
Shortage of water for Italian
industries
C.
Reduced competitiveness of
the French Aerospace
Industry
D.
Closure of the busy river
Rhine ports of Rotterdam,
Marseilles and Genoa
Note: The weak question is too wordy and contains unnecessary material.
Copyright © Open University Malaysia (OUM)
TOPIC 4
(c)
HOW TO ASSESS? – OBJECTIVE TESTS 
69
When writing stems, use clear, straightforward language. Questions that are
constructed using complex wording may become a test of reading
comprehension rather than an assessment of studentÊs performance with
regard to a specific learning outcome (see Example 4.3).
Example 4.3:
Weak Question
As the level of fertility approaches its
nadir, what is the most likely
ramification for the citizenry of a
developing nation?
A.
A decrease in the labour force
participation rate of women
B.
A downward trend in the youth
dependency ratio
C.
A broader base in the population
pyramid
D.
An increased infant mortality rate
Improved Question
A major decline in fertility in a
developing nation is likely to
produce
A.
A decrease in the labour force
participation rate of women
B.
A downward trend in the
youth dependency ratio
C.
A broader base in the
population pyramid
D.
An increased infant mortality
rate
Note: In the improved question, the word „nadir‰ is replaced with „decline‰
and „ramifications‰ is replaced with „produce‰. These are more
straightforward words.
(d)
When writing stems, use negatives sparingly. If negatives must be used,
capitalise, underscore or bold (see Example 4.4).
Example 4.4:
Weak Question
Improved Question
Which of the following is not a symptom
of osteoporosis?
Which of the following is a symptom
of osteoporosis?
A.
Decreased bone density
A.
Decreased bone density
B.
Frequent bone fractures
B.
Raised body temperature
C.
Raised body temperature
C.
Painful joints
D.
Lower back pain
D.
Hair loss
Note: The improved question is stated in the positive so as to avoid the use
of the negative „not‰.
Copyright © Open University Malaysia (OUM)
70
 TOPIC 4
(e)
When writing stems, put as much of the question in the stem as possible,
rather than duplicating material in each of the options (see Example 4.5).
HOW TO ASSESS? – OBJECTIVE TESTS
Example 4.5:
Weak Question
Improved Question
Theorists of pluralism have asserted
which of the following?
A.
The maintenance of democracy
requires a large middle class
B.
The maintenance of democracy
requires autonomous centres of
countervailing power
C.
The maintenance of democracy
requires the existence of a
multiplicity of religious groups
D.
Theorists of pluralism have asserted
that the maintenance of democracy
requires
A.
A large middle class
B.
The separation of
governmental powers
C.
Autonomous centres of
countervailing power
D.
The existence of a multiplicity
of religious groups
The maintenance of democracy
requires the separation of
governmental powers
Note: In the improved question, the phrase „maintenance of democracy‰ is
included in the stem so as not to duplicate it in each option.
(f)
When writing stems, avoid giving away the answer because of grammatical
cues (see Example 4.6).
Example 4.6:
Weak Question
Improved Question
A fertile area in the desert in which the
water table reaches the ground surface is
called an
A fertile area in the desert in which
the water table reaches the ground
surface is called a/an
A.
Mirage
A.
Lake
B.
Oasis
B.
Mirage
C.
Lake
C.
Oasis
D.
Polder
D.
Polder
Note: The weak question uses the article „an‰ which identifies choice B as
the correct response. Ending the stem with „a/an‰ improves the question.
Copyright © Open University Malaysia (OUM)
TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS 
71
(g)
When writing stems, avoid asking an opinion as much as possible.
(h)
Avoid using the words „always‰ and „never‰ in the stem as test-wise
students are likely to rule such universal statements out of consideration.
(i)
When writing distractors for single response MCQs, make sure that there is
only one correct response (see Example 4.7).
Example 4.7:
Weak Question
Improved Question
What is the main source of pollution of
Malaysian rivers?
What is the main source of pollution
of Malaysian rivers?
A.
Land clearing
A.
Open burning
B.
Open burning
B.
Coastal erosion
C.
Solid waste dumping
C.
Solid waste dumping
D.
Coastal erosion
D.
Carbon dioxide emission
Note: In the weak question, both options A and C can be considered correct.
(j)
When writing distractors, use only plausible and attractive alternatives (see
Example 4.8).
Example 4.8:
Weak Question
Improved Question
Who was the third Prime Minister of
Malaysia?
Who was the third Prime Minister of
Malaysia?
A.
Hussein Onn
A.
Hussein Onn
B.
Ghafar Baba
B.
Abdullah Badawi
C.
Mahathir Mohamad
C.
Mahathir Mohamad
D.
Musa Hitam
D.
Abdul Razak Hussein
Note: In the weak question, B and D are not serious distractors.
Copyright © Open University Malaysia (OUM)
72
 TOPIC 4
(k)
When writing distractors, if possible, avoid the choices „All of the above‰
and „None of the above‰. If you do include them, make sure that they appear
as correct answers some of the time.
HOW TO ASSESS? – OBJECTIVE TESTS
It is tempting to resort to these alternatives but their use can be flawed. To
begin with, they often appear as an alternative that is not the correct
response. If you do use them, be sure that they constitute the correct answer
part of the time. An „All of the above‰ alternative can be exploited by a testwise student who will recognise it as the correct choice by identifying only
two correct alternatives. Similarly, a student who can identify one wrong
alternative can then rule this response out. Clearly, the studentÊs chance of
guessing the correct answer improves as he or she employs these techniques.
Although a similar process of elimination is not possible with „None of the
above‰, it is the case that when this option is used as the correct answer, the
question is only testing the studentÊs ability to rule out the wrong answers
and this does not guarantee that the student knows the correct one
(Gronlund, 1988).
(l)
Distractors based on common student errors or misconceptions are very
effective.
One technique for compiling distractors is to ask students to respond to
open-ended short answer questions, perhaps as formative assessments.
Identify which incorrect responses appear most frequently and use them as
distractors for a multiple-choice version of the question.
(m) Do not create distractors that are so close to the correct answer that they may
confuse students who really know the answer to the question. Distractors
should differ from the key in a substantial way, not just in some minor
nuance of phrasing or emphasis.
(n)
Provide a sufficient number of distractors.
You will probably choose to use three, four or five alternatives in a multiplechoice question. Until recently, it was thought that three or four distractors
were necessary for the item to be suitably difficult.
However, a study by Owen and Freeman suggests that three choices are
sufficient (Owen & Freeman, 1987). Clearly, the higher the number of
distractors, the less likely it is for the correct answer to be chosen through
guessing provided that all alternatives are of equal difficulty.
Copyright © Open University Malaysia (OUM)
TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS 
73
ACTIVITY 4.1
1.
Do you agree that you should not use negatives in the stems of
MCQs? Why?
2.
Do you agree that you should not use „All of the above‰ and
„None of the above‰ as distractors in MCQs? Why?
3.
Select 10 multiple-choice questions in your subject area and
analyse the distractors of each item using the guidelines
mentioned earlier.
4.
Suggest how you would improve weak distractors.
Share your answers with your coursemates in the myINSPIRE online
forum.
4.2.3
Advantages of Multiple-choice Questions
Multiple-choice questions are widely used to measure knowledge outcomes and
various types of learning outcomes. They are popular because of the following
reasons:
(a)
Learning outcomes from simple to complex can be measured;
(b)
There are highly structured and clear tasks are provided;
(c)
A broad sample of achievement can be measured;
(d)
Incorrect alternatives or options provide diagnostic information;
(e)
Scores are less influenced by guessing than true-false items;
(f)
Scores are more reliable than subjectively scored items (such as essays);
(g)
Scoring is easy, objective and reliable;
(h)
Item analysis can reveal how difficult each item was and how well it
discriminated between the stronger and weaker students in the class;
(i)
Performance can be compared from class to class and year to year;
Copyright © Open University Malaysia (OUM)
74
 TOPIC 4
(j)
Can cover a lot of material very efficiently (about one item per minute of
testing time); and
(k)
Items can be written so that students must discriminate among options that
vary in degree of correctness.
4.2.4
HOW TO ASSESS? – OBJECTIVE TESTS
Limitations of Multiple-choice Questions
While there are many advantages of using multiple-choice questions, there are also
many limitations in using such items. These limitations are:
(a)
Constructing good items is time-consuming;
(b)
It is frequently difficult to find plausible distractors;
(c)
MCQs are not as effective for measuring some types of problem-solving
skills and the ability to organise and express ideas;
(d)
Scores can be influenced by studentsÊ reading ability;
(e)
There is a lack of feedback on individual thought processes ă it is difficult to
determine why individual students selected incorrect responses;
(f)
Students can sometimes read more into the question than was intended;
(g)
It often focuses on testing factual information and fails to test higher levels
of cognitive thinking;
(h)
Sometimes, there is more than one defensible „correct‰ answer;
(i)
They place a high degree of independence on the studentÊs reading ability
and the constructorÊs writing ability;
(j)
Does not provide a measure of writing ability; and
(k)
May encourage guessing.
Copyright © Open University Malaysia (OUM)
TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS 
75
Last but not least, let us look at Figure 4.4 which highlights some procedural rules
when constructing multiple-choice questions.
Figure 4.4: Procedural rules for the construction of multiple-choice questions
SELF-CHECK 4.2
1.
What are some advantages of using multiple-choice questions?
2.
List some limitations or weaknesses of multiple-choice questions.
4.3
TRUE-FALSE QUESTIONS
The next type of objective test is the true-false question. Here, we will discuss the
rationale for its use as well as its limitations.
Copyright © Open University Malaysia (OUM)
76
 TOPIC 4
4.3.1
HOW TO ASSESS? – OBJECTIVE TESTS
What are True-False Questions?
In the most basic format, true-false questions are those in which a statement is
presented and the student indicates in some manner whether the statement is true
or false. In other words, there are only two possible responses for each item and
the student chooses between them. A true-false question is a specialised form of
the multiple-choice format in which there are only two possible alternatives. These
questions can be used when the test designer wishes to measure a studentÊs ability
to identify whether statements of fact are accurate or not. The true-false questions
can be used for testing knowledge and judgement in many subjects. When
grouped together, a series of true-false questions on a specific topic or scenario can
test a more complex understanding of an issue. They can be structured to lead a
student through a logical pathway and can reveal part of the thinking process
employed by the student in order to solve a given problem. Let us see
Example 4.9.
Example 4.9:
True
False
A whale is a mammal because it gives birth to its young.
4.3.2
Advantages of True-False Questions
True-false questions can be quickly written and can cover a lot of content. Truefalse questions are well-suited for testing studentsÊ recall or comprehension.
Students can generally respond to many questions covering a lot of content in a
fairly short amount of time. From the teacherÊs perspective, these questions can be
written quickly and are easy to score. Since they can be objectively scored, the
scores are more reliable than for items that are at least partially dependent on the
teacherÊs judgement. Generally, they are easier to construct compared to multiplechoice questions because there is no need to develop distractors. Hence, they are
less time-consuming compared to constructing multiple-choice questions.
Copyright © Open University Malaysia (OUM)
TOPIC 4
4.3.3
HOW TO ASSESS? – OBJECTIVE TESTS 
77
Limitations of True-False Questions
However, true-false questions have a number of limitations, notably:
(a)
Guessing
A student has a one in two chances of guessing the correct answer of a
question. Scores on true-false items tend to be high because of the ease of
guessing correct answers when the answers are not known. With only two
choices (true or false), the student can expect to guess correctly on half of the
items for which correct answers are not known. Thus, if a student knows the
correct answers to 10 questions out of 20 and guesses on the other 10, the
student can expect a score of 15. The teacher can anticipate scores ranging
from approximately 50 per cent for a student who did nothing but guess on
all items to 100 per cent for a student who knew the material.
(b)
Tendency to Use the Original Text
Since these items are in the form of statements, there is sometimes a tendency
to take quotations from the text, expecting the student to recognise a correct
quotation or note a change (sometimes minor) in wording. There may also
be a tendency to include trivial or inconsequential material from the text.
Both of these practices are discouraged.
(c)
Difficult to Set
It can be difficult to write a statement which is unambiguously true or false,
particularly for complex material.
(d)
Unable to Discriminate Different Abilities
The format does not discriminate among students of different abilities as well
as other question types.
Copyright © Open University Malaysia (OUM)
78
 TOPIC 4
4.3.4
HOW TO ASSESS? – OBJECTIVE TESTS
Suggestions for Constructing True-False
Questions
Here are some suggestions for constructing true-false questions:
(a)
Include only one main idea in each item (see Example 4.10).
Example 4.10:
Poor Item
Better Item
The study of biology helps us
understand living organisms and
predict the weather.
(b)
The study of biology helps
understand living organism.
us
As in multiple-choice questions, use negatives sparingly. Avoid also double
negatives as they tend to contribute to the ambiguity of the statement.
Statement words like none, no and not should be avoided as far as possible
(see Example 4.11).
Example 4.11:
(c)
Poor Item
Better Item
None of the steps in the experiment
were unnecessary.
All the steps in the experiment were
necessary.
Avoid broad, general statements. Most of these statements are false unless
qualified (see Example 4.12).
Example 4.12:
Poor Item
Better Item
Short-answer questions are more
favourable than essay questions in
testing.
Short-answer questions are more
favourable than essay questions in
testing factual information.
Copyright © Open University Malaysia (OUM)
TOPIC 4
(d)
HOW TO ASSESS? – OBJECTIVE TESTS 
79
Avoid long complex sentences. Such sentences also test reading
comprehension besides the achievement to be measured (see Example 4.13).
Example 4.13:
Poor Item
Better Item
Despite
the
theoretical
and
experimental difficulties of determining
the exact pH value of a solution, it is
possible to determine whether a
solution is acidic by the red colour
formed on litmus paper when inserted
into the solution.
Litmus paper turns red in an acidic
solution.
(e)
Try using true-false questions in combination with other materials, such as
graphs, maps and written material. This combination allows for the testing
of more advanced learning.
(f)
Avoid lifting statements directly from assigned readings, notes or other
course materials so that recall alone will not lead to a correct answer.
(g)
In general, avoid the use of words which would signal the correct response
to the test-wise student. Absolutes such as „none‰, „never‰, „always‰, „all‰
and „impossible‰ tend to be false, while qualifiers such as „usually‰,
„generally‰, „sometimes‰ and „often‰ are likely to be true.
(h)
A similar situation occurs with the use of „can‰ in a true-false statement. If
the student knows of a single case in which something „can‰ be done, it
would be true.
(i)
Ambiguous or vague statements and terms, such as „largely‰, „long time‰,
„regularly‰, „some‰ and „usually‰ are best avoided in the interest of clarity.
Some terms have more than one meaning and may be interpreted differently
by individuals.
(j)
True statements should be about the same length as false statements. There
is a tendency to add details in true statements to make them more precise.
(k)
Word the statement so precisely that it can be judged unmistakably as either
true or false.
(l)
Statements of opinion should be attributed to some source.
(m) Keep statements short and use simple language structure.
Copyright © Open University Malaysia (OUM)
80
 TOPIC 4
(n)
Avoid verbal clues (specific determiners) that can indicate the answer.
(o)
Test important ideas rather than trivia.
(p)
Do not present items in an easily learned pattern.
HOW TO ASSESS? – OBJECTIVE TESTS
MATCHING QUESTIONS
4.4
Matching questions are used in measuring a studentÊs ability to identify the
relationship between two lists of terms, phrases, statements, definitions, dates,
events, people and so forth. In addition, one matching question can replace several
true-false questions.
4.4.1
Construction of Matching Questions
In developing matching questions, you have to identify two columns of material
listed vertically. The items in Column A (or I) are usually called premises and
assigned numbers (1, 2, 3 and so on) while items in Column B (or II) are called
responses and designated capital letters (A, B, C and so on). The student reads a
premise (Column A) and finds the correct response from among those in Column
B. The student then prints the letter of the correct response in the blank besides the
premise in Column A.
An alternative is to have the student draw a line from the correct response to the
premise, but this is more time consuming to score. One way to reduce the
possibility of guessing correct answers is to list a larger number of responses
(Column B) than premises (Column A), as shown in Example 4.14:
Example 4.14:
Directions: Column A contains statements describing selected Asian cities. For
each description, find the appropriate city in Column B. Each city in Column B can
be used only once.
Column A
1.
2.
3.
4.
5.
Column B
Ancient capital of Thailand: ___________
Largest city in Sumatera: ______________
Capital of Myanmar: _________________
Formerly known as Saigon: ___________
Former capital of Pakistan: ____________
A.
B.
C.
D.
E.
F.
G.
Ayutthaya
Ho Chi Minh City
Karachi
Medan
Yangon
Hanoi
Surabaya
Copyright © Open University Malaysia (OUM)
TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS 
81
Another way to decrease the possibility of guessing is to allow responses to be
used more than once. Directions to the students should be very clear about the use
of responses.
Some psychometricians suggest that no more than five to eight premises
(Column A) in one set are given. For each premise, the student has to read through
the entire list of responses (or those still unused) to find the matching response.
For this reason, the shorter elements should be in Column B, rather than Column
A, to minimise the amount of reading needed for each item. Responses (Column
B) should be listed in logical order if there is one (chronological, by size and so on).
If there is no apparent order, the responses should be listed alphabetically.
Premises (Column A) should not be listed in the same order as the responses. Care
must be taken to ensure that the association keyed as the correct response is
unquestionably correct and that the numbered item could not be rightly associated
with any other choice.
4.4.2
Advantages of Matching Questions
Like other types of assessments, there are advantages and disadvantages to
matching questions as well. Let us go through the advantages first.
(a)
Matching questions are particularly good at assessing a studentÊs
understanding of relationships. They can test recall by requiring a student to
match the following elements (McBeath, 1992):
(i)
Definitions ă Terms;
(ii)
Historical events ă Dates;
(iii) Achievements ă People;
(iv) Statements ă Postulates; and
(v)
(b)
Descriptions ă Principles.
They can also assess a studentÊs ability to apply knowledge by requiring a
test-taker to match the following:
(i)
Examples ă Terms;
(ii)
Functions ă Parts;
(iii) Classifications ă Structures;
Copyright © Open University Malaysia (OUM)
82
 TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS
(iv) Applications ă Postulates; and
(v)
Problems ă Principles.
(c)
Matching questions format is really a variation of the multiple-choice format.
If you find that you are writing MCQs which share the same answer choices,
you may consider grouping the questions into a matching item.
(d)
Matching questions are generally easy to write and score when the content
is tested and objectives are suitable for matching questions.
(e)
Matching questions are highly efficient as a large amount of knowledge can
be sampled in a short amount of time.
4.4.3
Limitations of Matching Questions
There are also limitations when using this type of assessment, such as:
(a)
Matching questions are limited to material that can be listed in two columns
and there may not be much material that lends itself to such a format;
(b)
If there are four items in a matching question and the student knows the
answer for three of them, the fourth item is a giveaway through elimination;
(c)
Difficult to differentiate between effective and ineffective items;
(d)
Often leads to testing of trivial facts or bits of information; and
(e)
Often criticised for encouraging rote memorisation.
4.4.4
Suggestions for Constructing Good Matching
Questions
When assessing students, we must prepare quality questions. Here are some
suggestions for constructing good matching questions:
(a)
Provide clear directions. They should explain how many times responses can
be used;
(b)
Keep the information in each column as homogeneous as possible;
(c)
Include more responses than premises or allow the responses to be used
more than once;
(d)
Put the items with more words in Column A;
Copyright © Open University Malaysia (OUM)
TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS 
83
(e)
Correct answers should not be obvious to those who do not know the content
being taught;
(f)
There should not be keywords appearing in both the premise and response,
providing clues to the correct answer; and
(g)
All of the responses and premises for a matching item should appear on the
same page.
SELF-CHECK 4.3
1.
What are some advantages of matching questions?
2.
List some limitations of matching questions.
4.5
SHORT-ANSWER QUESTIONS
A short-answer question is basically a supply-type item. It exists in two formats,
namely direct question and completion question formats. The following are
examples of short-answer questions (refer to Table 4.1):
Table 4.1: Direct Question versus Completion Question
Direct Question
Completion Question

Who was the first Prime Minister of
Malaysia? (Answer: Tunku Abdul
Rahman)

The first Prime Minister of Malaysia
was ___________. (Answer: Tunku
Abdul Rahman)

What is the value of x in the equation
2x + 5 = 9? (Answer: 2)

In the equation 2x + 5 = 9, x =
__________________. (Answer: 2)
You may refer to Nitko (2004) for more examples.
Copyright © Open University Malaysia (OUM)
84
 TOPIC 4
4.5.1
HOW TO ASSESS? – OBJECTIVE TESTS
Strengths and Weaknesses of Short-answer
Questions
The short-answer questions are generally used to measure simple learning
outcomes. It is used almost exclusively to measure memorised information (except
for learning outcomes on problem-solving in Mathematics and Science). This has
partly made the short-answer question one of the easiest to construct.
Another strength of the short-answer questions is that the possibility of guessing
which often occurs in the selection-type item can be reduced. In this case, learners
must supply the correct answer when they respond to the question. They must
either recall the information asked for or make the necessary computations to
obtain the answer. They cannot rely on their partial knowledge to choose the
correct answer from the list of alternatives.
Many short-answer questions can be set for a specific period of time. A test paper
of short-answer questions is thus able to cover a fairly wide coverage of content of
the course to be assessed. This enhances the content validity of the test.
One major weakness of the short-answer questions is that it cannot be used to
measure complex learning outcomes such as organising ideas, presenting an
argument or evaluating information. What is required of learners is simply
providing a word, phrase or symbol.
Scoring of answers to the short-answer questions can also pose a problem. Unless
the question is carefully phrased, learners can provide answers of varying degree
of correctness. For example, the answer to a question such as „When was Malaysia
formed?‰ could either be „In 1963‰ or „On 16 September 1963‰. The teacher has to
decide whether learners who give the partial answer have the same level of
knowledge as those who provide the complete answer.
Besides, learnersÊ answers can also be contaminated by spelling errors. If spelling
is taken into consideration, the test scores of learners will reflect their level of
knowledge of the content assessed as well as their spelling ability. If spelling is not
considered in the scoring, the teacher has to decide whether the misspelled word
actually represents the correct answer.
Copyright © Open University Malaysia (OUM)
TOPIC 4
4.5.2
HOW TO ASSESS? – OBJECTIVE TESTS 
85
Guidelines on Constructing Short-answer
Questions
Although the construction of short-answer questions is comparatively easier than
other types of objective items, they have a variety of defects which should be
avoided to ensure that they will function as intended. The following are some
guidelines for the construction of short-answer questions.
(a)
Word the question so that the intended answer is brief and specific. As far as
possible, the question should be phrased in such a way that only one answer
is correct (see Example 4.15).
Example 4.15:
(b)
Poor Item
Better Item
An animal that eats the flesh of other
animals is ____________.
An animal that eats the flesh of other
animals is classified as ______________.
(Possible answers: a wolf, a lion,
hungry,...etc)
(One specific answer: carnivorous)
Use direct questions instead of incomplete statements. The meaning of the
items is often clearer if they are phrased as direct questions (see
Example 4.16).
Example 4.16:
Poor Item
Better Item
The author of Alice in Wonderland was
______________.
(Possible answers: a story writer, a
mathematician, an Englishman, and
buried in 1898)
What is the pen name of the author of
Alice in Wonderland?
(Answer: Lewis Carroll)
Copyright © Open University Malaysia (OUM)
86
 TOPIC 4
(c)
If the question requires a numerical answer, indicate the units in which the
answer is to be expressed (see Example 4.17).
HOW TO ASSESS? – OBJECTIVE TESTS
Example 4.17:
Poor Item
When
did
America?
Colombus
Better Item
discover
(Possible answers: the 15th century,
1492)
(d)
In what year did Colombus discover
America?
(1492)
For an incomplete statement type of question, put the blank towards the end
of the sentence (see Example 4.18).
Example 4.18:
Poor Item
Better Item
____________ is the capital of Malaysia.
The
capital
_____________.
of
Malaysia
is
(Answer: Kuala Lumpur)
(e)
For an incomplete statement type of question, limit blanks to one or two. If
there are more than two blanks in a statement, the question becomes
unintelligible or ambiguous (see Example 4.19).
Example 4.19:
Poor Item
Better Item
_________ and __________ are two
methods of scoring _________.
Two different methods of scoring essay
tests are the __________ and
___________ methods.
(Answers: analytic, holistic)
Copyright © Open University Malaysia (OUM)
TOPIC 4
(f)
HOW TO ASSESS? – OBJECTIVE TESTS 
87
Avoid irrelevant clues (see Example 4.20).
Example 4.20:
Poor Item
Better Item
A specialist in urban planning is called
an ___________.
A specialist in city planning is called
a(n) ____________.
(Answer: urbanist)
(Answer: urbanist)
(g)
Do not copy statements verbatim from textbooks. When you copy material,
you encourage students to do rote memorisation.
(h)
A completion item should omit important words, not trivial words. Use the
item to assess a studentÊs knowledge of an important fact or concept.
(i)
Keep all the blanks of completion items the same length so as not to cue the
students to the possible answer.
SELF-CHECK 4.4
1.
What are the strengths of short-answer questions?
2.
Elaborate on some weaknesses of short-answer questions.
ACTIVITY 4.2
1.
Select five true-false questions in your subject area and analyse
each item using the guidelines mentioned earlier.
2.
Select five matching questions in your subject area and analyse
each item using the guidelines mentioned earlier.
3.
Suggest how you would improve the weak items for each type of
question.
Share your answers with your coursemates in the myINSPIRE online
forum.
Copyright © Open University Malaysia (OUM)
88
 TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS

An objective test is a written test consisting of items or questions which require
the respondent to select from a list of possible answers. An objective item or
question is „accurate‰ because it cannot be influenced by the personal
preferences and prejudices of the marker.

Objective tests vary depending on how the questions are presented. The three
common types of questions used in most objective tests are multiple-choice
questions, matching questions and true-false questions.

Multiple-choice questions have two parts: a stem that contains the question,
and three, four or five options with one of them containing the correct answer.
The correct option is called the key response and incorrect options are called
distractors.

Multiple-choice questions are widely used because they can measure learning
outcomes from simple to complex. They are highly structured and clear tasks
are provided to test a broad sample of what has been learnt.

Multiple-choice questions, however, are difficult to construct, tend to measure
low-level learning outcomes, lend themselves to guessing and do not measure
studentsÊ writing ability.

True-false questions are those in which a statement is presented and the
student indicates whether the statement is true or false.

True-false questions can be written quickly and are easy to score. Since they
can be objectively scored, the scores are more reliable than for items that are at
least partially dependent on the teacherÊs judgement.

Avoid lifting statements directly from assigned readings, notes or other course
materials so that recall alone will not lead to a correct answer.

Matching questions are used in measuring a studentÊs ability to identify the
relationship between two lists of terms, phrases, statements, definitions, dates,
events, people and so forth.

To reduce the possibility of guessing correct answers in matching questions,
list a larger number of responses than premises and allow responses to be used
more than once.
Copyright © Open University Malaysia (OUM)
TOPIC 4
HOW TO ASSESS? – OBJECTIVE TESTS 
89

In writing test items, you must consider the length of the test or examination
as well as the reading level of your students.

The two types of short-answer questions are direct questions and completion
questions.
Allotment of time
Objective tests
Alternatives
Premises
Distractors
Responses
Guessing
Short-answer questions
Matching questions
Stem
Multiple-choice questions
True-false questions
Gronlund, N. E. (1988). How to construct achievement tests. Englewood Cliffs, NJ:
Prentice Hall.
McBeath, R. (1992). Instructing and evaluating in higher education: A guidebook
for planning learning outcomes. Englewood Cliffs, NJ: Educational
Technology.
McKenna, C., & Bull, J. (1999). Designing effective objective test questions: An
introductory workshop. Retrieved from https://bit.ly/2It9v8K
Nitko, A. J. (2004). Educational assessment of students (4th ed.). New Jersey, NJ:
Pearson.
Owen, S. V., & Freeman, R. D. (1987). WhatÊs wrong with three option multiple
items? Educational & Psychological Measurement, 47, 513ă522.
Copyright © Open University Malaysia (OUM)
Topic  How to Assess?
5
ă Essay Tests
LEARNING OUTCOMES
By the end of the topic, you should be able to:

1.
Define and list the criteria for an essay question;
2.
Explain the formats of essay tests;
3.
List the advantages and limitations of essay questions;
4.
Construct well-written essay questions that assess learning outcomes
given; and
5.
Describe different types of marking schemes for essays.
INTRODUCTION
In Topic 4, we discussed in detail the use of objective tests in assessing students.
In this topic, we will examine a different type of test called the essay test. The essay
test is a popular technique for assessing learning and is used extensively at all
levels of education.
It is also widely used in assessing learning outcomes in business and professional
examinations. Essay questions are used because they challenge students to create
their own responses rather than simply selecting a response. Essay questions have
the potential to reveal studentsÊ abilities to reason, create, analyse and synthesise,
which may not be effectively assessed using objective tests.
Copyright © Open University Malaysia (OUM)
TOPIC 5
5.1
HOW TO ASSESS? – ESSAY TESTS 
91
WHAT IS AN ESSAY QUESTION?
According to Stalnaker (1951), an essay is „a test item which requires a response
composed by the examinee usually in the form of one or more sentences of a nature
that no single response or pattern of responses can be listed as correct, and the
accuracy and quality of which can be judged subjectively only by one skilled or
informed in the subject.‰ Though the definition was provided a long time ago, it is
a comprehensive definition. Elaborating on this definition, Reiner, Bothell,
Sudweeks and Wood (2002) argued that to qualify as an essay question, it should
meet the following four criteria:
(a)
The learner has to compose rather than select his or her response or answer.
In essay questions, students have to construct their own answer and decide
on what material to include in their response. Objective test questions (MCQ,
true-false, matching) on the other hand, require students to select the answer
from a list of possibilities.
(b)
The response or answer the learner provides will consist of one or more
sentences. Students do not respond with a „yes‰ or „no‰ but instead have to
respond in the form of sentences. In theory, there is no limit to the length of
the answer. However, in most cases, its length is predetermined by the
demand of the question and the time limit allotted for the test question.
(c)
There is no one single correct response or answer. In other words, the
question should be composed so that it does not ask for one single correct
response. For example, the question „Who killed JWW Birch?‰ assesses
verbatim recall or memory and not the ability to think. Hence, it cannot
qualify as an essay question. You can modify the question „Who killed JWW
Birch? Explain the factors that led to the killing.‰ Now, this is an essay
question that assesses studentsÊ ability to think and give reasons for the
killing supported with relevant evidence.
(d)
The accuracy and quality of studentsÊ responses or answers to essay
questions must be judged subjectively by a specialist in the subject. The
nature of essay questions is such that only specialists in the subject can judge
to what degree responses (or answers) to an essay question are complete,
accurate and relevant. Good essay questions encourage students to think
deeply about their answers that can be judged only by someone with
appropriate experience and expertise in the content area. Thus, content
expertise is essential for both writing and grading essay tests. For example,
the question „List three reasons for the opening of Penang by the British in
1789‰ requires students to recall a set list of items. The person marking or
grading the essay does not have to be a subject matter expert to know
Copyright © Open University Malaysia (OUM)
92
 TOPIC 5
HOW TO ASSESS? – ESSAY TESTS
whether the student has listed the three reasons correctly as long as the list
of three reasons is available as an answer key. For the question „To what
extent is commerce the main reason for the opening of Penang by the British
in 1789?‰, a subject matter expert is needed to grade or mark the answer to
this essay test question.
5.2
FORMATS OF ESSAY TESTS
Essay formats are usually classified into two groups: restricted response essay
questions and extended response essay questions. Both types are useful tools but
for different purposes.
(a)
Restricted Response Essay Questions
Restricted response essay questions restrict or limit both the content and the
form of studentsÊ answers. The following are three examples:
(i)
Discuss two advantages and two disadvantages of essay questions in
measuring studentsÊ performance.
(ii)
List five guidelines for writing good essay items. For each guideline,
write a short statement explaining why it is useful in improving the
validity of essay assessment.
(iii) Distinguish the formative assessment from the summative assessment
in terms of their aims, the timing of the implementation and the content
coverage.
As shown in the examples, students are specifically informed what and how
they should respond to the questions. They indicate the number of points
required and/or the scope of the responses. The restriction or limitation on
the studentsÊ responses can also be done by including an interpretative
material (e.g. a graph, a paragraph describing a particular problem or an
extract from a literary work) and students are asked to respond to one or two
questions based on it.
The restricted response questions are more structured and are useful for
measuring learning outcomes requiring the interpretation and application of
knowledge in a specific area. They narrow the focus of the assessment task
to a specific and well-defined performance. The nature of these questions
makes it more likely that the students will interpret each question the way it
is intended. The teacher is also in a better position to assess the correctness
of studentsÊ answers when a question is focused and all students interpret it
in the same way. When the teacher is clear about what makes up correct
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS 
93
answers, it improves scoring reliability and the scoresÊ validity. Although
restricting studentsÊ responses makes it possible to measure more specific
learning outcomes, these same restrictions make them less valuable as a
measure of those learning outcomes emphasising integration, organisation
and originality. For higher-order learning outcomes, greater freedom of
response is needed.
(b)
Extended Response Essay Questions
Extended response essay questions provide less structure and this promotes
greater creativity, integration and organisation of material. The following are
three examples:
(i)
Examine to what extent essay questions are effective in measuring
studentsÊ performance.
(ii)
Evaluate the usefulness of multiple-choice questions as an assessment
tool in education.
(iii) „Research without theory is blind.‰ Discuss.
In responding to extended response essay questions, students are free to select any
information that they think pertinent, to organise the answer in accordance with
their best judgement, to integrate and to evaluate ideas they deem appropriate.
This freedom enables them to demonstrate their ability to analyse problems,
organise their ideas, describe in their own words, and/or develop a coherent
argument. The extended-response essay questions are therefore useful in assessing
higher-order thinking skills. They can also be used to assess writing skills.
The freedom for students to respond to extended response essay questions can
cause some problems. First, there is usually no single correct answer to the
question. Students are free to choose the way to respond, and the degree of
correctness or merit of their answers can only be judged by a skilled subject-matter
expert. A large number of examiners is required if the assessment involves a big
student population. Inter-rater reliability in scoring can be an issue. Second, the
same freedom that enables the demonstration of creative expression and other
higher-order thinking skills makes the extended response essay question
inefficient for measuring more specific learning outcomes. Third, the extended
response essay questions require good writing skills on the part of the students.
This type of question is thus disadvantageous to students whose writing skills are
poor. Due to these limitations, it is often recommended that more restricted
response essay questions to be used in place of extended response essay questions.
Copyright © Open University Malaysia (OUM)
94
 TOPIC 5
HOW TO ASSESS? – ESSAY TESTS
ACTIVITY 5.1
Select a few essay questions that have been used in tests or examinations.
To what extent do these questions meet the criteria of an essay question
as defined by Stalnaker (1951) and elaborated by Reiner et al. (2002)?
Discuss with your coursemates in the myINSPIRE online forum.
5.3
ADVANTAGES OF ESSAY QUESTIONS
Essay questions are used to assess learning because of the following reasons:
(a)
Essay questions provide an effective way of assessing complex learning
outcomes. They allow one to assess studentsÊ ability to synthesise, organise
and express ideas, and evaluate the worth of ideas. These abilities cannot be
effectively assessed directly with other paper-and-pencil test items.
(b)
Essay questions allow students to demonstrate their reasoning. These
questions not only allow students to present an answer to a question but also
to explain how they have arrived at their conclusions. This allows teachers
to gain insight into a studentÊs way of viewing and solving problems. With
such insight, teachers can detect problems which students may have with
their reasoning process and help them overcome these problems.
(c)
Essay questions provide authentic experiences. Constructing responses is
closer to real life than selecting responses as in the case of objective tests.
Problem solving and decision making are vital life competencies which
require the ability to construct a solution or decision rather than selecting a
solution or decision from a limited set of possibilities. In the work
environment, it is unlikely that an employer will give a list of „four options‰
for a worker to choose from when the latter is asked to solve a problem. In
most cases, the worker will be required to construct a response.
Copyright © Open University Malaysia (OUM)
TOPIC 5
5.4
HOW TO ASSESS? – ESSAY TESTS 
95
DECIDING WHETHER TO USE ESSAY
QUESTIONS OR OBJECTIVE QUESTIONS
Keep in mind that essay questions should strive for higher-order thinking skills.
Therefore, the decision whether to use essay questions or objective questions in
examinations can be problematic for some educators. In such a situation, one has
to go back to the objectives of assessment. What kinds of learning outcomes do you
intend to assess? Essay questions are generally suitable to assess:
(a)
StudentsÊ understanding of subject matter or content; and
(b)
Thinking skills that require more than simple verbatim recall of information
by challenging the students to reason with their knowledge.
It is challenging to write test items to tap into higher-order thinking. However,
studentsÊ understanding of subject matter or content, and many of the other
higher-order thinking skills, can also be assessed through objective items. When in
doubt about whether to use an essay question or an objective question, just
remember that essay questions are used to assess studentsÊ ability to construct
rather than select answers.
To determine what type of test (essay or objective) to use, it is helpful that you
examine the verb(s) that best describe the desired ability to be assessed (refer to
Topic 2).
These verbs indicate what students are expected to do and how they should
respond. They serve to focus on the studentsÊ responses and channel them towards
the performance of specific tasks. Some verbs clearly indicate that students need
to construct rather than select their answer (such as to explain). Other verbs
indicate that the intended learning outcome is focused on studentsÊ ability to recall
information (such as to list). Perhaps, recall is best assessed through objectively
scored items. Verbs that test for understanding of subject matter or content or other
forms of higher-order thinking, but do not specify whether the student is to
construct or select the response (such as to interpret) can be assessed either by
essay questions or objective items.
Copyright © Open University Malaysia (OUM)
96
 TOPIC 5
HOW TO ASSESS? – ESSAY TESTS
ACTIVITY 5.2
Compare, explain, arrange, apply, state, classify, design, illustrate,
describe, name, complete, choose, defend and name. Decide which of the
verbs in the list are best assessed by essay questions or objective tests or
both objective and essay questions.
Post your answer on the myINSPIRE online forum.
5.5
LIMITATIONS OF ESSAY QUESTIONS
While essay questions are popular because they enable the assessment of higherorder learning outcomes, this format of evaluating students in examinations has a
number of limitations which should be kept in mind.
(a)
One purpose of testing is to assess a studentÊs mastery of subject matter. In
most cases, it is not possible to assess the studentÊs mastery of the complete
subject matter domain with just a few questions. Because of the time it takes
for students to respond to essay questions and for markers to mark studentsÊ
responses, the number of essay questions that can be included in a test is
limited. Therefore, using essay questions will limit the degree to which the
test is representative of the subject matter domain, thereby reducing content
validity. For instance, a test of 80 multiple-choice questions will most likely
cover more of the content domain than a test of three to four essay questions.
(b)
Essay questions have limitations in reliability. While essay questions allow
students some flexibility in formulating their responses, the reliability of
marking or grading is questionable. Different markers or graders may vary
in their marking or grading of the same or similar responses (inter-scorer
reliability) and one marker can vary significantly in his or her marking or
grading consistency across questions depending on many factors (intrascorer reliability). Therefore, essay answers of similar quality may receive
notably different scores. Characteristics of the learner, length and legibility
of responses, and personal preferences of the marker or grader with regard
to the content and structure of the response are some of the factors that may
lead to unreliable marking or grading.
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS 
97
(c)
Essay questions require more time for marking student responses. Teachers
need to invest a large amount of time to read and mark studentsÊ responses
to essay questions. On the other hand, relatively little or no time is required
for teachers to score objective test items like multiple-choice items and
matching exercises.
(d)
As mentioned earlier, one of the strengths of essay questions is that they
provide students with authentic experiences because students are challenged
to construct rather than select their responses. To what extent does the short
time normally allotted to test affect student response? Students have
relatively little time to construct their responses and this time limit does not
allow them to give appropriate attention to the complex process of
organising, writing and reviewing their responses. In fact, in responding to
essay questions, students use a writing process that is quite different from
the typical process that produces excellent writing (draft, review, revise and
evaluate). In addition, students usually have no resources to aid their writing
when answering essay questions (dictionary or thesaurus). This
disadvantage may offset whatever advantage accrued from the fact that
responses to essay questions are more authentic than responses to multiplechoice items.
5.6
MISCONCEPTIONS ABOUT ESSAY
QUESTIONS IN EXAMINATIONS
Other than the limitations of essay questions discussed earlier, there are also some
misconceptions about this form of assessment. These misconceptions are:
(a)
By Their Very Nature, Essay Questions Assess Higher-order Thinking
Whether or not an essay item assesses higher-order thinking depends on the
design of the question and how studentsÊ responses are scored. Not all essay
questions can assess higher-order thinking skills. Indeed, it is possible to
write essay questions that simply assess recall. Also, if a teacher designs an
essay question meant to assess higher-order thinking but then scores
studentsÊ responses in a way that only rewards recall ability, that teacher is
not assessing higher-order thinking. Therefore, teachers must be welltrained to design and write higher-order thinking questions.
Copyright © Open University Malaysia (OUM)
98
 TOPIC 5
(b)
Essay Questions are Easy to Construct
Essay questions are easier to construct than multiple-choice items because
teachers do not have to create effective distractors. However, that does not
mean that good essay questions are easy to construct. They may be easier to
construct in a relative sense, but they still require a lot of effort and time.
Essay questions that are hastily constructed without much thought and
review usually function poorly.
(c)
The Use of Essay Questions Eliminates the Problem of Guessing
One of the drawbacks of objective test items is that students sometimes
get the right answer by guessing which of the presented options is correct.
This problem does not exist with essay questions because students need
to generate the answer rather than identifying it from a set of options
provided. At the same time, the use of essay questions introduces bluffing,
another form of guessing. Some students are „good‰ at using various
methods of bluffing (vague generalities, padding, name-dropping) to add
credibility to an otherwise weak answer. Thus, the use of essay questions
changes the nature of the guessing that occurs, but does not eliminate it.
(d)
Essay Questions Benefit All Students by Placing Emphasis on the Importance
of Written Communication Skills
Written communication is a life competency that is required for effective and
successful performance in many vocations. Essay questions challenge
students to organise and express subject matter and problem solutions in
their own words, thereby giving them a chance to practise written
communication skills that will be helpful to them in future vocational
responsibilities. At the same time, the focus on written communication skills
is also a serious disadvantage for students who have marginal writing skills
but know the subject matter being assessed. If students who are
knowledgeable in the subject obtain low scores because of their inability to
write well, the validity of the test scores will be diminished.
(e)
Essay Questions Encourage Students to Prepare More Thoroughly
Some research seems to indicate that students are more thorough in
their preparation for examinations using essay questions than in their
preparation for objective examinations such as those using multiple-choice
questions. However, after an extensive review of existing literature and
research on this topic, Crooks (1988) concluded that studentsÊ extent of
preparation is based more on the expectations teachers set upon them
(higher-order thinking and breadth and depth of content) than the type of
test questions they expect to be given in examinations.
HOW TO ASSESS? – ESSAY TESTS
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS 
99
SELF-CHECK 5.1
1.
What are some limitations in the use of essay questions?
2.
List some of the misconceptions about essay questions.
ACTIVITY 5.3
Compare the following two essay questions and decide which one
assesses higher-order thinking skills.
(a)
„What are the major advantages and limitations of solar energy?‰
(b)
„Given its advantages and limitations, should governments spend
money developing solar energy?‰
Post your answer on the myINSPIRE online forum.
5.7
GUIDELINES ON CONSTRUCTING ESSAY
QUESTIONS
When constructing essay questions, whether they are for coursework assessments
or examinations, the most important thing is to ensure that students have a clear
idea of what they are expected to do after they have read the question or problem
presented.
Here are specific guidelines that can help you improve existing essay questions
and create new ones.
(a)
Clearly Define the Intended Learning Outcome to be Assessed by the
Question
Knowing the intended learning outcome is crucial for designing essay
questions. In specifying the intended learning outcome, teachers clarify the
performance that students should be able to demonstrate as a result of what
they have learnt. The intended learning outcome typically begins with a verb
that describes an observable behaviour or action that students should
demonstrate. The focus is on what students should and should not be able to
do in the learning or teaching process. Reviewing a list of verbs can help to
clarify what ability students should demonstrate, thereby defining the
intended learning outcome to be assessed (refer to subtopic 4.8).
Copyright © Open University Malaysia (OUM)
100  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS
(b)
Avoid Using Essay Questions for Intended Learning Outcomes that are
Better Assessed with Other Kinds of Assessment
Some types of learning outcomes can be more efficiently and more reliably
assessed with objective tests than with essay questions. Since essay questions
sample a limited range of subject matter or content, are more timeconsuming to score and involve greater subjectivity in scoring, the use of
essay questions should be reserved for learning outcomes that cannot be
better assessed by some other means. Let us look at Example 5.1.
Example 5.1:
Learning Outcome:
To be able to differentiate the reproductive habits of birds and amphibians.
Essay Question:
What are the differences in egg laying characteristics between birds and
amphibians?
Note: This learning outcome can be better assessed by an objective test.
Objective Item:
Which of the following differences between birds and amphibians is correct?
Birds
(c)
Amphibians
A
Lay a few eggs at a time
Lay many eggs at a time
B
Lay eggs
Give birth
C
Do not incubate eggs
Incubate eggs
D
Lay eggs in nest
Lay eggs on land
Clarity About the Task and Scope
Essay questions have two variable elements ă the degree to which the task is
structured and the degree to which the scope of the content is focused. There
is still confusion among educators as to whether more structure (of the task
required) and more focus (on the content) are better than less structure and
less focus. When the task is more structured and the scope of content is more
focused, two problems are reduced:
(i)
The problem of student responses containing ideas that were not meant
to be assessed; and
(ii)
The problem of extreme subjectivity when scoring student answers or
responses.
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS  101
Although more structure helps to avoid these problems, how much and what
kind of structure and focus to provide are dependent on the intended
learning outcome that is to be assessed by the essay question. The process of
writing effective essay questions involves defining the task and delimiting
the scope of the content in an effort to create an effective question that is
aligned with the intended learning outcome to be assessed by it (as
illustrated in Figure 5.1).
Figure 5.1: Alignment between content, learning activities
and assessment tasks
Source: Phillips, Ansary Ahmed and Kuldip Kaur (2005)
This alignment is absolutely necessary for obtaining studentsÊ responses that
can be accepted as evidence that a student has achieved the intended learning
outcome. Hence, the essay question must be carefully and thoughtfully
written in such a way that it elicits student responses that provide the teacher
with valid and reliable evidence about the studentsÊ achievement of the
intended learning outcome. Failure to establish adequate and effective limits
for studentsÊ answers to the question may result in students setting their own
boundaries for their responses. This means that students might provide
answers that are outside the intended task or address only a part of the
intended task. If this happens, then the teacher is left with unreliable and
invalid information about the studentsÊ achievement of the intended learning
outcome. Also, there is no basis for marking or grading studentsÊ answers.
Therefore, it is the responsibility of the teacher to write essay questions in
such a way that they provide students with clear boundaries for their
answers or responses. Let us look at Example 5.2.
Copyright © Open University Malaysia (OUM)
102  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS
Example 5.2: Improving Clarity of Task and Scope of Essay Questions
Weak Essay Question:
Evaluate the impact of the Industrial Revolution on England.
The verb is „evaluate‰, which is the task the student is supposed to do. The
scope of the question is the impact of the Industrial Revolution on England.
Very little guidance is given to students about the task of evaluating and the
scope of the task. A student reading the question may ask:
(i)
The impact on what in England? The economy? Foreign trade? A
particular group of people? (The scope is not clear.)
(ii)
Evaluate based on what criteria? The significance of the revolution? The
quality of life in England? Progress in technological advancements?
(The task is not clear.)
(iii) What exactly do you want me to do in my evaluation? (The task is not
clear.)
Improved Essay Question:
Evaluate the impact of the Industrial Revolution on the quality of family life
in England. Explain whether families were able to provide for the education
of their children.
The improved question determines the task for students by specifying a
particular unit of society in England affected by the Industrial Revolution
(family). The task is also determined by giving students a criterion for
evaluating the impact of the Industrial Revolution (whether or not families
were able to provide for their childrenÊs education). Students are clearer
about what must be done to „evaluate‰. They need to explain how family life
has changed and judge whether or not the changes are an improvement for
the children.
SELF-CHECK 5.2
1.
When would you decide to use an objective item rather than an
essay question to assess learning?
2.
What is the difference between the task and the scope of an essay
question?
Copyright © Open University Malaysia (OUM)
TOPIC 5
(d)
HOW TO ASSESS? – ESSAY TESTS  103
Questions that are Fair
One of the challenges that teachers face in composing essay questions is that
because of their extensive experience with the subject matter, they may be
tempted to demand unreasonable content expertise on the part of the
students. Hence, teachers need to make sure that their students can „be
expected to have adequate material with which to answer the question‰
(Stalnaker, 1951). In addition, teachers should ask themselves if students can
be expected to adequately perform the thought processes which are required
of them in the task. For assessment to be fair, teachers need to provide their
students with sufficient instruction and practice in the subject matter
required for the thought processes to be assessed.
Another important element is to avoid using indeterminate questions. A
question is indeterminate if it is so unstructured that students can redefine
the problem and focus on some aspect of it with which they are thoroughly
familiar or if experts in the subject matter cannot agree that one answer is
better than another. One way to avoid indeterminate questions is to stay
away from vocabulary that is ambiguous. For example, teachers should
avoid using the verb „discuss‰ in an essay question. This verb is simply too
broad and vague. Moreover, teachers should also avoid including
vocabulary that is too advanced for students.
(e)
Specify the Approximate Time Limit and Marks Allotted to Each Question
Specifying the approximate time limit helps students allocate their time in
answering several essay questions. Without such guidelines, students may
feel at a loss as to how much time to spend on a question. When deciding the
guidelines for how much time should be spent on a question, keep the slower
students and students with certain disabilities in mind. Also make sure that
students can be realistically expected to provide an adequate answer in the
given and/or suggested time. Similarly, state the marks allotted to each
question so that students can estimate how much they should write to
answer the question.
(f)
Use Several Relatively Short Essay Questions Rather than One Long
Question
Only a very limited number of essay questions can be included in a test
because of the time it takes for students to respond to them and the time it
takes for teachers to grade the studentsÊ responses. This creates a challenge
with regard to designing valid essay questions. Shorter essay questions are
better suited to assess the depth of student learning within a subject, whereas
longer test essay questions are better suited to assess the breadth of student
learning within a subject. Hence, there is a trade-off when choosing between
several short essay questions or one long question. Focus on assessing the
Copyright © Open University Malaysia (OUM)
104  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS
depth of student learning within a subject limits the assessment of the
breadth of student learning within the same subject. Meanwhile, focus on
assessing the breadth of student learning within a subject limits the
assessment of the depth of student learning within the same subject. When
choosing between using several short essay questions or a long question, also
keep in mind that short essays are generally easier to mark than long essays.
(g)
Avoid the Use of Optional Questions
Students should not be permitted to choose one essay question to answer
from two or more optional questions. The use of optional questions should
be avoided for the following reasons:
(i)
Students may waste time deciding on an option; and
(ii)
Some questions are likely to be harder which could make the
comparative assessment of studentsÊ abilities unfair.
The issue of the use of optional questions is debatable. It is often practised,
especially in higher education and students often demand that they be given
choices. The practice is acceptable if it can be assured that the questions have
equivalent difficulty levels and the tasks as well as the scope required by the
questions are equivalent.
Last but not least, let us improve the essay questions through preview and review.
Improving Essay Questions Through Preview and Review
The following steps can help you improve the essay item before and after you
administer it to your students.
PREVIEW (before handing out the essay question to the students)
Predict StudentsÊ Responses
Try to respond to the question from the perspective of a typical student.
Evaluate whether students have the content knowledge and the skills
necessary to adequately respond to the question. After detecting possible
weaknesses of the essay questions, repair them before handing them out in the
exam.
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS  105
Write a Model Answer
Before using a question, write model answer(s) or at least an outline of major
points that should be included in an answer. Writing the model answer allows
reflection on the clarity of the essay question. Furthermore, the model answer
serves as a basis for the grading of student responses. Once the model answer
has been written, compare its alignment with the question and the intended
learning outcome, and make changes as needed to assure that the intended
learning outcome, the question and the model answer are aligned with one
another.
Before using the question in a test, ask a knowledgeable person in the subject
to critically review the essay question, the model answer and the intended
learning outcome to determine how well they are aligned with each other.
REVIEW (after receiving the student responses)
Review StudentsÊ Responses to the Essay Question
After students have answered the questions, carefully review the range of
answers given and the manner in which students seem to have interpreted the
question. Make revisions based on the findings. Writing good essay questions
is a process that requires time and practice. Carefully studying the studentsÊ
responses can help to evaluate studentsÊ understanding of the question as well
as the effectiveness of the question in assessing the intended learning
outcomes.
Copyright © Open University Malaysia (OUM)
106  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS
In addition, you can use a checklist as shown in Figure 5.2 to check your essay
questions.
Figure 5.2: A checklist for writing essay questions
SELF-CHECK 5.3
1.
Why should you specify the time allotted for answering each
question?
2.
Why should you avoid optional questions?
3.
What is meant when it is said that questions should be „fair‰?
4.
What should you do before and after administering a test?
Copyright © Open University Malaysia (OUM)
TOPIC 5
5.8
HOW TO ASSESS? – ESSAY TESTS  107
VERBS DESCRIBING VARIOUS KINDS OF
MENTAL TASKS
Using the list suggested by Moss and Holder (1988), and Anderson and Krathwohl
(2001), Reiner et al. (2002) proposed the following list of verbs that describe mental
tasks to be performed (refer to Table 5.1).
Table 5.1: Verbs, Definitions and Examples
Verbs
Definitions
Examples
Analyse
Break material into its constituent
parts and determine how the parts
relate to one another and to an
overall structure or purpose.
Analyse the meaning of the line „He
saw a dead crow, in a drain, near the
post office‰ in the poem The Dead
Crow.
Apply
Decide
which
abstractions
(concepts, principles, rules, laws,
theories,
generalisations)
are
relevant in a problem situation.
Apply the principles of supply and
demand to explain why the
consumer price index (CPI) in
Malaysia has increased in the last
three months.
Attribute
Determine a point of view, bias, Determine the point of view of the
value or intent underlying the author in the article about her
presented material.
political perspective.
Classify
Determine which category belongs Classify
the
organisms
into
to something.
vertebrates and invertebrates.
Compare
Identify and describe points of Compare the role of the Dewan
similarity.
Rakyat and Dewan Negara.
Compose
Make or form by combining Compose an effective plan for
things, parts or elements.
solving flooding problems in Kuala
Lumpur.
Contrast
Bring out the points of difference.
Create
Put elements together to form a Create a comprehensive solution for
coherent or functional whole, the traffic problems in Kuala
reorganise elements into a new Lumpur.
pattern or structure.
Contrast the contribution of Tun
Hussein Onn and Tun Abdul Razak
Hussein to the political stability of
Malaysia.
Copyright © Open University Malaysia (OUM)
108  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS
Critique
Detect
consistencies
and Judge which of the two methods is
inconsistencies between a product the best way for reducing high
and relevant external criteria; absenteeism in the workplace.
detect the appropriateness of a
procedure for a given problem.
Defend
Develop and present an argument Defend the decision to raise fuel
to support a recommendation, to prices by the government.
maintain or revise a policy,
programme or propose a course of
action.
Define
Give the meaning of a word or Define
the
concept; place it in the class to weathering".
which it belongs and distinguish it
from other items in the same class.
Describe
Give an account of; tell or depict in Describe the contribution of ZaÊba
words; represent or delineate by a to the development of Bahasa
word picture.
Melayu.
Design
Devise
a
procedure
accomplishing some task.
term
„chemical
for Design an experiment to prove that
21 per cent of air is composed of
oxygen.
Differentiate Distinguish
relevant
from Distinguish between supply and
irrelevant parts or important from demand in determining price.
unimportant parts of presented
material.
Explain
Make clear the cause or reason of Explain the causes of the First
something; construct a cause-and- World War.
effect model of a system; tell
„how‰ to do; tell the meaning of.
Evaluate
Make judgements based on criteria Evaluate the contribution of the
and standards; determine the microchip in telecommunications.
significance, value, quality or
relevance of; give the good points
and the bad ones; identify and
describe the advantages and
limitations.
Generate
Come
up
with
alternative Generate hypotheses to account for
hypotheses, examples, solutions, an observed phenomenon.
proposals based on criteria.
Identify
Recognise as being a particular Identify the characteristics of the
person or thing.
Mediterranean climate.
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS  109
Illustrate
Use a word picture, a diagram, a Illustrate the use of catapults in the
chart or a concrete example to amphibious warfare of Alexander.
clarify a point.
Infer
Draw a logical conclusion from What can you infer happened in the
presented information.
experiment?
Interpret
Give the meaning of; change from Interpret the poetic line, „The sound
one form of representation (such as of a cobweb snapping is the noise of
numerical) to another (such as my life.‰
verbal).
Justify
Show good reasons for; give your Justify the American entry into the
evidence; present facts to support Second World War.
your position.
List
Create a series of names or other List the major functions of the
items.
human heart.
Predict
Know or tell beforehand with Predict the outcome of a chemical
precision
of
calculation, reaction.
knowledge or shrewd inference
from facts or experience what will
happen.
Propose
Offer for consideration, acceptance Propose a solution for landslides
or action; suggest.
along the North-South Highway.
Recognise
Locate knowledge in long-term Recognise the important events in
memory that is consistent with the road to independence in
presented material.
Malaysia.
Recall
Retrieve relevant knowledge from Recall the dates of important events
long-term memory.
in Islamic history.
Summarise
Sum up; give the main points Summarise the ways in which man
briefly.
preserves food.
Trace
Follow the course of; follow the Trace the development of television
trail of; give a description of in school instruction.
progress.
The definitions specify thought processes a person must perform to complete the
mental tasks. Note that this list is not exhaustive and local examples have been
introduced to illustrate the mental tasks required in each essay question.
Copyright © Open University Malaysia (OUM)
110  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS
ACTIVITY 5.4
Discuss the following with your coursemates in the myINSPIRE online
forum:
(a)
Select some essay questions in your subject area and examine
whether the verbs used are similar to those in the list given in
Table 5.1. Do you think the tasks required by the verbs used are
appropriate? Justify.
(b)
Do you think students are able to differentiate between the tasks
required in the verbs listed? Justify.
(c)
Are teachers able to describe to students the tasks required by using
these verbs? Explain.
5.9
MARKING AN ESSAY
Marking or grading of essays is a notoriously unreliable activity. If we read an
essay at two different times, the chances are high that we will give the essay a
different grade each time. If two or more of us read the essay, our grades will likely
differ, often dramatically so. We all like to think we are exceptions, but study after
study of well-meaning and conscientious teachers show that essay grading is
unreliable (Ebel, 1972; McKeachie, 1987). Eliminating the problem is unlikely, but
we can take steps to improve grading reliability. Using a scoring guide or marking
scheme helps control the shifting of standards that inevitably take place as we read
a collection of essays and papers. The common types of marking scheme used in
scoring studentsÊ responses to essay questions are diagrammatically presented as
follows (refer to Figure 5.3):
Figure 5.3: Types of marking scheme
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS  111
A marking scheme may take the form of a checklist, a rubric or a combination of
both.
(a)
Checklist
In a checklist, a score is awarded for every correct or relevant point in a
response. The sum of these individual scores provides the final score of the
response. Table 5.2 is an example of a checklist.
Table 5.2: Sample of a Checklist
Reference
Suggested
answers
Topic 5, Section 5.7, p. 74
Strengths
 Essay questions provide an effective way of assessing complex
learning outcomes.
 Essay questions allow students to demonstrate their reasoning
and creativity.
 Essay questions provide authentic experiences because students
are given the opportunity to organise, write and review their
responses.
 Guessing is very much reduced.
(Accept any other appropriate answers.)
Marks
allocation
Award 1 mark for each point. (1 mark  4 = 4 marks)
This marking scheme can be used to assess studentsÊ responses to an essay
question that ask for the strengths of essay questions as an assessment tool.
A checklist is easy to use. The teacher just needs to read through the studentÊs
response and checks the number of points for the calculation of marks. A
checklist is useful to assess factual content and it is relatively easy to
construct. The teacher just needs to present a list of points required in the
response and decide on the marks for each point. However, a checklist with
a list of points does not provide for the assessment of intangible learning
outcomes such as „to discuss‰, „to evaluate‰ or „to explain‰ and other
complexity levels of BloomÊs taxonomy. It also has limited feedback for
formative purposes and students cannot use it as a guide for writing
assignments.
Copyright © Open University Malaysia (OUM)
112  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS
(b)
Rubric
The two most common approaches used in scoring rubrics are the holistic
and the analytic methods.
(i)
Holistic Method (Global or Impressionistic Marking)
The holistic approach to scoring essay questions involves reading an
entire response and assigning it to a category identified by a score or
grade. This method involves considering the studentÊs answer as a
whole and judging the total quality of the answer relative to other
studentsÊ responses or the total quality of the answer based on certain
criteria that have been developed.
Think of it as sorting into bins. You read the answer to a particular
question and assign it to the appropriate bin. The best answers go into
the „exemplary‰ bin, the good ones go into the „good‰ bin and the
weak answers go into the „poor‰ bin (refer to Table 5.3).
Table 5.3: Sample of a Marking Scheme Using the Holistic Method
Level of Achievement
7ă8
(Exemplary)
Descriptor
 Addresses the question
 States a relevant argument
 Presents arguments in a logical order
 Uses acceptable style and grammar (no errors)
5ă6
(Good)
 Combination of above traits, but less consistently
represented (few errors)
3ă4
(Adequate)
 Does not address the question explicitly, though does
so tangentially
 States a somewhat relevant argument
 Presents some arguments in a logical order
 Uses adequate style and grammar (some errors)
1ă2
 Does not address the question
(Poor)
 States no relevant arguments
 Is not clearly or logically organised
 Fails to use acceptable style and grammar
0
 Irrelevant response or no answer
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS  113
Then, points are written on each paper appropriate to the bin it is in. It
is based on an overall impression. The holistic method is also referred
to as global or impressionistic marking.
One of the strengths of holistic rubric is that studentsÊ responses can be
scored quite quickly. The teacher needs to read through the studentÊs
response and decide in which band of scores the response lies. This
rubric can provide an overview of student performance but it does not
provide detailed information about studentÊs performance. It may be
difficult to provide an overall score to the studentÊs response.
How best can a teacher use the holistic method in scoring studentsÊ
responses? Before he or she starts marking, the teacher can develop a
description of the type of response that would illustrate each category,
and then try out this draft version using several actual papers. After
reading and categorising all of the papers, it is a good idea to reexamine the papers within a category to see if they are similar enough
in quality to receive the same points or grade. It may be faster to read
essays holistically and provide only an overall score or grade, but
students do not receive much feedback about their strengths and
weaknesses. Some instructors who use holistic scoring also write brief
comments on each paper to point out one or two strengths and/or
weaknesses so students will have a better idea of why their responses
received the scores they did.
(ii)
Analytic Method
The analytic method of marking is the system most frequently used in
large-scale public examinations and also by teachers in the classroom.
Its basic tool is a two-dimensional table with the performance criteria
down the vertical column on the left and the performance levels across
the top row. The cells then present the performance descriptors as
shown in Table 5.4.
Copyright © Open University Malaysia (OUM)
Table 5.4: Sample of a Marking Scheme Using the Analytic Method
114  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS  115
The holistic scoring gives students a single, overall assessment score for
the response as a whole. The analytic scoring provides students with at
least a rating score for each criterion. For example, based on the rubric,
a studentÊs response may get 3 points for focus/organisation, 2 points
for elaboration and 4 points for mechanics, giving a total of 9 marks.
Alternatively, an analytic rubric may take the form of a weighted
rubric, whereby different weights (value) are assigned to different
criteria and include an overall achievement by totalling the criteria.
Refer to Table 5.5 for a sample of a weighted analytic rubric.
Copyright © Open University Malaysia (OUM)
Table 5.5: Sample of a Marking Scheme Using the Weighted Analytic Method
116  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS  117
To use the rubric, the performance level achieved by the student is
multiplied by the weight to give a score for each criterion. For example,
for focus/organisation, the score is 3  1.25 = 3.75, for elaboration, the
score is 2  1.25 = 2.5 and for mechanics the score is 4  0.5 = 2.0. This
gives the student a total of 8.25 marks out 12.
The analytic rubric provides more detailed feedback on areas of
strength and weakness because the performance criteria are given and
each criterion can be weighted to reflect its relative importance in the
studentÊs response. Generic rubrics which are not task specific can also
be a useful aid to learning. Students can use them too as a guide to
doing the assignments. As shown in Table 5.5, the performance
descriptors are stated in general terms, and do not give away the
answers. However, it takes more time to create and use than a holistic
rubric. Moreover, it is important that each point for each criterion is
well-defined. Otherwise, different raters may not arrive at the same
score.
5.10
SUGGESTIONS FOR MARKING ESSAYS
Here are some suggestions for marking or scoring essays:
(a)
Grade the papers anonymously. This will help control the influence of our
expectations of the student on the evaluation of the answer.
(b)
Read and score the answers to one question before going on to the next
question. In other words, score all the studentsÊ responses to Question 1
before looking at Question 2. This helps to keep one frame of reference and
one set of criteria in mind through all the papers, which results in more
consistent grading. It also prevents an impression that we form in reading
one question from carrying over to our reading of the studentÊs next answer.
(c)
If a student has not done a good job on the first question, we may let this
impression influence our evaluation of the studentÊs second answer.
However, if other studentsÊ papers come in between, we are less likely to be
influenced by the original impression.
(d)
If possible, try to grade all the answers to one particular question without
interruption. Our standards might vary from morning to night or one day to
the next.
Copyright © Open University Malaysia (OUM)
118  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS
(e)
Shuffle all the papers after each item is scored. Changing the order of papers.
this way reduces the context effect and the possibility that a studentÊs score
may be the result of the location of the paper in relationship to other papers.
If RakeshÊs „B‰ work is always following JamalÊs „A‰ work, then it might
look more like „C‰ work and his grade would be lower than if his paper was
somewhere else in the stack.
(f)
Decide in advance how you are going to handle extraneous factors and be
consistent in applying the rule. Students should be informed about how you
treat such things as misspelled words, neatness, handwriting, grammar and
so on.
(g)
Be on the alert for bluffing. Some students who do not know the answer may
write a well-organised coherent essay but one containing material irrelevant
to the question. Decide how to treat irrelevant or inaccurate information
contained in the studentsÊ answers. We should not give credit for irrelevant
material. It is not fair to other students who may also have preferred to write
on another topic, but instead wrote on the required question.
(h)
Write comments on the studentsÊ answers. Teacher comments make essay
tests a good learning experience for students. They also serve to refresh your
memory of your evaluation should the student question the grade given.
(i)
Be aware of the order in which papers are marked which can have an impact
on the grades awarded. A marker may grow more critical (or more lenient)
after having read several papers, thus the early papers may receive lower (or
higher) marks than papers of similar quality that are scored later.
(j)
Also, when students are directed to take a stand on a controversial issue, the
marker must be careful to ensure that the evidence and the way it is
presented is evaluated, not the position taken by the student. If the student
takes a position which differs from that of the marker, the marker must be
aware of his or her own possible bias in marking the essay.
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS  119
ACTIVITY 5.4
1.
Compare the analytical method and holistic method of marking
essays.
2.
Which method is widely practised in your institution? Why?
3.
Do you think there would be a difference in marking an answer
using the two methods? Justify your answer.
Post your answers on the myINSPIRE online forum.

An essay question is a test item which requires a response composed by the
examinee usually in the form of one or more sentences of a nature that no single
response or pattern of responses can be listed as correct, and the accuracy and
quality of which can be judged subjectively only by one skilled or informed in
the subject matter.

There are two types of essays based on their function: restricted response and
extended response essay questions.

Essay questions provide an effective way of assessing complex learning
outcomes.

Essay questions provide authentic experiences because constructing responses
are closer to real life than selecting responses.

It is not possible to assess a studentÊs mastery of the complete subject matter
domain with just a few questions.

Essay questions have two variable elements ă the degree to which the task is
structured and the degree to which the scope of the content is focused.

Whether or not an essay item assesses higher-order thinking depends on the
design of the question and how studentsÊ responses are scored.

Specifying the approximate time limit helps students allocate their time in
answering several essay questions.
Copyright © Open University Malaysia (OUM)
120  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

Avoid using essay questions for intended learning outcomes that are better
assessed with other kinds of assessment.

Analytical marking is the system most frequently used in large-scale public
examinations and also by teachers in the classroom. Its basic tool is the marking
scheme with proper mark allocations for elements in the answer.

The holistic approach to scoring essay questions involves reading an entire
response and assigning it to one of several categories, each given a score or
grade.
Analytical method
Holistic method
Checklist
Marking scheme
Complex learning outcomes
Mental tasks
Constructed responses
Model answer
Essay
Rubric
Grading
Time consuming
Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching,
and assessing: A revision of BloomÊs taxonomy of educational objectives.
Boston, MA: Allyn & Bacon.
Crooks, T. J. (1988). The impact of classroom evaluation practices on
students. Review of Educational Research, 58(4), 438ă481.
Ebel, R. L. (1972). Essentials of educational measurement. Oxford, England:
Prentice-Hall.
McKeachie, W. J. (1987). Can evaluating instruction improve teaching? New
Directions for Teaching and Learning, 31(1987), 3ă7.
Copyright © Open University Malaysia (OUM)
TOPIC 5
HOW TO ASSESS? – ESSAY TESTS  121
Moss, A., & Holder, C. (1988). Improving student learning: A guidebook for faculty
in all disciplines. Dubuque, IO: Kendall/Hunt.
Phillips, J. A., Ansary Ahmed, & Kuldip Kaur. (2005). Instructional design
principles in the development of an e-learning graduate course. Paper
presented at The International Conference in E-Learning. Bangkok, Thailand.
Reiner, C. M., Bothell, T. W., Sudweeks, R. R., & Wood, B. (2002). Preparing
effective essay questions. Stillwater, OK: New Forums Press.
Stalnaker, J. M. (1951). The essay type examination. In E. F. Lindquist (Ed.),
Educational measurement (pp. 495ă530). Menasha, WI: George Banta.
Copyright © Open University Malaysia (OUM)
Topic  Authentic
6
Assessment
LEARNING OUTCOMES
By the end of the topic, you should be able to:

1.
Define authentic assessment;
2.
Explain how to use authentic assessment;
3.
Explain the advantages and disadvantages of authentic assessment;
4.
Describe the characteristics of authentic assessment; and
5.
Compare authentic assessment with traditional assessment.
INTRODUCTION
Many teachers use traditional assessment tools such as multiple-choice tests and
essay type tests to assess their students. How well do these multiple-choice or
essay tests really evaluate studentsÊ understanding and achievement? These
traditional assessment tools do serve a role in the assessment of student outcomes.
However, assessment does not always have to involve paper and pencil, but can
instead be in the form of a project, an observation or a task that shows a student
has learnt the material. Are these alternative assessments more effective than the
traditional ones?
Some classroom teachers are using testing strategies that do not focus entirely on
recalling facts. Instead, they ask students to demonstrate the skills and concepts
they have learnt. Teachers may want to ask the students to learn how to apply their
skills to authentic tasks and projects or to have students demonstrate the
Copyright © Open University Malaysia (OUM)
TOPIC 6
AUTHENTIC ASSESSMENT  123
application of their knowledge in real life. The students must then be trained to
perform meaningful tasks that replicate real-world challenges. In other words,
students are asked to perform a task rather than select an answer from a readymade list.
This strategy of asking students to perform real-world tasks that demonstrate
meaningful application of essential knowledge and skills is called authentic
assessment. Let us learn more about authentic assessment in the following
subtopics.
ACTIVITY 6.1
The following are two assessment procedures, A and B. Which is an
authentic assessment and which is traditional assessment?
Assessment A
Students are asked to take a paper-and-pencil test on how to prepare for
MCQs in an examination paper.
Assessment B
Students are asked to prepare for MCQs in an examination paper,
administer it to a class of 30 students and then write a report.
Justify your answer in the myINSPIRE online forum.
6.1
WHAT IS AUTHENTIC ASSESSMENT IN
THE CLASSROOM?
Authentic assessment, in contrast to the more traditional assessment, encourages
the integration of teaching, learning and assessing. In the „traditional assessment
model‰, teaching and learning are often separated from assessment. A test is
administered after knowledge or skills have been acquired. Authentic assessment
usually includes a task for students to perform and a rubric by which their
performance on the task will be assessed. Thus doing science experiments, writing
stories and reports, and solving mathematical problems that have real-world
applications can all be considered as examples of authentic assessment. Useful
achievement data can be obtained via authentic assessment.
Copyright © Open University Malaysia (OUM)
124  TOPIC 6 AUTHENTIC ASSESSMENT
Teachers can teach students mathematics, history and science, not just know them.
Then, to assess what the students had learnt, teachers can ask students to perform
the tasks that „replicate the challenges‰ faced by those using mathematics, history
or conducting a scientific investigation. Well-designed traditional classroom
assessments such as tests and quizzes can effectively determine whether or not
students have acquired a body of knowledge.
In contrast, authentic assessments ask students to demonstrate understanding by
performing a more complex task usually representative of more meaningful
application. These tasks involve asking students to analyse, synthesise and apply
what they have learnt in a substantial manner and students create new meaning
in the process as well. In short, authentic assessment helps answer the question,
„How well can you use what you know?‰ but traditional testing helps answer the
question, „Do you know it?‰
The usual or traditional classroom assessment such as multiple-choice tests and
short-answer tests are just as important as the authentic assessment. In fact, the
authentic assessment complements the traditional assessment. Authentic
assessment has been gaining acceptance among early childhood and primary
school teachers where traditional assessment may not be appropriate.
6.2
ALTERNATIVE NAMES FOR AUTHENTIC
ASSESSMENT
Did you know that authentic assessment is sometimes referred to as performance
assessment, alternative assessment and direct assessment?
It is called performance assessment or performance-based assessment because
students are asked to perform meaningful tasks. Performance assessment is, „a test
in which the test taker actually demonstrates the skills the test is intended to
measure by doing real-world tasks that require those skills, rather than by
answering questions asking how to do them‰ (Vander Ark, 2013). Project-based
learning (PBL) and portfolio assignments are examples of performance
assessment. With performance assessment, teachers observe students while they
are performing in the classroom, and judge the level of proficiency demonstrated.
As authentic tasks are rooted in curriculum, teachers can develop tasks based on
what already works for them. Through this process, evidence-based assignments
such as portfolios become more authentic and more meaningful to students.
Copyright © Open University Malaysia (OUM)
TOPIC 6
AUTHENTIC ASSESSMENT  125
The term alternative assessment is sometimes used because authentic assessment
is an alternative to traditional assessments. Using checklists and rubrics in self and
peer evaluation, students participate actively in evaluating themselves and one
another. Alternative assessments measure performance in ways other than
traditional paper-and-pencil, and short-answer tests. For example, a Klang Valley
Science teacher may ask the students to identify the different pollutants in the
Klang River and make a report to the local environmental council.
Direct assessment is so-called because authentic assessments are direct measures
that provide more direct evidence of meaningful application of knowledge and
skills. If a student does well on a multiple-choice test, we might infer indirectly
that the student could apply that knowledge in real-world contexts as well; but we
would be more comfortable making that inference from a direct demonstration of
that application such as in the example mentioned earlier, river pollutants. We do
not just want students to know the content of the disciplines when they leave
school; we want them to apply other knowledge and skills they have learnt. Direct
evidence of student learning is tangible, visible, and measureable and tends to be
more compelling evidence of exactly what students have and have not learnt.
Teachers can directly look at studentsÊ work or performances to determine what
they have learnt.
6.3
HOW TO USE AUTHENTIC ASSESSMENT?
Authentic assessments focus on the learning process, sound instructional
practices, and high-level thinking skills and proficiencies needed for success in the
real world, and, therefore, may offer students who have been exposed to them
huge advantages over those who have not. This helps students see themselves as
active participants, who are working on a task of relevance, rather than passive
recipients of obscure facts. It helps teachers by encouraging them to reflect on the
relevance of what they teach and provides results that are useful for improving
instruction.
The following lists the steps which you can take to create your own authentic
assessment:
(a)
Identify which standards you want your students to meet through this
assessment;
(b)
Choose a relevant task for this standard or set of standards, so that students
can demonstrate how they have or have not met the standards;
Copyright © Open University Malaysia (OUM)
126  TOPIC 6 AUTHENTIC ASSESSMENT
(c)
Define the characteristics of good performance on this task. This will provide
useful information regarding how well students have met the standards; and
(d)
Create a rubric or set of guidelines for students to follow so that they are able
to assess their work as they perform the assigned task.
Brady (2012) suggested some examples of authentic assessment strategies which
include the following:
(a)
Exhibit an athletic skill;
(b)
Produce a short musical, dance or drama;
(c)
Publish a class brochure;
(d)
Perform a role, an oral presentation or an artistic display;
(e)
Plan or draw conceptual mind maps or flow charts;
(f)
Demonstrate the use of ICT tools such as webpages creation or video editing;
(g)
Construct models;
(h)
Produce creative writing;
(i)
Peer teaching, evaluating teacher-student feedback; and
(a)
Attempt unstructured tasks like problem-solving, open-ended questions,
formal and informal observations.
6.4
ADVANTAGES OF AUTHENTIC
ASSESSMENT
According to Wiggins (1990), while standardised, multiple-choice tests can be
valid indicators of academic performance, tests often mislead students into
believing that learning requires cramming and mislead teachers into believing
tests are after-the-fact, contrived and irrelevant.
A move towards more authentic tasks and outcomes improves teaching and
learning. In this respect, authentic assessment has many benefits, but the main
benefits are as follows:
(a)
Authentic assessment provides parents and community members with
directly observable products and understandable evidence concerning their
childrenÊs performance. The quality of studentÊs work is more discernible to
laypeople than when we must rely on abstract statistical figures.
Copyright © Open University Malaysia (OUM)
TOPIC 6
AUTHENTIC ASSESSMENT  127
(b)
Authentic assessment uses tasks that reflect normal classroom activities or
real-life learning as means for improving instruction; thus, allowing teachers
to plan a comprehensive, developmentally-oriented curriculum based on
their knowledge of each child.
(c)
Authentic assessment is consistent with the constructivist approach to
learning. This approach emphasises that students should use their previous
knowledge to build new knowledge structures, be actively involved in
exploration and inquiry through task-like activities, and construct meaning
from educational experience. Most authentic assessments engage students
and actively involve them with complex tasks that require exploration and
inquiry.
(d)
Authentic assessment tasks assess the studentsÊ ability on how well they can
apply what they have learnt in real-life situations. An important school
outcome is the ability of the students to solve problems and lead a useful life,
rather than simply to answer questions about facts, principles and theories
they have learnt. In other words, authentic assessments require students to
demonstrate their ability to complete a task using their knowledge and skills
from several areas rather than simply recalling information or saying how to
do a task.
(e)
Authentic assessment tasks require an integration of knowledge, skills and
abilities. Complex tasks, especially those that span for longer periods, require
students to use different skills and abilities. Portfolios and projects, two
common tools in authentic assessment, require a student to use knowledge
from several different areas and many different abilities.
(f)
Authentic assessment focuses on higher-order thinking skills such as
„applying, analysing, evaluating and creating‰, which are found in BloomÊs
taxonomy. Authentic assessment evaluates thinking skills such as analysis,
synthesis, evaluation and interpretation of facts and ideas ă skills which
standardised tests generally avoid.
(g)
Embedding authentic assessment in the classroom allows for a wide range of
assessment strategies. It involves the teacher-and-student collaboration in
determining assessment (student-structured tasks).
(h)
Authentic assessment broadens the approach to student assessment.
Introducing authentic assessment along with traditional assessment
broadens the types of learning outcomes that a teacher can assess. It also
offers students a variety of ways of expressing their learning, thus enhancing
the validity of student evaluation.
Copyright © Open University Malaysia (OUM)
128  TOPIC 6 AUTHENTIC ASSESSMENT
(i)
Authentic assessment focuses on studentÊs progress, rather than identifying
their weaknesses. Authentic assessment lets teachers assess the processes
students use as well as the products they produce. Many authentic tasks offer
teachers the opportunity to watch the way a student goes about solving a
problem or completing a task. Appropriate scoring rubrics help teachers
collect information about the quality of the processes and strategies students
use, as well as assess the quality of the finished product.
6.5
DISADVANTAGES OF AUTHENTIC
ASSESSMENT
Despite the usefulness of authentic assessment as an assessment tool, it has some
drawbacks as well. Some of the criticisms are as follows:
(a)
High-quality Authentic Assessment Tasks Are Difficult to Develop
First, they must match the complex learning outcomes that are being
assessed. Teachers may decide that more than one learning outcome be
assessed by the same complex task. They must also be aware that not every
learning outcome can and should be assessed by authentic assessments. They
should only select those that can and should. In crafting the tasks for
assessment, teachers also have to decide if they want to assess the process,
the product or both. Of course, most important of all, the tasks developed
must allow for predetermined performance criteria. For that, the tasks must
possess special characteristics. Refer to subtopic 6.6 for details.
(b)
High-quality Scoring Rubrics Are Difficult to Develop
This is especially true when teachers want to assess complex cognitive and
intangible affective learning outcomes or permit multiple answers and
products. Failure to develop a high-quality rubric will affect the validity and
reliability of assessment.
(c)
Completing Authentic Assessment Tasks Takes a Lot of Time
Most authentic tasks take days, weeks or months to complete. For instance,
a research project might take a few weeks and this might reduce the amount
of instructional time.
Copyright © Open University Malaysia (OUM)
TOPIC 6
AUTHENTIC ASSESSMENT  129
(d)
Scoring Authentic Assessment Tasks Takes a Lot of Time
The more complex the tasks, the more time teachers can expect to spend on
scoring. Complex tasks normally allow for many diverse outputs from the
students. It is time consuming to score this type of outputs. Besides,
assessment that focuses on the process requires that teachers monitor and
score the output at different stages in the implementation of the tasks.
(e)
Scores from Tasks for Authentic Assessment May Have Lower Scorer
Reliability
With complex tasks, multiple outputs and answers, scoring depends on
teachersÊ own competence. If two teachers are doing the assessment, they
may mark the same output or answer of a student quite differently. This is
not only frustrating to the student but lowers the reliability and validity of
the assessment results. However, this problem can be solved by having welldefined rubrics and well-trained scorers to mark the studentsÊ output.
(f)
Authentic Assessments Have Low Reliability from the Content-sampling
Point of View
Normally, each authentic assessment task will only focus on specific subjectmatter content. As the task requires an extended period of time to complete,
it is not possible to have a wide content coverage as in the traditional
objective assessment formats which allow a broader content coverage in less
time.
(g)
Completing Authentic Assessment Tasks May Be Discouraging to Less Able
Students
Complex tasks such as projects require students to sustain their interest and
intensity over a long period of time. They may be overwhelmed by the high
demands of the authentic assessment. Though group work may help by
permitting peers to share the work and use each otherÊs differential
knowledge and skills to complete the task, group work has its limitations in
assessment.
In sum, criticism of authentic assessments generally involves both the informal
development of the assessments and difficulty in ensuring test validity and
reliability given the subjective nature of human scoring rubrics as compared to
computers scoring multiple-choice test items. Many teachers shy away from
authentic assessments because these methodologies are time intensive to manage,
grade, monitor and coordinate. Teachers find it hard to provide consistent grading
scheme. The subjective method of grading may lead to bias. Teachers also find that
this method is not practical for a big group of students.
Copyright © Open University Malaysia (OUM)
130  TOPIC 6 AUTHENTIC ASSESSMENT
Nevertheless, based on the value of authentic assessments to student outcomes,
the advantages of authentic assessments outweigh these concerns. For example,
once the assessment guidelines and grading rubric are created, they can be filed
away and used year after year. As Linquist (1951) noted, there is nothing new
about this authentic assessment methodology. This is not some kind of radical
invention recently fabricated by the opponents of traditional tests to challenge the
testing industry. Rather it is a proven method of evaluating human characteristics
that has been in use for decades.
ACTIVITY 6.2
Is authentic assessment practised in your institution? How is it done? If
it is not being practised, explain why.
Share your experience with your coursemates in the myINSPIRE online
forum.
6.6
CHARACTERISTICS OF AUTHENTIC
ASSESSMENT
The main characteristics of authentic assessment have been summed up by Reeves,
Herrington and Oliver (2002) who then contrasted its methodology to that of
traditional assessment. According to Reeves et al. (2002), authentic assessment is
characterised by the following:
(a)
Has Real-world Relevance
The assessment is meant to focus on the impact of oneÊs work in real or
realistic contexts.
(b)
Requires Students to Define the Tasks and Sub-tasks Needed to Complete
the Activity
Problems inherent in the activities are open to multiple interpretations rather
than easily solved by the application of existing algorithms.
Copyright © Open University Malaysia (OUM)
TOPIC 6
AUTHENTIC ASSESSMENT  131
(c)
Comprises Complex Tasks to Be Investigated by Students Over a Sustained
Period of Time
Activities are completed in days, weeks and months rather than minutes or
hours. They require significant investment of time and intellectual resources.
(d)
Provides the Opportunity for Students to Examine the Task from Different
Perspectives, Using a Variety of Resources
The use of a variety of resources rather than a limited number of pre-selected
references requires students to distinguish relevant information from
irrelevant data.
(e)
Provides the Opportunity to Collaborate
Collaboration is integral to the task, both within the course and the real
world, rather than achievable by the individual learner.
(f)
Provides the Opportunity to Reflect
Assessments need to enable learners to make choices and reflect on their
learning, both individually and socially.
(g)
Can Be Integrated and Applied Across Different Subject Areas and Lead
Beyond Domain-specific Outcomes
Assessments encourage interdisciplinary perspectives and enable students
to play diverse roles; thus, building robust expertise rather than knowledge
limited to a single well-defined field or domain.
(h)
Authentic Activities are Seamlessly Integrated with Assessment
Assessment of activities is seamlessly integrated with the major task in a
manner that reflects real-world assessment, rather than separate artificial
assessment removed from the nature of the task.
(i)
Creates Values
The product, outcome or result of an assessment is polished and is valued by
the student in its own right, rather than being treated as preparation for
something else.
(j)
Allows Competing Solutions and Diversity of Outcomes
Assessments allow a range and diversity of outcomes open to multiple
solutions of an original nature, rather than a single correct response obtained
by the application of rules and procedures.
Copyright © Open University Malaysia (OUM)
132  TOPIC 6 AUTHENTIC ASSESSMENT
6.7
DIFFERENCES BETWEEN AUTHENTIC AND
TRADITIONAL ASSESSMENTS
Assessment is authentic when we directly examine studentÊs performance on
worthy intellectual tasks. Traditional assessment, by contrast, relies on indirect or
„proxy items‰ that though efficient, are simplistic substitutes from which we think
valid inferences can be made about the studentÊs performance at those valued
challenges (Wiggins, 1990). The differences can be summed up as in Table 6.1.
Table 6.1: Comparisons between Authentic and Traditional Assessments
Attributes
Reasoning and
practice
Assessment and
curriculum
Authentic Assessment

Schools must help students
become proficient at
performing the tasks they
will encounter when they
leave schools.

To determine if teaching is
successful, the school must
then ask students to perform
meaningful tasks that
replicate real-world
challenges to see if students
are capable of doing so.

Assessment drives the
curriculum. That is, teachers
first determine the tasks that
students will perform to
demonstrate their mastery
and then a curriculum is
developed that will enable
students to perform those
tasks well, which would
include the acquisition of
essential knowledge and
skills. This has been referred
to as planning backwards.
Traditional Assessment

Schools must teach this
body of knowledge and
skills.

To determine if teaching
is successful, the school
must then test students to
see if they acquired the
knowledge and skills.

The curriculum drives
assessment. The body of
knowledge is determined
first. That knowledge
becomes the curriculum
that is delivered.
Subsequently, the
assessments are
developed and
administered to
determine if acquisition
of the curriculum
occurred.
Copyright © Open University Malaysia (OUM)
TOPIC 6
AUTHENTIC ASSESSMENT  133
Types of
assessment tasks

Students are required to
demonstrate understanding
by performing a more
complex task usually
representative of more
meaningful applications such
as carrying out a class project
and keeping portfolios.

Students are required to
take tests, usually the
selection type, in which
they are asked to select
the correct answer from
the choices provided.
Nature of
assessment tasks

Real-life tasks are assigned
for learners to perform in
order to demonstrate their
proficiency or competency.

Tests that are contrived,
e.g. MCQs are used to
assess learnersÊ
proficiency or
understanding in a short
period of time.
Focus of
assessment

Construction or Application
of Knowledge

Recall or Recognition of
Knowledge
Assessment requires learners
to be effective performers of
the acquired knowledge.
Therefore, during assessment
learners are asked to analyse,
synthesise and apply what
they have learnt and create
new meaning in the process.
LearnersÊ
responses in
assessment

Learner Structured
Authentic assessments allow
more student choices and
construction in determining
what is presented as evidence
of proficiency. Even when
students cannot choose their
own topics or formats, there
are usually multiple
acceptable routes towards
constructing a product or
performance.
In assessment, learners
are only required to
reveal if they can
recognise and recall,
normally facts that they
have learnt out of
context.

Teacher Structured
What a student can and
will demonstrate has
been carefully structured
by the person(s) who
developed the test. A
studentÊs attention will
understandably be
focused on and limited to
what is on the test.
Copyright © Open University Malaysia (OUM)
134  TOPIC 6 AUTHENTIC ASSESSMENT
Evidence of
learnersÊ
proficiency or
competency
Reliability and
validity

Authentic assessments offer
more direct evidence of
application and construction
of knowledge.

For example, asking a student
to write a critique should
provide more direct evidence
of that skill than asking the
student a series of multiple
choice, analytical questions
about a passage.

The evidence is very
indirect, particularly for
claims of meaningful
application in complex,
real-world situations.

For example, in MCQ, a
student effectively cannot
critique the arguments
someone else has
presented (an important
skill often required in the
real world).

Validity depends in part
upon whether the assessment
simulates real-world tests of
ability.

Validity is normally
determined by matching
test items to the
curriculum content.

It is difficult to ensure
reliability because of the
subjective nature of scoring
method (rubric) and the
presence of varied but
acceptable learnersÊ
responses.

It is possible to have high
scoring reliability as the
learnersÊ responses are
fixed. For example, there
is only one right answer
to a multiple-choice item.
Source: Adapted from Mueller (2005)
SELF-CHECK 6.1
1.
What is authentic assessment?
2.
State the other names used to describe authentic assessment.
3.
Highlight three differences between authentic and traditional
assessments.
Copyright © Open University Malaysia (OUM)
TOPIC 6
AUTHENTIC ASSESSMENT  135
ACTIVITY 6.3
1.
State the reasons why authentic assessment is a good replacement
for traditional assessment.
2.
Give an example of authentic assessment.
Post your answers on the myINSPIRE online forum.

The strategy of asking students to perform real-world tasks that demonstrate
meaningful application of essential knowledge and skills is called authentic
assessment.

Authentic assessment is sometimes called performance assessment, alternative
assessment or direct assessment.

Authentic assessment has many advantages and traditional assessment
complements the authentic assessment.

An authentic assessment usually includes a task for students to perform and a
rubric by which their performance on the task will be evaluated.

Authentic assessment is a proven method of assessing human characteristics
that has been in use for decades.
Alternative assessment
Indirect evidence
Backwards design
Kinaesthetic
Contrived to real life
Performance assessment
Direct assessment
Student structured
Direct evidence
Copyright © Open University Malaysia (OUM)
136  TOPIC 6 AUTHENTIC ASSESSMENT
Brady, L. (2012). Assessment and reporting: Celebrating student achievement
(4th ed.). Melbourne, Australia: Pearson.
Kohn, A. (2006). The trouble with rubrics. English Journal, 95(4), 12ă15.
Linquist, E. F. (1951). Preliminary considerations in objective test construction. In
E.F. Linquist (Ed), Educational measurement (pp. 4ă22). Washington, DC:
American Council on Education.
Muller, J. (2005). The authentic assessment toolbox: Enhancing student learning
through online faculty development. Journal of Online Learning and
Teaching, 1(1), 1ă7.
Reeves, T. C., Herrington, J., & Oliver, R. (2002). Authentic activity as a model for
web-based learning. Presentation at the Annual Meeting of the American
Educational Research Association, New Orleans, FL.
Vander Ark, T. (2013). What is performance assessment? Retrieved from
http://gettingsmart.com/2013/12/performance-assessment/
Wiggins, G. (1990). The case for authentic assessment. Practical Assessment,
Research & Evaluation, 2(2), 1ă3.
Wiggins, G., & McTighe, J. (1998). Understanding by design. Alexandria, VA:
Association for Supervision and Curriculum Development (ASCD).
Copyright © Open University Malaysia (OUM)
Topic  Project and
7
Portfolio
Assessments
LEARNING OUTCOMES
By the end of the topic, you should be able to:

1.
Explain how to design an effective project for assessment;
2.
Use different methods to assess group project work;
3.
Discuss the usefulness of using projects as an assessment tool;
4.
Describe the development of a portfolio; and
5.
Discuss to what extent portfolios are useful as an assessment tool.
INTRODUCTION
Besides objective and essay tests, there are other methods of assessing students
that you can use. In this topic, we will focus on two other assessment methods,
project assessment and portfolio assessment. Both types of assessments are
examples of authentic assessment, which you have learnt in Topic 6. We will
discuss the project assessment first, followed by the portfolio assessment in this
topic. Since both project and portfolio assessments are examples of authentic
assessment, whatever we discuss under authentic assessment applies to them,
except that the points discussed in this topic are more specific.
Copyright © Open University Malaysia (OUM)
138  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
7.1
PROJECT ASSESSMENT
Most of us have done some form of project work in school or university and know
what a project is. However, when asked to define it, one will see varying
interpretations of the project and its purpose. „Projects‰ can represent a range of
tasks that can be done at home or in the classroom, by parents or groups of
students, quickly or over time. While project-based learning (PBL) also features
projects, in PBL, the focus is more on the process of learning and learner-peercontent interaction than the end product itself.
A project is an activity in which time constraints have been largely removed and
it can be undertaken individually or by a group and usually involves a significant
element of work being done at home or out of school. Project work has its roots in
the constructivist approach which evolved from the work of psychologists and
educators such as Lev Vygotsky, Jerome Bruner, Jean Piaget and John Dewey.
Constructivism views learning as the result of mental construction wherein
students learn by constructing new ideas or concepts based on their current and
previous knowledge.
Most projects have the following common defining features (Katz & Chard, 1989)
which can also be considered as the strengths of using projects as an assessment
tool.
(a)
It is a studentăcentred process. Students have the liberty to decide and plan
what to do and how to complete the project assigned though the selection of
a project may be determined by the teacher. If the choice is left to the
students, it probably requires the approval of the teacher. What is significant
is that students are involved in the beginning, middle and end of the project.
They will play an active role in the entire process and take ownership of their
project. They are actively involved as problem solvers, decision makers,
investigators, documenters and researchers. They should therefore find
projects fun, motivating and challenging.
(b)
The content of a project and its work process are meaningful to students and
is directly observable in their environment. This is because projects normally
involve real-life problems and first-hand investigations. For instance, in
working on a project, students have to choose a knowledge area, delimit it
and formulate a problem or put forward questions. Then, they are required
to solve the problems and answer the questions through further work,
collection of materials and knowledge. Both the content gathered and the
work process involved are purposeful and reflective of real-life situations.
Project work allows for connections among school, life and work skills.
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  139
(c)
Projects can be planned with specific goals related to the curriculum.
Normally, work is planned in such a manner that it draws from knowledge
areas and skills in the current curriculum. It encourages students to break
away from the compartmentalisation of knowledge and instead involves
drawing upon different aspects of knowledge. It provides students with
opportunities to explore the interrelationships and interconnectedness of
topics within a subject and between subjects. For instance, the making of an
object requires handicraft skills, knowledge of materials, working methods
and uses of the object. Further, writing the project report requires language
skills. Technological supports will also enhance studentÊs learning. Thinking
skills are integral to project work. Project work thus involves drawing upon
different aspects of knowledge and skills from the curriculum and provides
students an integrated learning experience.
(d)
The product/output of a project is tangible and visible, which can be shared
with the intended audience. It provides direct evidence of meaningful
application of knowledge and skills by the students. Teachers can directly
look at the project output/product to determine what they have learnt, while
parents need not have to grapple with statistical results to know the
performance of their children.
(f)
Project work provides opportunities for reflective thinking and student selfassessment. Students not only can reflect and self-assess what they have done
at the end, but also do so during the entire process, allowing continuous
learning to take place.
(g)
Project work allows for multiple types of authentic assessment. For example,
in doing a project, students are required to use journals and diaries to
document the work process, portfolios to compile the project products,
reports to explain the work procedures, etc. All these outputs are useful
authentic evidence of studentsÊ performance that can be assessed.
(h)
Project work provides an opportunity for students to explore different
approaches in solving problems. In project work, a teacher follows, discusses
and assesses the work in all its different phases. The teacher is the studentÊs
supervisor. When working on a project, the whole work process is as
important as the final result or product.
Copyright © Open University Malaysia (OUM)
140  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
Generally, there are two types of projects:
(a)
Research-based Project
This is more theoretical in nature and may consist of posing a question,
formulating a problem or setting up some hypotheses. In order to answer the
question, solve the problem or confirm the assumptions, information must
be found, evaluated and used. This information can either be a result of their
own investigations or may be obtained from public sources without being a
pure reproduction. Such project work is usually presented as a research
report.
(b)
Product-based Project
This can be the production of a concrete object, a service, a dance
performance, a film, an exhibition, a play, a computer programme and so
forth.
There are many types of effective projects. The following are some ideas for
projects:
(a)
Survey of historical buildings in the studentÊs community;
(b)
Study of the economic activities of people in the local community;
(c)
Study of the transportation system in the district;
(d)
Recreate a historical event;
(e)
Develop a newsletter or website on a specific issue relevant to the school or
community (school safety, recycling, how businesses can save energy and
reduce waste);
(f)
Compile oral histories of the local area by interviewing community elders;
(g)
Produce a website as a „virtual tour‰ of the history of the community;
(h)
Create a video of students graduating from a primary or secondary school;
(i)
Create a wildlife or botanical guide for a local wildlife area;
(j)
Create an exhibition on local products, local history or local personalities
using audiotapes, videotapes and photographs; and
(k)
Investigate pollution of local rivers, lakes and ponds.
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  141
The possibilities are endless. The key ingredient for any project idea is that it is
student-driven, challenging and meaningful. It is important to realise that projectbased instruction complements the structured curriculum. Project-based
instruction builds on and enhances what students learn through systematic
instruction. Teachers do not let students become the sole decision makers about
what project to do, nor do teachers sit back and wait for the students to figure out
how to go about the process, which may be very challenging (Bryson, 1994). This
is where the teacherÊs ability to facilitate and act as a coach, plays an important role
in the success of the project. The teacher will brainstorm ideas with the student to
come up with a number of project possibilities, discuss these possibilities and
options, help the students form a guiding question and be ready to help them
throughout the implementation process such as setting guidelines, due dates,
resource selection and so forth (Bryson, 1994).
SELF-CHECK 7.1
1.
What is a project?
2.
State the differences between a research-based project and a
product-based project.
ACTIVITY 7.1
Give examples of the two types of projects in your subject area or any
subject area.
Share your answer with your coursemates in the myINSPIRE online
forum.
7.1.1
What is Assessed Using Projects?
Project-oriented work is becoming increasingly common in working life. Project
competence, the ability to work together with others and take personal initiatives
and entrepreneurship are skills often required by employers. These competences
can be developed during project work which thus prepares students for working
life. Project work makes schooling more like the real world. In real life, we seldom
spend several hours listening to authorities who know more than we do and tell
us exactly what to do and how to do things. We ask questions of the person we are
Copyright © Open University Malaysia (OUM)
142  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
learning from. We try to link what the person is telling us with what we already
know. We bring our experiences and what we already know that is relevant to the
issue or task and say something about it.
You can see this with a class of young learners. When the teacher tells a story, little
kindergarten children raise their hands, eager to share their experiences with
something related to the story. They want to be able to apply their natural
tendencies to the learning process. This is how life is much of the time! By giving
project work, we open up areas in schooling where students can speak about what
they already know.
Project work is a learning experience which enables the development of certain
knowledge, skills and attitudes which prepares students for lifelong learning and
the challenges ahead (refer to Table 7.1).
Table 7.1: The Knowledge, Skills and Attitudes Achieved with Projects
Domains
Learning Outcomes
Knowledge and
skills
application
The ability to apply the knowledge and skills acquired in the project
task.
Communication
Examples:

Be able to choose a knowledge area and delimit a task or
problem.

Be able to choose relevant resources to complete the project.

Be able to draw up a project plan, implement it and if necessary
revise it.

Be able to apply creative and critical thinking skills in solving
problems.
The ability to communicate effectively, by presenting their ideas
clearly and coherently to specific audiences, in both the written and
oral forms.
Examples:

Be able to discuss with their supervising teacher how their work
is developing.

Be able to provide a written report of the project describing the
progress of work from the initial idea to final product.

Be able to produce a final product which is an independent
solution to the task or problem chosen.
Copyright © Open University Malaysia (OUM)
TOPIC 7
Collaboration
PROJECT AND PORTFOLIO ASSESSMENTS  143
The skills of working with others and in a team to achieve common
goals.
Examples:
Independent
Learning

Be able to participate in group discussion actively.

Be able to listen actively the concerns of team members.

Be able to display a willingness to be a team player.

Be able to assess the strengths and weaknesses of partners.

Be able to recognise the contributions of team members.

Be able to play the roles assigned effectively and successfully.
The ability to learn on his/her own, self-reflect and take appropriate
actions to improve.
Examples:

Be able to document the progress of their work and regularly
report the process.

Be able to assess either in writing or verbally their work process
and results.

Be able to manage time for learning efficiently.
Source: Adapted from Harwell and Blank (1997)
SELF-CHECK 7.2
What are the knowledge, skills and attitudes evaluated using a project?
ACTIVITY 7.2
To what extent has project work been used as an assessment strategy in
Malaysian schools?
Discuss this matter with your coursemates in the myINSPIRE online
forum.
Copyright © Open University Malaysia (OUM)
144  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
7.1.2
Designing Effective Projects
There are many types of projects. There is no one correct way to design and
implement a project, but there are some questions and things to consider when
designing effective projects. You will be surprised that many teachers are not sure
why they use projects to assess their students. It is very important for everyone
involved to be clear about the learning goals of the project. Herman, Aschbacher
and Winters (1992) have identified five questions to consider when determining
learning goals:
(a)
What important cognitive skills do I want my students to develop? (For
example, to use algebra to solve everyday problems, to write persuasively);
(b)
What social and affective skills do I want my students to develop? (For
example, develop teamwork skills);
(c)
What metacognitive skills do I want my students to develop? (For example,
reflect on the research process they use, evaluate its effectiveness and
determine methods of improvement);
(d)
What types of problems do I want my students to be able to solve? (For
example, know how to do research, apply a scientific method); and
(e)
What concepts and principles do I want my students to be able to apply? (For
example, apply basic principles of biology and geography in their lives,
understand cause-and-effect relationships).
In designing project work for assessment, the teacher should also develop an
outline that explains the projectÊs essential elements and his or her expectations for
each project. Although the outline can take various forms, it should contain the
following elements (Bottoms & Webb, 1998):
(a)
Situation or Problem
A sentence or two describing the issue or problem that the project is trying
to address. For example, the pollution levels in rivers, transportation
problems in urban centres, the price of essential items are increasing, crime
rate in squatter areas, youths loitering in shopping complexes, students in
Internet cafes during school hours and so forth.
(b)
Project Description and Purpose
A concise explanation of the projectÊs ultimate purpose and how it addresses
the situation or problem. For example, students will research, conduct
surveys and make recommendations on how students can help reduce
pollution of rivers. Results will be presented in a newsletter, information
brochure, exhibition or website.
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  145
(c)
Performance Specifications
A list of criteria or quality standards the project must meet.
(d)
Rules
Guidelines for carrying out the project include timeline and short-term goals,
such as to have interviews and research completed by a certain date.
(e)
List of Project Participants with Roles Assigned
Roles of team members and if members of the community are involved,
identify their roles.
(f)
Assessment
How the studentÊs performance will be evaluated. In project work, the
learning process is being evaluated as well as the final product.
What concepts and principles do I want my students to be able to apply? (For
example, apply basic principles of biology and geography in their lives,
understand cause-and-effect relationships).
Steinberg (1998) provides a checklist, which is called the Six AÊs Project Checklist,
for the design of effective projects (refer to Table 7.2). The checklist can be used
throughout the process to help both teacher and student plan and develop a
project, as well as to assess whether the project is successful in meeting
instructional goals.
Table 7.2: The Six AÊs Project Checklist
Six AÊs Project
Authenticity
Questions Checklist
 Does the project stem from a problem or question that is meaningful
to the student?
 Is the project similar to one undertaken by an adult in the
community or workplace?
 Does the project give the student the opportunity to produce
something that has value or meaning to the student beyond the
school setting?
Academic
rigour
 Does the project enable the student to acquire and apply knowledge
central to one or more discipline areas?
 Does the project challenge the student to use methods of inquiry
from one or more disciplines (such as to think like a scientist)?
 Does the student develop higher-order thinking skills (such as
searching for evidence, using different perspectives)?
Copyright © Open University Malaysia (OUM)
146  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
Applied
learning
 Does the student solve a problem that is grounded in real life and/or
work (such as design a project, organise an event)?
 Does the student need to acquire and use skills expected in highperformance work environments (such as teamwork, problemsolving, communication or technology)?
 Does the project require the student to develop organisational and
self-management skills?
Active
exploration
 Does the student spend significant amount of time doing work in
the field, outside school?
 Does the project require the student to engage in real investigative
work, using a variety of methods, media and sources?
 Is the student expected to explain what he/she learnt through a
presentation or performance?
Adult
relationships
 Does the student meet and observe adults with relevant experience
and expertise?
 Is the student able to work closely with at least one adult?
 Do adults and the students collaborate on the design and assessment
of the project?
Assessment
practices
 Does the student reflect regularly on his or her learning, using clear
project criteria that he or she has helped to set?
 Do adults from outside the community help the student develop a
sense of the real-world standards from this type of work?
 Is the studentÊs work regularly assessed through a variety of
methods, including portfolios and exhibitions?
Source: Steinberg (1998)
In implementing the project, it is also important to ensure that the following
questions are addressed:
(a)
Do the students have easy access to the resources they need? This is
especially important if a student is using specific technology or subjectmatter expertise from the community;
(b)
Do the students know how to use the resources? Students who have minimal
experience with the computer, for example, may need extra assistance in
utilising it;
(c)
Do the students have mentors or coaches to support them in their work? This
can be in-school or out-of-school mentors; and
(d)
Are students clear on the roles and responsibilities of each person in the
group?
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  147
ACTIVITY 7.3
1.
What are some of the factors you should consider when designing
project work for students in your subject area?
2.
Give examples of projects you have included or can include in the
teaching and evaluation of your subject area.
Post your answers on the myINSPIRE online forum.
7.1.3
Possible Problems with Project Work
Teachers intending to use projects both as an instructional and assessment tool
should be aware of certain problem areas. Be as specific as possible in determining
outcomes so that both the student and the teacher understand exactly what is to
be learnt. In addition, be aware of the following problems when undertaking
project-based instruction:
(a)
Aligning project goals with curriculum goals can be difficult and requires
careful planning. For example, if one of the curriculum goals is to develop
the reading skills of learners, the project planned should enable learners to
acquire the skills in the process. Very often, the skills are assessed but the
scope for learning is not made available.
(b)
Parents tend to be exam-oriented. To the parents, the best way to assess
learning is through tests and examinations. They cannot see how doing a
project is related to the overall assessment of learning. They are thus not
supportive of projects and consider doing projects a waste of student
learning time and resources.
(c)
Students are not clear as to what is required of them. This happens when the
assigned projects do not have adequate and clear structure and guidelines.
Students are also not given proper guidance on how to carry out the projects.
(d)
Projects often take longer time to complete and teachers need a lot of time to
prepare good authentic projects, to manage and monitor their
implementation.
Copyright © Open University Malaysia (OUM)
148  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
(e)
Teachers are not traditionally prepared to integrate curriculum content into
real-world activities. Besides, they may not be familiar with how they should
assess projects. There is thus a need for intensive staff development to
prepare them for the job.
(f)
Resources needed for project work are not confined to the usual classroom
materials such as paper and pencil. For instance, a project to develop a
website requires a computer, special Internet program and other Internet
facilities which may not be readily available to the learners. Such resources
also involve cost. Support from school administration is needed.
(g)
Scoring studentsÊ project work can be a daunting task. Project work normally
involves the assessment of diverse competencies of the students. It also
allows for the production of many diverse outputs by the students. Assessing
project work is time-consuming. Besides, teachers need to be specially
trained to carry out this type of assessment especially when the project is
undertaken by a group of student. Fairness in assessment among group
members is an issue that needs special attention.
7.1.4
Group Work in Projects
A group project requires two or more students to work together on a longer
project. Working in groups has become an accepted part of learning as a
consequence of widely recognised benefits of collaborative group work for student
learning. When groups work well, students learn more and produce higher quality
learning outcomes. What are some benefits of group work in projects?
(a)
Group Work Can Enhance the Overall Quality of Student Learning
Groups that work well together can achieve much more than an individual
working on his or her own. A broader range of skills can be applied to
practical activities and sharing and discussing ideas can play a pivotal role
in deepening an individual studentÊs understanding of a particular subject
area. This is because working in a group enables him to examine topics from
the perspectives of others. When an individual is required to discuss a topic
and negotiate how to address it, he is forced to listen to other peopleÊs ideas.
Their ideas will then influence his own thinking and broaden his horizons.
His group members are not just fellow learners, they are also his teachers.
Besides, being part of a team will help him develop his interpersonal skills
such as speaking and listening. Group work will also help him find his own
strengths and weaknesses (for example, he may be a better leader than
listener, or he might be good at coming up with the „big ideas‰ but not so
good at putting them into action). Enhanced self-awareness will help his
approach to learning.
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  149
(b)
Group Work Can Improve Peer Learning
Group work enhances peer learning. Students learn from each other and
benefit from activities that require them to articulate and test their
knowledge. Group work provides an opportunity for students to clarify and
refine their understanding of concepts through discussion and rehearsal
with peers. Many, but not all students recognise the value of group work to
their personal development and of being assessed as a member of a group.
Working with a group and for the benefit of the group also motivates some
students. Group assessment helps some students develop a sense of
responsibility. A student working in a group on a project may think, „I felt
that because one is working in a group, it is not possible to slack off or to put
things off. I have to keep working, otherwise I would be letting other people
down.‰
(c)
Group Work Can Help Develop Generic Skills Sought by Employers
As a direct response to the objective of preparing graduates with the capacity
to function successfully as team members in the workplace, there has been a
trend in recent years to incorporate generic skills alongside traditional
subject-specific knowledge in the expected learning outcomes in higher
education. Group work can facilitate the development of skills, which
include:
(i)
Teamwork skills (skills in working within team dynamics and
leadership skills);
(ii)
Analytical and cognitive skills (analysing task requirements,
questioning, critically interpreting material and evaluating the work of
others);
(iii) Collaborative skills (conflict management and resolution, accepting
intellectual criticism, flexibility and negotiation and compromise); and
(iv) Organisational and time management skills. A student might say,
„Having to do group work has changed the way I worked. I could not
do it all the night before. I had to be more organised and efficient.‰
(d)
Group Work May Reduce the Work Load Involved in Assessing, Grading
and Providing Feedback to Students
Group work and group assessment in particular, is sometimes implemented
in the hope of streamlining assessment and grading tasks. In simple terms, if
students submit group assignments, then the number of pieces of work to be
assessed can be vastly reduced. This prospect might be particularly attractive
for staff teaching large classes.
Copyright © Open University Malaysia (OUM)
150  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
SELF-CHECK 7.3
1.
What are some problems in the implementation of project work
and how would you solve them?
2.
What are the benefits of group work in projects?
7.1.5
Assessing Project Work
Assessing student performance on project work is quite different from an
examination using objective tests and essay questions. It is possible that students
might be working on different projects; some may be working in groups while
others are working alone. This makes the task of assessing student progress even
more complex compared with a paper-and-pencil test where everyone is evaluated
using one marking scheme. Table 7.3 illustrates the general marking scheme for
projects.
Table 7.3: General Marking Scheme for Projects
Marks
100ă90%
Criteria
 Exceptional and distinguished work of a professional standard.
 Outstanding technical and expressive skills.
 Work demonstrating exceptional creativity and imagination.
 Work displaying great flair and originality.
89ă80%
 Excellent and highly developed work of a professional standard.
 Extremely good technical and expressive skills.
 Work demonstrating a high level of creativity and imagination.
 Work displaying flair and originality.
79ă70%
 Very good work which approaches professional standard.
 Very good technical and expressive skills.
 Work demonstrating good creativity and imagination.
 Work displaying originality.
69ă60%
 A good standard of work.
 Good technical and expressive skills.
 Work displaying creativity and imagination.
 Work displaying some originality.
Copyright © Open University Malaysia (OUM)
TOPIC 7
59ă50%
PROJECT AND PORTFOLIO ASSESSMENTS  151
 A reasonable standard of work.
 Adequate technical and expressive skills.
 Work displaying competence in the criteria assessed, but which may
be lacking some creativity or originality.
49ă40%
 A limited, but adequate standard of work.
 Limited technical and expressive skills.
 Work displaying some weaknesses in the criteria assessed and
lacking creativity or originality.
39ă30%
 Limited work which fails to meet the required standard.
 Weak technical and expressive skills.
 Work displaying significant weaknesses in the criteria assessed.
29ă20%
 Poor work. Unsatisfactory technical or expressive skills.
 Work displaying significant or fundamental weaknesses in the
criteria assessed.
19ă10%
 Very poor work or work where very little attempt has been made.
 A lack of technical or expressive skills.
 Work displaying fundamental weaknesses in the criteria assessed.
9ă1%
 Extremely poor work or work where no serious attempt has been
made.
Source: Chard (1992)
Product, Process or Both?
According to Bonthron and Gordon (1999), from the onset, you should be clear
about the following:
(a)
Whether you are going to assess the product of the group work or both
product and process.
(b)
If you intend to assess process, what proportions of the marks are you going
to allocate for process and based on what criteria? And how are you going to
use the criteria to assess process?
(c)
What criteria are you planning to use to assess the product and how will the
marks be distributed?
Some educators believe there is a need to assess the processes within groups as
well as the products or outcomes. What exactly does process mean? Both teachers
and students must be clear what process means. For example, if you want to assess
„the level of interaction‰ among students in the group, they should know what
„high‰ or „low‰ interaction means. Should the teacher be involved in the working
Copyright © Open University Malaysia (OUM)
152  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
of each group or should it rely on self or peer assessment? Obviously, being
involved in so many groups would be physically impossible for the teacher. So,
how do you measure „process‰? Some educators may say, „I donÊt care what they
do in their groups. All IÊm interested in is the final product and how they arrive at
their results is their business.‰
However, to provide a more balanced assessment, there is growing interest in both
the process and product of group work and the issue that arises is, „What
proportion of assessment should focus on product and what proportion should
focus on process?‰
The criteria for the evaluation of group work can be determined by teachers alone
or by both teachers and students through consultation. Group members can be
consulted on what should be assessed in a project through consultation with the
teacher. Obviously, you have to be clear about the intended learning outcomes of
the project in your subject area. It is a useful starting point for determining criteria
for assessment of the project. Once these broader learning outcomes are
understood, you can establish the criteria for marking the project. Generally, it is
easier to establish criteria for measuring the product of project work and much
more difficult to measure the processes involved in project work. However, it is
suggested that evaluation of product and process can be done separately rather
than attempting to do both at once.
Who Gets the Marks ă Individuals or the Group?
Most projects involve more than one student and the benefits of group work have
been discussed earlier. A major problem of evaluating projects involving group
work is how to allocate marks fairly among group members. As exclaimed by a
student, „I would like my teacher to tell me what amount of work and effort will
get what mark.‰ Other concerns would be, „Do all students get the same mark
even though not all students put in the same effort?‰ and „Are marks given for the
individual contribution of team members?‰
These are questions that bother teachers, especially when it is common to find free
loaders or sleeping partners in group projects. The following are some suggestions
how group work may be assessed:
(a)
Shared Group Mark
All group members receive the same mark for the work submitted regardless
of individual contribution. It is a straightforward method that encourages
group work where group members sink or swim together. However, it may
be perceived as unfair by better students who may complain that they are
unfairly disadvantaged by weaker students and the likelihood of „sleeping
partners‰ is very high.
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  153
(b)
Share-out Marks
The students in the group decide how the total number of marks should be
shared between them. For example, a score of 40 is given by the teacher for
the project submitted. There are five members in the group and so the total
score possible is 5  40 = 200. The students then share the 200 marks based
on the contribution of each of the five students; which may be 35, 45, 42, 38
and 40. This is an effective method if group members are fair, honest and do
not have ill feelings towards each other. However, there is the likelihood for
the marks to be equally distributed to avoid ill feelings among group
members.
(c)
Individual Mark
Each student in the group submits an individual report based on the task
allocated or on the whole project.
(d)
(i)
Allocated Task
From the beginning, the project is divided into different parts or tasks
and each student in the group completes his or her allocated task that
contributes to the final group product and gets the marks for that task.
This method is a relatively objective way of ensuring individual
participation and may motivate students to work hard on their task or
part. The problem is breaking up the project into tasks that are exactly
equal in size or complexity. Also, the method may not encourage group
collaboration and some members may slow down progress.
(ii)
Individual Report
Each student writes and submits an individual report based on the
whole project. The method ensures individual effort and may be
perceived as fair by students. However, it is difficult to determine how
the individual reports should differ and students may unintentionally
commit plagiarism.
Individual Mark (Examination)
Use examination questions that specifically target the group projects, and can
only be answered by students who have been thoroughly involved in the
project. This method may motivate students to learn from the group project
including learning from the other members of the group. However, it may
not be effective because students may be able to answer the questions by
reading the group reports. In the Malaysian context, a national examination
may not be able to include such questions as it involves hundreds of
thousands of students.
Copyright © Open University Malaysia (OUM)
154  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
(e)
Combination of Group Average and Individual Marks
The group mark is awarded to each member with a mechanism for adjusting
for individual contributions. This method may be perceived to be fairer than
shared group mark. However, it means additional work for teachers trying
to establish individual contribution.
ACTIVITY 7.4
Which of the five methods of assessing group work would you use in
evaluating project work in your subject area? Give reasons for your
choice.
Post your answer on the myINSPIRE online forum.
7.1.6
Evaluating Process in a Project
The assessment of a group product is rarely the only assessment taking place in
group activities. The process of group work is increasingly recognised as an
important element in the assessment of group work. Moreover, where group work
is marked solely on the basis of product and not process, there can be differences
in individual grading that are unfair and unacceptable. The following are the
elements which are considered in evaluating process:
(a)
Peer/Self Evaluation of Roles
Students rate themselves as well as other group members on specific criteria,
such as responsibility, contributing ideas and finishing tasks. This can be
done through various grading forms (refer to Figure 7.1) or having students
write a brief essay on the group membersÊ strengths and weaknesses.
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  155
Figure 7.1: Checklist for evaluating processes involved in project work
Source: Sutherland (2003)
Copyright © Open University Malaysia (OUM)
156  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
(b)
Individual Journals
Students keep a journal of events that occur in each group meeting. These
include who attended, what was discussed and plans for future meetings.
These can be collected and periodically read by the instructor, who
comments on progress. The instructor can provide guidance for the group
without directing them.
(c)
Minutes of Group Meetings
Similar to journals are minutes for each group meeting, which are periodically
read by the instructor. These include who attended, tasks completed, tasks
planned and contributors to various tasks. This provides the instructor with
a way of monitoring individual contributions to the group.
(d)
Group and Individual Contribution Grades
Instructors can divide the project grade into percentage of individual and
group contribution. This is especially beneficial if peer and self-evaluations
are used.
Logs can potentially provide plenty of information to form the basis of
assessment while keeping minutes helps members to focus on the process
which is a learning experience in itself. These techniques may be perceived
as a fair way to deal with „shirkers‰ and outstanding contributions.
However, reviewing logs can be time consuming for teachers and students
may need a lot of training and experience in keeping records. Also, emphasis
on second-hand evidence may not be reliable.
7.1.7
Self-assessment in Project Work
Self-assessment is a process by which students learn about themselves; for
example, what they have learnt about the project, how they have learnt and how
they have reacted in certain situations when carrying out the project. Involving
students in the assessment process is an essential part of balanced assessment.
When students become partners in the learning process, they gain a better sense of
themselves as readers, writers and thinkers. Some teachers maybe uncomfortable
with self-assessment because traditionally teachers are responsible for all forms of
assessment in the classroom and here we are asking students to assess themselves.
Self-assessment can take many forms such as:
(a)
Discussion involving the whole class or small groups;
(b)
Reflection logs;
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  157
(c)
Self-assessment checklist or inventories; and
(d)
Teacher-student interviews.
These types of self-assessment share a common theme; they ask students to review
their work to determine what they have learnt and if areas of confusion still exist.
Although each method may differ slightly, they all include enough time for
students to consider thoughtfully and evaluate their own progress.
Since project learning is student-driven, assessment should be student-driven as
well. Students can keep journals and logs to continually assess their progress. A
final reflective essay or log can allow students and teachers to understand thinking
processes, reasoning behind decisions, ability to arrive at conclusions and
communicate what they have learnt. According to Edwards (2000), the following
are some questions a student can ask himself or herself while self-assessing:
(a)
What were the projectÊs successes?
(b)
What might I do to improve the project?
(c)
How well did I meet my learning goals? What was most difficult about
meeting the goals?
(d)
What surprised me most about working on the project?
(e)
What was my groupÊs best team effort? Worst team effort?
(f)
How do I think other people involved with the project felt it went?
(g)
What were the skills I used during this project? How can I practise these skills
in the future?
SELF-CHECK 7.4
1.
Explain how process can be measured in a group project work.
2.
List some of the problems with the evaluation of process.
ACTIVITY 7.5
Do you think process should be assessed?
Justify your answer in the myINSPIRE online forum.
Copyright © Open University Malaysia (OUM)
158  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
7.2
WHAT IS A PORTFOLIO?
A portfolio is a collection of pieces of student work presented for assessment.
However, it does not contain all the work a student does. It may contain examples
of „best‰ works or examples from each of several categories of work, for example,
a book review, a letter to a friend, a creative short story and a persuasive essay. A
studentÊs portfolio may have one or several goals. The student will select and
submit works to meet these goals. The works submitted should provide evidence
of studentsÊ progress towards the goals and reflect both student production and
process.
A portfolio is therefore not a pile of student work that accumulates over a semester
or year. Rather, a portfolio contains a purposefully selected subset of student work
which reflects his or her efforts, progress and achievements in different areas of
the curriculum. Some people may associate portfolios with the stock market where
a person or organisation keeps a portfolio of stocks and shares owned. A portfolio
can be defined as a container that holds evidence of an individualÊs skills, ideas,
interests and accomplishments. The organised collection of contents such as text,
files, photos, videos and more to tell that story, are generically referred to as
artefacts and evidence of what students have learnt. These artefacts are usually
accompanied by studentsÊ reflection.
The particular purposes of portfolio determine the number and type of items to be
included, the process for selecting the items, how and whether students respond
to the items selected.
Portfolios offer a way of assessing student learning that is different from
traditional methods. Portfolio assessment provides the teacher and students an
opportunity to observe students in a broader context: taking risks, developing
creative solutions and learning to make judgements about their own
performances.
(Paulson, Paulson & Meyer, 1991)
Portfolios typically are created for one of three purposes; to show growth, to
showcase current abilities and to evaluate cumulative achievement. Many
educators who work with portfolios consider the reflection component the most
critical element of a good portfolio. Simply selecting samples of work can produce
meaningful stories about students and others can benefit from „reading‰ these
stories.
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  159
The students themselves are missing significant benefits of the portfolio process if
they are not asked to reflect upon the quality and growth of their work. As Paulson
et al. (1991) stated, „The portfolio is something that is done by the student, not to
the student.‰ Most importantly, it is something done for the student. The student
needs to be directly involved in each phase of the portfolio development to learn
the most from it and the reflection phase holds the most promise for promoting
studentÊs growth.
Portfolios are sometimes described as portraits of a personÊs accomplishments.
Using this metaphor, we can consider a student portfolio a self-portrait, but one
that has benefited from guidance and feedback from a teacher and sometimes from
other students.
7.2.1
What is Portfolio Assessment?
Increasingly, portfolio assessment is gaining acceptance as an assessment strategy
seeking to present a more holistic view of the learner. The collection of works by
students are assessed and hence the term portfolio assessment. However, some
suggest that portfolios are not really assessments at all because they are just
collections of previously completed assessments. In portfolio assignment, students
are in fact performing authentic tasks which capture meaningful application of
knowledge and skills. Their portfolios often tell compelling stories of the growth
of the studentsÊ talents and showcase their skills through a collection of authentic
performances.
The portfolio provides for continuous and ongoing assessment (i.e. formative
assessment) as well as assessment at the end of a semester or a year (i.e. summative
assessment). Emphasis is more on monitoring studentsÊ progress towards
achieving the learning outcomes of a particular subject, course or programme.
Portfolio assessment has been described as multidimensional because it allows
students to include different aspects of their works such as essays, project reports,
performance on objective tests, objects or artefacts they have produced, poems,
laboratory reports and so forth. In other words, the portfolio contains samples of
work over an entire semester, term or year, rather than single points in time, such
as during examination week only.
Using portfolios introduces students to an evaluation format which they may need
to become familiar as more schools adopt portfolio assessment. Although many
portfolios reflect long-term projects completed over a period of time, they do not
have to be that way. Teachers can have students create portfolios of their work for
a particular unit. That portfolio might count as a project for that particular topic of
study. Though portfolios assessment is currently quite popular in our school
system, there are still teachers who are uncomfortable to use it as an assessment
Copyright © Open University Malaysia (OUM)
160  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
tool. These teachers may have the thinking that the portfolio is a very subjective
form of assessment. They may be unsure of the purpose of a portfolio and its uses
in the classroom. To them, there is also the question of how the portfolio can be
most effectively used to assess student learning. The situation can be overcome if
these teachers understand the purpose of portfolios, how the portfolios can be used
to evaluate their studentsÊ work and how grades will be determined.
Portfolio assessment represents a significant shift in thinking about the role of
assessment in education. Teachers who use this strategy in the classroom have
shifted their philosophy of assessment from merely comparing achievement
(based on grades, test score, percentile rankings) towards improving studentsÊ
achievement through feedback and self-reflection. Teachers should convey to
students the purpose of the portfolio, what constitute quality work and how the
portfolio is graded.
7.2.2
Types of Portfolios
There are two main types of portfolios, namely process-oriented and productoriented portfolios (refer to Table 7.4).
Table 7.4: Types of Portfolios
Process-oriented Portfolios
Product-oriented Portfolios
These portfolios tell a story about the
student and how the learner has grown. It
will include earlier drafts and how these
drafts have been improved upon. For
example, the first draft of a poem written
by a Year Three student is reworked based
on the comments by the teacher and the
student reflecting on his or her work. All
the drafts and changes made are kept in
the portfolio. In this manner, student
works can be compared by providing
evidence of how the studentÊs skills have
improved.
These portfolios contain the works of a
student which he or she considers the best.
The aim is to document and reflect on the
quality of the final products rather than the
process that produced them. The student is
required to collect all his or her work at the
end of the semester, at which time he or
she must select those works which are of
the highest quality. Students could be left
to make the decision themselves or the
teacher can set the criteria on what a
portfolio must contain and the quality of
the works to be included.
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  161
SELF-CHECK 7.5
1.
What is portfolio assessment?
2.
Describe two main types of portfolios.
7.2.3
Developing a Portfolio
According to Epstein (2006), the design and development of a portfolio involves
four main steps as follows:
(a)
Collection
This step simply requires students to collect and store all of their work.
Students have to get used to the idea of documenting and saving their work
which they may not have done before. Questions involved in this step are:
(i)
How should the work be organised? By subject or by themes?
(ii)
How should the work be recorded and stored?
(iii) How to get students to form the habit of documenting evidence?
(b)
Selection
This will depend on whether it is a process or product portfolio and the
criteria set by the teacher. Students will go through the work collected and
select certain works for their portfolio. This might include: examination
papers and quizzes, audio and video recordings, project reports, journals,
computer work, essays, poems, artwork and so forth. Questions related to
this step are:
(i)
How does one select? What is the basis of selection?
(ii)
Who should be involved in the selection process?
(iii) What are the consequences of not completing the portfolio?
(c)
Reflection
This is the most important step in the portfolio process. It is reflection that
differentiates the portfolio from a mere collection of student work. Reflection
is often done in writing but it can also be done orally. Students are asked why
they have chosen a particular product or work (such as essay); and how it
Copyright © Open University Malaysia (OUM)
162  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
compares with other work, what particular skills and knowledge were used
to produce it (such as the essay) and how it can be further improved.
Questions related to this step are:
(d)
(i)
Should students reflect on how or why they chose certain works?
(ii)
How should students go about the reflection process?
Connection
As a result of „reflection‰, students begin to ask themselves, „Why are we
doing this?‰ They are encouraged to make connections between their
schoolwork and the value of what they are learning. They are also
encouraged to make connections between the work included in their
portfolio with the world outside the classroom. They learn to exhibit what
they have done in school to the community. Questions to ask are:
(i)
How is the cumulative effect of the portfolio evaluated?
(ii)
Should students exhibit their works?
7.2.4
Advantages of Portfolio Assessment
Portfolio assessments have been gaining importance as an assessment strategy in
educational institutions because of their benefits to teaching and learning.
However, like other assessment methods, the benefits of portfolio assessments also
bring along some problems. Let us first look at the advantages of using portfolios
as an assessment tool.
(a)
Allows Assessment of Creativity and Higher-level Cognitive Skills
It has frequently been suggested that paper-and-pencil tests (objective and
essay tests) are not able to assess all the learning outcomes in a particular
subject area. For example, many higher-level cognitive skills and the affective
domain (feelings, emotions, attitudes and values) are not adequately
assessed using traditional assessment methods. However, portfolio
assessments allow for the assessment of studentsÊ higher-level cognitive
skills such as critical and creative thinking skills. For instance, they can be
assessed on how critical they are in individual reflection. Likewise, they can
be assessed how creative they are in the selection and presentation of their
works for portfolio assessments. Materials compiled by students in the
portfolio development process also provide evidence about his or her growth
in the affective domain such as self-confidence, diligence, attention to detail
and positive attitude towards learning.
Copyright © Open University Malaysia (OUM)
TOPIC 7
(b)
PROJECT AND PORTFOLIO ASSESSMENTS  163
Continous, Ongoing Process
Portfolio assessments are an ongoing process. Hence, they not only provide
an opportunity for the teacher to trace or monitor change and growth over a
period of time, but also provide an opportunity for students to reflect their
own learning and thinking. Teachers have an opportunity to monitor their
understanding and approaches to solving problems and decision making
(Paulson et al., 1991), while upon reflection, students can identify where they
have gone wrong or how they can improve themselves. Emphasis in portfolio
assessment is on improving studentÊs achievement rather than ranking
students according to their performance on tests.
Portfolio assessments are both formative and summative. Since the
assessment is a continuous and ongoing process, it allows students to reflect
on their own learning and thinking and allows teachers to monitor studentsÊ
progress and to provide feedback. Through teacher feedback and selfreflection, students improve their achievement. The assessment of this
learning process is formative while the assessment of the documents that are
being compiled as well as assessment of the final products at the end of the
semester/year are summative.
(c)
Multidimensional
Portfolio assessments are multidimensional in that they allow for inclusion
of different aspects of studentsÊ works and also samples of work over time,
i.e. over a semester/year. Portfolios are thus a rich source of evidence on
student learning and development, providing a more complete and holistic
picture of studentsÊ achievement.
(d)
Encourages Self-assessment
Portfolio assessments involve students in the assessment process. Students
self-assess their own work and decide what to include as the evidence of their
growth and performance. They judge their work using explicit criteria to
identify strengths and weaknesses and monitor their own progress. By selfevaluating their own work, students will become more accountable and be
more responsible of their own learning. Student learning has also become
more meaningful. For self-evaluation to take place, teachers should
constantly invite students to reflect their growth and performance as
students. The teachers should convey to students the purpose of the
portfolio, what constitutes quality work and how the portfolio is graded.
Feedback enables learners to reflect on what they are learning and why.
Copyright © Open University Malaysia (OUM)
164  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
(e)
Tangible Outcomes
In portfolio assessments, the portfolios give parents and teachers concrete
examples of studentsÊ development over time as well as their current skills
and abilities. The assessment outcomes are also tangible and more
meaningful than the numeric statistical results. For example, through the
compositions written and compiled by a student in the portfolio, parents and
teachers cannot only see how he or she has mastered the writing skills but
also how he or she has improved over time.
(f)
Individualised
Portfolio assessments are individualised, meaning that studentsÊ portfolios
will be assessed separately. This allows teachers to see their students as
individuals, each with his or her own unique characteristics, needs and
strengths. With this understanding, teachers can adapt their instruction to
the learning needs and styles of the students.
7.2.5
Disadvantages of Portfolio Assessment
There are also disadvantages in portfolio assessment.
(a)
Time Consuming
Extra time is needed to plan an assessment system as the assessment involves
multiple learning outcomes. The problem is further compounded if a large
group of students need to be assessed. The portfolio outputs can be varied
and require expert judgement. Thus, assessing portfolios is time consuming
for teachers and the data from portfolios will be difficult to analyse.
(b)
Need for Constant Feedback and Mentoring
Moreover, for portfolio assessments to be beneficial to students, teachers
need to provide constructive feedback on the work included in the studentsÊ
portfolios and on the portfolios as a whole. Teachers also need to provide
guidance through portfolio conferences about how best to construct a
portfolio for a specific purpose. Scheduling individual portfolio conferences
is difficult and the length of each conference may interfere with other
instructional activities. There is thus a need to ensure that the benefits of
portfolio assessments justify the investment of time by the teachers.
(c)
Poor Reliability
Scoring portfolios involves extensive use of subjective evaluation procedures
and thus open to the question of reliability. Assessment is useless if data
obtained is unreliable. According to Nitko (2001), the reliability of portfolio
results is typically in the 0.4 to 0.6 range. This indicates that as much as
60 per cent of the variability in portfolio scores is the results of measurement
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  165
error. This should give all educators a reason to be cautious when using the
results of portfolio assessments in assigning course grades, certifying
achievement or making high-stake decisions. Part of the poor reliability
comes from the difficulty of establishing clear scoring criteria for the large
and diverse sets of materials that are included. Poor reliability is also due to
a lack of standardisation that leads to incomparability of portfolio entries that
different students choose to include.
(d)
Energy, Skills and Resources
Society is still strongly oriented towards grades and test scores and in
addition, most universities and colleges still use test scores and grades as the
main admission criteria. Parents who are exam-oriented cannot see how
keeping portfolios is related to the overall assessment of learning. They are
thus usually not supportive of portfolio assessments and consider them a
waste of student learning time and resources. To them, the best way to assess
learning is through tests and examinations.
An important aspect of portfolio assessments is reflection. In fact, this is the
most important step in the portfolio process. As mentioned earlier in
subtopic 7.2.3, it is reflection that differentiates the portfolio from a mere
collection of student work. Reflection is often done in writing but it can also
be done orally. They will be asked to reflect on what they are learning and
why. To be able to do this, students must possess good metacognitive skills
to think about what they are thinking. This can be a problem to students,
especially those who are underachievers.
In summary, portfolio assessments have significant strengths and weaknesses. On
the positive side, they provide a broad framework for examining a studentÊs
progress, encourage student participation in the assessment process and
strengthen the relationship between instruction and assessment. On the down
side, they demand considerable time, energy and a certain degree of expertise on
the part of teachers as well as the students, and have questionable reliability.
7.2.6
How and When Should Portfolios be Assessed?
If the purpose of the assessment is to demonstrate progress, the teacher could make
judgements about the evidence of progress and provide those judgements as
feedback to the student. The student could self-assess progress to check whether
the goals have been met or not.
Copyright © Open University Malaysia (OUM)
166  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
The portfolio is more than just a collection of student work. The teacher may assess
and assign grades to the process of assembling and reflecting upon the portfolio of
a studentÊs work. The students might have also included reflections on growth, on
strengths and weaknesses, on goals that were or are to be set, on why certain
samples tell a certain story about them or on why the contents reflect sufficient
progress to indicate completion of designated standards. Some of the process skills
may also be part of the teacherÊs or schoolÊs or districtÊs standards. So, the portfolio
provides some evidence of attainment of those standards. Any or all of these
elements can be evaluated and/or graded.
The portfolio assignments can be assessed or graded with a rubric. Rubric is useful
in avoiding personal judgement in assessing a complex product such as a portfolio.
Clear criteria for assessment, including what must be included in the portfolio and
rubrics are vital to a successful portfolio assessment. Rubric can provide some
clarity and consistency in assessing and judging the quality of the content and the
elements making up that content. Moreover, application of a rubric increases the
likelihood of consistency among the teachers who are assessing the portfolios.
Table 7.5 is a sample portfolio rubric that may be used for self-assessment and peer
feedback.
Table 7.5: Portfolio Rubric
Criteria
Unsatisfactory
Emerging
Proficient
Exemplary
Selection of
artefacts
The artefacts
and work
samples do
not relate to
the purpose of
the portfolio.
Some of the
artefacts and
work
samples are
related to the
purpose of
the portfolio.
Most artefacts
and work
samples are
related to the
purpose of
the portfolio.
All artefacts
and work
samples are
clearly and
directly
related to the
purpose of the
portfolio. A
wide variety
of artefacts is
included.
Descriptive
text
No artefacts
are
accompanied
by a caption
that clearly
explains the
importance of
the item
including title,
author and
date.
Some of the
artefacts are
accompanied
by a caption
that clearly
explains the
importance
of the item
including
title, author
and date.
Most of the
artefacts are
accompanied
by a caption
that clearly
explains the
importance of
the item work
including
title, author
and date.
All artefacts
are
accompanied
by a caption
that clearly
explains the
importance of
the item
including title,
author and
date.
Copyright © Open University Malaysia (OUM)
Rating
TOPIC 7
Reflection
Citations
PROJECT AND PORTFOLIO ASSESSMENTS  167
The
reflections
do not
explain
growth or
include
goals for
continued
learning.
A few of the
reflections
explain
growth and
include goals
for continued
learning.
Most of the
reflections
explain
growth and
include
goals for
continued
learning.
All reflections
clearly explain
how the artefacts
demonstrate
studentsÊ growth,
competencies,
accomplishments
and include goals
for continued
learning.
The
reflections
do not
illustrate the
ability to
effectively
critique
work or
provide
suggestions
for
constructive
practical
alternatives.
A few
reflections
illustrate the
ability to
effectively
critique work
and provide
suggestions
for
constructive
practical
alternatives.
Most of the
reflections
illustrate the
ability to
effectively
critique
work and
provide
suggestions
for
constructive
practical
alternatives.
All reflections
illustrate the
ability to
effectively
critique work and
provide
suggestions for
constructive
practical
alternatives.
No images,
media or
text created
by others are
cited with
accurate,
properly
formatted
citations.
Some of the
images,
media or
texts created
by others are
not cited
with
accurate,
properly
formatted
citations.
Most
images,
media or
text created
by others
are cited
with
accurate,
properly
formatted
citations.
All images,
media or text
created by others
are cited with
accurate,
properly
formatted
citations.
Copyright © Open University Malaysia (OUM)
168  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
Usability
and layout
Writing
convention
The portfolio
is difficult to
read due to
inappropriate
use of fonts,
type size for
headings,
subheadings
and text and
font styles
(italic, bold,
underline).
The portfolio
is often
difficult to
read due to
inappropriate
use of fonts
and type size
for headings,
subheadings,
text or long
paragraphs.
The portfolio
is generally
easy to read.
Fonts and
type size vary
appropriately
for headings,
subheadings
and text.
The portfolio
is easy to
read. Fonts
and type size
vary
appropriately
for headings,
subheadings
and text.
Many
formatting
tools are
under or overutilised and
decrease the
reader
accessibility to
the content.
Some
formatting
tools are
under or overutilised and
decrease the
readersÊ
accessibility to
the content.
Use of font
styles (italic,
bold,
underline) is
generally
consistent.
Use of font
styles is
consistent
and improves
readability.
There are
more than six
errors in
grammar,
capitalisation,
punctuation
and spelling
requiring
major editing
and revision.
There are four
or more errors
in grammar,
capitalisation,
punctuation,
and spelling
requiring
editing and
revision.
There are a
few errors in
grammar,
capitalisation,
punctuation
and spelling.
These require
minor editing
and revision.
There are no
errors in
grammar,
capitalisation,
punctuation
and spelling.
TOTAL
Source: Vandervelde (2018)
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  169
SELF-CHECK 7.6
1.
Describe four main steps to develop a portfolio. Which do you
think is the most important step?
2.
Examine to what extent portfolio assessments are useful as an
assessment tool.
3.
Justify how and when portfolios should be assessed.
ACTIVITY 7.6
Discuss in the myINSPIRE online forum:
(a)
To what extent is portfolio assessment used in Malaysian
classrooms?
(b)
Do you think portfolio assessment can be used as an assessment
technique in your subject area? Justify your answer.

A project is an activity in which time constraints have been largely removed. It
can be undertaken individually or by a group, and usually involves a
significant element of work done at home or out of school.

A research-based project is more theoretical in nature and may consist of
putting a question, formulating a problem or setting up some hypotheses.

A product-based project would be the production of a concrete object, a
service, a dance performance, a film, an exhibition, play, a computer
programme and so forth.

Project work is a learning experience which enables the development of certain
knowledge, skills and attitudes which prepares students for lifelong learning
and the challenges ahead: knowledge application, collaboration,
communication and independent learning.
Copyright © Open University Malaysia (OUM)
170  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS

An effective project should contain the following elements: situation or
problem, project description and purpose, performance specifications, rules,
roles of member and assessment.

The Six AÊs of a project comprises academic rigour, applied learning,
authenticity, active exploration, adult relationships and assessment practices.

Working in groups has become an accepted part of learning as a consequence
of widely recognised benefits of collaborative group work for student learning.

Allocating marks in a project work: include shared group marks, shared-out
marks, individual mark, individual mark (examination) and combination of
group average and individual mark.

A portfolio is a purposeful collection of the works produced by students which
reflects their efforts, progress and achievements in different areas of the
curriculum.

Teachers need to know the benefits and weaknesses of portfolios and use them
to help in studentÊ learning.

A portfolio is a purposeful collection of the work produced by students which
reflects their efforts, progress and achievements in different areas of the
curriculum.

The portfolio provides for continuous and ongoing assessment (i.e. formative
assessment) as well as assessment at the end of a semester or a year (i.e.
summative assessment).

The portfolio assignments can be assessed or graded with a rubric.

As a formative assessment tool, student portfolios can be used by teachers as
informal diagnostic techniques or feedback.
Copyright © Open University Malaysia (OUM)
TOPIC 7
PROJECT AND PORTFOLIO ASSESSMENTS  171
Artefacts
Product-oriented portfolio
Formative assessment
Project assessment
Group work
Research-based projects
Peer evaluation
Rubrics
Portfolio assessment
Self-assessment
Portfolios
Six AÊs effective projects
Process-oriented portfolio
Summative assessment
Product-based project
Bonthron, S., & Gordon, R. (1999). Service learning and assessment: A field
guide for teachers. Evaluation/Reflection. Paper 45. Retrieved from
http://digitalcommons.unomaha.edu/
Bottoms, G., & Webb, L. D. (1998). Connecting the curriculum to „real life.‰
Breaking ranks: Making it happen [Guide ă Non-classroom]. Reston, VA:
National Association of Secondary School Principals.
Bryson, E. (1994). Will a project approach to learning provide children
opportunities to do purposeful reading and writing, as well as provide
opportunities for authentic learning in other curriculum areas? Descriptive
report. ERIC Document No. ED392513.
Chard, S. C. (1992). The project approach: A practical guide for teachers.
Edmonton, Canada: University of Alberta Printing Services.
Edwards, K. M. (2000). EveryoneÊs guide to successful project planning: Tools for
youth. Portland, OR: Northwest Regional Educational Laboratory.
Epstein,
A.
(2006).
Introduction
http://www.teachervision.com
to
portfolios.
Retrieved
Copyright © Open University Malaysia (OUM)
from
172  TOPIC 7 PROJECT AND PORTFOLIO ASSESSMENTS
Harwell, S., & Blank, W. (1997). Connecting high school with the real world. ERIC
Document No. ED407586.
Herman, J. L., Aschbacher, P. R., & Winters, L. (1992). A practical guide to
alternative assessment. Alexandria, VA: Association for Supervision and
Curriculum Development.
Katz, L. G., & Chard, S. C. (1989). Engaging the minds of young children: The
project approach. Norwood, NJ: Ablex.
Nitko, A. J. (2001). Educational assessment of students. New Jersey, NJ: Pearson.
Steinberg, A. (1998). Real learning, real work: School-to-work as high school
reform. New York, NY: Routledge.
Sutherland, M. (2003). Peer evaluation checklist for the Biotechnology Academy at
Andrew P. Hill High School. San Jose, CA: East Side Union High School
District.
Paulson F. L., Paulson, P. R., & Meyer, C. (1991). What makes a portfolio a
portfolio? Educational Leadership, 48(1), 60ă63.
Vandervelde, J. (2018). Eportfolio (digital portfolio) rubric. Retrieved from
http://www2.uwstout.edu
Copyright © Open University Malaysia (OUM)
Topic  Reliability and
8
Validity of
Assessment
Techniques
LEARNING OUTCOMES
By the end of the topic, you should be able to:

1.
Explain the concept of a true score and reliability coefficient;
2.
Apply the different methods of estimating the reliability of a test;
3.
Compare the different techniques of establishing the validity of a test;
4.
Identify the factors affecting reliability and validity; and
5.
Discuss the relationship between reliability and validity.
INTRODUCTION
We have discussed the various methods of assessing student performance using
objective tests, essay tests, authentic assessments, project assessments and
portfolio assessment. In this topic, we will address two important issues, namely,
the reliability and validity of these assessment methods. How do we ensure that
the techniques we use for assessing the knowledge, skills and values of students
are reliable and valid? We are making important decisions about the abilities and
capabilities of the future generation and obviously we want to ensure that we are
making the right decisions.
Copyright © Open University Malaysia (OUM)
174  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
8.1
WHAT IS RELIABILITY?
What is reliability?
Reliability is the consistency of the measurement.
Let say you gave a Geometry test to a group of Form Five students and one of your
students named Swee Leong obtained a score of 66 per cent in the test. How sure
are you that it is actually the score that Swee Leong should receive? Is that his true
score? When you develop a test and administer it to your students, you are
attempting to measure as far as possible the true score of each student. The true
score is a hypothetical concept with regard to the actual ability, competency and
capacity of an individual. A test attempts to measure the true score of a person.
When measuring human abilities, it is practically impossible to develop an errorfree test and there will be error. However, just because there is an error it does not
mean that the test is not good; what is more important is the size of the error.
Formally, an observed test score, X, is conceived as the sum of a true score, T and
an error term, E. The true score is defined as the average of test scores if a test is
repeatedly administered to a student (and the student can be made to forget the
content of the test in-between repeated administrations). Given that the true score
is defined as the average of the observed scores, so in each administration of a test,
the observed score departs from the true score and the difference is called
measurement error. This statement can be simplified as follows:
Observed score (X) = True score (T) + Error (E)
This departure is not caused by blatant mistakes made by the test writers, but it is
caused by some chance elements in studentsÊ performance on a test. Measurement
error mostly comes from the fact that we have only sampled a small portion of a
studentÊs capabilities. Ambiguous questions and incorrect markings can
contribute to measurement error but it is only a small part of it. Imagine if there
are 10,000 items and a student can obtain 60 per cent if all 10,000 items are
administered (which is not practically feasible). Then, 60 per cent is the true score.
Now, assume that you sample only 40 items to put in a test. The expected score for
the student is 24 items. However, the student may get 20, 26, 30 and so on
depending on which items are in the test. This is the main source of measurement
error. That is, measurement error is due to the sampling of items, rather than
poorly written items.
Copyright © Open University Malaysia (OUM)
TOPIC 8
RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES  175
Generally, the smaller the error, the greater the likelihood that you are closer to
measuring the true score of a student. If you are confident that your Geometry test
(observed score) has a small error, then you can confidently infer that Swee
LeongÊs score of 66 per cent is close to his true score or his actual ability in solving
geometry problems; i.e. what he actually knows. To reduce the error in a test, you
must ensure that your test is both reliable and valid. The higher the reliability and
validity of your test, the greater the likelihood that you will be measuring the true
score of your students. We will first examine the reliability of a test.
Would your students get the same scores if they took your test on two different
occasions? Would they get approximately the same scores if they took two
different forms of your test? These questions have to do with the consistency of
your classroom tests in measuring studentsÊ abilities, skills and attitudes or values.
The generic name for consistency is reliability. Reliability is an essential
characteristic of a good test because if a test does not measure consistently
(reliably), then you cannot count on the scores resulting from the administration
of the test.
8.2
THE RELIABILITY COEFFICIENT
Reliability is quantified as a reliability coefficient. The symbol used to denote a
reliability coefficient is r with two identical subscripts (for example, rxx). The
reliability coefficient is generally defined as the variance of the true score divided
by the variance of the observed score. The following is the equation.
rxx 
 2 true score
Variance of the true score
 2
Variance of the observed score  observed score
Copyright © Open University Malaysia (OUM)
176  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
If there is relatively little error, the ratio of the true score variance to the observed
score variance approaches a reliability coefficient of 1.00 which is perfect
reliability. If there is a relatively large amount of error, the ratio of the true score
variance to the observed score variance approaches 0.00, which is total
unreliability (refer to Figure 8.1):
Figure 8.1: Reliability coefficient
High reliability means that the questions of a test tended to „pull together‰.
Students who answered a given question correctly were more likely to answer
other questions correctly as well. If an equivalent or parallel test was developed
by using similar items, the relative scores of students would show little change.
Meanwhile, low reliability indicates that the questions tended to be unrelated to
each other in terms of who answered them correctly. The resulting test scores
reflect that something is wrong with either the items or the testing situation rather
than the studentsÊ knowledge of the subject matter. The following guidelines may
be used to interpret reliability coefficients for classroom tests as shown in
Table 8.1.
Table 8.1: Interpretation of Reliability Coefficients
Reliability
Interpretation
0.90 and above
Excellent reliability (comparable to the best standardised tests).
0.80ă0.90
Very good for a classroom test.
0.70ă0.80
Good for a classroom test but there are probably a few items which
could be improved.
0.60ă0.70
Somewhat low. There are probably some items which could be
removed or improved.
0.50ă0.60
The test needs to be revised.
0.50 and below
Questionable reliability and the test should be replaced or needs major
revision.
Copyright © Open University Malaysia (OUM)
TOPIC 8
RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES  177
If you know the reliability coefficient of a test, can you estimate the true score of
a student on a test? In testing, we use the standard error of measurement to
estimate the true score.
The standard error of measurement = Standard deviation  1  r
Note: „r‰ is the reliability of the test.
Using the normal curve, you can estimate a studentÊs true score with some degree
of certainty based on the observed score and standard error of measurement.
Example 8.1:
You gave a History test to group of 40 students. Khairul obtained a score of 75 in
the test, which is his observed score. The standard deviation of your test is 2.0.
Earlier, you had established that your History test had a reliability coefficient of
0.7. You are interested in finding out KhairulÊs true score.
The standard error of measurement
= Standard deviation  1  r
= 2.0 1  0.7 =2.0  0.55 = 1.1
Therefore, based on the normal distribution curve (refer to Figure 8.2), KhairulÊs
true score should be:
(a)
Between 75 ă 1.1 and 75 + 1.1 or between 73.9 and 76.1 for 68 per cent of the
time.
(b)
Between 75 ă 2.2 and 75 + 2.2 or between 72.8 and 77.1 for 95 per cent of the
time.
(c)
Between 75 ă 3.3 and 75 + 3.3 or between 71.7 and 78.3 for 99 per cent of the
time.
Copyright © Open University Malaysia (OUM)
178  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
Figure 8.2: Determining KhairulÊs true score based on a normal distribution
SELF-CHECK 8.1
1.
Define the reliability of a test.
2.
What does the reliability coefficient indicate?
3.
Explain the concept of a true score.
ACTIVITY 8.1
Shalin obtains a score of 70 in a Biology test. Given the reliability of the
test is 0.65 and the standard deviation of the test is 1.5. The teacher was
planning to select students who had scored 70 and above to take part in
a Biology competition. The teacher was not sure whether he should select
Shalin since there could be an error in her score. Should he select Shalin?
Why?
Post your answer on the myINSPIRE online forum.
(Use the standard error of measurement.)
Copyright © Open University Malaysia (OUM)
TOPIC 8
8.3
RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES  179
METHODS TO ESTIMATE THE RELIABILITY
OF A TEST
Let us now discuss how we estimate the reliability of a test. Figure 8.3 lists three
common methods of estimating the reliability of a test. It is not possible to calculate
reliability exactly and so we have to estimate reliability.
Figure 8.3: Methods for estimating reliability
These three methods are further explained as follows:
(a)
Test-retest
Using the test-retest technique, the same test is administered again to the
same group of students. The scores obtained in the first administration of the
test are correlated to the scores obtained on the second administration of the
test. If the correlation between the two scores is high, then the test can be
considered to have high reliability. However, a test-retest situation is
somewhat difficult to conduct as it is unlikely that students will be prepared
to take the same test twice.
There is also the effect of practice and memory that may influence the
correlation. The shorter the time gap, the higher the correlation; the longer
the time gap, the lower the correlation. This is because the two observations
are related over time. Since this correlation is the test-retest estimate of
reliability, you can obtain considerably different estimates depending on the
interval.
(b)
Parallel or Equivalent Forms
For this technique, two equivalent tests (or forms) are administered to the
same group of students. The two tests are not similar but are equivalent. In
other words, they may have different questions but they are measuring the
same knowledge, skills or attitudes. Therefore, you have two sets of scores
which are correlated and reliability can be established. Unlike the test-retest
technique, the parallel or equivalent forms reliability measure is not affected
Copyright © Open University Malaysia (OUM)
180  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
by the influence of memory. One major problem with this approach is that
you have to be able to generate a lot of items that reflect the same construct.
This is often not an easy feat.
(c)
Internal Consistency
Internal consistency is determined using only one test administered once to
the students. Internal consistency refers to how the individual items or
questions behave in relation to each other and the overall test. In effect, we
judge the reliability of the instrument by estimating how well the items that
reflect the same construct yield similar results. We are looking at how
consistent the results are for different items for the same construct within the
measure. The following are two common internal consistency measures that
can be used.
(i)
Split-half
To solve the problem of having to administer the same test twice, the
split-half technique is used. In this technique, a test is administered
once to a group of students. The test is divided into two equal halves
after the students have completed the test. This technique is most
appropriate for tests which include multiple-choice items, true-false
items and perhaps short-answer essays. The items are selected based
on odd-even method whereby one half of the test consists of odd
numbered items, while the other half consists of even numbered items.
Then, the scores obtained for the two halves are correlated to determine
the reliability of the whole test using the Spearman-Brown correlation
coefficient.
rsb 
2rxy
1  rxy 
In this formula, rsb is the split-half reliability coefficient and rxy
represents the correlation between the two halves. Say for example, you
have established that the correlation coefficient between the two halves
is 0.65. What is the reliability of the whole test?
rsb 
2rxy
1  rxy 

2  0.65 
1  0.65

1.3
 0.79
1.65
Copyright © Open University Malaysia (OUM)
TOPIC 8
(ii)
RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES  181
CronbachÊs Alpha
CronbachÊs coefficient alpha can be used for both binary-type
(1 = correct, 0 = incorrect or 1 = true and 0 = false) and scale items
(1 = strongly agree, 2 = agree, 3 = disagree, 4 = strongly disagree).
Reliability is estimated by computing the correlation between the
individual questions and the extent to which individual questions
correlate with the total test. This is meant by internal consistency. The
key is „internal‰, unlike test-retest and parallel or equivalent form that
require another test as an external reference. The stronger the items are
inter-related, the more likely the test is consistent. The higher the alpha,
the more reliable is the test. There is no generally agreed cut-off point.
Usually, 0.7 and above is acceptable (Nunnally, 1978). The formula for
CronbachÊs alpha is as follows:
k


pi 1  pi  


k  i 1

CronbachÊs alpha () 
1
2

K 1
 x




where,

k is the number of items in the test;

pi refers to item difficulty which is the proportion of students who
answered the item i correctly; and

2x is the sample variance for the total score.
Example 8.2:
Suppose that in a multiple-choice test consisting of five items or
questions, the following difficulty index for each item was observed:
p1 = 0.4, p2= 0.5, p3 = 0.6, p4 = 0.75 and p5 = 0.85. Sample variance
(2x) = 1.84. CronbachÊs alpha would be calculated as follows:

5  1.045 
1 
  0.54
5  1  1.840 
Copyright © Open University Malaysia (OUM)
182  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
Professionally developed standardised tests should have an internal
consistency coefficient of at least 0.85. High reliability coefficients are
required for standardised tests because they are administered only
once and the score on that one test is used to draw conclusions about
each studentÊs ability level on the construct measured. Perhaps, the
closest to a standardised test in the Malaysian context would be the
tests for different subjects conducted at the national level in the PMR
and SPM.
According to Wells and Wollack (2003), it is acceptable for classroom
tests to have reliability coefficients of 0.70 and higher because a
studentÊs score on any one test does not determine the studentÊs entire
grade in the subject or course. Usually, grades are based on several
other measures such as project work, oral presentations, practical tests,
class participation and so forth. To what extent is this true in the context
of the Malaysian classroom?
A Word of Caution!
When you get a low alpha, you should be careful not to immediately conclude that
the test is a bad test. Instead, you should check to determine if the test measures
several attributes or dimensions rather than one attribute or dimension. If it does,
there is the likelihood for the CronbachÊs alpha to be deflated.
For example, an aptitude test may measure three attributes or dimensions such as
quantitative ability, language ability and analytical ability. Hence, it is not
surprising that the CronbachÊs alpha for the whole test may be low as the questions
may not correlate with each other. Why? This is because the items are measuring
three different types of human abilities. The solution is to compute three different
CronbachÊs alphas ă one for quantitative ability, one for language ability and one
for analytical ability.
SELF-CHECK 8.2
1.
What is the main advantage of the split-half technique over the testretest technique in determining the reliability of a test?
2.
Explain the parallel or equivalent form technique in determining
the reliability of a test.
3.
Explain the concept of internal consistency reliability.
Copyright © Open University Malaysia (OUM)
TOPIC 8
8.4
RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES  183
INTER-RATER AND INTRA-RATER
RELIABILITY
Whenever you use humans as a part of your measurement procedure, you have to
be concerned whether the results you get are reliable or consistent. People are
notorious for their inconsistency. We are easily distracted. We get tired of doing
repetitive tasks. We daydream. We misinterpret. Therefore, how do we determine
whether:
(a)
Two observers are being consistent in their observations?
(b)
Two examiners are being consistent in their marking of an essay?
(c)
Two examiners are being consistent in their marking of a project?
Let us analyse these problems from the perspectives of:
(a)
Inter-rater Reliability
When two or more people mark essay questions, the extent to which there is
agreement in the marks allotted is called inter-rater reliability (refer to
Figure 8.4). The greater the agreement, the higher is the inter-rater reliability.
Figure 8.4: Examiner A versus Examiner B
Copyright © Open University Malaysia (OUM)
184  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
Inter-rater reliability can be low because of the following reasons:
(i)
Examiners are subconsciously being influenced by knowledge of the
students whose scripts are being marked;
(ii)
Consistency in marking is affected after marking a set of either very
good or very weak scripts;
(iii) When there is an interruption during the marking of a batch of scripts,
different standards may be applied after the break; and
(iv) The marking scheme is poorly developed resulting in examiners
making their own interpretations of the answers.
Inter-rater reliability can be enhanced if the criteria for marking or the
marking scheme:
(i)
Contain suggested answers related to the question;
(ii)
Have made provision for acceptable alternative answers;
(iii) Allocate appropriate time for the work required;
(iv) Are sufficiently broken down to allow the marking to be as objective as
possible and the totalling of marks is correct; and
(v)
(b)
Allocate marks according to the degree of difficulty of the question.
Intra-rater Reliability
While inter-rater reliability involves two or more individuals, intra-rater
reliability refers to the consistency of grading by a single rater. Scores on a
test are rated by a single rater at different times. When we grade tests at
different times, we may become inconsistent in our grading for various
reasons. For example, some papers that are graded during the day may get
our full attention, while others that are graded towards the end of the day
may be very quickly glossed over. Similarly, changes in our mood may affect
the grading of papers. In these situations, the lack of consistency can affect
intra-reliability in the grading of student answers.
SELF-CHECK 8.3
List the steps that may be taken to enhance inter-rater reliability in the
grading of essay answer scripts.
Copyright © Open University Malaysia (OUM)
TOPIC 8
RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES  185
ACTIVITY 8.2
In the myINSPIRE online forum, suggest other steps you would take to
enhance intra-rater reliability in the grading of projects.
8.5
TYPES OF VALIDITY
Validity is often defined as the extent to which a test measures what it was
designed to measure (Nuttall, 1987). While reliability relates to the consistency of
the test, validity relates to the relevancy of the test. If it does not measure what it
sets out to measure, then its use is misleading and the interpretation based on the
test is not valid or relevant. For example, if a test that is supposed to measure the
„spelling ability of eight-year-old children‰ does not measure „spelling ability‰,
then the test is not a valid test. It would be disastrous if you make claims about
what a student can or cannot do based on a test that is actually measuring
something else. It is for this reason that many educators argue that validity is the
most important aspect of a test.
However, validity will vary from test to test depending on what it is used for. For
example, a test may have high validity in testing the recall of facts in economics
but that same test maybe low in validity with regard to testing the application of
concepts in economics.
Messick (1989) was most concerned about the inferences a teacher draws from the
test score, the interpretation the teacher makes about his or her students and the
consequences from such inferences and interpretation. You can imagine the power
an educator holds in his or her hand when designing a test. Your test could
determine the future of many thousands of students. Inferences based on a test of
low validity could give a completely different picture of the actual abilities and
competencies of students.
Three types of validity have been identified: construct validity, content validity
and criterion-related validity which is made up of predictive and concurrent
validity (refer to Figure 8.5).
Copyright © Open University Malaysia (OUM)
186  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
Figure 8.5: Types of validity
These various types of validity are further explained as follows:
(a)
Construct Validity
Construct validity relates to whether the test is an adequate measure of the
underlying construct. A construct could be any phenomenon such as
mathematics achievement, map skills, reading comprehension, attitude
towards school, inductive reasoning, environmental awareness, spelling
ability and so forth. You might think of construct validity as the correct
„labelling‰ of something. For example, when you measure what you term as
„critical thinking‰, is that what you are really measuring?
Thus, to ensure high construct validity, you must be clear about the
definition of the construct you intend to measure. For example, a construct
such as reading comprehension would include vocabulary development,
reading for literal meaning and reading for inferential meaning. Some
experts in educational measurement have argued that construct validity is
the most critical type of validity. You could establish the construct validity
of an instrument by correlating it with another test that measures the same
construct. For example, you could compare the scores obtained on your
reading comprehension test with the scores obtained on another well-known
reading comprehension test administered to the same sample of students. If
the scores for the two tests are highly correlated, then you may conclude that
your reading comprehension test has high construct validity.
A construct is determined by referring to theory. For example, if you are
interested in measuring the construct „self-esteem‰, you need to be clear
what self-esteem is. Perhaps, you need to refer to various literature in the
field describing the attributes of self-esteem. You find that theoretically, selfesteem is made of the following attributes: physical self-esteem, academic
self-esteem and social self-esteem. Based on this theoretical perspective, you
can build items or questions to measure self-esteem covering these three
types of self-esteem. Through such a process, you are more certain to ensure
high construct validity.
Copyright © Open University Malaysia (OUM)
TOPIC 8
(b)
RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES  187
Content Validity
Content validity is more straightforward and likely to be related to construct
validity. It concerns the coverage of appropriate and necessary content, i.e.
does the test cover the skills necessary for good performance or all the aspects
of the subject taught? It is concerned with sample-population
representativeness, i.e. the facts, concepts and principles covered by the test
items should be representative of the larger domain (e.g. syllabus) of facts,
concepts and principles.
For example, the science unit on „energy and forces‰ may include facts,
concepts, principles and skills on light, sound, heat, magnetism and
electricity. However, it is difficult, if not impossible, to administer a 2 to
3 hours paper to test all aspects of the syllabus on „energy and forces‰ (refer
to Figure 8.6).
Figure 8.6: Sample of content tested for the unit on „energy and forces‰
Therefore, only selected facts, concepts, principles and skills from the
syllabus (or domain) are sampled. The content selected will be determined
by content experts who will judge the relevance of the content in the test to
the content in the syllabus or a particular domain.
Content validity will be low if the questions in the test include questions
testing content not included in the domain or syllabus. To ensure content
validity and coverage, most teachers use the table of specifications (as
discussed in Topic 3). Table 8.2 is an example of a table of specifications
which specifies the knowledge and skills to be measured and the topics
covered for the unit on „energy and forces‰.
Copyright © Open University Malaysia (OUM)
188  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
Table 8.2: Table of Specifications for the Unit on „Energy and Forces‰
Understanding of
Concept
Application of
Concepts
Total
Light
7
4
11 (22%)
Sound
7
4
11 (22%)
Heat
7
4
11 (22%)
Magnetism
3
3
6 (11%)
Electricity
8
3
11 (22%)
32 (64%)
18 (36%)
50 (100%)
Topics
TOTAL
Since you cannot measure all the content of a topic, you will have to focus on
the key areas and give due weighting to those areas that are important. For
example, the teacher has decided that 64 per cent of questions will emphasise
the understanding of concepts, while the remaining 36 per cent will focus on
the application of concepts for the five topics. A table of specifications
provides the teachers with evidence that a test has high content validity, that
it covers what should be covered.
Content validity is different from face validity, which refers not to what the
test actually measures, but to what it superficially appears to measure. Face
validity assesses whether the test „looks valid‰ to the examinees who take it,
the administrative personnel who decide on its use and other technically
untrained observers. The face validity is a weak measure of validity but that
does not mean that it is incorrect, only that caution is necessary. Its
importance however cannot be underestimated.
(c)
Criterion-related Validity
Criterion-related validity of a test is established by relating the scores
obtained to some other criterion or the scores of other tests. There are two
types of criterion-related validity:
(i)
Predictive validity relates to whether the test predicts accurately some
future performance or ability. Is STPM a good predictor of performance
in a university? One difficulty in calculating the predictive validity of
STPM is because only those who pass the exam will go on to university
(generally speaking) and we do not know how well students who did
not pass might have done. Also, only a small proportion of the
population takes the STPM and the correlation between STPM grades
and performance at the degree level would be quite high.
Copyright © Open University Malaysia (OUM)
TOPIC 8
(ii)
8.6
RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES  189
Concurrent validity is concerned with whether the test correlates with,
or gives substantially the same results as, another test of the same skill.
For example, does your end-of-year language test correlate with the
Malaysian University English Test (MUET)? In other words, if your
language test correlates highly with MUET, then your language test has
high concurrent validity.
FACTORS AFFECTING RELIABILITY AND
VALIDITY
To prepare tests which are acceptably valid and reliable, the following factors
should be taken into account:
(a)
Construction of Test Items
The quality of test items has a significant effect on the validity and reliability
of a test. If the test items are poorly constructed, ambiguous and open to
different interpretations, the reliability of the test will be affected because the
test results will not reflect the true abilities of the students being assessed. If
the items do not assess the right content and do not match the intended
learning outcomes, then the test is not measuring what it is supposed to
measure; thus, affecting the test validity.
(b)
Length of the Test
Generally, the longer the test, the more reliable and valid the test is. A short
test would not adequately cover a yearÊs work. The syllabus needs to be
sampled. The test should consist of enough questions that are representative
of the knowledge, skills and competencies in the syllabus. However, there is
also a problem with tests that are too long. A lengthy test maybe valid, but it
will take too much time and fatigue may set in which may affect the
performance and the reliability of the test.
(c)
Selection of Topics
The topics selected and the test questions prepared should reflect the way
the topics were treated during teaching and learning. It is necessary to be
clear about the learning outcomes and to design items that measure these
learning outcomes. For example, in your teaching, students were not given
an opportunity to think critically and solve problems. However, your test
consists of items requiring students to think critically and solve problems. In
such a situation, the reliability and validity of the test will be affected. The
test is not reliable because it will not produce consistent results. It is also not
Copyright © Open University Malaysia (OUM)
190  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
valid because the test does not measure the right intended learning
outcomes. There is no constructive alignment between instruction and
assessment.
(d)
Choice of Testing Techniques
The testing techniques selected will also affect reliability and validity. For
example, if you choose to use essay questions, validity may be high but
reliability may be low. Essay questions tend to have validity because they are
capable of assessing simple and complex learning outcomes, but tend to be
less reliable because of the subjective manner studentsÊ responses are scored.
On the other hand, if a test chooses objective test items such as MCQs, truefalse questions, matching questions and short-answer questions are selected,
the reliability of the test can be high because the scoring of studentsÊ
responses are not influenced by the subjective judgement of the assessors.
The validity, however, can be low because not all intended learning
outcomes can be appropriately assessed by objective test items alone. For
instance, multiple-choice questions are not suitable in assessing learning
outcomes that require students to organise ideas.
(e)
Method of Test Administration
Test administration is also an important step in the measurement process.
This includes the arrangement of items in a test, the monitoring of test taking
and the preparation of data files from the test booklets. Poor test
administration procedures can lead to problems in the data collected and
affect the validity of the test results. For instance, if the results of the students
taking a test are not accurately recorded, the test scores have become invalid.
Adequate time must also be allowed for the majority of students to finish the
test. This would reduce wild guessing and instead encourage students to
think carefully about the answer. Instructions need to be clear to reduce the
effects of confusion on reliability and validity. The physical conditions under
which the test is taken must be favourable for the students. There must be
adequate space and lighting, and the temperature must be conducive.
Students must be able to work independently and the possibility of
distractions in the form of movement and noise must be minimised. If such
measures are not taken, studentsÊ performance may be affected because they
are handicapped in demonstrating their true abilities.
(f)
Method of Marking
The marking should be as objective as possible. Marking which depends on
the exercise of human judgement ă such as in essays, projects and portfolios
ă is subject to the variations of human fallibility (refer to inter-rater reliability
discussed earlier). Besides, poorly designed or inappropriate marking
schemes can affect validity. For example, if an essay test is intended to assess
Copyright © Open University Malaysia (OUM)
TOPIC 8
RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES  191
studentsÊ ability to discuss an issue but a checklist is used to assess content
(knowledge), the validity of the test is questionable. If unqualified or
incompetent examiners are engaged to mark responses to essay questions,
they will not be consistent in their scoring, thus affecting test reliability. It is
quite easy to mark objective items quickly, but it is also surprisingly easy to
make careless errors. This is especially true where large numbers of scripts
are being marked. A system of checks is strongly advised. One method is
through the comments of the students themselves when their marked papers
are returned to them.
8.7
RELATIONSHIP BETWEEN RELIABILITY
AND VALIDITY
Some people may think of reliability and validity as two separate concepts. In
reality, reliability and validity are related. Figure 8.7 shows the analogy.
Figure 8.7: Graphical representations of the relationship between reliability and validity
The centre or the bullÊs-eye is the concept that we are trying to measure. Say, for
example, in trying to measure the concept of „inductive reasoning‰, you are likely
to hit the centre (or the bullÊs-eye) if your inductive reasoning test is both reliable
and valid, which is what all test developers aim to achieve (refer to Figure 8.6d).
On the other hand, your inductive reasoning test can be „reliable but not valid‰.
How is that possible? Your test may not measure inductive reasoning but the
scores you obtain each time you administer the test is approximately the same
(refer to Figure 8.6b). In other words, the test is consistently and systematically
measuring the wrong construct (i.e. inductive reasoning). Imagine the
consequences of making judgement about the inductive reasoning of students
using such a test!
Copyright © Open University Malaysia (OUM)
192  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
However, in the context of psychological testing, if an instrument does not have
satisfactory reliability, one typically cannot claim validity. That is, validity requires
that instruments are sufficiently reliable. So, Figure 8.6c does not have high
validity even though the target is hit twice. The validity is low and it does not have
reliability because the hits are not concentrated. In other words, you are not getting
a valid estimate of the inductive reasoning ability of your students and they are
inconsistent.
The worst-case scenario is when the test is neither reliable nor valid (refer to
Figure 8.6a). In this scenario, the scores obtained by students tend to concentrate
at the top and left of the target and they are consistently missing the centre target.
Your measure in this case is neither reliable nor valid and the test should be
rejected or improved.

The true score is a hypothetical concept with regard to the actual ability,
competency and capacity of an individual.

The higher the reliability and validity of your test, the greater the likelihood
that you will be measuring the true scores of your students.

Reliability refers to the consistency of a measure. A test is considered reliable
if we get the same result repeatedly.

Validity requires that instruments are sufficiently reliable.

Face validity is a weak measure of validity.

Using the test-retest technique, the same test is administered again to the same
group of students.

For the parallel or equivalent forms technique, two equivalent tests (or forms)
are administered to the same group of students.

Internal consistency is determined using only one test administered once to the
students.

When two or more people mark essay questions, the extent to which there is
agreement in the marks allotted is called inter-rater reliability.

While inter-rater reliability involves two or more individuals, intra-rater
reliability is the consistency of grading by a single rater.
Copyright © Open University Malaysia (OUM)
TOPIC 8
RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES  193

Validity is the extent to which a test measures what it claims to measure. It is
vital for a test to be valid in order for the results to be accurately applied and
interpreted.

Construct validity relates to whether the test is an adequate measure of the
underlying construct.

Content validity is more straightforward and likely to be related to construct
validity; it is related to the coverage of appropriate and necessary content.

Some people may think of reliability and validity as two separate concepts. In
reality, reliability and validity are related.
Construct
Reliability and validity relationship
Content and face
Reliable and not valid
Criterion relate
Test-retest
Internal consistency
True score
Parallel-form
Valid and reliable
Predictive
Validity
Reliability
Deale, R. N. (1975). Assessment and testing in secondary school. Chicago, IL:
Evans Bros.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement
(3rd ed.). New York, NY: Macmillan.
Nunnally, J. (1978). Psychometric methods. New York, NY: McGraw-Hill.
Copyright © Open University Malaysia (OUM)
194  TOPIC 8 RELIABILITY AND VALIDITY OF ASSESSMENT TECHNIQUES
Nuttall, D. L. (1987). The validity assessment. European Journal of Psychology
Education, 2(2), 109ă118.
Wells, C. S., & Wollack, J. A. (2003). An instructorÊs guide to understanding test
reliability. Retrieved from https://testing.wisc.edu/Reliability.pdf
Copyright © Open University Malaysia (OUM)
Topic  Item Analysis
9
LEARNING OUTCOMES
By the end of the topic, you should be able to:

1.
Describe what item analysis is and the steps in item analysis;
2.
Calculate the difficulty index and discrimination index;
3.
Apply item analysis on essay-type question;
4.
Discuss the relationship between the difficulty index and
discrimination index of an item;
5.
Do distractor analysis; and
6.
Explain the role of an item bank in the development of tests.
INTRODUCTION
When you develop a test, it is important to identify the strengths and weaknesses
of each item. To determine how well items in a test perform, some statistical
procedures need to be used.
In this topic, we will discuss item analysis which involves the use of three
procedures: item difficulty, item discrimination and distractor analysis to help the
test developer decide whether the items in a test can be accepted or should be
modified, or rejected. These procedures are quite straightforward and easy to use,
and the educator needs to understand the logic underlying the analyses in order
to use them properly and effectively.
Copyright © Open University Malaysia (OUM)
196  TOPIC 9 ITEM ANALYSIS
9.1
WHAT IS ITEM ANALYSIS?
After having administered a test and marked it, most teachers would discuss the
answers with their students. Discussion would usually focus on the right answers
and the common errors made by students. Some teachers may focus on the
questions most students performed poorly on and the questions they did very
well.
However, there is much more information available about a test that is often
ignored by teachers. This information will only be available if the item analysis is
done. What is item analysis?
Item analysis is a process which examines the responses to individual test
items or questions in order to assess the quality of those items and the test as
a whole.
Item analysis is especially valuable in improving items or questions that will be
used again in later tests, but it can also be used to eliminate ambiguous or
misleading items in a single test administration.
Specifically, in classical test theory (CTT) the statistics produced from analysing
the test results based on test scores include measures of difficulty index and
discrimination index. Analysing the effectiveness of distractors also becomes part
of the process (which we will discuss in detail later in the topic).
The quality of a test is determined by the quality of each item or question in the
test. The teacher who constructs a test can only roughly estimate the quality of a
test. This estimate is based on the fact that the teacher has followed all the rules
and conditions of test construction.
However, it is possible that this estimation may not be accurate and certain
important aspects have been ignored. Hence, it is suggested that to obtain a more
comprehensive understanding of the test, item analysis should be conducted on
the responses of students. Item analysis is done to obtain information about
individual items or questions in a test and how the test can be further improved.
It also facilitates the development of an item or question bank which can be used
in the construction of a test.
Copyright © Open University Malaysia (OUM)
TOPIC 9
9.2
ITEM ANALYSIS  197
STEPS IN ITEM ANALYSIS
Both CTT and the „modern‰ test theory such as item response theory (IRT) provide
useful statistics to help us analyse the test data. For many item analysis, CTT is
sufficient to provide the information we need. CTT will be used in this module.
Let us take an example of a teacher who has administered a 30-item multiplechoice objective test in geography to 45 students in a secondary school classroom.
Step 1
Upon receiving the answer sheet, the first step would be to mark each of the
answer sheets.
Step 2
Arrange the 45 answer sheets from the highest score obtained to the lowest score
obtained. The paper with the highest score is on top and the paper with the lowest
score is at the bottom.
Step 3
Multiply 45 (the number of answer sheets) with 0.27 (or 27 per cent) which is 12.15
and round it up to 12. The use of the value 0.27 or 27 per cent is not inflexible. It is
possible to use any percentage from 27 to 35 per cent as the value. However, the
27 per cent rule can be ignored if the class size is too small. Instead of taking the
27 per cent sample, divide the number of answer sheets by two.
Step 4
Arrange the pile of 45 answer sheets according to the scores obtained (from the
highest score to the lowest score). Take out 12 answer sheets from the top of the
pile and 12 answer sheets from the bottom of the pile. Call these two piles „high
marks‰ students and „low marks‰ students respectively. Set aside the middle
group of papers (21 papers). Although these could be included in the analysis,
using only the high and low groups will simplify the procedure.
Step 5
Refer to Question 1 (refer to Figure 9.1), then:
(a)
Count the number of students from the „high marks‰ group who selected
each of the options (A, B, C or D); and
(b)
Count the number of students from the „low marks‰ group who selected the
option A, B, C or D.
Copyright © Open University Malaysia (OUM)
198  TOPIC 9 ITEM ANALYSIS
Figure 9.1: Item analysis for one item or question
From the analysis, 11 students from the „high marks‰ group and two students
from the „low marks‰ group selected „B‰ which is the correct answer. This means
that 13 out of the 24 students selected the correct answer. Also, note that all the
distractors (A, C and D) were selected by at least one student. However, the
information provided in Figure 9.1 is insufficient and further analysis has to be
conducted.
SELF-CHECK 9.1
1.
Define item analysis.
2.
Describe the five steps of item analysis.
Copyright © Open University Malaysia (OUM)
TOPIC 9
9.3
ITEM ANALYSIS  199
DIFFICULTY INDEX
Using the information provided in Figure 9.1, you can compute the difficulty index
which is a quantitative indicator with regard to the difficulty level of an individual
item or question. It can be calculated using the following formula:
Difficulty index 

Number of students with the correct answer (R)
Total number of students who attempted the question (T)
R 13

 0.54
T 24
What does a difficulty index (p) of 0.54 mean? The difficulty index is a coefficient
that shows the percentage of students who got the correct answer compared with
the total number of students who attempted the question. In other words,
54 per cent of students selected the right answer. Although our computation is
based on the high and low scoring groups only, it provides a close approximation
of the estimate that would be obtained with the total group. Thus, it is proper to
say that the index of difficulty for this item is 54 per cent (for this particular group).
Note that since „difficulty‰ refers to the percentage getting the item right, the
smaller the percentage figure the more difficult the item. The meaning of the
difficulty index is shown in Figure 9.2.
Figure 9.2: Interpretation of the difficulty index (p)
If a teacher believes that the achievement 0.54 on the item is too low, he or she can
change the way he or she teaches the item to better meet the objective represented
by it. Another interpretation might be that the item was too difficult or confusing
or invalid, in which case the teacher can replace or modify the item, perhaps using
information from the itemÊs discrimination index or distractor analysis.
Under CTT, the item difficulty measure is simply the proportion that is correct for
an item. For an item with a maximum score of two, there is a slight modification
to the computation of proportion of percentage correct.
Copyright © Open University Malaysia (OUM)
200  TOPIC 9 ITEM ANALYSIS
This item has a possible partial credit scoring 0, 1, 2. If the total number of students
attempting this item is 100, and 23 students scored 0, 60 students scored 1 and
17 students scored 2, then a simple calculation will show that 23 per cent of the
students scored 0, 60 per cent of the students scored 1, and 17 per cent of the
students scored 2 for this particular item. The average score for this item should
be 0  0.23 + 1  0.6 + 2  0.17 = 0.94.
Thus, the observed average score of this item is 0.94 out of a maximum of 2. So the
average proportion correct is 0.94/2 = 0.47 or 47 per cent.
ACTIVITY 9.1
A teacher gave a 20-item Science test to a group of 35 students. The
correct answer for Question #20 is „C‰ and the results are as follows:
Options
A
B
C
D
Blank
High marks group (n = 12)
0
2
8
2
0
Low marks group (n = 12)
2
4
3
2
1
(a)
Calculate the difficulty index (p) for Question #20.
(b)
Is Question #20 an easy or difficult question?
(c)
Do you think you need to improve Question #20? Why?
Post your answers on the myINSPIRE online forum.
9.4
DISCRIMINATION INDEX
Discrimination index is a basic measure which shows the extent to which a
question discriminates or differentiates between students in the „high marks‰
group and „low marks‰ group. This index can be interpreted as an indication of
the extent to which overall knowledge of the content area or mastery of the skills
is related to the response on an item. Most crucial for a test item is that whether a
student answered a question correctly or not is due to his/her level of knowledge
or ability and not due to something else such as chance or test bias.
Copyright © Open University Malaysia (OUM)
TOPIC 9
ITEM ANALYSIS  201
In our example in subtopic 9.2, 11 students in the high group and two students in
the low group selected the correct answer. This indicates positive discrimination,
since the item differentiates between students in the same way that the total test
score does. That is, students with high scores on the test (high group) got the item
right more frequently than students with low scores on the test (low group).
Although analysis by inspection maybe all that is necessary for most purposes, an
index of discrimination can be easily computed using the following formula:
Rh  RL
1 T
2
where Rh = Number of students in „high marks‰ group (Rh) with the correct
answer
Discrimination index 
RL = Number of students in „low marks‰ group (RL) with the correct
answer
T
= Total number of students
Example 9.1:
A test was given to a group of 43 students and 10 out of the 13 „high marks‰ group
got the correct answer compared to five out of the 13 „low marks‰ group who got
the correct answer. The discrimination index is computed as follows:

R h  R L 10  5 10  5


 0.38
1 T
1  26 
13
2
2
What does a discrimination index of 0.38 mean? The discrimination index
is a coefficient that shows the extent to which the question discriminates or
differentiates between „high marks‰ students and „low marks‰ students.
Blood and Budd (1972) provide the guidelines on the meaning of
the discrimination index as follows (refer to Figure 9.3).
Copyright © Open University Malaysia (OUM)
202  TOPIC 9 ITEM ANALYSIS
Figure 9.3: Interpretation of the discrimination index
Source: Blood and Budd (1972)
A question that has a high discrimination index is able to differentiate between
students who know and those who do not know the answer. When we say that a
question has a low discrimination index, it is not able to differentiate between
students who know and students who do not know. A low discrimination index
means that more „low marks‰ students got the correct answer because the
question was too simple. It could also indicate that students from both the „high
marks‰ group and „low marks‰ group got the answer wrong because the question
was too difficult.
The formula for the discrimination index is such that if more students in the „high
marks‰ group chose the correct answer than students did in the low scoring group,
the number will be positive. At a minimum, one would hope for a positive value,
as that would indicate that it is knowledge of the question that resulted in the
correct answer. The greater the positive value (the closer it is to 1.0), the stronger
the relationship is between overall test performance and performance on that item.
If the discrimination index is negative, that means that for some reason, students
who scored low on the test were more likely to get the answer correct. This is a
strange situation which suggests poor validity for an item.
Copyright © Open University Malaysia (OUM)
TOPIC 9
ITEM ANALYSIS  203
APPLICATION OF ITEM ANALYSIS ON
ESSAY-TYPE QUESTIONS
9.5
The previous subtopics explain the use of item analysis on multiple-choice
questions. Item analysis can also be applied on essay-type questions. This subtopic
will illustrate how this can be done. For ease of understanding, the illustration will
use a short-answer essay question as an example.
Let us assume that a group of 20 students have responded to a short-answer essay
question with scores ranging from the minimum of 0 to the maximum of 4.
Table 9.1 provides the scores obtained by the students.
Table 9.1: Scores Obtained by Students for a Short-answer Essay Question
Item Score
No. of Students Earning Each
Score
Total Scores Earned
4
5
20
3
6
18
2
5
10
1
3
3
0
1
0
Total
Average Score
51
51/20 = 2.55
The difficulty index (p) of the item can be computed using the following formula:
p
Average score
Possible range of score
Using the information from Table 9.1, the difficulty index of the short-answer essay
question can be easily computed. The average score obtained by the group of
students is 2.55, while the possible range of score for the item is (4 ă 0) = 4. Thus,
p
2.55
4
 0.64
Copyright © Open University Malaysia (OUM)
204  TOPIC 9 ITEM ANALYSIS
The difficulty index (p) of 0.64 means that on average, students have received
64 per cent of the maximum possible score of the item. The difficulty index can
be interpreted the same as that of the multiple-choice question discussed in
subtopic 9.3. The item is of a moderate level of difficulty (refer to Figure 9.2).
Note that in computing the difficulty index in the previous example, the scores of
the whole group are used to obtain the average score. However, for a large group
of students, it is possible to estimate the difficulty index for an item based on only
a sample of students comprising the high marks and low marks groups as in the
case of computing the difficulty index of a multiple-choice question.
To compute the discrimination index (D) of an essay-type question, the following
formula is suggested by Nitko (2004):
D
Difference between upper and lower groups' average score
Possible range of score
Using the information from Table 9.1 but presenting it in the following format as
in Table 9.2, we can compute the discrimination index of the short-answer essay
question.
Table 9.2: Distribution of Scores Obtained by Students
0
1
2
3
4
Total
Average
Score
High marks group (n = 10)
0
0
1
4
5
34
3.4
Low marks group (n = 10)
1
3
4
2
0
17
1.7
Score
Note: n refers to the number of students.
The average score obtained by the upper group of students is 3.4 while that of the
lower group is 1.7. Using the formula as suggested by Nitko (2004), we can
compute the discrimination index of the short-answer essay question as follows:
D
3.4  1.7
4
 0.43
Copyright © Open University Malaysia (OUM)
TOPIC 9
ITEM ANALYSIS  205
The discrimination index (D) of 0.43 indicates that the short-answer question does
discriminate between the upper and lower groups of students and at a high level
(refer to Figure 9.3.) As in the computation of the discrimination index of the
multiple-choice question for a large group of students, a sample of students
comprising the top 27 per cent and the bottom 27 per cent may be used to provide
a good estimate.
The following are two possible reasons for poorly discriminating items:
(a)
The item tests something else compared to the majority of items in the test;
or
(b)
The item is poorly written and confuses the students.
Thus, when examining the low discriminating item, it is advisable to check
whether:
(a)
The wording and format of the item are problematic; and
(b)
The item may be testing a different thing than that intended for the test.
Copyright © Open University Malaysia (OUM)
206  TOPIC 9 ITEM ANALYSIS
ACTIVITY 9.2
1.
The following is the performance of students in the high marks and
the low marks groups in a short-answer essay question.
Score
0
1
2
3
4
High marks group (n = 10)
2
2
3
1
2
Low marks group (n = 10)
3
2
2
3
0
(a)
Calculate the difficulty index.
(b)
Calculate the discrimination index.
Discuss the findings on the myINSPIRE online forum.
2.
A teacher gave a 35-item Economics test to 42 students. For
Question 16; 8 out of the 11 from the high marks groups got the
correct answer compared with 4 out of 11 from the low marks
group who got the correct answer.
(a)
Calculate the discrimination index for Question 16.
(b)
Does Question 16 have a high or low discrimination index?
Post your answers on the myINSPIRE online forum.
9.6
RELATIONSHIP BETWEEN DIFFICULTY
INDEX AND DISCRIMINATION INDEX
Theoretically, the more difficult or easier a question (or item) is, the lower will the
discrimination index be. Stanley and Hopkins (1972) provided a theoretical model
to explain the relationship between the difficulty index and discrimination index
of a particular question or item (refer to Figure 9.4).
Copyright © Open University Malaysia (OUM)
TOPIC 9
ITEM ANALYSIS  207
Figure 9.4: Theoretical relationship between difficulty index and discrimination index
Source: Stanley and Hopkins (1972)
According to the model, a difficulty index of 0.2 can result in a discrimination
index of about 0.3 for a particular item (which may be described as an item of
„moderate discrimination‰). Note that as the difficulty index increases from 0.1 to
0.5, the discrimination index increases even more. When the difficulty index
reaches 0.5 (described as an item of „moderate difficulty‰), the discrimination
index is positive 1.00 (very high discrimination). Interestingly, a difficulty index of
more than 0.5 leads to a decrease in the discrimination index.
For example, a difficulty index of 0.9 results in a discrimination index of about 0.2,
is described as an item of low to moderate discrimination. What does this mean?
The more difficult a question, the harder it is for that question or item to
discriminate between those students who know and those who do not know the
answer to the question.
Copyright © Open University Malaysia (OUM)
208  TOPIC 9 ITEM ANALYSIS
Similarly, when the difficulty index is about 0.1, the discrimination index drops to
about 0.2. What does this mean? The easier a question, the harder it is for that
question or item to discriminate between those students who know and those who
do not know the answer to the question.
ACTIVITY 9.3
1.
What can you conclude about the relationship between the difficulty
index of an item and its discrimination index?
2.
Do you take these factors into consideration when giving an
objective test to students in your school? Justify.
Share your answers with your coursemates in the myINSPIRE online
forum.
9.7
DISTRACTOR ANALYSIS
In addition to examining the performance of an entire test item, teachers are also
interested in examining the performance of individual distractors (incorrect
answer options) on multiple-choice items. By calculating the proportion of
students who chose each answer option, teachers can identify which distractors
are „working‰ and appear attractive to students who do not know the correct
answer, and which distractors are simply taking up space and not being chosen by
many students. To eliminate blind guessing which results in a correct answer
purely by chance (which hurts the validity of a test item), teachers want as many
plausible distractors as is feasible. Analyses of response options allow teachers to
fine-tune and improve items they may wish to use again with future classes. Let
us examine performance on an item or question (refer to Figure 9.5).
Figure 9.5: Effectiveness of distractors
Copyright © Open University Malaysia (OUM)
TOPIC 9
ITEM ANALYSIS  209
Generally, a good distractor is able to attract more „low marks‰ students to select
that particular response or distract „high marks‰ students towards selecting that
particular response. What determines the effectiveness of distractors? Figure 9.5
shows you how 24 students selected the options A, B, C and D for a particular
question. Option B is a less effective distractor because many „high marks‰
students (n = 5) selected option B. Option D is a relatively good distractor because
two students from the „high marks‰ group and five students from the „low marks‰
group selected this option. The analysis of response options shows that those who
missed the item were about equally likely to choose answer B and answer D. No
students chose answer C, meaning it does not act as a distractor. Students were not
choosing between four answer options on this item, they were really choosing
between only three options, as they were not even considering answer C. This
makes guessing correctly more likely, which hurts the validity of the item. The
discrimination index can be improved by modifying and improving options B
and C.
ACTIVITY 9.4
Which British resident was killed by Maharajalela in Pasir Salak?
Hugh Low Birch
Options
A
B
High marks (n = 15)
4
7
Low marks (n = 15)
6
3
Brooke Gurney
C
D No Response
0
4
0
2
4
0
The answer is B.
Analyse the effectiveness of the distractors. Discuss your answer with
your coursemates on the myINSPIRE online forum.
Copyright © Open University Malaysia (OUM)
210  TOPIC 9 ITEM ANALYSIS
9.8
PRACTICAL APPROACH IN ITEM ANALYSIS
Some teachers may find the techniques discussed earlier as time consuming and
this fact cannot be denied especially when you have a test consisting of 40 items.
However, there is a more practical approach which may take less time. Imagine
that you have administered a 40-item test to a class of 30 students. It will surely
take a lot of time to analyse the effectiveness of each item and this may discourage
teachers from analysing each item in a test. Here is a method that shows you how
to do so:
Step 1
Arrange the 30 answer sheets from the highest score obtained to the lowest score
obtained.
Step 2
Select the answer sheet that obtained a middle score. Group all answer sheets
above this score as „high marks‰ (mark an „H‰ on these answer sheets). Group all
answer sheets below this score as „low marks‰ group (mark an „L‰ on these
answer sheets).
Step 3
Divide the class into two groups (high and low) and distribute the „high‰ answer
sheets to the high group and the low answer sheet to the low group. Assign one
student in each group to be the counter.
Step 4
The teacher then asks the class, „The answer for Question #1 is „C‰ and those who
got it correct, raise your hand.
Counter from „H‰ group: „Fourteen for group H‰
Counter from „L‰ group: „Eight from group L‰
Step 5
The teacher records the responses on the whiteboard as follows:
Question #1
Question #2
Question #3
|
|
Question #n
High
14
12
16
Low
8
6
7
Total of Correct Answers
22
18
23
n
n
n
Copyright © Open University Malaysia (OUM)
TOPIC 9
ITEM ANALYSIS  211
Step 6
Calculate the difficulty index for Question #1 as follows:
Difficulty index 
R H  R L 14  8

 0.73
30
30
Step 7
Compute the discrimination index for Question #1 as follows:
Discrimination index 
R H  R L 14  8 6


 0.40
1 30
15
15
2
Note that earlier, we took 27 per cent of answer sheets in the „high marks‰ group
and 27 per cent of answer sheets in the „low marks‰ group from the total answer
sheets. However, in this approach we divided the total answer sheets into two
groups. There is no middle group. The important thing is to use a large enough
fraction of the group to provide useful information. Selecting the top and bottom
27 per cent of the group is recommended for a more refined analysis. This method
may be less accurate but it is a „quick and dirty‰ method.
ACTIVITY 9.5
Compare the difficulty index and discrimination index obtained using
this rough method with the theoretical model by Stanley and Hopkins
(1972) in Figure 9.4. Are the indexes very far out?
Share your answer with your coursemates in the myINSPIRE online
forum.
Copyright © Open University Malaysia (OUM)
212  TOPIC 9 ITEM ANALYSIS
9.9
USEFULNESS OF ITEM ANALYSIS TO
TEACHERS
After each test or assessment, it is advisable to carry out item analysis of the test
items because the information from the analysis would be useful to teachers.
Among the benefits they can get from the analysis are as follows:
(a)
From the discussion in the earlier subtopics, it is obvious that the results of
item analysis could provide answers to the following questions:
(i)
Did the item function as intended?
(ii)
Were the items of appropriate difficulty?
(iii) Were the items free from irrelevant clues and other defects?
(iv) Was each of the distracters effective (in multiple-choice questions)?
Answers to the previous questions can be used to select or revise test items
for future use. This would improve the quality of test items and the test paper
to be used in future. It also saves teachersÊ time in preparing the test items
for future use because good items can be stored in the item bank.
(b)
Item analysis data can provide a basis for efficient class discussion of the test
results. Knowing how effectively each test item functions in measuring the
achievement of the intended learning outcome and how students perform in
each item, teachers can have a more fruitful discussion with the students as
feedback based on the item analysis that is more objective and informative.
For example, teachers can highlight the misinformation or misunderstanding
reflected in the choice of particular distracters on multiple-choice questions
or frequently repeated errors on essay-type questions, thereby enhancing the
instructional value of assessment. If, during the discussion, the item analysis
reveals that there are technical defects in the items or the marking scheme,
studentsÊ marks can also be rectified to ensure a fairer test.
(c)
Item analysis data can be used for remedial work. The analysis will reveal
the specific areas that the students are weak in. Teachers can use the
information to focus remedial work directly on the particular areas of
weakness. For example, based on the distracter analysis, it is found that a
specific distracter has a low discrimination with a great number of students
from both the high marks and low marks groups choosing the option. This
could suggest that there is some misunderstanding of a particular concept.
Remedial lessons can thus be planned to arrest the problem.
Copyright © Open University Malaysia (OUM)
TOPIC 9
ITEM ANALYSIS  213
(d)
Item analysis data can reveal weaknesses in teaching and provide useful
information to improve teaching. For example, despite the fact that an item
is properly constructed, it has a low difficulty index, suggesting that most
students fail to answer the item satisfactorily. This might suggest that the
students have not mastered a particular syllabus content that is being
assessed. This could be due to the weakness in instruction and thus
necessitates the implementation of more effective teaching strategies by the
teachers. Furthermore, if the item is repeatedly difficult for the items, there
might be a need to revise the curriculum.
(e)
Item analysis procedures provide a basis for teachers to improve their skills
in test construction. As teachers analyse studentsÊ responses to items, they
become aware of the defects of the items and what causes them. When
revising the items, they gain experience in rewording the statements so that
they are clear, rewriting the distracters so that they are more plausible and
modifying the items so that they are at a more appropriate level of difficulty.
As a consequence, teachers improve their test construction skills.
9.10
CAUTION IN INTERPRETING ITEM
ANALYSIS RESULTS
Despite the usefulness of item analysis, the results from such an analysis are
limited in many ways and must be interpreted cautiously. The following are some
of the major precautions to observe:
(a)
Item discriminating power does not indicate item validity. A high
discrimination index merely indicates that students from the high marks
group perform relatively better than the students from the low marks group.
The division of the high and low marks groups is based on the total test score
obtained by each student, which is an internal criterion. By using the internal
criterion of total test score, item analysis offers evidence concerning the
internal consistency of the test rather than its validity. The validity of a test
needs to be judged by an external criterion, that is, to what extent the test
assesses the learning outcomes intended.
(b)
The discrimination index is not always an indicator of item quality. For
example, a low index of discriminating power does not necessarily indicate
a defective item. If an item does not discriminate but it has been found to be
free from ambiguity and other technical defects, the item should be retained,
especially in a criterion-referenced test. In such a test, a non-discriminating
item may suggest that all students have achieved the criterion set by the
teacher. As such, the item does not discriminate between the good and poor
Copyright © Open University Malaysia (OUM)
214  TOPIC 9 ITEM ANALYSIS
students. Another possible reason why low discrimination occurs for an item
is that the item may be very easy or very difficult. Sometimes, this item,
however, is necessary or desirable to be retained in order to measure a
representative sample of learning outcomes and course content. Moreover,
an achievement test is usually designed to measure several different types of
learning outcomes (knowledge, comprehension, application and so on). In
such a case, there will be learning outcomes that are assessed by fewer test
items and these items will have low discrimination because they have less
representation in the total test score. Removing these items from the test is
not advisable as it will affect the validity of the test.
(c)
This traditional item analysis data is tentative. It is not fixed but influenced
by the type and number of students being tested and the instructional
procedures employed. The data would thus change with every
administration of the same test items. So, if repeated use of items is possible,
item analysis should be carried for each administration of each item. The
tentative nature of item analysis should therefore be taken seriously and the
results are interpreted cautiously.
9.11
ITEM BANK
What is an item bank?
An item bank is a large collection of easily accessible questions or items that
have been administered over a period of time.
For achievement tests which assess performance in a body of knowledge such as
Geography, History, Chemistry or Mathematics, the questions that can be asked
are rather limited. Hence, it is not surprising that previous questions are
„recycled‰ with some minor changes and administered to a different group of
students. Making good test items is not a simple task and can be time consuming
for teachers. Hence, an item or question bank would be of great assistance to
teachers.
An item bank consists of questions that have been analysed and stored because
they are good items. Each stored item will have information on its difficulty index
and discrimination index. Each item is stored according to what it measures,
especially in relation to the topics of the curriculum. These items will be stored in
the form of a table of specifications indicating the content being measured as well
as the cognitive levels measured. For example, you will be able to draw from the
Copyright © Open University Malaysia (OUM)
TOPIC 9
ITEM ANALYSIS  215
item bank items measuring the application of concepts for the topic on
„electricity‰. You will also be able to draw items from the bank with different
difficulty levels. Perhaps, you want to arrange easier questions at the beginning of
the test so as to build confidence in students and then gradually introduce
questions of increasing difficulty.
With computerised databases, item banks are easy to access. Teachers will have at
their disposal hundreds of items from which they can draw upon when
developing classroom tests. This would certainly help them with the tedious and
time-consuming task of having to construct items or questions from scratch.
Unfortunately, not many educational institutions are equipped with such an item
bank. The more common practice is for teachers to select items or questions from
commercially prepared workbooks, past examination papers and sample items
from textbooks. These sources do not have information about the difficulty index
and discrimination index of items, nor information about the cognitive levels of
questions or what they aim to measure. Teachers will have to figure out for
themselves the characteristics of the items based on their experience in teaching
the content.
However, there are certain issues to consider in setting up a question bank. One of
the major concerns of the bank is how to place different test items collected
overtime on a common scale. The scale should indicate difficulty of the items, one
scale per subject matter. Retrieval of items from the bank is made easy when all
items are placed on the same scale.
The person in charge must also take every effort to add only quality items to the
item pool. To develop and maintain a good item bank requires a great deal of
preparation, planning, expertise and organisation. Though item response theory
(IRT) approach is not a panacea for the item banking problems, it can solve many
of these issues (IRT will be explained further in the next subtopic).
Copyright © Open University Malaysia (OUM)
216  TOPIC 9 ITEM ANALYSIS
9.12
PSYCHOMETRIC SOFTWARE
Software designed for general statistical analysis such as SPSS can often be used
for certain types of psychometric analysis. However, there are many software
available in the market specially to analyse the data from tests.
Classical test theory or CTT is an approach to psychometric analysis that has
weaker assumptions than item response theory and is more applicable to smaller
sample sizes. Under CTT, the studentÊs raw test score would be the sum of the
scores received on the item in the test. For example, Iteman is a commercial
software program while TAP is a free program for classical analysis.
Item response theory (IRT) is a psychometric approach which assumes that the
probability of a certain response is a direct function of an underlying trait or traits.
Under IRT, the concern is whether the student obtained each item correctly or not,
rather than the raw test score. The basic concept of IRT is about the individual item
of test rather than about the test scores. Student trait or ability and item
characteristics are referenced to the same scale. For example, ConQuest is a
computer program for item response and latent regression models and TAM is a
R package for item response models.
ACTIVITY 9.6
In the myINSPIRE forum, discuss:
(a)
To what extent do Malaysian schools have item banks?
(b)
Do you think teachers should have access to computerised item
banks? Justify.

Item analysis is a process which examines the responses to individual test
items or questions in order to assess the quality of those items and the test as a
whole.

Item analysis is conducted to obtain information about individual items or
questions in a test and how the test can be improved.
Copyright © Open University Malaysia (OUM)
TOPIC 9
ITEM ANALYSIS  217

The difficulty index is a quantitative indicator with regard to the difficulty
level of an individual item or question.

The discrimination index is a basic measure which shows the extent to which
a question discriminates or differentiates between students in the "high marks"
group and "low marks" group.

Theoretically, the more difficult a question (or item) or easier the question (or
item) is, the lower will the discrimination index be.

By calculating the proportion of students who chose each answer option,
teachers can identify which distractors are "working" and appear attractive to
students who do not know the correct answer, and which distractors are
simply taking up space and not being chosen by any student.

Generally, a good distractor is able to attract more "low marks" students to
select that particular response or distract "high marks" students towards
selecting that particular response.

An item bank is a collection of questions or items that have been administered
over a period of time.

There are many psychometric software programs to help expedite the tedious
calculation process.
Computerised data bank
Good distractor
Difficult question
High marks group
Difficulty index
Item analysis
Discrimination index
Item bank
Distractor analysis
Low marks group
Easy question
Copyright © Open University Malaysia (OUM)
218  TOPIC 9 ITEM ANALYSIS
Blood, D. F., & Budd, W. C. (1972). Educational measurement and evaluation.
Manhattan, NY: Harper and Row.
Nitko, A. J. (2004). Educational assessments of students. Englewood Cliffs, NJ:
Prentice Hall.
Stanley, G., & Hopkins, D. (1972). Introduction to educational measurement and
testing. Boston, MA: Macmillan.
Copyright © Open University Malaysia (OUM)
Topic  Analysis of
10
Test Scores
LEARNING OUTCOMES
By the end of the topic, you should be able to:

1.
Differentiate between descriptive and inferential statistics;
2.
Calculate various central tendency measures;
3.
Explain the use of standard scores;
4.
Calculate Z-score and T-score;
5.
Describe the characteristics of the normal curve; and
6.
Explain the role of norms in standardised tests.
INTRODUCTION
Do you know that all the data you have collected on the performance of students
have to be analysed? In this final topic, we will focus on the analysis and
interpretation of the data you have collected about the knowledge, skills and
attitudes of your students. Information you have collected about your students can
be analysed and interpreted quantitatively and qualitatively. For the quantitative
analysis of data, various statistical tools are used. For example, statistics are used
to show the distribution of scores on a Geography test and the average score
obtained by a group of students.
Copyright © Open University Malaysia (OUM)
220  TOPIC 10 ANALYSIS OF TEST SCORES
10.1
WHY USE STATISTICS?
When you give a Geography test to your class of 40 students at the end of the
semester, you get a score for each student which is a measurement of a sample of
the studentÊs ability. The behaviour tested could be the ability to solve problems
in Geography such as reading maps, the globe and interpretation of graphs. For
example, student A gets a score of 64 while student B gets 32. Does this mean that
the ability of student A is better than that of student B? Does it mean that the ability
of student A is twice the ability of student B? Are the scores 64 and 32 percentages?
These scores or marks are difficult to interpret because they are raw scores. Raw
scores can be confusing if there is no reference made to a „unit‰. So, it is only logical
that you convert the score to a unit such as percentages. In this example, you get
64 per cent and 32 per cent.
Even the use of percentages may not be meaningful. For example, getting
64 per cent in the test may be considered „good‰ if the test was a difficult one. On
the other hand, if the test was easy, then 64 per cent may be considered to be only
„average‰. In other words, to get a more accurate picture of the scores obtained by
students on the test, the teacher should find out:
(a)
Which student obtained the highest marks in the class and the number of
questions correctly answered;
(b)
Which student obtained the lowest marks in the class and the number of
questions correctly answered; and
(c)
The number of questions correctly answered by all students in the class.
This illustrates that the marks obtained by students in a test should be carefully
examined. It is not enough to just report the marks obtained. More information
should be given about the marks obtained and to do this you have to use statistics.
Some teachers may be afraid of statistics while others may regard it as too time
consuming. In fact, many of us often use statistics without being aware of it. For
example, when we talk about average rainfall, per capita income, interest rates and
percentage increase in our daily lives, we are using the language of statistics. What
is statistics?
Statistics is a mathematical science pertaining to the analysis, interpretation
and presentation of data.
Copyright © Open University Malaysia (OUM)
TOPIC 10
ANALYSIS OF TEST SCORES  221
It is applicable to a wide variety of academic disciplines from the physical and
social sciences to the humanities. Statistics have been widely used by researchers
in education and by classroom teachers. In applying statistics in education, one
begins with a population to be studied. This could be all Form Two students in
Malaysia which number about 450,000 or all secondary school teachers in the
country.
For practical reasons, rather than compiling data about an entire population, we
usually select or draw a subset of the population called a sample. In other words,
the 40 Form Two students that you teach is a sample of the population of Form
Two students in the country. The data you collect about the students in your class
can be subjected to statistical analysis, which serves two related purposes, namely,
descriptive and inference.
(a)
Descriptive Statistics
You use these statistical techniques to describe how your students
performed. For example, you use descriptive statistics techniques to
summarise data in a useful way either numerically or graphically. The aim is
to present the data collected so that it can be understood by teachers, school
administrators, parents, the community and the Ministry of Education. The
common descriptive techniques used are the mean or average and standard
deviation. Data may also be presented graphically using various kinds of
charts and graphs.
(b)
Inferential Statistics
You use inferential statistical techniques when you want to infer about the
population based on your sample. You use inferential statistics when you
want to find out the differences between groups of students, the relationship
between variables or when you want to make predictions about student
performance. For example, you want to find out whether the boys did better
than the girls or whether there is a relationship between performance in
coursework and the final examination. The inferential statistics often used
are the t-test, ANOVA and linear regression.
Copyright © Open University Malaysia (OUM)
222  TOPIC 10 ANALYSIS OF TEST SCORES
10.2
DESCRIBING TEST SCORES
Let us assume that you have just given a test on Bahasa Melayu to a class of
35 students in Form One. After marking the scripts, you have a set of scores for
each of the students in the class and you want to find out more about how your
students performed. Figure 10.1 shows you the distribution of the score obtained
by students in the test.
Figure 10.1: The distribution of Bahasa Melayu marks
The „frequency‰ column shows how many students scored for each mark shown
and the percentage is shown in the „percentage‰ column. You can describe these
scores using two types of measures, namely, central tendency and dispersion.
Copyright © Open University Malaysia (OUM)
TOPIC 10
10.2.1
ANALYSIS OF TEST SCORES  223
Central Tendency
The term „central tendency‰ refers to the „middle‰ value and is measured using
the mean, median and mode. It is an indication of the location of the scores. Each
of these three measures is calculated differently, and which one to use will depend
on the situation and what you want to show.
(a)
Mean
The mean is the most commonly used measure of central tendency. When
we talk about an „average‰, we usually refer to the mean. The mean is simply
the sum of all the values (marks) divided by the total number of items
(students) in the set. The result is referred to as the arithmetic mean. Using
the data from Figure 10.1 and applying the formula given, you can calculate
the mean.
Mean 
 x  35  40  41  ............... 75  1863  53.23
N
35
35
(b)
Median
The median is determined by sorting the score obtained from lowest to
highest values and taking the score that is in the middle of the sequence. For
the example in Figure 10.1, the median is 52. There are 17 students with
scores less than 52 and 17 students whose scores are greater than 52. If there
is an even number of students, there will not be a single point at the middle.
So, you calculate the median by taking the mean of the two middle points i.e.
divide the sum of the two scores by 2.
(c)
Mode
The mode is the most frequently occurring score in the data set. Which number
appears most often in your data set? In Figure 10.1, the mode is 57 because
seven students obtained that score. However, you can also have more than one
mode. If you have two modes, it is bimodal.
Distributions of scores may be graphed to demonstrate visually the
relationship among the scores in a group. In such graphs, the horizontal axis
or x-axis is the continuum on which the individuals are measured. The vertical
axis or y-axis is the frequency (or the number) of individuals earning any given
score shown on the x-axis. Figure 10.2 shows you a histogram representing
the scores for the Bahasa Melayu test obtained by a group of 35 students as
shown earlier in Figure 10.1.
Copyright © Open University Malaysia (OUM)
224  TOPIC 10 ANALYSIS OF TEST SCORES
Figure 10.2: Graph showing the distribution of Bahasa Melayu test scores
SELF-CHECK 10.1
1.
What is the difference between descriptive statistics and inferential
statistics?
2.
What is the difference between mean, median and mode?
10.2.2
Dispersion
Although the mean tells us about the groupÊs average performance, it does not tell
us how close to the average or mean the students scored. For example, did every
student score 80 per cent on the test or were the scores spread out from 0 to
100 per cent? Dispersion is the distribution of the scores and among the measures
used to describe spread are range and standard deviation.
(a)
Range
The range of scores in a test refers to the lowest and highest scores obtained
in the test. The range is the distance between the extremes of a distribution.
Copyright © Open University Malaysia (OUM)
TOPIC 10
(b)
ANALYSIS OF TEST SCORES  225
Standard Deviation
Standard deviation refers to how much the scores (obtained by students)
deviate or differ from the mean. Table 10.1 shows the scores obtained by
10 students on a Science test.
Table 10.1: Scores on a Science Test Obtained by 10 Students
x  x 
x x
Marks x
2
35
35 ă 39 = -4
(-4)2 = 16
39
39 ă 39 = 0
(0)2 = 0
45
45 ă 39 = 6
(6)2 = 36
40
40 ă 39 = 1
(1)2 = 1
32
32 ă 39 =-7
(-7)2 = 49
42
42 ă 39 = 3
(3)2 = 9
37
37 ă 39 =-2
(-2)2 = 4
44
44 ă 39 = 5
36
36 ă 39 =-3
(-3)2 = 9
41
41 ă 39 = 2
(2)2 = 4
(5)2 = 25
Sum = 390
Mean ( x ) =
39
N =
10
¡ ( x ă x )2 = 153
Based on the raw scores, you can calculate the standard deviation of a sample
using the formula given.
Standard deviation 
x  x 
N 1
2

153
 17
9
 4.12
The steps in calculating the standard deviation are as follows:
(i)
The first step is to find the mean, which is 390 divided by 10 (the
number of students) = 39;
(ii)
Next is to subtract the mean from each score in the column labelled
x  x . Note that all numbers in this column are positive. The squared
differences are then summed and the square root calculated; and
Copyright © Open University Malaysia (OUM)
226  TOPIC 10 ANALYSIS OF TEST SCORES
(iii) The standard deviation is the positive square root of 153 divided by 9
and the result is 4.12.
To better understand what the standard deviation means, refer to
Figure 10.3 which shows the spread of scores with the same mean but
different standard deviations.
Figure 10.3: Distribution of scores with varying standard deviations
Based on Figure 10.3:
(i)
For Class A, with a standard deviation of 4.12, approximately
68 per cent (1 standard deviation) of students scored between 34.88 and
43.12;
(ii)
For Class B, with a standard deviation of 2, approximately 68 per cent
(1 standard deviation) of students scored between 37 and 41; and
(iii) For Class C, with a standard deviation of 1, approximately 68 per cent
of students scored between 38 and 40.
Note that the smaller the standard deviations, the greater the scores tend to
„bunch‰ around the mean and vice versa. Hence, it is not enough to just
examine the mean alone because the standard deviation tells us a lot about
the spread of the scores around the mean. Which class do you think
performed better? The mean does not tell us which class performed better.
Class C performed the best because approximately two-thirds of the students
scored between 38 and 40.
Copyright © Open University Malaysia (OUM)
TOPIC 10
ANALYSIS OF TEST SCORES  227
Skew refers to the symmetry of a distribution. A distribution is skewed if one
of its tails is longer than the other. Figure 10.4 shows you the distribution of
the scores obtained by 38 students on a History test.
Figure 10.4: Distribution of History test scores
There is a negative skew because it has a longer tail in the negative direction.
What does it mean? It means that more students were getting high scores on
the History test which may indicate that either the test was too easy or the
teaching methods and materials were successful in bringing about the
desired learning outcomes.
Now, let us look at Figure 10.5 which shows the distribution of the scores
obtained by 38 students on a Biology test.
Copyright © Open University Malaysia (OUM)
228  TOPIC 10 ANALYSIS OF TEST SCORES
Figure 10.5: Distribution of Biology test scores
There is a positive skew because it has a longer tail in the positive direction.
What does it mean? It means that more students were getting low scores in
the Biology test which indicates that the test was too difficult. Alternatively,
it could imply that the questions were not clear or the teaching methods and
materials did not bring about the desired learning outcomes.
SELF-CHECK 10.2
What is the difference between range and standard deviation?
Copyright © Open University Malaysia (OUM)
TOPIC 10
ANALYSIS OF TEST SCORES  229
ACTIVITY 10.1
1.
What is the difference between a standard deviation of 2 and a
standard deviation of 5?
2.
A teacher administered an English test to 10 students in her class.
The students earned the following marks: 14, 28, 48, 52, 77, 63, 84,
87, 90 and 98. For the distribution of marks, find the following:
(a)
Mean;
(b)
Median;
(c)
Range; and
(d)
Standard deviation.
Post your answers on the myINSPIRE online forum.
10.3
STANDARD SCORES
After having given a test, most teachers report the raw scores obtained by students.
For example, Zulinda, a Form Five student, earned the following scores in the endof-semester examination:
(a)
Science: 80;
(b)
History: 72; and
(c)
English: 40.
With these raw scores alone, what can you say about ZulindaÊs performance on
these tests or her standing in the class? Actually, you cannot say very much.
Without knowing how these raw scores compare to the total distribution of raw
scores for each subject, it is difficult to draw any meaningful conclusion regarding
her relative performance in each of these tests.
How do you make these raw scores meaningful? Let us assume that the scores of
all three tests are approximately normally distributed.
Copyright © Open University Malaysia (OUM)
230  TOPIC 10 ANALYSIS OF TEST SCORES
The mean and standard deviation of the three tests are as shown in Table 10.2.
Table 10.2: Mean and Standard Deviation for the Three Tests
Subject
Mean
Standard Deviation
Science
90
10
History
60
12
English
40
15
Based on this additional information, what statements can you make regarding
ZulindaÊs relative performance on each of these three tests? The following are
some conclusions you can make:
(a)
Zulinda did best on the History test and her raw score of 72 falls at a point
one standard deviation above the mean;
(b)
Her next best score is English and her raw score of 40 falls exactly on the
mean of the distribution of the scores; and
(c)
Finally, even though her raw score for Science was 80, it falls one standard
deviation below the mean.
Converting ZulindaÊs raw scores into Z-scores, we can say that she achieved a:
(i)
Z-score of +1 for History;
(ii)
Z-score of 0 for English; and
(iii) Z-score of -1 for Science.
10.3.1
Z-score
What is a Z-score? How do you calculate the Z-score? A Z-score is a type of
standard score. The term standard score is the general name for converting a
raw score to another scale using a predetermined mean and a predetermined
standard deviation. Z-scores tell how many standard deviations away from the
mean the score is located. Z-scores can be positive or negative. A positive Z-score
indicates that the value is above the mean, while a negative Z-score indicates that
the value falls below the mean. A Z-score is a raw score that has been transformed
or converted to a scale with a predetermined mean of 0 and a predetermined
standard deviation of 1. For instance, a Z-score of -6 means that the score is 6
standard deviations below the mean.
Copyright © Open University Malaysia (OUM)
TOPIC 10
ANALYSIS OF TEST SCORES  231
The formula used for transforming a raw score into a Z-score involves subtracting
the mean from the raw score and then dividing it by the standard deviation.
Z
x x
SD
For example, let us use this formula to convert KumarÊs marks of 52 obtained in a
Geography test. The mean for the test is 70 and the standard deviation is 7.5.
Z
x x
SD

52  70 18

 2.4
7.5
7.5
The Z-score calculated for the raw score of 52 is -2.4 which means that KumarÊs
score for the Geography test is located 2.4 standard deviations below the mean.
10.3.2
Example of Using the Z-score to Make
Decisions
A teacher administered two Bahasa Melayu tests to students in Form Four A, Form
Four B and Form Four C. The two top students in Form Four C were Seng Huat
and Mei Ling. The teacher was planning to give a prize for the best student in
Bahasa Melayu in Form Four C but was not sure who the better student was.
Test 1
Test 2
Seng Huat
30
50
Mei Ling
45
35
Mean
42
47
7
8
Standard deviation
Copyright © Open University Malaysia (OUM)
232  TOPIC 10 ANALYSIS OF TEST SCORES
The teacher could use the mean to determine who was better. However, both
students have the same mean. How does the teacher decide? Using a Z-score can
tell the teacher how far from the mean are the scores of the two students and thus
who performed better. Using the formula, the teacher calculates the Z-score shown
as follows:
Test 1
Seng Huat
Mei Ling
30  42
7
45  42
7
Test 2
= -1.71
= 0.43
50  47
8
35  47
8
Total
= 0.375
-1.34
= -1.50
-1.07
Upon examination of the calculation, the teacher finds that both Seng Huat and
Mei Ling have negative Z-scores for the total of both tests. However, Mei Ling has
a higher total Z-score (-1.07) compared with Seng HuatÊs total Z-score (-1.34). In
other words, Mei LingÊs total score was closer to the mean and therefore the
teacher concludes that Mei Ling did better than Seng Huat.
Z-scores are relatively simple to use but many educators are reluctant to use it,
especially when test scores are reported as negative numbers. How would you like
to have your mathematics score reported as -4? For this reason, alternative
standard score methods are used such as the T-score.
10.3.3
T-score
The T-score was developed by W. McCall in the 1920s and is one of the many
standard scores currently used. T-scores are widely used in the fields of
psychology and education, especially when reporting performance in
standardised tests. The T-score is a standardised score with a mean of 50 and a
standard deviation of 10. The formula for calculating the T-score is:
T = 10(z) + 50
Copyright © Open University Malaysia (OUM)
TOPIC 10
ANALYSIS OF TEST SCORES  233
For example, a student has a Z-score of -1.0 and after converting it to T-score, you
get the following:
T = 10 (z) + 50
= 10 (-1.0) + 50
= (-10) + 50
= 40
When converting Z-scores to T-scores, you should be careful not to drop the
negatives. Dropping the negatives will result in a completely different score.
ACTIVITY 10.2
1.
Convert the following Z-scores to T-scores.
Z-score
T-score
+1.0
-2.4
+1.8
2.
Why would you use T-scores rather than Z-scores when reporting
the performance of students in the classroom?
Share your answers with your coursemates on the myINSPIRE online
forum.
10.4
THE NORMAL CURVE
The normal curve (also called the „bell curve‰) is a hypothetical curve that is
supposed to represent all natural occurring phenomena. In a normal distribution,
the mean, median and mode have the same value. It is assumed that if we were to
sample a particular characteristic such as the height of Malaysian men, you will
find the average height to be 162.5cm or 5 feet 4 inches.
Copyright © Open University Malaysia (OUM)
234  TOPIC 10 ANALYSIS OF TEST SCORES
However, there will be a few men who will be relatively shorter and an equal
number who are relatively taller. By plotting the heights of all Malaysian men
according to the frequency of occurrence, you can expect to obtain something
similar to a normal distribution curve.
Besides height, normal distribution curve can seen in terms of IQ. Figure 10.6
shows a normal distribution curve for IQ based on the Wechsler intelligence scale
for children.
Figure 10.6: The normal distribution curve
In a normal distribution, about two-thirds of individuals will have an IQ of
between 85 and 115 with a mean of 100. According to the American Association
on Intellectual and Developmental Disabilities, individuals who have an IQ of less
than 70 may be classified as mentally retarded or mentally challenged and those
who have an IQ of more than 130 may be considered as gifted.
Similarly, test scores that measure a particular characteristic such as language
proficiency, quantitative ability or scientific literacy of a specific population can be
expected to produce a normal curve. The normal curve is divided according to
standard deviations (i.e. -4s, -3s ⁄⁄ +3s and 4s) which are shown on the
horizontal axis. The area of the curve between standard deviations is indicated as
a percentage on the diagram. For example, the area between the mean and
standard deviation +1 is 34.13 per cent. Similarly, the area between the mean and
Copyright © Open University Malaysia (OUM)
TOPIC 10
ANALYSIS OF TEST SCORES  235
standard deviation -1 is also 34.13 per cent. Hence, the area between standard
deviation -1 and standard deviation +1 is 68.26 per cent. It means that in a normal
distribution, 68.26 per cent of individuals will score between standard deviations
-1 and +1.
In using the normal curve, it is important to make a distinction between standard
deviation values and standard deviation scores. A standard deviation value is a
constant and is shown on the horizontal axis in Figure 10.6.
On the other hand, the standard deviation score is the obtained score when we use
the standard deviation formula (which we discussed earlier). For example, if we
obtained a standard deviation of 5, then the score for one standard deviation is 5
and the score for two standard deviations is 10, the score for three standard
deviations is 15 and so forth. Standard deviation values of -1, -2 and -3 will have
corresponding negative scores of -5, -10 and -15.
Note, that in Figure 10.6, Z-scores are indicated from +1 to +4 and -1 to -4 with the
mean as 0. Each interval is equal to one standard deviation. Similarly, T-scores are
reported from 10 to 90 (interval of 10) with the mean set at 50. Each interval of 10
is equal to one standard deviation.
10.5
NORMS
In norm-referenced assessment, an individualÊs performance is evaluated in
relation to other peopleÊs performances. Norm-referenced tests are seldom used in
Malaysia but in the United States, standardised tests are widely used. Perhaps,
because of the decentralised education system in the United States, school-based
assessment is extensively practised. Unlike Malaysia, there are no national
examinations like the PMR and SPM in the United States. Hence, teachers there
who want to find out how their students are performing compared with other
students in the country rely on norm-referenced tests to compare the performances
of their students with the performances of other students in the norm group.
What are norms?
Norms are the characteristics of a population accurately estimated from the
characteristics of a representative subset of the population (called the sample
or norm sample).
Copyright © Open University Malaysia (OUM)
236  TOPIC 10 ANALYSIS OF TEST SCORES
Norms are produced based on the norm sample. For example, if you have norms
of reading ability for children of different age groups, you will be able to compare
the performance of a seven-year-old in your class on the reading ability test with
the rest of the population. In other words, you can determine whether your sevenyear-old is reading at the level of other seven-year-olds in the country. In
establishing these norms, you have to ensure that the norm sample is
representative of the population.
Representativeness
When you compare your students with the rest of the population, you want to
ensure that the norm sample is representative. In other words, the individuals
tested in the norm sample must consist of the appropriate age group, taking into
consideration gender differences, geographic location and cultural differences.
For example, the eight-year-olds selected for the norm sample should reflect eightyear-olds in the rest of the country according to gender (male and female),
geographic location (urban or rural) and cultural differences. For example, the
norm sample consists of 3,000 Malaysian primary school children with 500
students for each age group (seven-year-olds = 500 students, eight-year-olds = 500
and so forth). The norm sample should consist of children in all the states of
Malaysia, including all the ethnic groups in the country and be drawn from
different socioeconomic backgrounds and geographic locations. Based on the
norm sample of 3,000 primary school children, the following hypothetical norms
on reading ability in Bahasa Melayu for Malaysian children were produced (refer
to Table 10.3).
Table 10.3: Norms for a Reading Ability Test
Reading Ability (Eight-year-olds)
Score
Percentile
50
96
49
90
48
84
47
78
46
70
47
66
46
58
45
50
44
45
Copyright © Open University Malaysia (OUM)
TOPIC 10
ANALYSIS OF TEST SCORES  237
Percentile ranks (percentiles) are used in standardised tests which allow teachers
to compare the performance of their students with the norm group. An eight-yearold student who obtained a score of 48 on the test has a percentile rank of 84. This
means that the student is reading at a level as well as, or better than,
84 per cent of other eight-year-old students in the test. Similarly, an eight-year-old
who obtains a percentile rank of 45 is reading as well as, or better than,
45 per cent of eight-year-olds in the norm sample.
To use norms effectively, you should be sure that the norm sample is appropriate,
both for the purpose of testing and for the person being tested. If you recognise
that the test norms are inadequate, you should be cautious because you may obtain
misleading information about the abilities of your students. The organisation
responsible for developing the norms should clearly state the groups tested
because you want to ensure that the norm sample is similar to your students. In
other words, the norm sample should consist of the same type of people in the
same proportion as is found in the population of reference. The norm sample
should be large enough to be stable over time.
SELF-CHECK 10.3
1.
List some characteristics of the normal curve.
2.
What are norms? How are norms used?
3.
Do you think we should have standardised tests with norms for the
measurement of different kinds of abilities? Why?

Statistics is a mathematical science pertaining to the analysis, interpretation
and presentation of data.

Data collected about students can be subjected to statistical analysis, which
serves two related purposes: descriptive and inferential.

The term „central tendency‰ refers to the „middle‰ value and is measured
using the mean, median and mode. It is an indication of the location of scores.

The mean is simply the sum of all the values (marks) divided by the total
number of items (students) in the set.
Copyright © Open University Malaysia (OUM)
238  TOPIC 10 ANALYSIS OF TEST SCORES

The range of scores in a test refers to the lowest and highest scores obtained in
the test.

Standard deviation refers to how much the scores obtained by students deviate
or differ from the mean.

Skew refers to the symmetry of a distribution.

A negative skew has a longer tail in the negative direction. A positive skew has
a longer tail in the positive direction.

The standard score refers to a raw score that has been converted from one scale
to another scale using the mean and standard deviation.

Z-score tells how many standard deviations away from the mean the score is
located.

The T-score is a standardised score with a mean of 50 and a standard deviation
of 10.

The normal curve (also called the „bell curve‰) is a hypothetical curve that is
supposed to represent all natural occurring phenomena.

In norm-referenced assessment, an individualÊs performance is evaluated in
relation to other peopleÊs performances.

Norms are the characteristics of a population accurately estimated from the
characteristics of a representative subset of the population called the sample or
norm sample.
Copyright © Open University Malaysia (OUM)
TOPIC 10
ANALYSIS OF TEST SCORES  239
Central tendency
Norms
Descriptive statistics
Positive skew
Dispersion
Range
Inferential statistics
Standard deviation
Mean
Standard score
Median
T-scores
Negative skew
Z-scores
Normal curve
Copyright © Open University Malaysia (OUM)
MODULE FEEDBACK
MAKLUM BALAS MODUL
If you have any comment or feedback, you are welcome to:
1.
E-mail your comment or feedback to modulefeedback@oum.edu.my
OR
2.
Fill in the Print Module online evaluation form available on myINSPIRE.
Thank you.
Centre for Instructional Design and Technology
(Pusat Reka Bentuk Pengajaran dan Teknologi )
Tel No.:
03-78012140
Fax No.:
03-78875911 / 03-78875966
Copyright © Open University Malaysia (OUM)
Copyright © Open University Malaysia (OUM)
Download