Uploaded by leanderjamesgalas7

assessment-of-learning-1-module compress

advertisement
ASSESSMENT
OF
LEARNING 1
TEACHING
MATERIALS
Compiled by Giovanni A. Alcain, CST, LPT, (MAEd-Eng on-going)
EDUC10
ASSESSMENT OF LEARNING 1
MODULE 1 – BASIC CONCEPTS IN ASSESSMENT OF LEARNING
Assessment –refers to the process of gathering, describing or quantifying information about the student
performance. It includes paper and pencil test, extended responses (example essays) and performance
assessment are usually referred to as‖authentic assessment‖ task (example presentation of research work)
Measurement-is a process of obtaining a numerical description of the degree to which an individual
possesses a particular characteristic. Measurements answers the questions‖how much?
Evaluation- it refers to the process of examining the performance of student. It also determines whether
or not the student has met the lesson instructional objectives.
Test –is an instrument or systematic procedures designed to measure the quality, ability, skill or
knowledge of students by giving a set of question in a uniform manner. Since test is a form of assessment,
tests also answer the question‖how does individual student perform?
Testing-is a method used to measure the level of achievement or performance of the learners. It also
refers to the administration, scoring and interpretation of an instrument (procedure) designed to elicit
information about performance in a simple of a particular area of behavior.
Types of Measurement
There are two ways of interpreting the student performance in relation to classroom instruction. These are
the Norm-reference tests and Criterion-referenced tests.
Norm-reference test is a test designed to measure the performance of a student compared with other
students. Each individual is compared with other examinees and assigned a score-usually expressed as
percentile, a grade equivalent score or a stanine. The achievement of student is reported for broad skill
areas, although some norm referenced tests do report student achievement for individual.
The purpose is to rank each student with respect to the achievement of others in broad areas of knowledge
and to discriminate high and low achievers.
Criterion- referenced test is a test designed to measure the performance of students with respect to some
particular criterion or standard. Each individual is compared with a pre determined set of standard for
acceptable achievement. The performance of the other examinees are irrelevant. A student’s score is
usually expressed as a percentage and student achievement is reported for individual skills,
The purpose is to determine whether each student has achieved specific skills or concepts. And to find out
how mush students know before instruction begins and after it has finished.
Other terms less often used for criterion-referenced are objective referenced, domain referenced, content
referenced and universe referenced.
According to Robert L. Linn and Norma E. gronlund (1995) pointed out the common characteristics and
differences of Norm-Referenced Tests and Criterion-Referenced Tests
Common Characteristics of Norm-Referenced Test and Criterion-Referenced Tests
1.
2.
3.
4.
5.
6.
Both require specification of the achievement domain to be measured
Both require a relevant and representative sample of test items
Both use the same types of test items
Both used the same rules for item writing (except for item difficulty)
Both are judge with the same qualities of goodness (validity and reliability)
Both are useful in educational assessment
Differences between Norm-Referenced Tests and Criterion Referenced Tests
Norm –Referenced Tests
Criterion-Referenced Tests
1. Typically covers a large domain of
learning tasks, with just few items
measuring each specific task.
1.Typically focuses on a delimited domain of
learning tasks, with a relative large number of
items measuring each specific task.
2. Emphasizes
discrimination
among
individuals in terms of relative of level of
learning.
3. Favors items of large difficulty and
typically omits very easy and very hard
items
2.Emphasizes among individuals can and
cannot perform.
3.Matches item difficulty to learning tasks,
without altering item difficulty or omitting
easy or hard times
4. Interpretation requires clearly defined
4.Interpretation requires a clearly defined and
group
delimited achievement domain
TYPES OF ASSESSMENT
There are four type of assessment in terms of their functional role in relation to classroom instruction.
These are the placement assessment, diagnostic assessment, formative assessment and summative
assessment.
A. Placement Assessment is concerned with the entry performance of student, the purpose of
placement evaluation is to determine the prerequisite skills, degree of mastery of the course
objectives and the best mode of learning.
B. Diagnostic Assessment is a type of assessment given before instruction. It aims to identify the
strengths and weaknesses of the students regarding the topics to be discussed. The purpose of
diagnostic assessment:
1. To determine the level of competence of the students
2. To identify the students who have already knowledge about the lesson;
3. To determine the causes of learning problems and formulate a plane for remedial action.
C. Formative Assessment is a type of assessment used to monitor the learning progress of the
students during or after instruction. Purpose of formative assessment:
1. To provide feed back immediately to both student and teacher regarding the success and
failure of learning.
2. To identify the learning errors that is need of correction
3. To provide information to the teacher for modifying instruction and used for improving
learning and instruction
D. Summative Assessment is a type of assessment usually given at the end of a course or unit.
Purpose of summative assessment:
1. To determine the extent to which the instructional objectives have been met;
2. To certify student mastery of the intended outcome and used for assigning grades;
3. To provide information for judging appropriateness of the instructional objectives
4. To determine the effectiveness of instruction
MODULE 2 - PRINCIPLES OF HIGH-QUALITY CLASSROOM ASSESSMENT
1.
2.
3.
4.
5.
6.
7.
8.
Clarity of learning targets
Appropriateness of Assessment Methods
Validity
Reliability
Fairness
Positive Consequences
Practicality and Efficiency
Ethics
1. CLARITY OF LEARNING TARGETS
Assessment can be made precise, accurate and dependable only if what are to be achieved are
clearly stated and feasible. The learning targets, involving knowledge, reasoning, skills, products
and effects, need to be stated in behavioral terms which denote something which can be observed
through the behavior of the students.
Cognitive Targets
Benjamin Bloom (1954) proposed a hierarchy of educational objectives at the cognitive level.
These are:
Knowledge – acquisition of facts, concepts and theories
Comprehension - understanding, involves cognition or awareness of the interrelationships
Application – transfer of knowledge from one field of study to another of from one concept to
another concept in the same discipline
Analysis – breaking down of a concept or idea into its components and explaining g the concept
as a composition of these concepts
Synthesis – opposite of analysis, entails putting together the components in order to summarize
the concept
Evaluation and Reasoning – valuing and judgment or putting the ―worth‖ of a concept or
principle.
Skills, Competencies and Abilities Targets
Skills – specific activities or tasks that a student can proficiently do
Competencies – cluster of skills
Abilities – made up of relate competencies categorized as:



Cognitive
Affective
Psychomotor
Products, Outputs and Project Targets





tangible and concrete evidence of a student’s ability
need to clearly specify the level of workmanship of projects
expert
skilled
novice
2. APPROPRIATENESS OF ASSESSMENT METHODS
Written-Response Instruments
Objective tests – appropriate for assessing the various levels of hierarchy of educational
objectives
Essays – can test the students’ grasp of the higher level cognitive skills
Checklists – list of several characteristics or activities presented to the subjects of a study, where
they will analyze and place a mark opposite to the characteristics.
Product Rating Scales


Used to rate products like book reports, maps, charts, diagrams, notebooks, creative
endeavors
Need to be developed to assess various products over the years
Performance Tests - Performance checklist


Consists of a list of behaviors that make up a certain type of performance
Used to determine whether or not an individual behaves in a certain way when asked to
complete a particular task
Oral Questioning – appropriate assessment method when the objectives are to:


Assess the students’ stock knowledge and/or
Determine the students’ ability to communicate ideas in coherent verbal sentences.
Observation and Self Reports
 Useful supplementary methods when used in conjunction with oral questioning and
performance tests
3. VALIDITY


Something valid is something fair.
A valid test is one that measures what it is supposed to measure.
Types of Validity





Face: What do students think of the test?
Construct: Am I testing in the way I taught?
Content: Am I testing what I taught?
Criterion-related: How does this compare with the existing valid test?
Tests can be made more valid by making them more subjective (open items).
MORE ON VALIDITY
Validity – appropriateness, correctness, meaningfulness and usefulness of the specific
conclusions that a teacher reaches regarding the teaching-learning situation.
Content validity – content and format of the instrument
 Students’ adequate experience
 Coverage of sufficient material
 Reflect the degree of emphasis
Face validity – outward appearance of the test, the lowest form of test validity
Criterion-related validity – the test is judge against a specific criterion
Construct validity – the test is loaded on a ―construct‖ or factor
4.RELIABILITY


Something reliable is something that works well and that you can trust.
A reliable test is a consistent measure of what it is supposed to measure.
Questions:
 Can we trust the results of the test?
 Would we get the same results if the tests were taken again and scored by a different
person?
Tests can be made more reliable by making them more objective (controlled items).

Reliability is the extent to which an experiment, test, or any measuring procedure yields
the same result on repeated trials.

Equivalency reliability is the extent to which two items measure identical concepts at an
identical level of difficulty. Equivalency reliability is determined by relating two sets of
test scores to one another to highlight the degree of relationship or association.

Stability reliability (sometimes called test, re-test reliability) is the agreement of
measuring instruments over time. To determine stability, a measure or test is repeated on
the same subjects at a future date.

Internal consistency is the extent to which tests or procedures assess the same
characteristic, skill or quality. It is a measure of the precision between the observers or of
the measuring instruments used in a study.

Interrater reliability is the extent to which two or more individuals (coders or raters)
agree. Interrater reliability addresses the consistency of the implementation of a rating
system.
RELIABILITY – CONSISTENCY, DEPENDABILITY, STABILITY WHICH CAN BE
ESTIMATED BY
Split-half method
 Calculated using the following:
Richardson – KR 20 and KR21

Spearman-Brown prophecy formula and Kuder-
Consistency of test results when the same test is administered at two different time
periods such as Test-retest method and Correlating the two test results.
5. FAIRNESS
The concept that assessment should be 'fair' covers a number of aspects.
 Student Knowledge and learning targets of assessment
 Opportunity to learn
 Prerequisite knowledge and skills
 Avoiding teacher stereotype
 Avoiding bias in assessment tasks and procedures

6. POSITIVE CONSEQUENCES
Learning assessments provide students with effective feedback and potentially improve their
motivation and/or self-esteem. Moreover, assessments of learning gives students the tools to
assess themselves and understand how to improve positive consequence on students, teachers,
parents, and other stakeholders
7. PRACTICALITY AND EFFICIENCY


Something practical is something effective in real situations.
A practical test is one which can be practically administered.
Questions:


Will the test take longer to design than apply?
Will the test be easy to mark?
Tests can be made more practical by making it more objective (more controlled items)






Teacher Familiarity with the Method
Time required
Complexity of Administration
Ease of scoring
Ease of Interpretation
Cost Teachers should be familiar with the test, - does not require too much time implementable
8. ETHICS


1.
2.
3.
Informed consent
Anonymity and Confidentiality
Gathering data
Recording Data
Reporting Data
ETHICS IN ASSESSMENT – ―RIGHT AND WRONG‖
 Conforming to the standards of conduct of a given profession or group
 Ethical issues that may be raised
1.
2.
3.
4.
Possible harm to the participants.
Confidentiality.
Presence of concealment or deception.
Temptation to assist students.
MODULE 3 – DEVELOPMENT OF CLASSROOM TOOLS FOR MEASURING
KNOWLEDGE AND UNDERSTANDING
DIFFERENT TYPES OF TESTS
MAIN POINTS FOR
COMPARSON
TYPES OF TEST
Psychological

Aims to measure
students intelligence
or mental ability in a
large degree without
reference to what the
students has learned
Measures the
intangible
characteristics of an
individual (e.g.
Aptitude Tests,
Personality Tests,
Intelligence Tests)
Survey

Covers a broad range
of objectives
 Measures general
achievement in
certain subjects
 Constructed by
trained professional
Norm- Referenced




Purpose

Scope of Content
Educational
Result is interpreted
by comparing one
student’s performance
with other students’
Aims to measure the
result of instructions
and learning (e.g.
Performance Tests)
Mastery
Covers a specific
objective
 Measures
fundamental skills
and abilities
 Typically constructed
by the teacher
Criterion-Referenced
Result is interpreted
by comparing
student’s performance
based on a predefined



performance
Some will really pass
There is competition
for a limited
percentage of high
scores
Describes pupil’s
performance
compared to others



standard
All or none may pass
There is no
competition for a
limited percentage of
high score
Describes pupil’s
mastery of course
objectives
Interpretation


Language Mode
Verbal
Words are used by
students in attaching
meaning to or
responding to test
items
Standardized




Construction






Non-verbal
Students do not use
words in attaching
meaning to or in
responding to test
items (e.g. graphs,
numbers, 3-D
subjects)
Informal
Constructed by a
professional item
writer
Covers a broad range
of content covered in
a subject area
Uses mainly multiple
choice
Items written are
screened and the best
items were chosen for
the final instrument
Can be scored by a
machine
Interpretation of
results is usually
norm-referenced
Individual

Constructed by a
classroom teacher

Covers a narrow
range of content

Various types of
items are used
Teacher picks or
writes items as
needed for the test
Mostly given orally or
requires actual
demonstration of skill
One-on-one
situations, thus, many
opportunities for

This is a paper-andpen test

Loss of rapport,
insight and
knowledge about each



Scored manually by
the teacher
Interpretation is
usually criterionreferenced
Group

Manner of Administration



Effect of Biases
clinical observation
Chance to follow-up
examinee’s response
in order to clarify or
comprehend it more
clearly
Objective
Scorer’s personal
judgement does not
affect the scoring
Worded that only one
answer is acceptable
Little or no
disagreement on what
is the correct answer

Subjective



Power


Time Limit and Level of
Difficulty
Consists of series of
items arranged in
ascending order of
difficulty
Measures student’s
ability to answer more
and more difficult
items




Consists of items
approximately equal
in difficulty

Measure’s student’s
speed or rate and
accuracy in
responding
Supply
There are choices for
the answer
Multiple choice, True
or False, Matching
Type

Can be answered
quickly
Prone to guessing

Time consuming to
construct



Format

Affected by scorer’s
personal opinions,
biases and judgement
Several answers are
possible
Possible to
disagreement on what
is the correct answer
Speed
Selective

examinee
Same amount of time
needed to gather
information from one
student
There are no choices
for the answer
Short answer,
Completion,
Restricted or
Extended Essay
May require a longer
time to answer
Less chance to
guessing but prone to
bluffing
Time consuming to
answer and score
TYPES OF TESTS ACCORDING TO FORMAT
1. Selective Type – provides choices for the answer
a. Multiple Choice – consists of a stem which describes the problem
and 3 or more alternatives which give the suggested solutions. The incorrect
alternatives are the distractions.
b. True-False or Alternative Response – consists of declarative statement
that one has to mark true or false, right or wrong, correct or incorrect, yes or no, fact or
opinion,, and the like
c. Matching Type – consists of two parallel columns: Column A, the
column of premises from which a match is sought; Column B, the column of responses
from which the selection is made.
2. Supply Test
a. Short Answer – uses a direct question that can be answered by a word,
phrase, a number, or a symbol
b. Completion Test – It consists of an incomplete statement
3. Essay Test
a. Restricted Response – limits the content of the response by
restricting the scope of the topic
b. Extended Response – allows the students to select any factual
information that they think is pertinent, to organize their answers in accordance with their best
judgement
Projective Test
 A psychological test that uses images in order to evoke responses from a subject and
reveal hidden aspects of the subject’s mental life

These were developed in an attempt to eliminate some of the major problems
inherent in the use of self-report measures, such as the tendency of some
respondents to give “socially responsible” responses.
Important Projective Techniques
1. Word Association Test. An individual is given a clue or hint and asked to respond
to the first thing that comes to mind.
2. Completion Test. In this the respondents are asked to complete an incomplete
sentence or story. The completion will reflect their attitude and state of mind.
3. Construction Techniques (Thematic Apperception Test) – This is more or less
like completion test. They can give you a picture and you are asked to write a story about it. The
initial structure is limited and not detailed like the completion test. For e.g.: 2 cartoons are
given and a dialogue is to be written.
4. Expression Techniques - : In this the people are asked to express the feeling or
attitude of each other people.
GUIDELINES FOR CIINSTRUCTING TEST ITEMS
When to use Essay Test
Essays are appropriate when:
1. the group to be tested is SMALL and the test is NOT TO BE USED again;
2. you wish to encourage and reward the development of student’s SKILL
WRITING;
3. you are more interested in explori ng the student’s ATTITDES than in
measuring his/her academic achievement;
4. you are more confident of your ability as a critical and fair reader than as an
imaginative writer of good objective test items
When to Use Objective Test Items
Objective test items are especially appropriate when:
1. The group to be tested is LARGE and the test may be REUSED;
2. HIGHLY RELIABLE TEST SCOREs must be obtained as efficiently as possible;
3. IMPARTIALITY of evaluation, ABSOLUTE FAIRNESS, and FREEDOM from
possible test SCORING INFLUENCES – fatigue, lack of anonymity are essential;
4. You are more confident of your ability to express objective test items clearly
than your ability to judge essay test answers correctly;
5. There is more PRESSURE IN SPEEDY REPORTING OF SCORES than for speedy
test preparation.
Multiple Choice Items
 It consists of:
1. Stem – which identifies the question or problem
2. Response alternatives or Options
3. Correct answer
Example:
Which of the following is a chemical change? (STEM)
a. Evaporation of alcohol
c. burning oil
b. Freezing of water
d. melting of wax
Alternatives
Advantage of Using Multiple Choice Items
Multiple choice items can provide:
1. Versatility in measuring all levels of cognitive ability
2. Highly reliable test scores
3. Scoring efficiency and accuracy
4. Objective measurement of student achievement or ability
5. A wide sampling of content or objectives
6. A reduce guessing factor when compared to true-false items
7. Different response alternatives which can provide diagnostic feedback.
Limitations of Multiple Choice Items
1. Difficult and time consuming to construct
2. Lead a teacher to favour simple recall of facts
3. Place a high degree of dependence on student’s reading ability and te acher’s
writing ability
SUGGESTIONS FOR WRITING MULTIPPLE CHOICE ITEMS
1. When possible, state the stem as a direct question rather than as an incomplete
statement.
Poor: Alloys are ordinarily produced by…
Better: How are alloys ordinarily produced?
2. Present a definite, explicit singular question or problem in the stem.
Poor: Psychology…
Better: The science of mind and behaviour is called…
3. Eliminate excessive verbiage or irrelevant information from the stem.
Poor: While ironing her formal polo shirt, June burned her hand accidentally on
the hot iron. This was due to a heat transfer because…
Better: Which of the following ways of heat transfer explains why June’s hand
was burned after she touched a hot iron?
4. Include in the stem any word(s) that might otherwise be repeated in each
alternative.
Poor:
In national elections in the US, the President is officially
a. Chosen by the people
b. Chosen by the electoral college
c. Chosen by members of the Congress
d. Chosen by the House of Representative’
Better:
In national elections in the US, the President is officially chosen by
a. the people
b. the electoral college
c. members of the Congress
d. the House of Representative
5. Use negatively stated questions sparingly. When used, underline and/or
capitalize the negative word.
Poor: Which of the following is not cited as an accomplishment of Arroyo
administration?
Better: Which of the following is NOT cited as an accomplishment of Arroyo
administration?
6. Make all alternatives plausible and attractive to the less knowledge or skilful
student.
What process is most nearly opposite of photosynthesis?
Poor
Better
a. Digestion
a. Digestion
b. Relaxation
b. Assimilation
c. Respiration
c. Respiration
d. Exertion
d. Catabolism
7. Make the alternative grammatically parallel with each other and consistent with
the stem.
Poor: What would advance the application of atomic discoveries to medicine?
a. Standardized techniques for treatment of patients
b. Train the average doctor to apply the radioactive treatments
c. Remove the restriction of the use of radioactive substances
d. Establishing hospital staffed by highly trained radioactive therapy
specialist.
Better: What would advance the application of atomic discoveries to medicine?
a. Development of standardized techniques for treatment of patients
b. Removal of restriction on the use of radioactive substances
c. Addition of trained radioactive therapy specialist to hospital staffs
d. Training the average doctor in applicant of radioactive treatments.
8. Make the alternatives mutually exclusive.
Poor: The daily minimum required amount of milk that a 10-year old should
drink is
a. 1-2 glasses
b. 2-3 glasses*
c. 3-4 glasses*
d. At least 4 glasses
Better: What is the daily minimum required amount of milk a 10-year old child
should drink?
a. 1 glass
b. 2 glasses
c. 3 glasses
d. 4 glasses
9. When possible, present alternatives in some logical order (chronological, most to
least, alphabetical).
At 7 a.m. two trucks leave a diner and travel north. One truck averages 42 miles
per hour and the other truck averages 38 miles per hour. At what time will be 24 miles apart?
Undesirable
Desirable
a. 6 p.m.
a. 1 a.m.
b. 9 a.m.
b. 6 a.m.
c. 1 a.m.
c. 9 a.m.
d. 1 p.m.
d. 1 p.m.
e. 6 a.m.
e. 6 p.m.
10. Be sure there is only one correct or best response to the item.
Poor: The two most desired characteristics in a classroom test are validity and
a. Precision
b .Reliability*
c. Objectivity
d. Consistency*
Best: The two most desired characteristics in a classroom test are validity and
a. Precision
b. Reliability*
c. Objectivity
d. Standardization
11. Make alternative approximately equal in length.
Poor:
The most general cause of low individual incomes in the US is
a. Lack of valuable productive services to sell*
b. Unwillingness to work
c. Automation
d. Inflation
Better: What is the most general cause of low individual incomes in the US?
a. A lack of valuable productive services to sell*
b. The population’s overall unwillingness to work
c. The nation’s increase reliance on automation
d. An increasing national level of inflation.
12. Avoid irrelevant clues, such as grammatical structure, well-known verbal
associations or connections between stem and answer.
Poor:
(grammatical clue) A chain of islands is called an
a. Archipelago
b. Peninsula
c. Continent
d. Isthmus
Poor:
(verbal association) The reliability of a test can be estimated by a
coefficient of
a. Measurement
b. Correlation*
c. Testing
d. Error
Poor:
(connections between stem and answer) The height to which a water
dam is build depends on
a. The length of the reserve behind the dam.
b. The volume of the water behind the dam.
c. The height of water behind the dam.*
d. The strength of the reinforcing the wall.
13. Use at least four alternatives for each item to lower the probability of getting
the item correctly by guessing.
14. Randomly distribute the correct responses among the alternative positions
throughout the test having approximately the same proportion of the alternatives a, b, c, d and
e as correct response.
15. Use the alternative NONE OF THE ABOVE and ALL OF THE ABOVE sparingly.
When used, such alternatives should occasionally be used as the correct response.
True-False Test Items
True-false test items are typically used to measure the ability to identify whether or not the
statements of facts are correct. The basic format is simply a declarative statement that the
student must judge as true or false. No modification of the basic form in which the student
must respond “yes” or “no”, “agree” or “disagree.”
Three Forms:
1. Simple – consists of only two choices
2. Complex – consists of more than two choices
3. Compound – two choices plus a conditional completion response
Examples:
Simple: The acquisition of morality is a developmental process.
True
False
Complex: The acquisition of morality is a developmental process. True
Compound: An acquisition of morality is a developmental process. True
If the statement is false, what makes it false?
False
Opinion
False
Advantages of True-False Items
True-false items can provide:
1. The widest sampling of content or objectives per unit of testing time
2. Scoring efficiency and accuracy
3. Versatility in measuring all levels of cognitive ability
4. Highly reliable test scores; and
5. An objective measurement of student achievement or ability.
Limitations of True-False Items
1. Incorporate an extremely high guessing factor
2. It can often lead the teacher to write ambiguous statements due to the difficulty
of writing statements which are unequivocally true or false.
3. Do not discriminate between students varying ability as well as other item types.
4. It can often include more irrelevant clues than do other item types.
5. It can often lead a teacher to favour testing of trivial challenge.
Suggestions for Writing True-False Items (Payne, 1984)
1. Base true-false items upon statements that are absolutely true or false, without
qualifications or exceptions.
Poor: Nearsightedness is hereditary in origin.
Better: Geneticists and eye specialists believe that the predisposition to
nearsightedness is hereditary.
2. Express the item statement as simply as clearly as possible.
Poor: When you see a highway with a marker that reads: “Interstate 80,” you
know that the construction and upkeep of that road is built and maintained by the local and
national government.
Better: The construction and maintenance of the interstate highways are are
provided by both local and national government.
3. Express a single idea in each test item.
Poor: Water will boil at a higher temperature if the atmospheric pressure on its
surface is increased and more heat is applied to the container.
Better: Water will boil at a higher temperature if the atmospheric pressure on its
surface is increased; or water will boil at a higher temperature if more heat is applied to the
container.
4. Include enough background information and qualifications so that the ability to
respond correctly to the item does not depend on some special, uncommon knowledge.
Poor: The second principle of education is that the individual gathers
knowledge.
Better: According to John Dewey, the second principle of education is that the
individual gathers knowledge.
5. Avoid lifting statements directly from the text lecture or other materials so that
memory alone will not permit a correct answer.
Poor: For every actions there is an opposite or equal reaction.
Better: If you were to stand in a canoe and throw a life jacket forward to another
canoe, chances are, you canoe will jerk backward.
6. Avoid using negatively stated item statements.
Poor: The Supreme Court is not composed of nine justices.
Better: The Supreme Court is composed of nine justices
7. Avoid the use of unfamiliar vocabulary.
Poor: According to some politicians, the raison d’etre for capital punishment is
retribution.
Better: According to some politicians, justification for the existence of capital
punishment is retribution.
8. Avoid the use of specific determiners which should permit a test wise but
unprepared examinee to respond correctly. Specific determiners refer to sweeping terms like
always, all, none, never, impossible, inevitable. Statements including such terms are likely to be
false. On the other hand, statements using qualifying determiners such as usually, sometimes,
often, are likely to be true. When statements require specific determiners, make sure they
appear in both true and false items.
Poor: All sessions of Congress are called by the President (F)
The Supreme Court is frequently required to rule on the constitutionality
of the law. (T)
The objectives test is generally easier to score than an essay test. (T)
Better: When specific determiners are used, reverse the expected outcomes.
The sum of angles of a triangle is always 180 degrees. (T)
Each molecule of a given compound is chemically the same as every
other molecule of that compound. (T)
The galvanometer is the instrument usually used for the metering of
electrical energy use in a home. (F)
9. False items tend to discriminate more highly than true items. Therefore, use
more false items than true items (but not more than 15% additional false items).
Matching Test Items
In general matching items consists of a column of stimuli presented on the left side
of the exam page and a column of responses placed on the right side of the page. Students are
required to match the response associated with a given stimulus.
Advantages of Using Matching Test Items
1. Require short period of reading and response time allowing the teacher to cover
more content.
2. Provide objective measurement of student achievement or ability.
3. Provide highly reliable test scores.
4. Provide scoring efficiency and accuracy.
Disadvantages of Using Matching Test Items
1. Have difficulty measuring learning objectives requiring more than simple recall
or information.
2. Are difficult to construct due to the problem of selecting a common set of stimuli
and responses.
Suggestions for Writing Matching Test items
1. Include directions which clearly state the basis for matching the stimuli with the
responses. Explain whether or not the response can be used more than once and indicate
where to write the answer.
Poor: Directions: Match the following.
Better: Directions: On the line to the left of each identifying location and
characteristics in Column 1, write the letter on the country in column III that is best defined.
Each country in Column may be used more than once.
2. Use only homogeneous material in matching items.
Poor: Directions: Match the following.
1. _______Water
A. NaCI
2. _______Discovered Radium
B. Fermi
3. _______Salt
C. NH3
4. _______Year of the First Nuclear Fission by man D. 1942
5. _______Ammonia
E. Curie
Better: Directions: On the line to the left of each compound in column I, write
the letter of the compound’s formula presented in column II. Use each formula once.
Column I
Column II
1. _______Water
A.H2SO4
2. _______Salt
B. HCI
3. _______Ammonia
C. NaCI
4. _______Sulfuric Acid
D. H2O
E. H2HCI
3. Arrange the list of responses in some systematic order if possible – chronological,
alphabetical.
Directions: On the line to the left of each definition in column I, write the letter of the
defense mechanism in column II that is described. Use each defense mechanism only once.
Column I
Column II
Undesirable
Desirable
_____1. Hunting for reason to support
A. Rationalization A. Denial of Reality
one’s belief
_____2. Accepting the values
and norms of others
_____3. As one’s own even if
they are contrary to previously held values
______4. Attributing to other’s
own unacceptable
impulse and thoughts and desires
______5. Ignoring disagreeable
And situations, thoughts and
desires
B. Identification
B. Identification
C. Projection
C. Projection
D. Introjection
D. Projection
E. Denial of Reality
E. Rationalization
4. Avoid grammatical or other clues to correct response.
Poor:
Directions: Match the following in order to complete the sentence on the left.
___1. Igneous rocks are formed
A. a hardness of 7
___2. The formation of coal requires
B. with crystalline rock
___3. Ageode is filled
C. a metamorphic rock
___4. Feldspar is classified as
D. through the solid formation of molten
Better: Avoid sentence completion due to grammatical clues.
Note:
1. Keep matching items brief, limiting the list of stimuli to under 10
2. Include more responses than stimuli to help prevent answering through the
process of elimination.
3. When possible, reduce the amount of reading time by including only short
phrases or single word in the response list.
Completion Test Items
The completion items require the student to answer a question or to finish an
incomplete statement by filling in a blank with correct word or phrase.
Example:
According to Freud, personality is made up of three major systems, the______,
the________, and the__________.
Advantages of Using Completion Items
Completion items can:
1. Provide a wide sampling of content;
2. Efficiency measure lower levels of cognitive ability;
3. Minimize guessing as compared to multiple choice or true-false items; and
4. Usually provide an objective measure of student achievement or ability
Limitations of Using Completion Items
Completion items:
1. Are difficult to construct so that the desired response is clearly indicated;
2. Have difficulty in measuring learning objectives requiring more than simple recall
of information;
3. Can often include irrelevant clues than do other item types;
4. Are more time consuming to score when compared to multiple choice or truefalse items; and
5. Are more difficult to score since more than one answer may have to be
considered correct if the item was not properly prepared.
Suggestions for Writing Completion Test Items
1. Omit only significant words from the statement.
Poor: Every atom has a central (core) called nucleus.
Better: Every atom has a central core called a (an) (nucleus)
2. Do not omit so many words from the statement that the intended meaning is
lost.
Poor: The__ were to Egypt was the__ were to Persia as__ were to the clearly
tribes of Israel.
Better: The Pharaohs were to Egypt as the__ were to Persia as__ were to the
early tribes of Israel.
3. Avoid grammatical or other clues to the correct response.
Poor: Most if the United States’ libraries are organized according t o the
(Dewey) decimal system.
Better: Which organizational system is used by most of the United States’
libraries? (Dewey Decimal)
4. Be sure there is only one correct response.
Poor: Trees which shed their leaves annually are (seed-bearing, common).
Better: Trees which shed their leaves annually are called (delicious).
5. Make the blanks of equal length.
Poor: In Greek mythology, Vulcan was the sun of (Jupiter and Juno).
Better: In Greek mythology, Vulcan was the son of___ and___.
6. When possible, delete words at the end of the statement after the student has
been presented a clearly defined problem.
Poor: (122.5) is the molecular weight of KC103.
Better: The molecular weight of KC103 is___.
7. Avoid lifting statements directly from the text, lecture or other sources.
8. Limit the required response to a single word or phrase.
Essay Test Items
A classroom essay test consists of a small number of questions to which the
student is expected to demonstrate his/her ability to:
a.
Recall factual knowledge;
b.
Organize this knowledge; and
c.
Present the knowledge is a logical, integrated answer to the question.
Classification of Essay Test:
1.
Extended-response essay item
2.
Limited Response or Short-answer essay item
Example of Extended-Response Essay Item:
Explain the difference between the S-R (Stimulus-Response) and the S-O-R (StimulusOrganism-Response) theories of personality. Include in your answer the following:
a. Brief description of both theories
b. Supporters of both theories
c. Research methods used to study each of the two theories (20 pts)
Example of Short-Answer Essay Item:
Identify research methods used to study the (Stimulus-Response) and the S-O-R
(Stimulus-Organism-Response) theories of personality. (10pts)
Advantages of Using Essay Items
Essay items:
1.
Are easier and less time consuming to construct than most item types;
2.
Provide a means for testing students’ ability to compose an answer and present it
in a logical manner; and
3.
Can efficiently measure higher order cognitive objectives – analysis, synthesis,
evaluation.
Limitations of Using Essay Items
Essay Items:
1. Cannot measure a large amount of content or objectives;
2. Generally prove a low test scorer reliability;
3. Require an extensive amount of instructor’s time to read and grade; and
4. Generally do not provide an objective measure of student achievement or ability
(subject to bias on the part of the grader)
Suggestions for Writing the Essay Test Items
1. Prepare essay items that elicit the type of behaviour you want to measure.
Learning Objective: The student will be able to explain how the normal curve
serves as a statistical model.
Poor: Describe a normal curve in terms of symmetry, modality, kurtosis and
skewness.
Better: Briefly explain how the normal curve serves as statistical model for
estimation and hypothetical testing.
2. Phrase each items so that the student’s task is clearly indicated.
Poor: Discuss the economic factors which led to stock market crash of 2008.
Better: Identify the three economic conditions which led to the stock market
crash of 2008. Discuss briefly each condition in correct chronological sequence and in one
paragraph indicate how the three factors were interrelated.
3. Indicate for each item appoint or weight and an estimated the limit for
answering.
Poor: Compare the writing of Bret Harte and Mark Twain in terms of setting,
depth of characterization, and dialogue styles of their main characters.
Better: Compare the writings of Bret and Mark Twain in terms of setting, depth
of characterization, and dialogue styles of their main characters. (10 points 20 points)
4. Ask questions that will elicit responses on which experts could agree that one
answer is better than another.
5. Avoid giving a student a choice among optional items as this greatly reduces the
reliability of the test.
6. It is generally recommended for classroom examinations to administer several
short-answer items rather than only on or two extended-response items.
Guidelines for Grading Essay Items
1. When writing each essay item, simultaneously develop a scoring rubric.
2. To maintain a consistent scoring system and ensure same criteria are applied
to all assessments, score one essay across all test prior to scoring the next essay.
3. To reduce the influence of the halo effect, bias and other subconscious factors,
all essay questions should be graded blind to the identity of the student.
4. Due to the subjective nature of graded essays, the score on one essay may be
influenced by the quality of previous essays. To provide this type of bias, reshuffle the order of
assessments after reading through each item.
Principle 3: Balanced
- A balanced assessments sets target in all sets in domains of learning (cognitive,
effective, and psychomotor) or domains of intelligence (verbal-linguistics, logic
mathematical, bodily kinaesthetic, visual-spatial, musical-rhythmic, intrapersonal-social,
intrapersonal-introspection, physical world-natural-existential-spiritual)
- A balanced assessment makes use of both traditional and alternative assessment.
Principle 4. Validity
Validity – is a degree to which the assessment instrument measures what it intends
to measure.
 It is also refers to the usefulness of the instrument for a given purpose.
 It is the most important criterion of a good assessment instrument
Ways in Establishing Validity
1. Face Validity- is done by examining the physical appearance of the
instrument.
2. Content Validity- is done through a careful and critical examination of the
objectives of assessment so that it reflects the curricular objectives.
3. Criterion-related Validity- is established statistically such that a set of scores
revealed by the measuring instrument IS CORRELATED with the scores obtained in another
EXTERNAL PREDICTOR OR MEASURE.
It has two purposes:
a. Concurrent Validity- describe the present status of the individual by correlating the sets of
scores obtained FROM TWO MEASUREs GIVEN CONCURRENTLY.
Example: Relate the reading test result with pupils’ average grades in reading given by the
teacher.
b. Predictive Validity- describes the future performance of an individual by
correlating the sets of scores obtained from TWO MEASURES GIVEN AT A LONGER TIME
INTERVAL.
Example: The entrance examination scores in a test administered to a freshmen class at the
beginning of the school year is correlated with the average grades at the end of the school year.
4. Construct Validity- Validity established by analysing the activities and processes
that correspond to a particular concept; is established statistically by comparing psychological
traits or factors that theoretically influence scores in a test.
a. Convergence validity helps to establish construct validity when you use two different
measurement procedures and research methods (e.g., participant observation and a survey) in
your study to collect data about a construct (e.g., anger, depression, motivation, task
performance).
b. Divergent validity helps to establish construct validity by demonstrating that the construct
you are interested in (e.g., anger) is different from other constructs that might be present in
your study (e.g., depression).
Factors Influencing the Validity of an Assessment Instrument
1. Under Directions - directions that do not clearly indicate to the students how to
respond to the task and how to record the responses tend to reduce validity.
2. Reading Vocabulary and sentence structure too difficult- Vocabulary and
sentences structure that are too complicated for the student result in the assessment of
reading comprehension thus altering the meaning of assessment result.
3. Ambiguity- Ambiguous statements in assessments task contribute to
misinterpretations and confusion. Ambiguity sometimes confuses the better students more
than it does the poor students.
4. Inadequate time limits- time limits that do not provide students with enough
time to consider the tasks and provide thoughtful responses can reduce the validity of
interpretations of results.
5. Overemphasis of easy- to assess aspects of domain at the expense of
important, but hard- to assess aspects (construct under the presentation). It is easy to develop
test question that asses factual recall and generally harder to develop ones that tap conceptual
understanding or higher-order thinking processes such as the evaluation of completing
positions or arguments. Hence it is important to guard against under representation of task
getting the important, but more difficult to assess aspects of achievement.
6. Test items inappropriate for the outcomes being measured- attempting to
measure understanding, thinking, skills and other complex types of achievement with test
forms that are appropriate for only measuring factual knowledge will invalidate the results.
7. Poorly constructed test items- test items that unintentionally provide clues to
the answer tend to measure the students’ alertness in detecting clues as well as mastery of
skills or knowledge the test is intended to measure
8. Test too short- if a test is too short to provide a representative sample of the
performance we are interested in its validity will suffer accordingly.
9. Improper arrangement of items- test items are typically arranged in order of
difficulty, with the easiest items first. Placing difficult items first in the test may cause students
to spend too much time on these and prevent them from reaching items they could easily
answer. Improper arrangement may also influenced validity by having a detrimental effect on
student motivation.
10. Identifiable pattern of answer- Placing correct answers in some systematic
pattern (e.g., T,T,F,F, or B,B,BC,C,C,D,D,D) enables students to guess the answers to some items
more easily, and this lowers validity.
TABLE OF SPECIFICATIONS – TOS
Table of specification is a device for describing test items in terms of the content and the process
dimensions. That is, what a student is expected to know and what he or she is expected to do with that
knowledge. It is described by combination of content and process in the table of specification.
Sample of One way table of specification in Linear Function
Content
Number of Class
Number of Items
Sessions
Test Item
Distribution
1. Definition of linear function
2
4
1-4
2. Slope of a line
2
4
5-8
3. Graph of linear function
2
4
9-12
4. Equation of linear function
2
4
13-16
5. Standard Forms of a line
3
6
17-22
6. Parallel and perpendicular lines
4
8
23-30
7. Application of linear functions
5
10
31-40
20
40
40
TOTAL
Number of items= Number of class sessions x desired total number of itens
Total number of class sessions
Example :
Number of items for the topic‖ definition of linear function‖
Number of class session= 2
Desired number of items= 40
Total number of class sessions=20
Number of items= Number of class sessions x desired total number of itens
Total number of class sessions
=2x40
20
Number of items= 4
Sample of two-way table of specification in Linear Function
Content
1.Definition
function
of
linear
Class hours
Know
Com
App
2
1
1
1
Analysis
Synthesis
2.Slope of a line
2
3.Graph of linear function
2
1
4.Equation of linear function
2
1
1
5.Standard Forms of a line
3
1
1
1
1
6.Parallel and perpendicular
line
4
1
2
1
7.Application
functions
5
1
1
3
1
3
20
4
6
8
8
7
of
linear
TOTAL
1
Evaluati
on
Tota
l
1
4
1
1
1
1
1
1
4
1
4
1
6
2
8
1
1
10
7
40
MODULE 4: DESCRIPTION OF ASSESSMENT DATA
TEST APPRAISAL
ITEM ANALYSIS
Item analysis refers to the process of examining the student’s responses to each item in the test.
According to Abubakar S. Asaad and William M. Hailaya (Measurement and Evaluation Concepts &
Principles) Rex Bookstore (2004 Edition), there are two characteristics of an item. These are desirable
and undesirable characteristics. An item that has desirable characteristics can be retained for subsequent
use and that with undesirable characteristics is either be revised or rejected.
These criteria in determining the desirability and undesirability of an item.
a. Difficulty if an item
b. Discriminating power of an item
c. Measures of attractiveness
Difficulty index refers to the proportion of the number of students in the upper and lower groups who
answered an item correctly. In a classroom achievement test, the desired indices of difficulty not lower
than 0.20 nor higher than 0.80. the average index difficulty form 0.30 or 0.40 to maximum of 0.60.
DF = PUG + PLG
2
PUG = proportion of the upper group who got an item right
PLG = proportion of the lower group who get an item right
Level of Difficulty of an Item
Index Range
Difficulty Level
0.00-0.20
Very difficult
0.21-0.40
Difficult
0.41-0.60
Moderately Difficult
0.61-0.80
Easy
0.81-1.00
Very Easy
Index of Discrimination
Discrimination Index is the differences between the proportion of high performing students who got the
item and the proportion of low performing students who got an item right. The high and low performing
students usually defined as the upper 27% of the students based on the total examination score and the
lower 27% of the students based on total examination score. Discrimination are classified into positive
Discrimination if the proportion of students who got an item right in the upper performing group is
greater than the students in the upper performing group. And Zero Discrimination if the proportion of the
students who got an item right in the upper performing group and low performing group are equal.
Discrimination Index
Item Evaluation
0.40 and up
Very good item
0.30-0.39
Reasonably good item but possibly subject to improvement
0.20-0.29
Marginal, usually needing and being subject to improvement
Below 0.19
Poor Item, to be rejected or improved by version
Maximum Discrimination is the sum of the proportion of the upper and lower groups who answered the
item correctly. Possible maximum discrimination will occur if the half or less of the sum of the upper and
lower groups answered an item correctly.
Discriminating Efficiency is the index of discrimination divided by the maximum discrimination.
PUG = proportion of the upper group who got an item right
PLG= proportion of the lower group who got an item right
D i = discrimination index
DM – Maximum discrimination
DE = Discriminating Efficiency
Formula:
Di = PUG – PLG
DE = Di
DM
DM= PUG + PLG
Example: Eighty students took an examination in Algebra, 6 students in the upper group got the correct
answer and 4 students in the lower group got the correct answer for item number 6. Find the
Discriminating efficiency
Given:
Number of students took the exam = 80
27% of 80 = 21.6 or 22, which means that there are 22 students in the upper performing group and 22
students in the lower performing group.
P UG = 6/22 = 27%
P LG = 4/22 = 18%
Di = PUG- PLG
= 27%- 18%
Di= 9%
DM = PUG +PLG
= 27% + 18%
DM= 45%
DE = Di/DM
= .09/.45
DE = 0.20 or 20%
This can be interpreted as on the average, the item is discriminating at 20% of the potential of an item of
its difficulty.
Measures of Attractiveness
To measure the attractiveness of the incorrect option ( distracters) in multiple-choice tests, we count the
number if students who selected the incorrect option in both upperand lower groups. The incorrect
option is said to be effective distracter if there are more students in the lower group chose that
incorrect option than those students in the upper group.
Steps of Item Analysis
1. Rank the scores of the students from highest score to lowest score.
2. Select 27% of the papers within the upper performing group and 27% of the papers within the
lower performing group.
3. Set aside the 46% of papers because they will not be used for item analysis.
4. Tabulate the number of students in the upper group and lower group who selected each
alternative.
5. Compute the difficulty of each item
6. Compute the discriminating powers of each item
7. Evaluate the effectiveness of the distracters
MODULE 5: INTERPRETATION AND UTILIZATION OF TEST RESULTS
CRITERION-REFERENCED
INTERPRETATION
INTERPRETATION OF TEST RESULTS
VS.
NORM-REFRENCED
STATISTICAL ORGANIZATION OF TEST SCORES
We shall discusse different statistical technique used in describing and analyzing test results.
1. Measures of Central Tendency (Averages)
2. Measures of Variability ( Spread of Scores
3. Measures of Relationship (Correlation)
4. Skewness
Measures of Central Tendency it is a single value that is used to identify the center of the data, it is taught
as the typical value in a set of scores. It tends to lie within the center if it is arranged form lowest to
highest or vice versa. There are three measures of central tendency commonly used; the mean, median
and mode.
The Mean
The Mean is the common measures of center and it also know as the arithmetic average.
Sample Mean = ∑x
n
∑= sum of the scores
X= individual scores
n = number of scores
Steps in solving the mean value using raw scores
1. Get the sum of all the scores in the distribution
2. Identify the number of scores (n)
3. Substitute to the given formula and solve the mean value
Example: Find the mean of the scores of students in algebra quiz
(x) scores in algebra
45
35
48
60
44
39
47
55
58
54
∑x = 485
n= 10
Mean = ∑x
n
= 485÷ 10
Mean = 48.5
Properties of Mean
1.
2.
3.
4.
5.
6.
7.
Easy to compute
It may be an actual observation in the data set
It can be subjected to numerous mathematical computation
Most widely used
Each data affected by the extremes values
It is easily affected by the extremes values
Applied to interval level data
The Median
The median is a point that divides the scores in a distribution into two equal parts when the scores are
arranged according to magnitude, that is from lowest score to highest score or highest score to lowest
score. If the number of score is an odd number, the value of the median is the middle score. When the
number of scores is even number, the median values is the average of the two middle scores.
Example: 1. Find the median of the scores of 10 students in algebra quiz.
(x) scores of students in algebra
45
35
38
60
44
39
47
55
58
54
First , arrange the scores from lowest to highest and find the average of two middle most scores since the
number of cases in an even.
35
39
44
45
47
48
54
55
58
60
Mean = 47 + 48
2
= 47.5 is the median score
50% of the scores in the distribution fall below 47.5
Example 2. Find the median of the scores of 9 students in algebra quiz
(x) scores of students in algebra
35
39
44
45
47
48
54
55
58
The median value is the 5th score which is 47. Which means that 50% of the scores fall below 47.
Properties of Median
1.
2.
3.
4.
It is not affected by extremes values
It is applied to ordinal level of data
The middle most score in the distribution
Most appropriate when there are extremes scores
The Mode
The mode refers to the score or scores that occurred most in the distribution. There are classification of
mode: a) unimodal is a distribution that consist of only one mode. B) bimodal is a distribution of scores
that consist of two modes, c) multimodal is a score distribution that consist of more than two modes.
Properties of Mode
1.
2.
3.
4.
5.
It is the score/s occurred most frequently
Nominal average
It can be used for qualitative and quantitative data
Not affected by extreme values
It may not exist
Example 1. Find the mode of the scores of students in algebra quiz: 34,36,45,65,34,45,55,61,34,46
Mode= 34 , because it appeared three times. The distribution is called unimodal.
Example 2. Find the mode of the scores of students in algebra quiz: 34,36,45,61,34,45,55,61,34,45
Mode = 34 and 45, because both appeared three times. The distribution is called bimodal
Measures of Variability
Measures of Variability is a single value that is used to describe the spread out of the scores in
distribution, that is above or below the measures of central tendency. There are three commonly used
measures variability, the range, quartile deviation and standard deviation
The Range
Range is the difference between highest and lowest score in the data set.
R=HS-LS
Properties of Range
1. Simplest and crudest measure
2. A rough measure of variation
3. The smaller the value, the closer the score to each other or the higher the value, the more
scattered the scores are.
4. The value easily fluctuate, meaning if there is a changes in either the highest score or lowest score
the value of range easily changes.
Example: scores of 10 students in Mathematics and Science. Find the range and what subject has a greater
variability?
Mathematics
Science
35
35
33
40
45
25
55
47
62
55
34
35
54
45
36
57
47
39
40
52
Mathematics
Science
HS = 62
HS =57
LS= 33
LS= 25
R = HS-LS
R= HS-LS
R= 62-33
R= 57-25
R= 29
R= 32
Based form the computed value of the range, the scores in Science has greater variability. Meaning,
scores in Science are more scattered than in the scores in Mathematics
The Quartile Deviation
Quartile Deviation is the half of the differences the third quartile (Q3) and the first quartile (Q1). It is
based on the middle 50% of the range, instead the range of the entire set
Of distribution. In symbol QD = Q3-Q1
2
QD= quartile deviation
Q3= third quartile value
Q1= first quartile value
Example : In a score of 50 students, the Q3 = 50.25 and Q1 = 25.45, Find the QD
QD = Q3-Q1
2
=50.25 – 25.4
2
QD= 12.4
The value of QD =12.4 which indicates the distance we need to go above or below the median to include
approximately the middle 50% of the scores.
The standard deviation
The standard deviation is the most important and useful measures of variation, it is the square root of the
variance. It is an average of the degree to which each set of scores in the distribution deviates from the
mean value. It is more stable measures of variation because it involves all the scores in a distribution
rather than range and quartile deviation.
√∑( x-mean)2
SD =
n-1
where ,x = individual score
n= number of score in a distribution
Example: 1. Find the standard deviation of scores of 10 students in algebra quiz. Using the given data
below.
X
(x-mean)2
45
12.25
35
182.25
48
0.25
60
132.25
44
20.5
39
90.25
47
2.25
55
42.25
58
90.25
54
30.25
∑x= 485
∑(x-mean)2 = 602.25
N= 10
Mean = ∑x
N
= 485
10
Mean= 48.5
SD=
√∑(x-mean)2
n-1
SD=
√ 602.5
10-1
SD=
√ 66.944444
SD= 8.18, this means that on the average the
amount that deviates from the mean value= 48.5
is 8.18
Example 2: Find the standard deviation of the score of 10 students below. In what subject has greater
variability
Mathematics
Science
35
35
33
40
45
25
55
47
62
55
34
35
54
45
36
57
47
39
40
52
Solve for the standard deviation of the scores in mathematics
Mathematics (x)
(x-mean)2
35
82.81
33
123.21
45
0.81
55
118.81
62
320.41
34
102.01
54
98.01
36
65.61
47
8.41
40
16.81
∑x = 441
∑(x-mean)2 = 936.9
Mean = 44.1
∑(x-mean)2= 918
SD=
√∑(x-mean)2
n-1
=
√ 936.9
10-1
√
= 104.1
SD = 10.20 for the mathematics subject
Solve for the standard deviation of the score in science
Science (x)
(x-mean)2
36
64
40
9
25
324
47
16
55
144
35
64
45
4
57
196
39
16
52
81
∑x= 430
∑(x-mean)2= 918
Mean =430
10
Mean= 43
SD=
√∑(x-mean)2
n-1
=
√ 918
10-1
√ 102
=
SD= 10.10 for science subject
The standard deviation for mathematics subject is 10.20 and the standard deviation foe science subject is
10.10, which means that mathematics scores has a greater variability than science scores. In other words,
the scores in mathematics are more scattered than in science.
Interpretation of Standard Deviation
When the value of standard deviation is large, on the average, the scores will be far form the mean. On
the other hand, If the value of standard deviation is small, on the average, the score will be close form the
mean.
Coefficient of Variation
Coefficient of variation is a measure of relative variation expressed as percentage of the arithmetic mean.
It is used to compare the variability of two or more sets of data even when the observations are expressed
in different units of measurement. Coefficient of variation can be solve using the formula.
( )
CV = SD x 100%
Mean
The lower the value of coefficient of variation, the more the overall data approximate to the mean or more
the homogeneous the performance of the group
Group
Mean
Standard deviation
A
87
8.5
B
90
10.25
CV Group A= standard deviation
Mean
= 8.5 x
87
CV Group A=9.77%
x 100%
100%
CV GroupB= standard deviation
Mean
x 100%
= 10.25 x
90
CV Group B=11.39%
100%
The CV of Group A is 9.77% and CB of Group B is 11/39%, which means that group A has homogenous
performance.
Percentile Rank
The Percentile rank of a score is the percentage of the scores in the frequency distribution which are
lower. This means that the percentage of the examinees in the norm group who scored below the score of
interest. Percentile rank are commonly used to clarify the interpretation of scores on standardized tests.
Z- SCORE
Z- score (also known as standard score) measures how many standard deviations an observations is
above or below the mean. A positive z-score measures the number of standard deviation a score is above
the mean, and a negative z-negative z-score gives the number of standard deviation a score is below the
mean.
The z-score can be computed using the formula
Z= x-µ
o
for population
Z= x-mean for sample
SD
Where
X= is a raw score
0= is the standard deviation of the population
µ= is the mean of the population
SD= is the standard deviation of the sample
EXAMPLE:
James Mark’s examination results in the three subjects are as follows:
Subject
Mean
Standard deviation
James Mark’s Grade
Math Analysis
88
10
95
Natural Science
85
5
80
Labor Management
92
7.5
94
EXAMPLE:A study showed the performance of two Groups A and B in a certain test given by a
researcher. Group A obtained a mean score of 87 points with standard deviation of 8.5 points, Group B
obtained a mean score of 90 points with standard deviation of 10.25 points. Which of the two group has a
more homogeneous performance?
In what subject did James Mark performed best? Very Poor?
Z math analysis = 95-88
10
Z math analysis = 0.70
Z natural science= 80-85
5
Z natural Science= -1
Z labor management = 94-92
7.5
Z labor management = 0.27
James Mark had a grade in Math Analysis that was 0.70 standard deviation above the mean of the Math
Analysis grade, while in Natural Science he was -1.0 standard deviation below the mean of Natural
Science grade. He also had a grade in Labor Management that was 0.27 standard deviation above the
mean of the Labor Management grades. Comparing the z scores, James Mark performed best in
Mathematics Analysis while he performed very poor in Natural Science in relation to the group
performance.
T-score
T-score can be obtained by multiplying the z-score by 10 and adding the product to 50. In symbol, Tscore = 10z +50
Using the same exercise, compute the T-score of James Mark in Math Analysis, Natural Science and
Labor Management
T- score (math analysis)
= 10 (.7) +50
= 57
T- score (natural science) = 10(-1)+50
= 40
T-score (labor management) = 10(0.27) +50
=52.7
Since the highest T-score us in math analysis = 57, we can conclude that James Mark performed best in
Math analysis than in natural science and labor management.
Stanine
Stanine also known as standard nine, is a simple type of normalized standard score that illustrate the
process of normalization. Stanines are single digit scores ranging form 1 to 9.
The distribution of new scores is divided into nine parts
Percent
in
Stanines
4%
7%
12%
17%
20%
17%
12%
7%
4%
2
3
4
5
6
7
8
9
Stanines 1
Skewness
Describes the degree of departures of the distribution of the data from symmetry.
The degree of skewness is measured by the coefficient of lsewness, denoted as SK and computed as,
SK= 3(mean-media)
SD
Normal curve is a symmetrical bell shaped curve, the end tails are continuous and asymptotic. The mean,
median and mode are equal. The scores are normally distributed if the computed value of SK=0
Areas Under the Normal Curve
Positively skewed when the curve is skewed to the right, it has a long tail extending off to the right but a
short tail to the left. It increases the presence of a small proportion of relatively large extreme value SK˃0
When the computed value of SK is positive most of the scores of students are very low, meaning to say
that they performed poor in the said examination
Negatively skewed when a distribution is skewed to the left. It has a long tail extending off to the left but
a short tail to the right. It indicates the presence of a high proportion of relatively large extreme values
SK˂0.
When the computed value of SK is negative most of the students got a very high score, meaning to say
that they performed very well in the said examination
MODULE 6: MARKS/GRADES AND GRADING SYSTEM
BASIC TERMINOLOGY




Marks= Cumulative grades that reflect students’ academic progress during a period of instruction
Score= Reflect performance on a single assessment
Grades= Can be used interchangeably with marks
Most of the time these terms are used to mean the same thing
FEEDBACK AND EVALUATION




Test results can be used for a variety of reasons, such as informing students of their progress,
evaluating achievement, and assigning grades
Formative evaluation= Activities that are aimed at providing feedback to the students
Summative evaluation= Activities that determine the worth, value, or quality of an outcome
Often involve the assignment of a grade
INFORMAL AND FORMAL EVALUATION





Informal evaluation= Not planned and not standardized
Can come in the form of commentary such as
―great work‖ or ―try that one again‖
Formal evaluation= More likely to be applied consistently and be written out
Includes scores and commentary, often written down
THE USE OF FORMATIVE EVALUATION IN SUMMATIVE EVALUATION


Sometimes, formative assessment and evaluation can feed into summative evaluation
This is recommended more in courses of study that are topical, rather than sequential, as mastery
of earlier concepts may not reflect on the assessment of later ones, and vice versa
REPORTING STUDENT PROGRESS: WHICH SYMBOLS TO USE?



This is often decided by the administration or state
Most teachers are familiar with letter (A, B, C, D, F) and numerical (0-100) grades
Verbal descriptors= Grades like excellent or needs improvement


Pass-fail= A variant of mastery grading in which most students are expected to master the content
(i.e. ―pass‖)
Supplemental systems= Using means of communication like phone calls home, checklists of
objectives, or other methods to communicate feedback
BASIS OF GRADES
Before assigning grades, consider: Are the grades solely based on academic achievement, or are there
other factors to consider?




Factors could include attendance, participation, attitudes, etc.
Most experts recommend making academic chievement the sole basis for assigning grades
If desired, the recommendation is to keep a separate rating system for such nonachievement
factors to keep achievement grades unbiased
When comparing grades (5th grade to 6th grade, for example) it is critical to consider how grades
were calculated. Grades based heavily on homework will not be comparable to grades based
heavily on testing.
FRAME OF REFERENCE


After deciding what to base your grades on, you will then have to decide how you’re going to
interpret and compare student scores
There are several different frames of reference that suit different needs
NORM-REFERENCE GRADING (RELATIVE GRADING)






Involves comparing each student’s performance to that of a reference group
Also known as ―grading on a curve‖
In this arrangement, a certain amount of students receive each grade (10% receive A’s, 20%
receive B’s, and so on)
Straightforward method of grading, and helps reduce grade inflation
However, depending on the reference group used as a basis, this frame of reference is not always
considered fair
Another approach is to use ranges instead of exact percentages (10-20% A’s, 20-30% B’s,
etc.)
CRITERION-REFERENCED GRADING (ABSOLUTE GRADING)




Involves comparing a student’s performance to a specified level of performance
One common system is the percentage system (A= 90-100%, B=8089%, etc.)
Marks directly describe student performance
However, there may be considerable variability between teachers of how they assign grades
(lower vs. higher expectations)
ACHIEVEMENT IN RELATION TO IMPROVEMENT OR EFFORT




Students who make higher learning gains earn better grades than those who make smaller gains
This method of grading can be risky, as students may figure out to start the year or unit low and
finish high to earn a better grade
There are also many other technical factors, including the fact that this is not a pure measure of
achievement, but a measure of effort as well
Can motivate poor students, but may have a negative effect on strong students
ACHIEVEMENT RELATIVE TO ABILITY
Usually based on performance on an intelligence test
There are also numerous technical and consistency issues to be taken into consideration when using this
approach
RECOMMENDATION
Most experts recommend using absolute than relative grading systems, as they represent pure measures of
student achievement
Both grading systems have their strengths and limitations, which should be taken into consideration when
deciding which to use

Reporting both styles of grades is also an option
COMBINING GRADES INTO A COMPOSITE
The decision of how much certain grades should weigh into the composite (or final grade) is up to the
teacher or department and is based on the importance of different types of assignments (e.g., five response
papers might be 10% each, with 12.5% assigned to 4 tests; this is different from 50% assigned to one
major paper and 50% to one cumulative test)
There are several different methods of equating scores into composite scores, although most schools have
commercial gradebook programs that do this for the teacher
INFORMING STUDENTS OF GRADING SYSTEM




Students should be informed early on in the course about exactly how they will be graded well
before any assessment procedures have taken place
Parents should also be informed of grading procedures
It is the professional responsibility of a teacher to explain the scores/grades to students and
parents in ways that the explanation is understandable
This can be done by simply handing out a sheet with a breakdown of the weights of different
grades, though it is recommended that Q & A sessions are conducted
PARENT CONFERENCES




Parent-teacher conferences should be professional and the information disclosed should be kept
confidential
Discussion should only concern the individual student
Teachers should have a file folder or computer file of the student’s performance and grades
readily available
Presenting the students work as evidence/an indicator of grades is also recommended
FUNCTIONS OF GRADING AND REPORTING IN GRADING SYSTE,
1. Improve students’ learning by:





clarifying instructional objectives for them
showing students’ strengths & weaknesses
providing information on personal-social development
enhancing students’ motivation (e.g., short-term goals)
indicating where teaching might be modified
Best achieved by:


day-to-day tests and feedback
plus periodic integrated summaries
2. Reports to parents/guardians


Communicates objectives to parents, so they can help promote learning
Communicates how well objectives being met, so parents can better plan
3.Administrative and guidance uses




Help decide promotion, graduation, honors, athletic eligibility
Report achievement to other schools or to employers
Provide input for realistic educational, vocational, and personal counseling
TYPES OF GRADING AND REPORTING SYSTEM
1.Traditional letter-grade system


Easy and can average them
But of limited value when used as the sole report, because:
1. they end up being a combination of achievement, effort, work habits, behavior
2. teachers differ in how many high (or low) grades they give
3. they are therefore hard to interpret
4. they do not indicate patterns of strength and weakness
2. Pass-fail




Popular in some elementary schools
Used to allow exploration in high school/college
Should be kept to the minimum, because:
1. do not provide much information
2. students work to the minimum
In mastery learning courses, can leave blank till ―mastery‖ threshold reached
3. Checklists of objectives




Most common in elementary school
Can either replace or supplement letter grades
Each item in the checklist can be rated: Outstanding, Satisfactory, Unsatisfactory; A, B, C, etc.
Problem is to keep the list manageable and understandable
4. Letters to parents/guardians



Useful supplement to grades
Limited value as sole report, because:
1. very time consuming
2. accounts of weaknesses often misinterpreted
3. not systematic or cumulative
Great tact needed in presenting problems (lying, etc.)
5. Portfolios


Set of purposefully selected work, with commentary by student and teacher
Useful for:
1. showing student’s strengths and weaknesses
2. illustrating range of student work
3. showing progress over time or stages of a project
4. teaching students about objectives/standards they are to meet
6. Parent-teacher conferences




Used mostly in elementary school
Portfolios (when used) are useful basis for discussion
Useful for:
1. two-way flow of information
2. getting more information and cooperation from parents
Limited in value as the major report, because
1. time consuming
2. provides no systematic record of progress
3. some parents won’t come
HOW YOU SHUOLD DEVELOP GRADING SYSTEM?
1. Guided by the functions to be served


will probably be a compromise, because functions often conflict
but always keep achievement separate from effort
2. Developed cooperatively (parents, students, school personnel)


more adequate system
more understandable to all
3. Based on clear statement of learning objectives




are the same objectives that guided instruction and assessment
some are general, some are course-specific
aim is to report progress on those objectives
practicalities may impose limits, but should always keep the focus on objectives
4. Consistent with school standards



should support, not undermine, school standards
should use the school’s categories for grades and performance standards
should actually measure what is described in those standards
5. Based on adequate assessment


implication: don’t promise something you cannot deliver
design a system for which you can get reliable, valid data
6. Based on the right level of detail



detailed enough to be diagnostic
but compact enough to be practical
1. not too time consuming to prepare and use
2. understandable to all users
3. easily summarized for school records
probably means a letter-grade system with more detailed supplementary reports
7. Providing for parent-teacher conferences as needed


regularly scheduled for elementary school
as needed for high school
ASSIGNING LETEER GRADES
What to include?


Only achievement
Avoid temptation to include effort for less able students, because:
1. difficult to assess effort or potential
2. difficult to distinguish ability from achievement
3. would mean grades don’t mean same thing for everyone (mixed message, unfair)
How to combine data?


Properly weight each component to create a composite
Must put all components on same scale to weight properly:
1. equate ranges of scores (see example on p. 389, where students score 10-50 on one test and
80-100 on another)
2. or, convert all to T-scores or other standard scores (see chapter 19)
What frame of reference?



Relative—score compared to other students (where you rank)
1. grade (like a class rank) depends on what group you are in, not just your own performance
2. typical grade may be shifted up or down, depending on group’s ability
3. widely used because much classroom testing is norm-referenced
Absolute—score compared to specified performance standards (what you can do)
1. grade does NOT depend on what group you are in, but only on your own performance
compared to a set of performance standards
2. complex task, because must
I. clearly define the domain
II. clearly define and justify the performance standards
III. do criterion-referenced assessment
3. conditions hard to meet except in complete mastery learning settings
Learning ability or improvement—score compared to learning ―potential‖ or past performance
4. widely used in elementary schools
5. inconsistent with a standards-based system (each child is their own standard)
6. reliably estimating learning ability (separate from achievement) is very difficult
7. can’t reliably measure change with classroom measures
8. therefore, should only be used as a supplement
What distribution of grades?


Relative (have ranked the students)—distribution is a big issue
1. normal curve defensible only when have large, unselected group
2. when ―grading on the curve,‖ school staff should set fair ranges of grades for different
groups and courses
3. when ―grading on the curve,‖ any pass-fail decision should be based on an absolute
standard (i.e., failed the minimum essentials)
4. standards and ranges should be understood and followed by all teachers
Absolute (have assessed absolute levels of knowledge)—not an issue
1. system seldom uses letter grades alone
2. often includes checklists of what has been mastered (see example on p. 395)
3. distribution of grades is not predetermined
Guidelines for Effective Grading
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Describe grading procedures to students at beginning of instruction.
Clarify that course grade will be based on achievement only.
Explain how other factors (effort, work habits, etc.) will be reported.
Relate grading procedures to intended learning outcomes.
Obtain valid evidence (tests, etc.) for assigning grades.
Try to prevent cheating.
Return and review all test results as soon as possible.
Properly weight the various types of achievements included in the grade.
Do not lower an achievement grade for tardiness, weak effort, or misbehavior.
Be fair. Avoid bias. When in doubt, review the evidence. If still in doubt, give the higher
grade.
Conducting Parent-Teacher Conferences
Productive when:


Carefully planned
Teacher is skilled
Guidelines for a good conference
1. Make plans




Review your goals
Organize the information to present
Make list of points to cover and questions to ask
If bring portfolios, select and review carefully
2. Start positive—and maintain a positive focus
3. Present student’s strong points first


Helpful to have example of work to show strengths and needs
Compare early vs. later work to show improvement
4. Encourage parents to participate and share information


Be willing to listen
Be willing to answer questions
5. Plan actions cooperatively

What steps you can each take

Summarize at the end
6. End with positive comment


Should not be a vague generality
Should be true
7. Use good human relations skills
DO






Be friendly and informal
Be positive in approach
Be willing to explain in understandable terms
Be willing to listen
Be willing to accept parents’ feelings
Be careful about giving advice
DON’T






Argue, get angry
Ask embarrassing questions
Talk about other students, parents, teachers
Bluff if you don’t know
Reject parents’ suggestions
Be a know-it-all with pat answers
Reporting Standardized Test Results to Parents
Aims



Present test results in understandable language, not jargon
Put test results in context of total pattern of information about the student
Keep it brief and simple
Actions
1. Describe what the test measures



Use a general statement: e.g., ―this test measures skills and abilities that are useful in
school learning‖
Refer to any part of the test report that may list skill clusters
Avoid misunderstandings by:
a. not referring to tests as ―intelligence‖ tests
b. not describing aptitudes and abilities as fixed

c. not saying that a test predicts outcomes for an individual person (can say ―people
with this score usually….‖
Let a counselor present results for any non-cognitive test (personality, interests, etc.)
2. Explain meaning of test scores (chapter 19 devoted to this)


For norm-referenced
1. explain norm group
2. explain score type (percentile, stanine, etc.)
3. stay with one type of score, if possible
For criterion-referenced
1. more easily understood than norm-referenced
2. usually in terms of relative degree of mastery
3. describe the standard of mastery
4. may need to distinguish percentile from percent correct
3. Clarify accuracy of scores




Say all tests have error
Stanines already take account of error (because so broad). Two stanine difference is probably
a real difference
For other scores, use confidence bands when presenting them
If you refer to subscales with few items, describe them as only ―clues‖ and look for related
evidence.
4. Discuss use of test results

Coordinate all information to show what action they suggest
Decisions in Assigning Grades
1. What should grades include (effort, achievement, neatness, spelling, good behavior, etc.)?
2. Grades for individual assessments


criterion-reference or norm-referenced?
1. if criterion-referenced, what standard?
2. if norm-referenced, what reference group?
letter grades or numbers?
3. Combining assessments for a composite grade

what common numerical scale?
1. percentages
2. standard scores
3. range of scores (max-min)
4. combining absolute and relative grades


weight to give different assessments?
what cut-off points for letter grades?
MODULE 7: AUTHENTIC ASSESSMENT
MODE OF ASSESSMENT
A. Traditional Assessment
1. Assessment in which students typically select an answer or recall information to complete the
assessment. Test may be standardized or teacher made test, these tests may be multiplechoice, fill-in-the-blanks, true-false, matching type.
2. Indirect measures of assessment since the test items are designed to represent competence by
extracting knowledge and skills from their real life context.
3. Items on standardized instrument tends to test only the domain of knowledge and skill to
avoid ambiguity to the test takers.
4. One-time measures to rely on a single correct answer to each item. There is a limited
potential for traditional test to measure higher order thinking skills.
B. Performance assessment
1. Assessment in which students are asked to perform real-world tasks that demonstrate
meaningful application of essential knowledge and skills
2. Direct measures of students’ performance because task are design to incorporate contexts,
problems, and solutions strategies that students would use in real life.
3. Designed ill-structured challenges since the goal is to help students prepare for the complex
ambiguities in life.
4. Focus on processes and rationales. There is no single correct answer, instead students are led
to craft polished, thorough and justifiable responses, performances and products.
5. Involve long-range projects, exhibits, and performances are linked to the curriculum
6. Teacher is an important collaborator in creating tasks, as well as in developing guidelines for
scoring and interpretation
C. Portfolio Assessment
1. Portfolio is a collection of student’s work specifically to tell a particular story about the
student.
2. A portfolio is not a pie of student work that accumulates over a semester or year
3. A portfolio contains a purposefully selected subset of student work
4. It measures the growth and development of students.
TRADITIONAL ASSESSMENT VS. AUTHENTIC ASSESSMENT
Traditional ----------------------------- Authentic
Selecting a Response ------------------- Performing a Task
Contrived -------------------------------- Real-life
Recall/Recognition -------------------------------- Construction/Application
Teacher-structured ---------------------------------- Student-structured
Indirect Evidence --------------------------------- Direct Evidence
Seven Criteria in Selecting a Good Performance Assessment Task
1. Authenticity – the task is similar to what the students might encounter in the real
world as opposed to encountering only in school.
2. Feasibility – the task is realistically implemented in relation to its cost, space, time,
and equipment requirements.
3. Generalizability – the likelihood that the students’ performance on the task will
generalize to comparable tasks.
4. Fairness – the task is fair to all students regardless of their social status or gender.
5. Teachability – the task allows one to master the skill that one should be proficient in.
6. Multi Foci – the task measures multiple instructional outcomes.
7. Scorability – the task can be reliably and accurately evaluated.
Rubrics
Rubrics is a scoring scale and instructional tool to assess the performance of student using a task-specific
set of criteria. It contains two essential parts: the criteria for the task and levels of performance for each
criterion. It provides teachers an effective means of students-centered feedback and evaluation of the
work of students. It also enables teachers to provide a detailed and informative evaluations of their
performance.
Rubrics is very important most especially if you are measuring the performance of students against a set
of standard or pre-determined set of criteria. Through the use of scoring rubrics or rubrics the teachers can
determine the strengthens and weaknesses of the students, hence it enables the students to develop their
skills.
Steps in developing a Rubrics
1. Identify your standards, objectives and goals for your students. Standard is a statement of what
the students should be able to know or be able to perform. It should indicate that your students
should be able to know or be able to perform. It should indicate that your students should met
these standards. Know also the goals for instruction, what are the learning outcomes.
2. Identify the characteristics of a good performance on the task, the criteria, when the students
perform or present their work, it should indicate that they performed well in the task given to
them; hence they met that particular standards.
3. Identify the levels of performance for each criterion. There is no guidelines with regards to the
number of levels of performance, it vary according to the task and needs. It can have as few as
two levels of performance or as many as the teacher can develop. In this case, the rater can
sufficiently discriminate the performance of the students in each criteria. Through this levels of
performance, the teacher or the rater can provide more detailed feedback about the performance
of the students. It is easier also for the teacher and students to identify the areas needed for
improvement.
Types of Rubrics
1. Holistic Rubrics
In holistic rubrics does not list a separate levels of performance for each criterion. Rather,
holistic, rubrics assigns a level of performance along with a multiple criterion as a whole, in other
words you put all the component together.
Advantage: quick scoring, provide overview of students’ achievement.
Disadvantage: does not provide detailed information about the student performance in specific
areas of the content and skills. May be difficult to provide one overall score.
2. Analytic Rubrics
In analytic rubrics the teacher or the rater identify and assess components of a finished product.
Breaks down the final product into component parts and each part is scored independently. The
total score is the sum of all the rating for all the parts that are to be assessed or evaluated. In
analytic scoring, it is very important for the rater to treat each part as separate to avoid bias
toward the whole product.
Advantage: more detailed feedback, scoring more consistent across students and graders.
Disadvantage: time consuming to score.
Example of Holistic Rubric
3-Excellent Researcher
 Included 10-12 sources
 No apparent historical inaccuracies
 Can easily tell which sources information was drawn from
 All relevant information is included
2- Good Researcher
 Included 5-9 sources
 Few historical inaccuracies
 Can tell with difficulty where information came from
 Bibliography contains most relevant information
1-Poor Researcher




Included 1-4 sources
Lots of historical inaccuracies
Cannot tell from which source information came from
Bibliography contains very little information
Example of Analytic Rubric
Criteria
Limited
Acceptable
Proficient
1
2
1
Made good observations
Observations are Most observations All observations
absent or vague
are clear and detailed clear and detailed
are
Made good predictions
Predictions
are Most predictions are All predictions
absent or irrelevant reasonable
reasonable
are
Appropriate conclusion
Conclusion
absent
inconsistent
observation
is Conclusion
is
or consistent with most
with observations
Conclusion
consistent
observations
is
with
Advantages of Using Rubrics
When assessing the performance of the students using performance based assessment it is very important
to use scoring rubrics. The advantages of using rubrics in assessing student’s performance are:
1.
2.
3.
4.
5.
6.
Rubrics allow assessment to become more objective and consistent
Rubrics clarify the criteria in specific terms
Rubrics clearly show the student how work will be evaluated and what is expected
Rubrics promote student awareness of the criteria to use in assessing peer performance
Rubrics provide useful feedbacks regarding the effectiveness of the instruction: and
Rubrics provide benchmarks against which to measure and document progress
PERFORMANCE BASED ASSESSMENT
Performance based assessment is a direct and systematic observation of actual performances of the
students based from a pre-determined performance criterion as cited by (Gabuyo, 2011). It is an
alternative form of assessing the performance of the students that represent a set of strategies for the
application of knowledge, skills and work habits through the performance of tasks that are meaningful
and engaging to students‖
Framework of Assessment Approaches
Selection Type
Supply Type
Product
Performance
True-false
Completion
Essay, story or poem
Oral presentation
report
Multiple-choice
Label a diagram
Writing portfolio
Musical, dance or
dramatic performance
Matching type
Short answer
Research report
Typing test
Concept man
Portfolio exhibit, Art
exhibit
Diving
Writing journal
Laboratory
demonstration
of
Cooperation in group
works
Forms of Performance Based Assessment
1. Extended response task
a. Activities for single assessment may be multiple and varied
b. Activities may be extended over a period of time
c. Products from different students may be different in focus
2. Restricted-response tasks
a. Intended performances more narrowly defined than extended-response tasks.
b. Questions may begin like a multiple-choice or short answer stem, but then ask for
explanation, or justification.
c. May have introductory material like an interpretative exercise, but then asks for an
explanation of the answer, not just the answer itself
3. Portfolio is a purposeful collection of student work that exhibits the student’s efforts, progress
and achievements in one or more areas.
Uses of Performance Based Assessment
1. Assessing the cognitive complex outcomes such as analysis, synthesis and evaluation
2. Assessing non-writing performances and products
3. Must carefully specify the learning outcomes and construct activity or task that actually called
forth.
Focus of Performance Bases Assessment
Performance based assessment can assess the process, or product or both (process and product) depending
on the learning outcomes. It also involves doing rather that just knowing about the activity or task. The
teacher will assess the effectiveness of the process or procedures and the product used in carrying out the
instruction. The question is when to use the process and the product?
Use the process when:
1.
2.
3.
4.
5.
There is no product
The process is orderly and directly observable;
Correct procedures/steps in crucial to later success;
Analysis of procedural steps can help in improving the product,
Learning is at the early age.
Use the product when:
1.
2.
3.
4.
Different procedures result in an equally good product;
Procedures not available for observation;
The procedures have been mastered already;
Products have qualities that can be identified and judge
The final step in performance assessment is to assess and score the student’s performance. To assess the
performance of the students the evaluator can used checklist approach, narrative or anecdotal approach,
rating scale approach, and memory approach. The evaluator can give feedback on a student’s performance
in the form of narrative report or grade. There are different was to record the results of performance-based
assessments.
1. Checklist Approach are observation instruments that divide performance whether it is certain or
not certain. The teacher has to indicate only whether or not certain elements are present in the
performances
2. Narrative/Anecdotal Approach is continuous description of student behavior as it occurs,
recorded without judgment or interpretation. The teacher will write narrative reports of what was
done during each of the performances. Form these reports teachers can determine how well their
students met their standards.
3. Rating Scale Approach is a checklist that allows the evaluator to record information on a scale,
noting the finer distinction that just presence or absence of a behavior. The teacher they indicate
to what degree the standards were met. Usually, teachers will use a numerical scale. For instance,
one teacher may arte each criterion on a scale of one to five with one meaning ―skills barely
present‖ and five meaning ―skill extremely well executed.‖
4. Memory Approach the teacher observes the students when performing the tasks without taking
any notes. They use the information from memory to determine whether or not the students were
successful. This approach is not recommended to use for assessing the performance of the
students.
PORTFOLIO ASSESSMENT
Portfolio assessment is the systematic, longitudinal collection of student work created in response to
specific, know instructional objectives and evaluated in relation to the same criteria. Student Portfolio is a
purposeful collection of student work that exhibits the students’ efforts, progress and achievements in one
or more areas. The collection must include student participation in selecting contents, the criteria for
selection, the criteria for judging merit and evidence of student self-reflection.
Comparison of Portfolio and Traditional Forms of Assessment
Traditional Assessment
Portfolio Assessment
Measures student’s ability at one time
Measures student’s ability over time
Done by the teacher alone, students are not aware Done by the teacher and the students, the students
of the criteria
are aware of the criteria
Conducted outside instruction
Embedded in instruction
Assigns student a grade
Involves student in own assessment
Does not capture the students language ability
Capture many facets if
performance
language learning
Does not include the teacher’s knowledge of Allows for expression of teacher’s knowledge of
student as a learner
student as learner
Does not gives student responsibility
Student learns how to take responsibility
THREE TYPES OF PORFOLIO
There are three basic types of portfolio to consider for classroom use. These are working portfolio,
showcase portfolio and progress portfolio
1. Working Portfolio
The first type of portfolio is working portfolio also known as ―teacher -student portfolio‖. As the
name implies that it is a project ―in work‖ it contains the work in progress as well as the finished
samples of work use to reflect in process by the students and teachers. It documents the stages of
learning and provides a progressive record of student growth. This is interactive teacher-student
portfolio that aids in communication between teacher and student.
The working portfolio may be used to diagnose student needs. In both student and teacher have
evidence of student strengths and weakness in achieving learning objectives, information
extremely useful in designing future instruction.
2. Showcase Portfolio
Showcase portfolio is the second type of portfolio and also known as best works portfolio or
display portfolio. In this kind of portfolio, it focuses on the student’s best and most representative
work. It exhibits the best performance of the student. Best works portfolio may document student
activities beyond school for example a story written at home. It is just like an artist’s portfolio
where a variety of work is selected to reflect breadth of talent, painters can exhibit the best
paintings. Hence, in this portfolio the student selects what he or she thinks is representative work.
This folder is most often seen at open houses and parent visitations.
The most rewarding use of student portfolios is the display of student’s best work, the work that
makes them proud. In this case, it encourages self-assessment and build self-esteem to students.
The pride and sense of accomplishment that students feel make the effort well worthwhile and
contribute to a culture for learning in the classroom
3. Progress Portfolio
This third type of portfolio is progress portfolio and it is also known as Teacher Alternative
Assessment Portfolio. It contains examples of student’s work with the same types done over a
period of time and they are utilized to assess their progress
All the works of the students in this type of portfolio are scored, rated, ranked, or evaluated.
Teachers can keep individual student portfolios that are solely for the teacher’s use as an
assessment tool. This a focused type of portfolio and is a model approach to assessment.
Assessment portfolios used to document student learning on specific curriculum outcomes and
used to demonstrate the extent of mastery in any curricular area,
Uses of Portfolios
1. It can provide both formative and summative opportunities for monitoring progress toward
reaching identified outcomes
2. Portfolios can communicate concrete information about what us expected of students in terms
of the content and quality of performance in specific curriculum areas.
3. A portfolio is that they allow students to document aspects of their learning that do not show
up well in traditional assessments
4. Portfolios are useful to showcase periodic or end of the year accomplishment of students such
as in poetry, reflections on growth, samples of best works, etc.
5. Portfolios may also be used to facilitate communication between teachers and parents
regarding their child’s achievement and progress in a certain period of time.
6. The administrator may use portfolios for national competency testing to grant high school
credit, to evaluate education programs.
7. Portfolios may be assembled for combination of purposes such as instructional enhancement
and progress documentation. A teacher reviews students portfolios periodically and make
notes for revising instruction for next year used.
According to Mueller (2010) there are seven steps in developing portfolios of students.
Below are the discussions of each step.
1.
2.
3.
4.
Purpose: What is the purposes of the portfolio?
Audience: For what audience will the portfolio be created?
Content: What samples of student work will be included?
Process: What processes (e.g. selection of work to be included, reflection in work, conferencing)
will be engaged in during the development of the portfolio?
5. Management: How will time and materials be managed in the development of the portfolio?
6. Communication: How and when will the portfolio be shared with pertinent audiences?
7. Evaluation: If the portfolio is to be used for evaluation, when and how should it be evaluated?
Guidelines for Assessing Portfolios
1.
2.
3.
4.
5.
Include enough documents (items) on which to base judgment
Structure the contents to provide scorable information
Develop judging criteria and a scoring scheme fir raters to use in assessing the portfolios
Use observation instruments such as checklists and rating when possible to facilitate scoring.
Use trained evaluators or assessors
GUIDANCE AND COUNSELING
Guidance and Counseling are both process to solve problems of life, they differ only on the approach
used. In guidance the client’s problems are listened carefully and readymade solutions are provided by the
experts. While in counseling the client’s problem are discussed and relevant information are provided inbetween. Through these information, the client will gain an insight to the problem and become
empowered to take his own decision.
Guidance Counselor assist each student to benefit from the school experience through attention to their
personal, social and academic needs.
Guidance (Downing) as pointed out by Lao (2006) is an organized set of specialized services established
as an integral part of the school environment designed to promote the development of students and assist
them toward a realization of sound, wholesome adjustment and maximum accomplishment commensurate
with their potentialities.
Guidance (Good) is a process id dynamic interpersonal relationship designed to influence the attitude and
subsequent behavior of the person.
Counseling is both process and relationship. It is a process by which concentrated attention is given by
both counselor and counselee to the problems and concerns of the students in a setting of privacy,
warmth, mutual acceptance and confidentiality. As a process it utilizes appropriate tools and procedure
which contribute to experience. Counseling is also a relationship characterized by trust, confidence and
intimacy in which the students gains intellectual and emotional stability from which he can resolve
difficulties, make plans and realize greatest self-fulfillment.
Villar (2207) pointed out the different guidance services based from Rules and Regulations of Republic
Act 9258, Rule 1, Section 3 Manila standard, 2007) and other services not mentioned in Rules and
Regulations
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Individual inventory/ analysis
Information
Counseling
Research
Placement
Referral
Follow-up
Evaluation
Consultation
Program development
Public relations
Roles of the Guidance Counselor
There are 5 roles of the guidance counselor are discussed by Dr. Imelda V.G. Villar in her book
―implementing a comprehensive Guidance and Counseling Programs in the Philippines (2007)
1.
2.
3.
4.
5.
As Counselor
As Coordinator
As Consultant\
As Conductor of Activities
As Change Agent
Essential Elements of Counseling Process
1.
2.
3.
4.
5.
6.
7.
Anticipating the interview
Developing a positive working relationship
Exploring feelings and attitudes
Reviewing and determining present status
Exploring alternatives
Reading decision
Post counseling contact
Techniques and Methodologies used in the Guidance Process
1.
2.
3.
4.
5.
Autobiography
Anecdotal record
Case study
Cumulative record
Interview
6.
7.
8.
9.
Observation
Projective techniques
Rating scale
Sociometry
Ethical Consideration of the Counselor
1.
2.
3.
4.
Counselor’s responsibility to the client and to his family
Recognize the boundaries of their competence and their own personal and professional limitations
Confidentiality
Imposition of one’s values and philosophy of life on the client is considered unethical.
Four Important Functions of Guidance Services
1. Counseling
 Individual counseling
 Small group counseling
 Crisis counseling
 Career counseling
 Referrals
 Peer helping programs
2. Prevention
 Primary, secondary, tertiary plans and programs
 Individual assessments coordinated student support team activities
 Students activities
 Transitional planning
REFERENCES
Acero, V.O et, al., (2000). Principles and strategies of learning. Manila: Rex Book Store.
Calderon, J.F and Expectation, C.G (1993). Measurement and evaluation. Manila:
Solares printing Press.
Clamorin, L.P (1984). Educational measurement and evaluation. Metro Manila: National
Book Store, Inc.
Garcia, C.D (2004). Educational measurement and evaluation. Mandaluyong City:
Books, Atbp. Publishing Corp.
Hopkins, C.D et, al., (1990). Classroom measurement and evaluation. Illinois: F.E.
Peacoock Publisher, Inc.
Mercado-Del Rosario, A.C. (2001). Educational measurement and evaluation. Manila:
JNPM Design and Printing.
McMillan, J.H (2004). Classroom assessment: principles and practice foe effective
instruction. Boston: Pearson Education, Inc.
Navarro, R.L et. al., (2013). Authentic assessment of student learning outcomes:
Assessment of learning 2. Quezon City, Metro Manila Philippines: LORIMAR Publishing
INC.
Seng, T.O et al., (2003). Educational Psychology: A Practioner-Researcher Approach.
USA: Thomson Asia Pub. Ltd.
Tilestone, D.W. (2004). Student assessment. California: Corwin Press.
Internet site:
http//jonathan,Mueller,faculty.noctrl.edu
http://www1.udel.edu/educ/gottfredson/451/unit11-chap15.htm
https://www.google.com/search?q=uses+of+marks+and+grades+in+assessment&oq=U
SES+OF+MARKS+AND+GRADES&aqs=chrome.
Download