Uploaded by juliusjayson.gonzales

Assessment of Learning

advertisement
Assessment of Learning
I. Basic Concept
1. Test – the instruments designed to measure any characteristic, quality, ability, or skill.
2. Measurement – the process of quantifying the degree to which someone/something possesses
a given trait.
3. Assessment – process of gathering and organizing quantitative or qualitative data into an
interpretable form to have a basis for judgment or decision-making. Pre-requisite of evaluation.
4. Evaluation – process of systematic interpretation, analysis, appraisal, or judgment of the worth
of organized data as basis for decision-making

Feedback is an important component that makes up an assessment cycle.
II. Purpose of Assessment
1. Assessment FOR Learning (done before and during instruction)
 Aptitude – entry skills and knowledge
 Placement – prior to instruction, sectioning; basis in planning for a relevant instruction
 Formative – during instruction, monitors students’ attainment of the learning objective
 Diagnostic – before/during instruction, to determine student’s recurring or persistent
difficulties; aids in the formulation of a plan for remedial instruction)
2. Assessment OF Learning (done after instruction)
 summative assessment, term/ unit exams, chapter exercises, and the like
3. Assessment AS Learning
 Self-Assessment – metacognition, introspection
III. Modes of Assessment
1. Traditional (paper and pencil/pen)
2. Alternative
 Performance
a. Process-oriented (demonstration)
b. Product-oriented (creation)
/ Portfolio (collection/ compilation of students’ works, artifacts, and evidence)
3. Authentic (simulate real-life)
IV. Do's and Don'ts in Test Construction
A. General Guidelines
i. Avoid wordings that are ambiguous.
ii. Use appropriate vocabulary words.
iii. Keep questions short and to the point.
iv. Write items that have one correct answer.
v. Do not provide clues to the answer.
B. Specific Guidelines
I. for supply type
 answers should be brief and specific;
 do not take statements directly from textbooks;
 if possible, make use of direct question rather than incomplete statement;
 if answers is in numerical units, indicate the type of answer wanted;
 blanks should be at the end of the question; and
 do not mutilate statement.
II. for Alternate-response/True-False
 avoid broad, trivial statements, negative &/or double negatives, specific
determiners/absolute terms, and long and complex sentences when
crafting statements;
 always ensure that opinionated statements be attributed to some source;
 an equal amount and equal length of true-false should be observed; and
 avoid identifiable patterns for answers.
III. for Matching type







ensure that all materials are homogeneous;
there should be an adequate number of distracters;
place descriptions at the left and options at the right column;
options should be in logical order/arrangement;
the basis for matching the responses and premises must be indicated;
all items must be on the same page; and
observe the 10-15 items limitation.
IV. for Multiple Choice
 the stem must be self-sufficient and free of irrelevant material;
 refrain from using negative words or double negatives. If cannot be
avoided, always stress/highlight negative words for emphasis.
 all alternatives must be grammatically consistent with the stem of the
item;
 item should pose objectivity;
 distracters should be plausible/attractive;
 avoid verbal associations between the stem and correct answer;
 alternatives should be arranged logically
 special alternatives should be used sparingly;
 stem and alternatives should be on the same page; and
 alternatives must be of equal length.
V. for Essay Type
 restrict questions to those cannot be satisfactorily measured by objective
items;
 questions should call forth skills specified in the learning standards;
 phrase each questions so that the student’s task is clearly defined or
indicated;
 avoid optional questions;
 time limit and points for each items must be indicated; and
 prepare an outline of the expected answer in advance or scoring rubrics.
V. Types of Portfolio
1. Developmental/ Progress – improvements
2. Showcase – best works
3. Documentary/ Work – day-to-day
4. Evaluation/ Assessment
PRINCIPLES OF PORTFOLIO ASSESSMENT
1. Content (reflects the relevant subject matter)
2. Learning (students become active and thoughtful learners)
3. Equity (students demonstrate their learning styles and multiple intelligences
Steps in Portfolio Development
 Set goals
 Collect evidences
 Select evidences
 Organize
 Reflect
 Rate/Evaluate
 Confer/Exhibit
VI. Rubrics
- are instruments used in rating performance-based task and even portfolio-based
task.
- are modified:
1. Checklist – presents the characteristics of a desirable performance or
product; the WHAT.
2. Rating scale – measures the extent or degree to which a trait has been
satisfied by one’s work/performance; at least 3 levels of description; the TO
WHAT DEGREE.
- are developmental
- are indispensable in authentic, portfolio, self, as well as performance-based
assessment.
1. Types

Holistic (overall quality)
- Fast assessment
- One score to overall performance
- Indicates general strength and weaknesses of the performance
- Does not clearly describe the degree of the criterion satisfied.
- Does not permit differential weighting of the requirements

Analytic (each dimension)
- Clearly describe the degree/criterion satisfied
- Permits differential weighting of qualities of performance and product
- Helps rater pinpoint specific areas of strengths and weaknesses
- More time consuming to use
- More difficulty to construct
2. Examples




Likert Scale
Checklist
Rating Scale
Ranking
3. Scoring Biases and Errors

ERRORS – mistakes committed when scoring/ rating
 Leniency error – judging better than it is
 Generosity error – tendency of using the high end of the scale
 Severity error – tendency of using the low end of the scale
 Central Tendency error – tendency to avoid both extremes of the scale
 BIAS – letting another factor influence the score
 Halo effect – letting general impression of student influence rating of specific
criteria.
 Contamination effect – influenced by irrelevant knowledge about the student or
other factors I independent to what is being assessed
 Similar-to-me effect – judging more favorably those students whom the raters see
similar to themselves.
 First impression effect – judgment is based on early opinions rather than a
complete picture
 Contrast effect – judging by comparing students against other students instead of
established criteria/standards
 Rater drift – unintentionally redefining criteria and standards over time or across
a series of scorings)
VII. Characteristics of a good test
1. Clarity and Appropriateness of learning targets (SMART/ABCDs)
2. Appropriateness of methods (The type of test used should always match the
instructional objectives or learning outcomes of the subject matter during the delivery of
the instruction)
3. Fairness (persons; provide all students the opportunity to demonstrate achievement)
4. Balance (things; set the targets in all domains of learning/ intelligences, and even the
modes of assessment)
5. Validity (the degree to which it measures what it intends to measure
 Face – physical appearance
 Content – objectives, curriculum, lesson plans
 Criterion – correlating scores to external predictor or measure.
a. Concurrent – present
b. Predictive – future
 Construct – psychological factors that theoretically influence scores in a test
a. Convergent – established instruments define another similar trait other than it
intends to measure; Creative Test & Critical thinking Test
b. Divergent – established instruments can only describe what it intends to
measure; Critical Thinking & Reading Comprehension Test= no relation
FACTORS AFFECTING VALIDITY
1. Appropriateness of Test
2. Directions
3. Reading vocabulary and sentence structure too difficult
4. Ambiguity
5. Inadequate time limits
6. Test Construction
7. Test Length
8. Arrangement of Items
9. Patterns of Answers
6. Reliability (refers to the consistency of measures/scores obtained by the same person when
retested using the same instrument/ its parallel or when compared with other students who took
the same test)
 Test-Retest (measure of stability)
 Parallel Test/ Forms (measure of equivalence)
 Split Half (measure of internal consistency)
 Kuder-Richardson (measure of internal consistency)
IMPROVING TEST RELIABILITY
1. Test length
2. Spread of scores
3. Item difficulty
4. Item discrimination
5. Time limits
7. Practicality and Efficiency
o Efficiency- with the lowest resources possible
- It should be worth the resources and time required to obtain it.
Factors to consider:
 Teacher familiarity with the method (strength/weaknesses of the method and how to
use them)
 Complexity of Administration (Directions and procedures for administrations and
procedures are clear and that little time and effort is needed)
 Ease of Scoring (the easier the procedure, the more reliable the assessment is)
 Ease of Interpretation (the plans how to use the results prior to assessment)
 Cost (the less expense, the better)
8. Continuity – takes place in all phases of instruction
9. Authenticity
Criteria of Authentic Achievement
 Discipline inquiry (in-depth understanding of the problem; a move beyond
knowledge)
 Integration of Knowledge (a whole rather than fragment of knowledge)
 Value Beyond Evaluation (values beyond the classroom)
10.Communication (process)
11.Positive consequences (motivates the learner to learn; helps teacher improve the
effectiveness of instruction)
12.Ethics (free from harmful consequences of misuse, or overuse of various assessment
procedures; good or bad; respect
o Morality – right or wrong
VIII. Types of Test
NATURE OF ANSWER:
1. Personality Test - social adjustment and emotions
2. Intelligence Test - mental ability e.g IQ test
3. Aptitude Test - success and entry e.g: entrance exam
4. Achievement Test - mastery of skill
5. Summative Test - end of instruction)
6. Diagnostic Test - strengths and weaknesses
7. Formative Test - improve teaching and learning
8. Sociometric Test - likes and dislikes / social acceptance
9. Trade Test - skills in an occupation or vocation
10. Placement Test - assigns students to classes / program appropriate to their
level.
MODE OF RESPONSE
1. Oral test - students answers orally
2. Written test - students’ answer are written
3. Performance test - demonstration of knowledge and skill
EASE OF QUANTIFICATION/RESPONSE (BIASES)
1. Objective test
- convergent or specific response
- non-biased
- prone to guessing
2. Subjective test
- divergent response
- biased
- wide sampling of ideas and content
- prone to bluffing
MODE OF ADMINISTRATION
1. Individual test
- one students at a time
- usually requires oral response
2. Group test
- group of students
TEST CONSTRUCTION
1. Standardized test
- prepared by experts
- machine checked
2. Unstandardized test
- prepared by a classroom teacher
DIFFICULTY
1. Power - easiest to most difficult
2. Speed - with time limit
MODE OF INTERPRETING RESULT
Criterion-referenced testing
- mastery
- compare students to a set of standard, criterion or specific skill.
Norm-referenced testing
- compare results to classmates or batchmate
- ranking
- with respect to the achievement of others
VIII. Phases of Making a Test
A. PLANNING
- objectives
- TOS
- decide on format
 selective
 supply
 essay
B. ITEM WRITING (write the item based on TOS)
C. TRY OUT
1.first trial run (50-100 students)
- item analysis
- options analysis
- rewrite the items
2.second trial run (50-100 students)
- item analysis
- options analysis
- rewrite the items
D. EVALUATION
- administer the exam
- test validity and reliability
IX. Item analysis
DISCRIMINATION INDEX - discriminates higher group from lower group
INDEX
0.2 below
0.21 – 1
INTERPRETATION
Poor
Moderate – High
DECISION
Reject
Retain
1. Positive Discrimination - more from the higher group got the item correctly.
2. Negative Discrimination - more from the lower group got the item correctly.
3. Zero Discrimination - cannot discriminate - either all are correct or all are wrong.
DIFFICULTY INDEX - easiness of an item
INDEX
INTERPRETATION
0.81 – 1
Very Easy
0.61 – 80
Easy
0.41 – 60
Moderate
0.21 – 40
Difficult
0 – 0.21
Very Difficult
Note: If the discrimination index is between 0.21-1 use the table above.
If the discrimination index is between 0.20 and below refer to this table:
INDEX
INTERPRETATION
0.81 – 1
Very Easy
0.61 – 80
Easy
0.41 – 60
Moderate
0.21 – 40
Difficult
0 – 0.21
Very Difficult
DECISION
Reject
Revise
Retain
Revise
Reject
DECISION
Reject
Reject
Revise
Reject
Reject
XI. Affective Assessment
- “Affective assessment is a measurement of a student’s attitudes, interests, and/or values”
Popham, 2013
1. Attitudes are defined as a mental predisposition to act that is expressed by evaluating
a particular entity with some degree of favor or disfavor.
2. Motivation is the process that initiates, guides, and maintains goal-oriented behaviors.
3. Self-esteem relates to a person’s sense of self- worth.
4. Self-efficacy relates to person’s perception of their ability to reach a goal.
AFFECTIVE ASSESSMENT TOOLS
1. Self Report - it essentially requires an individual to provide an account of his attitude
or feelings toward a concept or idea or people. It is also called “written reflections”.
2. Rating Scale - It consists of close-ended questions along with a set of categories as
options for respondents.
3. Semantic Differential Scale - it tries to assess an individual’s reaction to specific words,
ideas or concepts in terms of ratings on bipolar scales defined with contrasting adjectives
at each end.
4. Likert Scale - this requires an individual to tick on a box to report whether they “strongly
agree” “agree” “undecided”, “disagree” or “strongly disagree” in response to a large
number of items concerning attitude object or stimulus.
5. Two-Point Scale - the respondent must choose between two options: yes to agree or
no to disagree.
6. Checklist - the least complex form of scoring that examines the presence or absence
of specific elements in the product of a performance.
XII. Statistics
DESCRIPTIVE STATISTICS
A. Measure of Central Tendency
1. Mean - average
- most reliable measure of reliability
2. Median - central most (middle most)
- most reliable measure of tendency IF THEIR ARE EXTREMITIES
3. Mode - most frequent
 Unimodal (1 mode)
 Bimodal (2 modes)
 Multimodal (3 or more modes)
Note: in PRC ProfEd trimodal (3 modes) is accepted
 No mode (none)
B. Measures of Variability
1. Range - simplest measure of variability.
- highest score subtracted to lowest score
2. Standard deviation - spread out of the scores with respect to the mean
- most reliable measure of variability.
- describes how far is the data from the mean.
 Heterogeneous - high SD, scores are scattered, spread out.
 Homogeneous - low SD, scores are clustered, bunched together.
3. Variance - square of standard deviation.
C. Measures of Relative Position
1. Percentile - is a number where a certain percentage of scores falls below that
number 100 equal part.
2. Decile - 10 equal parts
3. Quartile - 4 equal parts
4. Stanine - 9 equal parts
Categories in stanine:
S1-S3 : Below Average
S4-S6 : Average
S7-S9 : Above Average
D. Measures of Shapes
1. Kurtosis - shape of the peaks in a distribution of data
- the measures of central tendency are equal
1.1 Leptokurtic distribution - all or all most is average.
1.2 Mesokurtic distribution - most got an average score and few got high and low
score. It is also called as normal curve or bell-shaped curve.
1.3 Platykurtic distribution - scores are scattered or spread out.
2. Skewness - most of the scores are either above or below the mean.
2.1 Positively Skewed
- skewed to the right
- most of them got a low score
2.2 Negatively Skewed
- skewed to the left
- most of them got a high score
INFERENTIAL STATISTICS
A. Levels of Data Measurement
1. Nominal - used for classifying data; categorical
- qualitative (name)
2. Ordinal - ordered relationship among the variables
- quantitative (order)
3. Interval - classifies and orders the measurement.
- specifies distances between each interval.
-quantitative
4. Ratio - the same as interval, but ratio has absolute zero.
- quantitative
B. Test of Relationship / Correlation
1. Pearson’s r test - test of relationship between 2 variables.
Positive correlation
- directly proportionate
- same direction
Negative correlation
- inversely proportionate
- opposite direction
No correlation
- no relationship between variables that are being compared.
2. T-test - test of the difference between two groups.
3. ANOVA (Analysis of Variance)
- test of difference between 3 or more groups.
4. Chi-squared test - test of association that requires nominal data.
5. Spearman Rho - comparing 2 ordinal measurements.
- order and ranking comparison.
XIII. K-12 Grading System (RA 10533)
DepEd Order No. 8, series of 2015
KINDERGARTEN
- for Kindergarten, checklists and anecdotal records are used instead of numerical
grades. It is important for teachers to keep a portfolio, which is a record or compilation of the
learner's output, such as writing samples, accomplished activity sheets, and artwork.
GRADE 1 – GRADE 10
For MAPEH, individual grades are given to each area, namely, Music, Arts, Physical Education,
and Health. The quarterly grade for MAPEH is the average of the quarterly grades in the four
areas.
GRADE 11 - GRADE 12
NOTE: The final grade of the grade 11 and 12 are computed by getting the average of 2
quarters.
XIV. Feedback
ASSESSMENT FOR KINDERGARTEN
There are no numerical grades in Kindergarten. Descriptions of the learners' progress in the
various learning areas are represented using checklists and student portfolios. These are
presented to the parents at the end of each quarter for discussion.
GRADE DESCRIPTORS
PROMOTION AND RETENTION
Grades 1 - 10
 Passed all subjects = promoted
 Failed 1 or 2 subjects = remedial
 Passed the remedial = either promoted or retained
 Failed the remedial = retained
 Failed 3 or more subjects = retained
Grades 11 - 12
Download