Assessment of Learning I. Basic Concept 1. Test – the instruments designed to measure any characteristic, quality, ability, or skill. 2. Measurement – the process of quantifying the degree to which someone/something possesses a given trait. 3. Assessment – process of gathering and organizing quantitative or qualitative data into an interpretable form to have a basis for judgment or decision-making. Pre-requisite of evaluation. 4. Evaluation – process of systematic interpretation, analysis, appraisal, or judgment of the worth of organized data as basis for decision-making Feedback is an important component that makes up an assessment cycle. II. Purpose of Assessment 1. Assessment FOR Learning (done before and during instruction) Aptitude – entry skills and knowledge Placement – prior to instruction, sectioning; basis in planning for a relevant instruction Formative – during instruction, monitors students’ attainment of the learning objective Diagnostic – before/during instruction, to determine student’s recurring or persistent difficulties; aids in the formulation of a plan for remedial instruction) 2. Assessment OF Learning (done after instruction) summative assessment, term/ unit exams, chapter exercises, and the like 3. Assessment AS Learning Self-Assessment – metacognition, introspection III. Modes of Assessment 1. Traditional (paper and pencil/pen) 2. Alternative Performance a. Process-oriented (demonstration) b. Product-oriented (creation) / Portfolio (collection/ compilation of students’ works, artifacts, and evidence) 3. Authentic (simulate real-life) IV. Do's and Don'ts in Test Construction A. General Guidelines i. Avoid wordings that are ambiguous. ii. Use appropriate vocabulary words. iii. Keep questions short and to the point. iv. Write items that have one correct answer. v. Do not provide clues to the answer. B. Specific Guidelines I. for supply type answers should be brief and specific; do not take statements directly from textbooks; if possible, make use of direct question rather than incomplete statement; if answers is in numerical units, indicate the type of answer wanted; blanks should be at the end of the question; and do not mutilate statement. II. for Alternate-response/True-False avoid broad, trivial statements, negative &/or double negatives, specific determiners/absolute terms, and long and complex sentences when crafting statements; always ensure that opinionated statements be attributed to some source; an equal amount and equal length of true-false should be observed; and avoid identifiable patterns for answers. III. for Matching type ensure that all materials are homogeneous; there should be an adequate number of distracters; place descriptions at the left and options at the right column; options should be in logical order/arrangement; the basis for matching the responses and premises must be indicated; all items must be on the same page; and observe the 10-15 items limitation. IV. for Multiple Choice the stem must be self-sufficient and free of irrelevant material; refrain from using negative words or double negatives. If cannot be avoided, always stress/highlight negative words for emphasis. all alternatives must be grammatically consistent with the stem of the item; item should pose objectivity; distracters should be plausible/attractive; avoid verbal associations between the stem and correct answer; alternatives should be arranged logically special alternatives should be used sparingly; stem and alternatives should be on the same page; and alternatives must be of equal length. V. for Essay Type restrict questions to those cannot be satisfactorily measured by objective items; questions should call forth skills specified in the learning standards; phrase each questions so that the student’s task is clearly defined or indicated; avoid optional questions; time limit and points for each items must be indicated; and prepare an outline of the expected answer in advance or scoring rubrics. V. Types of Portfolio 1. Developmental/ Progress – improvements 2. Showcase – best works 3. Documentary/ Work – day-to-day 4. Evaluation/ Assessment PRINCIPLES OF PORTFOLIO ASSESSMENT 1. Content (reflects the relevant subject matter) 2. Learning (students become active and thoughtful learners) 3. Equity (students demonstrate their learning styles and multiple intelligences Steps in Portfolio Development Set goals Collect evidences Select evidences Organize Reflect Rate/Evaluate Confer/Exhibit VI. Rubrics - are instruments used in rating performance-based task and even portfolio-based task. - are modified: 1. Checklist – presents the characteristics of a desirable performance or product; the WHAT. 2. Rating scale – measures the extent or degree to which a trait has been satisfied by one’s work/performance; at least 3 levels of description; the TO WHAT DEGREE. - are developmental - are indispensable in authentic, portfolio, self, as well as performance-based assessment. 1. Types Holistic (overall quality) - Fast assessment - One score to overall performance - Indicates general strength and weaknesses of the performance - Does not clearly describe the degree of the criterion satisfied. - Does not permit differential weighting of the requirements Analytic (each dimension) - Clearly describe the degree/criterion satisfied - Permits differential weighting of qualities of performance and product - Helps rater pinpoint specific areas of strengths and weaknesses - More time consuming to use - More difficulty to construct 2. Examples Likert Scale Checklist Rating Scale Ranking 3. Scoring Biases and Errors ERRORS – mistakes committed when scoring/ rating Leniency error – judging better than it is Generosity error – tendency of using the high end of the scale Severity error – tendency of using the low end of the scale Central Tendency error – tendency to avoid both extremes of the scale BIAS – letting another factor influence the score Halo effect – letting general impression of student influence rating of specific criteria. Contamination effect – influenced by irrelevant knowledge about the student or other factors I independent to what is being assessed Similar-to-me effect – judging more favorably those students whom the raters see similar to themselves. First impression effect – judgment is based on early opinions rather than a complete picture Contrast effect – judging by comparing students against other students instead of established criteria/standards Rater drift – unintentionally redefining criteria and standards over time or across a series of scorings) VII. Characteristics of a good test 1. Clarity and Appropriateness of learning targets (SMART/ABCDs) 2. Appropriateness of methods (The type of test used should always match the instructional objectives or learning outcomes of the subject matter during the delivery of the instruction) 3. Fairness (persons; provide all students the opportunity to demonstrate achievement) 4. Balance (things; set the targets in all domains of learning/ intelligences, and even the modes of assessment) 5. Validity (the degree to which it measures what it intends to measure Face – physical appearance Content – objectives, curriculum, lesson plans Criterion – correlating scores to external predictor or measure. a. Concurrent – present b. Predictive – future Construct – psychological factors that theoretically influence scores in a test a. Convergent – established instruments define another similar trait other than it intends to measure; Creative Test & Critical thinking Test b. Divergent – established instruments can only describe what it intends to measure; Critical Thinking & Reading Comprehension Test= no relation FACTORS AFFECTING VALIDITY 1. Appropriateness of Test 2. Directions 3. Reading vocabulary and sentence structure too difficult 4. Ambiguity 5. Inadequate time limits 6. Test Construction 7. Test Length 8. Arrangement of Items 9. Patterns of Answers 6. Reliability (refers to the consistency of measures/scores obtained by the same person when retested using the same instrument/ its parallel or when compared with other students who took the same test) Test-Retest (measure of stability) Parallel Test/ Forms (measure of equivalence) Split Half (measure of internal consistency) Kuder-Richardson (measure of internal consistency) IMPROVING TEST RELIABILITY 1. Test length 2. Spread of scores 3. Item difficulty 4. Item discrimination 5. Time limits 7. Practicality and Efficiency o Efficiency- with the lowest resources possible - It should be worth the resources and time required to obtain it. Factors to consider: Teacher familiarity with the method (strength/weaknesses of the method and how to use them) Complexity of Administration (Directions and procedures for administrations and procedures are clear and that little time and effort is needed) Ease of Scoring (the easier the procedure, the more reliable the assessment is) Ease of Interpretation (the plans how to use the results prior to assessment) Cost (the less expense, the better) 8. Continuity – takes place in all phases of instruction 9. Authenticity Criteria of Authentic Achievement Discipline inquiry (in-depth understanding of the problem; a move beyond knowledge) Integration of Knowledge (a whole rather than fragment of knowledge) Value Beyond Evaluation (values beyond the classroom) 10.Communication (process) 11.Positive consequences (motivates the learner to learn; helps teacher improve the effectiveness of instruction) 12.Ethics (free from harmful consequences of misuse, or overuse of various assessment procedures; good or bad; respect o Morality – right or wrong VIII. Types of Test NATURE OF ANSWER: 1. Personality Test - social adjustment and emotions 2. Intelligence Test - mental ability e.g IQ test 3. Aptitude Test - success and entry e.g: entrance exam 4. Achievement Test - mastery of skill 5. Summative Test - end of instruction) 6. Diagnostic Test - strengths and weaknesses 7. Formative Test - improve teaching and learning 8. Sociometric Test - likes and dislikes / social acceptance 9. Trade Test - skills in an occupation or vocation 10. Placement Test - assigns students to classes / program appropriate to their level. MODE OF RESPONSE 1. Oral test - students answers orally 2. Written test - students’ answer are written 3. Performance test - demonstration of knowledge and skill EASE OF QUANTIFICATION/RESPONSE (BIASES) 1. Objective test - convergent or specific response - non-biased - prone to guessing 2. Subjective test - divergent response - biased - wide sampling of ideas and content - prone to bluffing MODE OF ADMINISTRATION 1. Individual test - one students at a time - usually requires oral response 2. Group test - group of students TEST CONSTRUCTION 1. Standardized test - prepared by experts - machine checked 2. Unstandardized test - prepared by a classroom teacher DIFFICULTY 1. Power - easiest to most difficult 2. Speed - with time limit MODE OF INTERPRETING RESULT Criterion-referenced testing - mastery - compare students to a set of standard, criterion or specific skill. Norm-referenced testing - compare results to classmates or batchmate - ranking - with respect to the achievement of others VIII. Phases of Making a Test A. PLANNING - objectives - TOS - decide on format selective supply essay B. ITEM WRITING (write the item based on TOS) C. TRY OUT 1.first trial run (50-100 students) - item analysis - options analysis - rewrite the items 2.second trial run (50-100 students) - item analysis - options analysis - rewrite the items D. EVALUATION - administer the exam - test validity and reliability IX. Item analysis DISCRIMINATION INDEX - discriminates higher group from lower group INDEX 0.2 below 0.21 – 1 INTERPRETATION Poor Moderate – High DECISION Reject Retain 1. Positive Discrimination - more from the higher group got the item correctly. 2. Negative Discrimination - more from the lower group got the item correctly. 3. Zero Discrimination - cannot discriminate - either all are correct or all are wrong. DIFFICULTY INDEX - easiness of an item INDEX INTERPRETATION 0.81 – 1 Very Easy 0.61 – 80 Easy 0.41 – 60 Moderate 0.21 – 40 Difficult 0 – 0.21 Very Difficult Note: If the discrimination index is between 0.21-1 use the table above. If the discrimination index is between 0.20 and below refer to this table: INDEX INTERPRETATION 0.81 – 1 Very Easy 0.61 – 80 Easy 0.41 – 60 Moderate 0.21 – 40 Difficult 0 – 0.21 Very Difficult DECISION Reject Revise Retain Revise Reject DECISION Reject Reject Revise Reject Reject XI. Affective Assessment - “Affective assessment is a measurement of a student’s attitudes, interests, and/or values” Popham, 2013 1. Attitudes are defined as a mental predisposition to act that is expressed by evaluating a particular entity with some degree of favor or disfavor. 2. Motivation is the process that initiates, guides, and maintains goal-oriented behaviors. 3. Self-esteem relates to a person’s sense of self- worth. 4. Self-efficacy relates to person’s perception of their ability to reach a goal. AFFECTIVE ASSESSMENT TOOLS 1. Self Report - it essentially requires an individual to provide an account of his attitude or feelings toward a concept or idea or people. It is also called “written reflections”. 2. Rating Scale - It consists of close-ended questions along with a set of categories as options for respondents. 3. Semantic Differential Scale - it tries to assess an individual’s reaction to specific words, ideas or concepts in terms of ratings on bipolar scales defined with contrasting adjectives at each end. 4. Likert Scale - this requires an individual to tick on a box to report whether they “strongly agree” “agree” “undecided”, “disagree” or “strongly disagree” in response to a large number of items concerning attitude object or stimulus. 5. Two-Point Scale - the respondent must choose between two options: yes to agree or no to disagree. 6. Checklist - the least complex form of scoring that examines the presence or absence of specific elements in the product of a performance. XII. Statistics DESCRIPTIVE STATISTICS A. Measure of Central Tendency 1. Mean - average - most reliable measure of reliability 2. Median - central most (middle most) - most reliable measure of tendency IF THEIR ARE EXTREMITIES 3. Mode - most frequent Unimodal (1 mode) Bimodal (2 modes) Multimodal (3 or more modes) Note: in PRC ProfEd trimodal (3 modes) is accepted No mode (none) B. Measures of Variability 1. Range - simplest measure of variability. - highest score subtracted to lowest score 2. Standard deviation - spread out of the scores with respect to the mean - most reliable measure of variability. - describes how far is the data from the mean. Heterogeneous - high SD, scores are scattered, spread out. Homogeneous - low SD, scores are clustered, bunched together. 3. Variance - square of standard deviation. C. Measures of Relative Position 1. Percentile - is a number where a certain percentage of scores falls below that number 100 equal part. 2. Decile - 10 equal parts 3. Quartile - 4 equal parts 4. Stanine - 9 equal parts Categories in stanine: S1-S3 : Below Average S4-S6 : Average S7-S9 : Above Average D. Measures of Shapes 1. Kurtosis - shape of the peaks in a distribution of data - the measures of central tendency are equal 1.1 Leptokurtic distribution - all or all most is average. 1.2 Mesokurtic distribution - most got an average score and few got high and low score. It is also called as normal curve or bell-shaped curve. 1.3 Platykurtic distribution - scores are scattered or spread out. 2. Skewness - most of the scores are either above or below the mean. 2.1 Positively Skewed - skewed to the right - most of them got a low score 2.2 Negatively Skewed - skewed to the left - most of them got a high score INFERENTIAL STATISTICS A. Levels of Data Measurement 1. Nominal - used for classifying data; categorical - qualitative (name) 2. Ordinal - ordered relationship among the variables - quantitative (order) 3. Interval - classifies and orders the measurement. - specifies distances between each interval. -quantitative 4. Ratio - the same as interval, but ratio has absolute zero. - quantitative B. Test of Relationship / Correlation 1. Pearson’s r test - test of relationship between 2 variables. Positive correlation - directly proportionate - same direction Negative correlation - inversely proportionate - opposite direction No correlation - no relationship between variables that are being compared. 2. T-test - test of the difference between two groups. 3. ANOVA (Analysis of Variance) - test of difference between 3 or more groups. 4. Chi-squared test - test of association that requires nominal data. 5. Spearman Rho - comparing 2 ordinal measurements. - order and ranking comparison. XIII. K-12 Grading System (RA 10533) DepEd Order No. 8, series of 2015 KINDERGARTEN - for Kindergarten, checklists and anecdotal records are used instead of numerical grades. It is important for teachers to keep a portfolio, which is a record or compilation of the learner's output, such as writing samples, accomplished activity sheets, and artwork. GRADE 1 – GRADE 10 For MAPEH, individual grades are given to each area, namely, Music, Arts, Physical Education, and Health. The quarterly grade for MAPEH is the average of the quarterly grades in the four areas. GRADE 11 - GRADE 12 NOTE: The final grade of the grade 11 and 12 are computed by getting the average of 2 quarters. XIV. Feedback ASSESSMENT FOR KINDERGARTEN There are no numerical grades in Kindergarten. Descriptions of the learners' progress in the various learning areas are represented using checklists and student portfolios. These are presented to the parents at the end of each quarter for discussion. GRADE DESCRIPTORS PROMOTION AND RETENTION Grades 1 - 10 Passed all subjects = promoted Failed 1 or 2 subjects = remedial Passed the remedial = either promoted or retained Failed the remedial = retained Failed 3 or more subjects = retained Grades 11 - 12