ASSESSMENT OF LEARNING 1 TEACHING MATERIALS Compiled by Giovanni A. Alcain, CST, LPT, (MAEd-Eng on-going) EDUC10 ASSESSMENT OF LEARNING 1 MODULE 1 – BASIC CONCEPTS IN ASSESSMENT OF LEARNING Assessment –refers to the process of gathering, describing or quantifying information about the student performance. It includes paper and pencil test, extended responses (example essays) and performance assessment are usually referred to as‖authentic assessment‖ task (example presentation of research work) Measurement-is a process of obtaining a numerical description of the degree to which an individual possesses a particular characteristic. Measurements answers the questions‖how much? Evaluation- it refers to the process of examining the performance of student. It also determines whether or not the student has met the lesson instructional objectives. Test –is an instrument or systematic procedures designed to measure the quality, ability, skill or knowledge of students by giving a set of question in a uniform manner. Since test is a form of assessment, tests also answer the question‖how does individual student perform? Testing-is a method used to measure the level of achievement or performance of the learners. It also refers to the administration, scoring and interpretation of an instrument (procedure) designed to elicit information about performance in a simple of a particular area of behavior. Types of Measurement There are two ways of interpreting the student performance in relation to classroom instruction. These are the Norm-reference tests and Criterion-referenced tests. Norm-reference test is a test designed to measure the performance of a student compared with other students. Each individual is compared with other examinees and assigned a score-usually expressed as percentile, a grade equivalent score or a stanine. The achievement of student is reported for broad skill areas, although some norm referenced tests do report student achievement for individual. The purpose is to rank each student with respect to the achievement of others in broad areas of knowledge and to discriminate high and low achievers. Criterion- referenced test is a test designed to measure the performance of students with respect to some particular criterion or standard. Each individual is compared with a pre determined set of standard for acceptable achievement. The performance of the other examinees are irrelevant. A student’s score is usually expressed as a percentage and student achievement is reported for individual skills, The purpose is to determine whether each student has achieved specific skills or concepts. And to find out how mush students know before instruction begins and after it has finished. Other terms less often used for criterion-referenced are objective referenced, domain referenced, content referenced and universe referenced. According to Robert L. Linn and Norma E. gronlund (1995) pointed out the common characteristics and differences of Norm-Referenced Tests and Criterion-Referenced Tests Common Characteristics of Norm-Referenced Test and Criterion-Referenced Tests 1. 2. 3. 4. 5. 6. Both require specification of the achievement domain to be measured Both require a relevant and representative sample of test items Both use the same types of test items Both used the same rules for item writing (except for item difficulty) Both are judge with the same qualities of goodness (validity and reliability) Both are useful in educational assessment Differences between Norm-Referenced Tests and Criterion Referenced Tests Norm –Referenced Tests Criterion-Referenced Tests 1. Typically covers a large domain of learning tasks, with just few items measuring each specific task. 1.Typically focuses on a delimited domain of learning tasks, with a relative large number of items measuring each specific task. 2. Emphasizes discrimination among individuals in terms of relative of level of learning. 3. Favors items of large difficulty and typically omits very easy and very hard items 2.Emphasizes among individuals can and cannot perform. 3.Matches item difficulty to learning tasks, without altering item difficulty or omitting easy or hard times 4. Interpretation requires clearly defined 4.Interpretation requires a clearly defined and group delimited achievement domain TYPES OF ASSESSMENT There are four type of assessment in terms of their functional role in relation to classroom instruction. These are the placement assessment, diagnostic assessment, formative assessment and summative assessment. A. Placement Assessment is concerned with the entry performance of student, the purpose of placement evaluation is to determine the prerequisite skills, degree of mastery of the course objectives and the best mode of learning. B. Diagnostic Assessment is a type of assessment given before instruction. It aims to identify the strengths and weaknesses of the students regarding the topics to be discussed. The purpose of diagnostic assessment: 1. To determine the level of competence of the students 2. To identify the students who have already knowledge about the lesson; 3. To determine the causes of learning problems and formulate a plane for remedial action. C. Formative Assessment is a type of assessment used to monitor the learning progress of the students during or after instruction. Purpose of formative assessment: 1. To provide feed back immediately to both student and teacher regarding the success and failure of learning. 2. To identify the learning errors that is need of correction 3. To provide information to the teacher for modifying instruction and used for improving learning and instruction D. Summative Assessment is a type of assessment usually given at the end of a course or unit. Purpose of summative assessment: 1. To determine the extent to which the instructional objectives have been met; 2. To certify student mastery of the intended outcome and used for assigning grades; 3. To provide information for judging appropriateness of the instructional objectives 4. To determine the effectiveness of instruction MODULE 2 - PRINCIPLES OF HIGH-QUALITY CLASSROOM ASSESSMENT 1. 2. 3. 4. 5. 6. 7. 8. Clarity of learning targets Appropriateness of Assessment Methods Validity Reliability Fairness Positive Consequences Practicality and Efficiency Ethics 1. CLARITY OF LEARNING TARGETS Assessment can be made precise, accurate and dependable only if what are to be achieved are clearly stated and feasible. The learning targets, involving knowledge, reasoning, skills, products and effects, need to be stated in behavioral terms which denote something which can be observed through the behavior of the students. Cognitive Targets Benjamin Bloom (1954) proposed a hierarchy of educational objectives at the cognitive level. These are: Knowledge – acquisition of facts, concepts and theories Comprehension - understanding, involves cognition or awareness of the interrelationships Application – transfer of knowledge from one field of study to another of from one concept to another concept in the same discipline Analysis – breaking down of a concept or idea into its components and explaining g the concept as a composition of these concepts Synthesis – opposite of analysis, entails putting together the components in order to summarize the concept Evaluation and Reasoning – valuing and judgment or putting the ―worth‖ of a concept or principle. Skills, Competencies and Abilities Targets Skills – specific activities or tasks that a student can proficiently do Competencies – cluster of skills Abilities – made up of relate competencies categorized as: Cognitive Affective Psychomotor Products, Outputs and Project Targets tangible and concrete evidence of a student’s ability need to clearly specify the level of workmanship of projects expert skilled novice 2. APPROPRIATENESS OF ASSESSMENT METHODS Written-Response Instruments Objective tests – appropriate for assessing the various levels of hierarchy of educational objectives Essays – can test the students’ grasp of the higher level cognitive skills Checklists – list of several characteristics or activities presented to the subjects of a study, where they will analyze and place a mark opposite to the characteristics. Product Rating Scales Used to rate products like book reports, maps, charts, diagrams, notebooks, creative endeavors Need to be developed to assess various products over the years Performance Tests - Performance checklist Consists of a list of behaviors that make up a certain type of performance Used to determine whether or not an individual behaves in a certain way when asked to complete a particular task Oral Questioning – appropriate assessment method when the objectives are to: Assess the students’ stock knowledge and/or Determine the students’ ability to communicate ideas in coherent verbal sentences. Observation and Self Reports Useful supplementary methods when used in conjunction with oral questioning and performance tests 3. VALIDITY Something valid is something fair. A valid test is one that measures what it is supposed to measure. Types of Validity Face: What do students think of the test? Construct: Am I testing in the way I taught? Content: Am I testing what I taught? Criterion-related: How does this compare with the existing valid test? Tests can be made more valid by making them more subjective (open items). MORE ON VALIDITY Validity – appropriateness, correctness, meaningfulness and usefulness of the specific conclusions that a teacher reaches regarding the teaching-learning situation. Content validity – content and format of the instrument Students’ adequate experience Coverage of sufficient material Reflect the degree of emphasis Face validity – outward appearance of the test, the lowest form of test validity Criterion-related validity – the test is judge against a specific criterion Construct validity – the test is loaded on a ―construct‖ or factor 4.RELIABILITY Something reliable is something that works well and that you can trust. A reliable test is a consistent measure of what it is supposed to measure. Questions: Can we trust the results of the test? Would we get the same results if the tests were taken again and scored by a different person? Tests can be made more reliable by making them more objective (controlled items). Reliability is the extent to which an experiment, test, or any measuring procedure yields the same result on repeated trials. Equivalency reliability is the extent to which two items measure identical concepts at an identical level of difficulty. Equivalency reliability is determined by relating two sets of test scores to one another to highlight the degree of relationship or association. Stability reliability (sometimes called test, re-test reliability) is the agreement of measuring instruments over time. To determine stability, a measure or test is repeated on the same subjects at a future date. Internal consistency is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the observers or of the measuring instruments used in a study. Interrater reliability is the extent to which two or more individuals (coders or raters) agree. Interrater reliability addresses the consistency of the implementation of a rating system. RELIABILITY – CONSISTENCY, DEPENDABILITY, STABILITY WHICH CAN BE ESTIMATED BY Split-half method Calculated using the following: Richardson – KR 20 and KR21 Spearman-Brown prophecy formula and Kuder- Consistency of test results when the same test is administered at two different time periods such as Test-retest method and Correlating the two test results. 5. FAIRNESS The concept that assessment should be 'fair' covers a number of aspects. Student Knowledge and learning targets of assessment Opportunity to learn Prerequisite knowledge and skills Avoiding teacher stereotype Avoiding bias in assessment tasks and procedures 6. POSITIVE CONSEQUENCES Learning assessments provide students with effective feedback and potentially improve their motivation and/or self-esteem. Moreover, assessments of learning gives students the tools to assess themselves and understand how to improve positive consequence on students, teachers, parents, and other stakeholders 7. PRACTICALITY AND EFFICIENCY Something practical is something effective in real situations. A practical test is one which can be practically administered. Questions: Will the test take longer to design than apply? Will the test be easy to mark? Tests can be made more practical by making it more objective (more controlled items) Teacher Familiarity with the Method Time required Complexity of Administration Ease of scoring Ease of Interpretation Cost Teachers should be familiar with the test, - does not require too much time implementable 8. ETHICS 1. 2. 3. Informed consent Anonymity and Confidentiality Gathering data Recording Data Reporting Data ETHICS IN ASSESSMENT – ―RIGHT AND WRONG‖ Conforming to the standards of conduct of a given profession or group Ethical issues that may be raised 1. 2. 3. 4. Possible harm to the participants. Confidentiality. Presence of concealment or deception. Temptation to assist students. MODULE 3 – DEVELOPMENT OF CLASSROOM TOOLS FOR MEASURING KNOWLEDGE AND UNDERSTANDING DIFFERENT TYPES OF TESTS MAIN POINTS FOR COMPARSON TYPES OF TEST Psychological Aims to measure students intelligence or mental ability in a large degree without reference to what the students has learned Measures the intangible characteristics of an individual (e.g. Aptitude Tests, Personality Tests, Intelligence Tests) Survey Covers a broad range of objectives Measures general achievement in certain subjects Constructed by trained professional Norm- Referenced Purpose Scope of Content Educational Result is interpreted by comparing one student’s performance with other students’ Aims to measure the result of instructions and learning (e.g. Performance Tests) Mastery Covers a specific objective Measures fundamental skills and abilities Typically constructed by the teacher Criterion-Referenced Result is interpreted by comparing student’s performance based on a predefined performance Some will really pass There is competition for a limited percentage of high scores Describes pupil’s performance compared to others standard All or none may pass There is no competition for a limited percentage of high score Describes pupil’s mastery of course objectives Interpretation Language Mode Verbal Words are used by students in attaching meaning to or responding to test items Standardized Construction Non-verbal Students do not use words in attaching meaning to or in responding to test items (e.g. graphs, numbers, 3-D subjects) Informal Constructed by a professional item writer Covers a broad range of content covered in a subject area Uses mainly multiple choice Items written are screened and the best items were chosen for the final instrument Can be scored by a machine Interpretation of results is usually norm-referenced Individual Constructed by a classroom teacher Covers a narrow range of content Various types of items are used Teacher picks or writes items as needed for the test Mostly given orally or requires actual demonstration of skill One-on-one situations, thus, many opportunities for This is a paper-andpen test Loss of rapport, insight and knowledge about each Scored manually by the teacher Interpretation is usually criterionreferenced Group Manner of Administration Effect of Biases clinical observation Chance to follow-up examinee’s response in order to clarify or comprehend it more clearly Objective Scorer’s personal judgement does not affect the scoring Worded that only one answer is acceptable Little or no disagreement on what is the correct answer Subjective Power Time Limit and Level of Difficulty Consists of series of items arranged in ascending order of difficulty Measures student’s ability to answer more and more difficult items Consists of items approximately equal in difficulty Measure’s student’s speed or rate and accuracy in responding Supply There are choices for the answer Multiple choice, True or False, Matching Type Can be answered quickly Prone to guessing Time consuming to construct Format Affected by scorer’s personal opinions, biases and judgement Several answers are possible Possible to disagreement on what is the correct answer Speed Selective examinee Same amount of time needed to gather information from one student There are no choices for the answer Short answer, Completion, Restricted or Extended Essay May require a longer time to answer Less chance to guessing but prone to bluffing Time consuming to answer and score TYPES OF TESTS ACCORDING TO FORMAT 1. Selective Type – provides choices for the answer a. Multiple Choice – consists of a stem which describes the problem and 3 or more alternatives which give the suggested solutions. The incorrect alternatives are the distractions. b. True-False or Alternative Response – consists of declarative statement that one has to mark true or false, right or wrong, correct or incorrect, yes or no, fact or opinion,, and the like c. Matching Type – consists of two parallel columns: Column A, the column of premises from which a match is sought; Column B, the column of responses from which the selection is made. 2. Supply Test a. Short Answer – uses a direct question that can be answered by a word, phrase, a number, or a symbol b. Completion Test – It consists of an incomplete statement 3. Essay Test a. Restricted Response – limits the content of the response by restricting the scope of the topic b. Extended Response – allows the students to select any factual information that they think is pertinent, to organize their answers in accordance with their best judgement Projective Test A psychological test that uses images in order to evoke responses from a subject and reveal hidden aspects of the subject’s mental life These were developed in an attempt to eliminate some of the major problems inherent in the use of self-report measures, such as the tendency of some respondents to give “socially responsible” responses. Important Projective Techniques 1. Word Association Test. An individual is given a clue or hint and asked to respond to the first thing that comes to mind. 2. Completion Test. In this the respondents are asked to complete an incomplete sentence or story. The completion will reflect their attitude and state of mind. 3. Construction Techniques (Thematic Apperception Test) – This is more or less like completion test. They can give you a picture and you are asked to write a story about it. The initial structure is limited and not detailed like the completion test. For e.g.: 2 cartoons are given and a dialogue is to be written. 4. Expression Techniques - : In this the people are asked to express the feeling or attitude of each other people. GUIDELINES FOR CIINSTRUCTING TEST ITEMS When to use Essay Test Essays are appropriate when: 1. the group to be tested is SMALL and the test is NOT TO BE USED again; 2. you wish to encourage and reward the development of student’s SKILL WRITING; 3. you are more interested in explori ng the student’s ATTITDES than in measuring his/her academic achievement; 4. you are more confident of your ability as a critical and fair reader than as an imaginative writer of good objective test items When to Use Objective Test Items Objective test items are especially appropriate when: 1. The group to be tested is LARGE and the test may be REUSED; 2. HIGHLY RELIABLE TEST SCOREs must be obtained as efficiently as possible; 3. IMPARTIALITY of evaluation, ABSOLUTE FAIRNESS, and FREEDOM from possible test SCORING INFLUENCES – fatigue, lack of anonymity are essential; 4. You are more confident of your ability to express objective test items clearly than your ability to judge essay test answers correctly; 5. There is more PRESSURE IN SPEEDY REPORTING OF SCORES than for speedy test preparation. Multiple Choice Items It consists of: 1. Stem – which identifies the question or problem 2. Response alternatives or Options 3. Correct answer Example: Which of the following is a chemical change? (STEM) a. Evaporation of alcohol c. burning oil b. Freezing of water d. melting of wax Alternatives Advantage of Using Multiple Choice Items Multiple choice items can provide: 1. Versatility in measuring all levels of cognitive ability 2. Highly reliable test scores 3. Scoring efficiency and accuracy 4. Objective measurement of student achievement or ability 5. A wide sampling of content or objectives 6. A reduce guessing factor when compared to true-false items 7. Different response alternatives which can provide diagnostic feedback. Limitations of Multiple Choice Items 1. Difficult and time consuming to construct 2. Lead a teacher to favour simple recall of facts 3. Place a high degree of dependence on student’s reading ability and te acher’s writing ability SUGGESTIONS FOR WRITING MULTIPPLE CHOICE ITEMS 1. When possible, state the stem as a direct question rather than as an incomplete statement. Poor: Alloys are ordinarily produced by… Better: How are alloys ordinarily produced? 2. Present a definite, explicit singular question or problem in the stem. Poor: Psychology… Better: The science of mind and behaviour is called… 3. Eliminate excessive verbiage or irrelevant information from the stem. Poor: While ironing her formal polo shirt, June burned her hand accidentally on the hot iron. This was due to a heat transfer because… Better: Which of the following ways of heat transfer explains why June’s hand was burned after she touched a hot iron? 4. Include in the stem any word(s) that might otherwise be repeated in each alternative. Poor: In national elections in the US, the President is officially a. Chosen by the people b. Chosen by the electoral college c. Chosen by members of the Congress d. Chosen by the House of Representative’ Better: In national elections in the US, the President is officially chosen by a. the people b. the electoral college c. members of the Congress d. the House of Representative 5. Use negatively stated questions sparingly. When used, underline and/or capitalize the negative word. Poor: Which of the following is not cited as an accomplishment of Arroyo administration? Better: Which of the following is NOT cited as an accomplishment of Arroyo administration? 6. Make all alternatives plausible and attractive to the less knowledge or skilful student. What process is most nearly opposite of photosynthesis? Poor Better a. Digestion a. Digestion b. Relaxation b. Assimilation c. Respiration c. Respiration d. Exertion d. Catabolism 7. Make the alternative grammatically parallel with each other and consistent with the stem. Poor: What would advance the application of atomic discoveries to medicine? a. Standardized techniques for treatment of patients b. Train the average doctor to apply the radioactive treatments c. Remove the restriction of the use of radioactive substances d. Establishing hospital staffed by highly trained radioactive therapy specialist. Better: What would advance the application of atomic discoveries to medicine? a. Development of standardized techniques for treatment of patients b. Removal of restriction on the use of radioactive substances c. Addition of trained radioactive therapy specialist to hospital staffs d. Training the average doctor in applicant of radioactive treatments. 8. Make the alternatives mutually exclusive. Poor: The daily minimum required amount of milk that a 10-year old should drink is a. 1-2 glasses b. 2-3 glasses* c. 3-4 glasses* d. At least 4 glasses Better: What is the daily minimum required amount of milk a 10-year old child should drink? a. 1 glass b. 2 glasses c. 3 glasses d. 4 glasses 9. When possible, present alternatives in some logical order (chronological, most to least, alphabetical). At 7 a.m. two trucks leave a diner and travel north. One truck averages 42 miles per hour and the other truck averages 38 miles per hour. At what time will be 24 miles apart? Undesirable Desirable a. 6 p.m. a. 1 a.m. b. 9 a.m. b. 6 a.m. c. 1 a.m. c. 9 a.m. d. 1 p.m. d. 1 p.m. e. 6 a.m. e. 6 p.m. 10. Be sure there is only one correct or best response to the item. Poor: The two most desired characteristics in a classroom test are validity and a. Precision b .Reliability* c. Objectivity d. Consistency* Best: The two most desired characteristics in a classroom test are validity and a. Precision b. Reliability* c. Objectivity d. Standardization 11. Make alternative approximately equal in length. Poor: The most general cause of low individual incomes in the US is a. Lack of valuable productive services to sell* b. Unwillingness to work c. Automation d. Inflation Better: What is the most general cause of low individual incomes in the US? a. A lack of valuable productive services to sell* b. The population’s overall unwillingness to work c. The nation’s increase reliance on automation d. An increasing national level of inflation. 12. Avoid irrelevant clues, such as grammatical structure, well-known verbal associations or connections between stem and answer. Poor: (grammatical clue) A chain of islands is called an a. Archipelago b. Peninsula c. Continent d. Isthmus Poor: (verbal association) The reliability of a test can be estimated by a coefficient of a. Measurement b. Correlation* c. Testing d. Error Poor: (connections between stem and answer) The height to which a water dam is build depends on a. The length of the reserve behind the dam. b. The volume of the water behind the dam. c. The height of water behind the dam.* d. The strength of the reinforcing the wall. 13. Use at least four alternatives for each item to lower the probability of getting the item correctly by guessing. 14. Randomly distribute the correct responses among the alternative positions throughout the test having approximately the same proportion of the alternatives a, b, c, d and e as correct response. 15. Use the alternative NONE OF THE ABOVE and ALL OF THE ABOVE sparingly. When used, such alternatives should occasionally be used as the correct response. True-False Test Items True-false test items are typically used to measure the ability to identify whether or not the statements of facts are correct. The basic format is simply a declarative statement that the student must judge as true or false. No modification of the basic form in which the student must respond “yes” or “no”, “agree” or “disagree.” Three Forms: 1. Simple – consists of only two choices 2. Complex – consists of more than two choices 3. Compound – two choices plus a conditional completion response Examples: Simple: The acquisition of morality is a developmental process. True False Complex: The acquisition of morality is a developmental process. True Compound: An acquisition of morality is a developmental process. True If the statement is false, what makes it false? False Opinion False Advantages of True-False Items True-false items can provide: 1. The widest sampling of content or objectives per unit of testing time 2. Scoring efficiency and accuracy 3. Versatility in measuring all levels of cognitive ability 4. Highly reliable test scores; and 5. An objective measurement of student achievement or ability. Limitations of True-False Items 1. Incorporate an extremely high guessing factor 2. It can often lead the teacher to write ambiguous statements due to the difficulty of writing statements which are unequivocally true or false. 3. Do not discriminate between students varying ability as well as other item types. 4. It can often include more irrelevant clues than do other item types. 5. It can often lead a teacher to favour testing of trivial challenge. Suggestions for Writing True-False Items (Payne, 1984) 1. Base true-false items upon statements that are absolutely true or false, without qualifications or exceptions. Poor: Nearsightedness is hereditary in origin. Better: Geneticists and eye specialists believe that the predisposition to nearsightedness is hereditary. 2. Express the item statement as simply as clearly as possible. Poor: When you see a highway with a marker that reads: “Interstate 80,” you know that the construction and upkeep of that road is built and maintained by the local and national government. Better: The construction and maintenance of the interstate highways are are provided by both local and national government. 3. Express a single idea in each test item. Poor: Water will boil at a higher temperature if the atmospheric pressure on its surface is increased and more heat is applied to the container. Better: Water will boil at a higher temperature if the atmospheric pressure on its surface is increased; or water will boil at a higher temperature if more heat is applied to the container. 4. Include enough background information and qualifications so that the ability to respond correctly to the item does not depend on some special, uncommon knowledge. Poor: The second principle of education is that the individual gathers knowledge. Better: According to John Dewey, the second principle of education is that the individual gathers knowledge. 5. Avoid lifting statements directly from the text lecture or other materials so that memory alone will not permit a correct answer. Poor: For every actions there is an opposite or equal reaction. Better: If you were to stand in a canoe and throw a life jacket forward to another canoe, chances are, you canoe will jerk backward. 6. Avoid using negatively stated item statements. Poor: The Supreme Court is not composed of nine justices. Better: The Supreme Court is composed of nine justices 7. Avoid the use of unfamiliar vocabulary. Poor: According to some politicians, the raison d’etre for capital punishment is retribution. Better: According to some politicians, justification for the existence of capital punishment is retribution. 8. Avoid the use of specific determiners which should permit a test wise but unprepared examinee to respond correctly. Specific determiners refer to sweeping terms like always, all, none, never, impossible, inevitable. Statements including such terms are likely to be false. On the other hand, statements using qualifying determiners such as usually, sometimes, often, are likely to be true. When statements require specific determiners, make sure they appear in both true and false items. Poor: All sessions of Congress are called by the President (F) The Supreme Court is frequently required to rule on the constitutionality of the law. (T) The objectives test is generally easier to score than an essay test. (T) Better: When specific determiners are used, reverse the expected outcomes. The sum of angles of a triangle is always 180 degrees. (T) Each molecule of a given compound is chemically the same as every other molecule of that compound. (T) The galvanometer is the instrument usually used for the metering of electrical energy use in a home. (F) 9. False items tend to discriminate more highly than true items. Therefore, use more false items than true items (but not more than 15% additional false items). Matching Test Items In general matching items consists of a column of stimuli presented on the left side of the exam page and a column of responses placed on the right side of the page. Students are required to match the response associated with a given stimulus. Advantages of Using Matching Test Items 1. Require short period of reading and response time allowing the teacher to cover more content. 2. Provide objective measurement of student achievement or ability. 3. Provide highly reliable test scores. 4. Provide scoring efficiency and accuracy. Disadvantages of Using Matching Test Items 1. Have difficulty measuring learning objectives requiring more than simple recall or information. 2. Are difficult to construct due to the problem of selecting a common set of stimuli and responses. Suggestions for Writing Matching Test items 1. Include directions which clearly state the basis for matching the stimuli with the responses. Explain whether or not the response can be used more than once and indicate where to write the answer. Poor: Directions: Match the following. Better: Directions: On the line to the left of each identifying location and characteristics in Column 1, write the letter on the country in column III that is best defined. Each country in Column may be used more than once. 2. Use only homogeneous material in matching items. Poor: Directions: Match the following. 1. _______Water A. NaCI 2. _______Discovered Radium B. Fermi 3. _______Salt C. NH3 4. _______Year of the First Nuclear Fission by man D. 1942 5. _______Ammonia E. Curie Better: Directions: On the line to the left of each compound in column I, write the letter of the compound’s formula presented in column II. Use each formula once. Column I Column II 1. _______Water A.H2SO4 2. _______Salt B. HCI 3. _______Ammonia C. NaCI 4. _______Sulfuric Acid D. H2O E. H2HCI 3. Arrange the list of responses in some systematic order if possible – chronological, alphabetical. Directions: On the line to the left of each definition in column I, write the letter of the defense mechanism in column II that is described. Use each defense mechanism only once. Column I Column II Undesirable Desirable _____1. Hunting for reason to support A. Rationalization A. Denial of Reality one’s belief _____2. Accepting the values and norms of others _____3. As one’s own even if they are contrary to previously held values ______4. Attributing to other’s own unacceptable impulse and thoughts and desires ______5. Ignoring disagreeable And situations, thoughts and desires B. Identification B. Identification C. Projection C. Projection D. Introjection D. Projection E. Denial of Reality E. Rationalization 4. Avoid grammatical or other clues to correct response. Poor: Directions: Match the following in order to complete the sentence on the left. ___1. Igneous rocks are formed A. a hardness of 7 ___2. The formation of coal requires B. with crystalline rock ___3. Ageode is filled C. a metamorphic rock ___4. Feldspar is classified as D. through the solid formation of molten Better: Avoid sentence completion due to grammatical clues. Note: 1. Keep matching items brief, limiting the list of stimuli to under 10 2. Include more responses than stimuli to help prevent answering through the process of elimination. 3. When possible, reduce the amount of reading time by including only short phrases or single word in the response list. Completion Test Items The completion items require the student to answer a question or to finish an incomplete statement by filling in a blank with correct word or phrase. Example: According to Freud, personality is made up of three major systems, the______, the________, and the__________. Advantages of Using Completion Items Completion items can: 1. Provide a wide sampling of content; 2. Efficiency measure lower levels of cognitive ability; 3. Minimize guessing as compared to multiple choice or true-false items; and 4. Usually provide an objective measure of student achievement or ability Limitations of Using Completion Items Completion items: 1. Are difficult to construct so that the desired response is clearly indicated; 2. Have difficulty in measuring learning objectives requiring more than simple recall of information; 3. Can often include irrelevant clues than do other item types; 4. Are more time consuming to score when compared to multiple choice or truefalse items; and 5. Are more difficult to score since more than one answer may have to be considered correct if the item was not properly prepared. Suggestions for Writing Completion Test Items 1. Omit only significant words from the statement. Poor: Every atom has a central (core) called nucleus. Better: Every atom has a central core called a (an) (nucleus) 2. Do not omit so many words from the statement that the intended meaning is lost. Poor: The__ were to Egypt was the__ were to Persia as__ were to the clearly tribes of Israel. Better: The Pharaohs were to Egypt as the__ were to Persia as__ were to the early tribes of Israel. 3. Avoid grammatical or other clues to the correct response. Poor: Most if the United States’ libraries are organized according t o the (Dewey) decimal system. Better: Which organizational system is used by most of the United States’ libraries? (Dewey Decimal) 4. Be sure there is only one correct response. Poor: Trees which shed their leaves annually are (seed-bearing, common). Better: Trees which shed their leaves annually are called (delicious). 5. Make the blanks of equal length. Poor: In Greek mythology, Vulcan was the sun of (Jupiter and Juno). Better: In Greek mythology, Vulcan was the son of___ and___. 6. When possible, delete words at the end of the statement after the student has been presented a clearly defined problem. Poor: (122.5) is the molecular weight of KC103. Better: The molecular weight of KC103 is___. 7. Avoid lifting statements directly from the text, lecture or other sources. 8. Limit the required response to a single word or phrase. Essay Test Items A classroom essay test consists of a small number of questions to which the student is expected to demonstrate his/her ability to: a. Recall factual knowledge; b. Organize this knowledge; and c. Present the knowledge is a logical, integrated answer to the question. Classification of Essay Test: 1. Extended-response essay item 2. Limited Response or Short-answer essay item Example of Extended-Response Essay Item: Explain the difference between the S-R (Stimulus-Response) and the S-O-R (StimulusOrganism-Response) theories of personality. Include in your answer the following: a. Brief description of both theories b. Supporters of both theories c. Research methods used to study each of the two theories (20 pts) Example of Short-Answer Essay Item: Identify research methods used to study the (Stimulus-Response) and the S-O-R (Stimulus-Organism-Response) theories of personality. (10pts) Advantages of Using Essay Items Essay items: 1. Are easier and less time consuming to construct than most item types; 2. Provide a means for testing students’ ability to compose an answer and present it in a logical manner; and 3. Can efficiently measure higher order cognitive objectives – analysis, synthesis, evaluation. Limitations of Using Essay Items Essay Items: 1. Cannot measure a large amount of content or objectives; 2. Generally prove a low test scorer reliability; 3. Require an extensive amount of instructor’s time to read and grade; and 4. Generally do not provide an objective measure of student achievement or ability (subject to bias on the part of the grader) Suggestions for Writing the Essay Test Items 1. Prepare essay items that elicit the type of behaviour you want to measure. Learning Objective: The student will be able to explain how the normal curve serves as a statistical model. Poor: Describe a normal curve in terms of symmetry, modality, kurtosis and skewness. Better: Briefly explain how the normal curve serves as statistical model for estimation and hypothetical testing. 2. Phrase each items so that the student’s task is clearly indicated. Poor: Discuss the economic factors which led to stock market crash of 2008. Better: Identify the three economic conditions which led to the stock market crash of 2008. Discuss briefly each condition in correct chronological sequence and in one paragraph indicate how the three factors were interrelated. 3. Indicate for each item appoint or weight and an estimated the limit for answering. Poor: Compare the writing of Bret Harte and Mark Twain in terms of setting, depth of characterization, and dialogue styles of their main characters. Better: Compare the writings of Bret and Mark Twain in terms of setting, depth of characterization, and dialogue styles of their main characters. (10 points 20 points) 4. Ask questions that will elicit responses on which experts could agree that one answer is better than another. 5. Avoid giving a student a choice among optional items as this greatly reduces the reliability of the test. 6. It is generally recommended for classroom examinations to administer several short-answer items rather than only on or two extended-response items. Guidelines for Grading Essay Items 1. When writing each essay item, simultaneously develop a scoring rubric. 2. To maintain a consistent scoring system and ensure same criteria are applied to all assessments, score one essay across all test prior to scoring the next essay. 3. To reduce the influence of the halo effect, bias and other subconscious factors, all essay questions should be graded blind to the identity of the student. 4. Due to the subjective nature of graded essays, the score on one essay may be influenced by the quality of previous essays. To provide this type of bias, reshuffle the order of assessments after reading through each item. Principle 3: Balanced - A balanced assessments sets target in all sets in domains of learning (cognitive, effective, and psychomotor) or domains of intelligence (verbal-linguistics, logic mathematical, bodily kinaesthetic, visual-spatial, musical-rhythmic, intrapersonal-social, intrapersonal-introspection, physical world-natural-existential-spiritual) - A balanced assessment makes use of both traditional and alternative assessment. Principle 4. Validity Validity – is a degree to which the assessment instrument measures what it intends to measure. It is also refers to the usefulness of the instrument for a given purpose. It is the most important criterion of a good assessment instrument Ways in Establishing Validity 1. Face Validity- is done by examining the physical appearance of the instrument. 2. Content Validity- is done through a careful and critical examination of the objectives of assessment so that it reflects the curricular objectives. 3. Criterion-related Validity- is established statistically such that a set of scores revealed by the measuring instrument IS CORRELATED with the scores obtained in another EXTERNAL PREDICTOR OR MEASURE. It has two purposes: a. Concurrent Validity- describe the present status of the individual by correlating the sets of scores obtained FROM TWO MEASUREs GIVEN CONCURRENTLY. Example: Relate the reading test result with pupils’ average grades in reading given by the teacher. b. Predictive Validity- describes the future performance of an individual by correlating the sets of scores obtained from TWO MEASURES GIVEN AT A LONGER TIME INTERVAL. Example: The entrance examination scores in a test administered to a freshmen class at the beginning of the school year is correlated with the average grades at the end of the school year. 4. Construct Validity- Validity established by analysing the activities and processes that correspond to a particular concept; is established statistically by comparing psychological traits or factors that theoretically influence scores in a test. a. Convergence validity helps to establish construct validity when you use two different measurement procedures and research methods (e.g., participant observation and a survey) in your study to collect data about a construct (e.g., anger, depression, motivation, task performance). b. Divergent validity helps to establish construct validity by demonstrating that the construct you are interested in (e.g., anger) is different from other constructs that might be present in your study (e.g., depression). Factors Influencing the Validity of an Assessment Instrument 1. Under Directions - directions that do not clearly indicate to the students how to respond to the task and how to record the responses tend to reduce validity. 2. Reading Vocabulary and sentence structure too difficult- Vocabulary and sentences structure that are too complicated for the student result in the assessment of reading comprehension thus altering the meaning of assessment result. 3. Ambiguity- Ambiguous statements in assessments task contribute to misinterpretations and confusion. Ambiguity sometimes confuses the better students more than it does the poor students. 4. Inadequate time limits- time limits that do not provide students with enough time to consider the tasks and provide thoughtful responses can reduce the validity of interpretations of results. 5. Overemphasis of easy- to assess aspects of domain at the expense of important, but hard- to assess aspects (construct under the presentation). It is easy to develop test question that asses factual recall and generally harder to develop ones that tap conceptual understanding or higher-order thinking processes such as the evaluation of completing positions or arguments. Hence it is important to guard against under representation of task getting the important, but more difficult to assess aspects of achievement. 6. Test items inappropriate for the outcomes being measured- attempting to measure understanding, thinking, skills and other complex types of achievement with test forms that are appropriate for only measuring factual knowledge will invalidate the results. 7. Poorly constructed test items- test items that unintentionally provide clues to the answer tend to measure the students’ alertness in detecting clues as well as mastery of skills or knowledge the test is intended to measure 8. Test too short- if a test is too short to provide a representative sample of the performance we are interested in its validity will suffer accordingly. 9. Improper arrangement of items- test items are typically arranged in order of difficulty, with the easiest items first. Placing difficult items first in the test may cause students to spend too much time on these and prevent them from reaching items they could easily answer. Improper arrangement may also influenced validity by having a detrimental effect on student motivation. 10. Identifiable pattern of answer- Placing correct answers in some systematic pattern (e.g., T,T,F,F, or B,B,BC,C,C,D,D,D) enables students to guess the answers to some items more easily, and this lowers validity. TABLE OF SPECIFICATIONS – TOS Table of specification is a device for describing test items in terms of the content and the process dimensions. That is, what a student is expected to know and what he or she is expected to do with that knowledge. It is described by combination of content and process in the table of specification. Sample of One way table of specification in Linear Function Content Number of Class Number of Items Sessions Test Item Distribution 1. Definition of linear function 2 4 1-4 2. Slope of a line 2 4 5-8 3. Graph of linear function 2 4 9-12 4. Equation of linear function 2 4 13-16 5. Standard Forms of a line 3 6 17-22 6. Parallel and perpendicular lines 4 8 23-30 7. Application of linear functions 5 10 31-40 20 40 40 TOTAL Number of items= Number of class sessions x desired total number of itens Total number of class sessions Example : Number of items for the topic‖ definition of linear function‖ Number of class session= 2 Desired number of items= 40 Total number of class sessions=20 Number of items= Number of class sessions x desired total number of itens Total number of class sessions =2x40 20 Number of items= 4 Sample of two-way table of specification in Linear Function Content 1.Definition function of linear Class hours Know Com App 2 1 1 1 Analysis Synthesis 2.Slope of a line 2 3.Graph of linear function 2 1 4.Equation of linear function 2 1 1 5.Standard Forms of a line 3 1 1 1 1 6.Parallel and perpendicular line 4 1 2 1 7.Application functions 5 1 1 3 1 3 20 4 6 8 8 7 of linear TOTAL 1 Evaluati on Tota l 1 4 1 1 1 1 1 1 4 1 4 1 6 2 8 1 1 10 7 40 MODULE 4: DESCRIPTION OF ASSESSMENT DATA TEST APPRAISAL ITEM ANALYSIS Item analysis refers to the process of examining the student’s responses to each item in the test. According to Abubakar S. Asaad and William M. Hailaya (Measurement and Evaluation Concepts & Principles) Rex Bookstore (2004 Edition), there are two characteristics of an item. These are desirable and undesirable characteristics. An item that has desirable characteristics can be retained for subsequent use and that with undesirable characteristics is either be revised or rejected. These criteria in determining the desirability and undesirability of an item. a. Difficulty if an item b. Discriminating power of an item c. Measures of attractiveness Difficulty index refers to the proportion of the number of students in the upper and lower groups who answered an item correctly. In a classroom achievement test, the desired indices of difficulty not lower than 0.20 nor higher than 0.80. the average index difficulty form 0.30 or 0.40 to maximum of 0.60. DF = PUG + PLG 2 PUG = proportion of the upper group who got an item right PLG = proportion of the lower group who get an item right Level of Difficulty of an Item Index Range Difficulty Level 0.00-0.20 Very difficult 0.21-0.40 Difficult 0.41-0.60 Moderately Difficult 0.61-0.80 Easy 0.81-1.00 Very Easy Index of Discrimination Discrimination Index is the differences between the proportion of high performing students who got the item and the proportion of low performing students who got an item right. The high and low performing students usually defined as the upper 27% of the students based on the total examination score and the lower 27% of the students based on total examination score. Discrimination are classified into positive Discrimination if the proportion of students who got an item right in the upper performing group is greater than the students in the upper performing group. And Zero Discrimination if the proportion of the students who got an item right in the upper performing group and low performing group are equal. Discrimination Index Item Evaluation 0.40 and up Very good item 0.30-0.39 Reasonably good item but possibly subject to improvement 0.20-0.29 Marginal, usually needing and being subject to improvement Below 0.19 Poor Item, to be rejected or improved by version Maximum Discrimination is the sum of the proportion of the upper and lower groups who answered the item correctly. Possible maximum discrimination will occur if the half or less of the sum of the upper and lower groups answered an item correctly. Discriminating Efficiency is the index of discrimination divided by the maximum discrimination. PUG = proportion of the upper group who got an item right PLG= proportion of the lower group who got an item right D i = discrimination index DM – Maximum discrimination DE = Discriminating Efficiency Formula: Di = PUG – PLG DE = Di DM DM= PUG + PLG Example: Eighty students took an examination in Algebra, 6 students in the upper group got the correct answer and 4 students in the lower group got the correct answer for item number 6. Find the Discriminating efficiency Given: Number of students took the exam = 80 27% of 80 = 21.6 or 22, which means that there are 22 students in the upper performing group and 22 students in the lower performing group. P UG = 6/22 = 27% P LG = 4/22 = 18% Di = PUG- PLG = 27%- 18% Di= 9% DM = PUG +PLG = 27% + 18% DM= 45% DE = Di/DM = .09/.45 DE = 0.20 or 20% This can be interpreted as on the average, the item is discriminating at 20% of the potential of an item of its difficulty. Measures of Attractiveness To measure the attractiveness of the incorrect option ( distracters) in multiple-choice tests, we count the number if students who selected the incorrect option in both upperand lower groups. The incorrect option is said to be effective distracter if there are more students in the lower group chose that incorrect option than those students in the upper group. Steps of Item Analysis 1. Rank the scores of the students from highest score to lowest score. 2. Select 27% of the papers within the upper performing group and 27% of the papers within the lower performing group. 3. Set aside the 46% of papers because they will not be used for item analysis. 4. Tabulate the number of students in the upper group and lower group who selected each alternative. 5. Compute the difficulty of each item 6. Compute the discriminating powers of each item 7. Evaluate the effectiveness of the distracters MODULE 5: INTERPRETATION AND UTILIZATION OF TEST RESULTS CRITERION-REFERENCED INTERPRETATION INTERPRETATION OF TEST RESULTS VS. NORM-REFRENCED STATISTICAL ORGANIZATION OF TEST SCORES We shall discusse different statistical technique used in describing and analyzing test results. 1. Measures of Central Tendency (Averages) 2. Measures of Variability ( Spread of Scores 3. Measures of Relationship (Correlation) 4. Skewness Measures of Central Tendency it is a single value that is used to identify the center of the data, it is taught as the typical value in a set of scores. It tends to lie within the center if it is arranged form lowest to highest or vice versa. There are three measures of central tendency commonly used; the mean, median and mode. The Mean The Mean is the common measures of center and it also know as the arithmetic average. Sample Mean = ∑x n ∑= sum of the scores X= individual scores n = number of scores Steps in solving the mean value using raw scores 1. Get the sum of all the scores in the distribution 2. Identify the number of scores (n) 3. Substitute to the given formula and solve the mean value Example: Find the mean of the scores of students in algebra quiz (x) scores in algebra 45 35 48 60 44 39 47 55 58 54 ∑x = 485 n= 10 Mean = ∑x n = 485÷ 10 Mean = 48.5 Properties of Mean 1. 2. 3. 4. 5. 6. 7. Easy to compute It may be an actual observation in the data set It can be subjected to numerous mathematical computation Most widely used Each data affected by the extremes values It is easily affected by the extremes values Applied to interval level data The Median The median is a point that divides the scores in a distribution into two equal parts when the scores are arranged according to magnitude, that is from lowest score to highest score or highest score to lowest score. If the number of score is an odd number, the value of the median is the middle score. When the number of scores is even number, the median values is the average of the two middle scores. Example: 1. Find the median of the scores of 10 students in algebra quiz. (x) scores of students in algebra 45 35 38 60 44 39 47 55 58 54 First , arrange the scores from lowest to highest and find the average of two middle most scores since the number of cases in an even. 35 39 44 45 47 48 54 55 58 60 Mean = 47 + 48 2 = 47.5 is the median score 50% of the scores in the distribution fall below 47.5 Example 2. Find the median of the scores of 9 students in algebra quiz (x) scores of students in algebra 35 39 44 45 47 48 54 55 58 The median value is the 5th score which is 47. Which means that 50% of the scores fall below 47. Properties of Median 1. 2. 3. 4. It is not affected by extremes values It is applied to ordinal level of data The middle most score in the distribution Most appropriate when there are extremes scores The Mode The mode refers to the score or scores that occurred most in the distribution. There are classification of mode: a) unimodal is a distribution that consist of only one mode. B) bimodal is a distribution of scores that consist of two modes, c) multimodal is a score distribution that consist of more than two modes. Properties of Mode 1. 2. 3. 4. 5. It is the score/s occurred most frequently Nominal average It can be used for qualitative and quantitative data Not affected by extreme values It may not exist Example 1. Find the mode of the scores of students in algebra quiz: 34,36,45,65,34,45,55,61,34,46 Mode= 34 , because it appeared three times. The distribution is called unimodal. Example 2. Find the mode of the scores of students in algebra quiz: 34,36,45,61,34,45,55,61,34,45 Mode = 34 and 45, because both appeared three times. The distribution is called bimodal Measures of Variability Measures of Variability is a single value that is used to describe the spread out of the scores in distribution, that is above or below the measures of central tendency. There are three commonly used measures variability, the range, quartile deviation and standard deviation The Range Range is the difference between highest and lowest score in the data set. R=HS-LS Properties of Range 1. Simplest and crudest measure 2. A rough measure of variation 3. The smaller the value, the closer the score to each other or the higher the value, the more scattered the scores are. 4. The value easily fluctuate, meaning if there is a changes in either the highest score or lowest score the value of range easily changes. Example: scores of 10 students in Mathematics and Science. Find the range and what subject has a greater variability? Mathematics Science 35 35 33 40 45 25 55 47 62 55 34 35 54 45 36 57 47 39 40 52 Mathematics Science HS = 62 HS =57 LS= 33 LS= 25 R = HS-LS R= HS-LS R= 62-33 R= 57-25 R= 29 R= 32 Based form the computed value of the range, the scores in Science has greater variability. Meaning, scores in Science are more scattered than in the scores in Mathematics The Quartile Deviation Quartile Deviation is the half of the differences the third quartile (Q3) and the first quartile (Q1). It is based on the middle 50% of the range, instead the range of the entire set Of distribution. In symbol QD = Q3-Q1 2 QD= quartile deviation Q3= third quartile value Q1= first quartile value Example : In a score of 50 students, the Q3 = 50.25 and Q1 = 25.45, Find the QD QD = Q3-Q1 2 =50.25 – 25.4 2 QD= 12.4 The value of QD =12.4 which indicates the distance we need to go above or below the median to include approximately the middle 50% of the scores. The standard deviation The standard deviation is the most important and useful measures of variation, it is the square root of the variance. It is an average of the degree to which each set of scores in the distribution deviates from the mean value. It is more stable measures of variation because it involves all the scores in a distribution rather than range and quartile deviation. √∑( x-mean)2 SD = n-1 where ,x = individual score n= number of score in a distribution Example: 1. Find the standard deviation of scores of 10 students in algebra quiz. Using the given data below. X (x-mean)2 45 12.25 35 182.25 48 0.25 60 132.25 44 20.5 39 90.25 47 2.25 55 42.25 58 90.25 54 30.25 ∑x= 485 ∑(x-mean)2 = 602.25 N= 10 Mean = ∑x N = 485 10 Mean= 48.5 SD= √∑(x-mean)2 n-1 SD= √ 602.5 10-1 SD= √ 66.944444 SD= 8.18, this means that on the average the amount that deviates from the mean value= 48.5 is 8.18 Example 2: Find the standard deviation of the score of 10 students below. In what subject has greater variability Mathematics Science 35 35 33 40 45 25 55 47 62 55 34 35 54 45 36 57 47 39 40 52 Solve for the standard deviation of the scores in mathematics Mathematics (x) (x-mean)2 35 82.81 33 123.21 45 0.81 55 118.81 62 320.41 34 102.01 54 98.01 36 65.61 47 8.41 40 16.81 ∑x = 441 ∑(x-mean)2 = 936.9 Mean = 44.1 ∑(x-mean)2= 918 SD= √∑(x-mean)2 n-1 = √ 936.9 10-1 √ = 104.1 SD = 10.20 for the mathematics subject Solve for the standard deviation of the score in science Science (x) (x-mean)2 36 64 40 9 25 324 47 16 55 144 35 64 45 4 57 196 39 16 52 81 ∑x= 430 ∑(x-mean)2= 918 Mean =430 10 Mean= 43 SD= √∑(x-mean)2 n-1 = √ 918 10-1 √ 102 = SD= 10.10 for science subject The standard deviation for mathematics subject is 10.20 and the standard deviation foe science subject is 10.10, which means that mathematics scores has a greater variability than science scores. In other words, the scores in mathematics are more scattered than in science. Interpretation of Standard Deviation When the value of standard deviation is large, on the average, the scores will be far form the mean. On the other hand, If the value of standard deviation is small, on the average, the score will be close form the mean. Coefficient of Variation Coefficient of variation is a measure of relative variation expressed as percentage of the arithmetic mean. It is used to compare the variability of two or more sets of data even when the observations are expressed in different units of measurement. Coefficient of variation can be solve using the formula. ( ) CV = SD x 100% Mean The lower the value of coefficient of variation, the more the overall data approximate to the mean or more the homogeneous the performance of the group Group Mean Standard deviation A 87 8.5 B 90 10.25 CV Group A= standard deviation Mean = 8.5 x 87 CV Group A=9.77% x 100% 100% CV GroupB= standard deviation Mean x 100% = 10.25 x 90 CV Group B=11.39% 100% The CV of Group A is 9.77% and CB of Group B is 11/39%, which means that group A has homogenous performance. Percentile Rank The Percentile rank of a score is the percentage of the scores in the frequency distribution which are lower. This means that the percentage of the examinees in the norm group who scored below the score of interest. Percentile rank are commonly used to clarify the interpretation of scores on standardized tests. Z- SCORE Z- score (also known as standard score) measures how many standard deviations an observations is above or below the mean. A positive z-score measures the number of standard deviation a score is above the mean, and a negative z-negative z-score gives the number of standard deviation a score is below the mean. The z-score can be computed using the formula Z= x-µ o for population Z= x-mean for sample SD Where X= is a raw score 0= is the standard deviation of the population µ= is the mean of the population SD= is the standard deviation of the sample EXAMPLE: James Mark’s examination results in the three subjects are as follows: Subject Mean Standard deviation James Mark’s Grade Math Analysis 88 10 95 Natural Science 85 5 80 Labor Management 92 7.5 94 EXAMPLE:A study showed the performance of two Groups A and B in a certain test given by a researcher. Group A obtained a mean score of 87 points with standard deviation of 8.5 points, Group B obtained a mean score of 90 points with standard deviation of 10.25 points. Which of the two group has a more homogeneous performance? In what subject did James Mark performed best? Very Poor? Z math analysis = 95-88 10 Z math analysis = 0.70 Z natural science= 80-85 5 Z natural Science= -1 Z labor management = 94-92 7.5 Z labor management = 0.27 James Mark had a grade in Math Analysis that was 0.70 standard deviation above the mean of the Math Analysis grade, while in Natural Science he was -1.0 standard deviation below the mean of Natural Science grade. He also had a grade in Labor Management that was 0.27 standard deviation above the mean of the Labor Management grades. Comparing the z scores, James Mark performed best in Mathematics Analysis while he performed very poor in Natural Science in relation to the group performance. T-score T-score can be obtained by multiplying the z-score by 10 and adding the product to 50. In symbol, Tscore = 10z +50 Using the same exercise, compute the T-score of James Mark in Math Analysis, Natural Science and Labor Management T- score (math analysis) = 10 (.7) +50 = 57 T- score (natural science) = 10(-1)+50 = 40 T-score (labor management) = 10(0.27) +50 =52.7 Since the highest T-score us in math analysis = 57, we can conclude that James Mark performed best in Math analysis than in natural science and labor management. Stanine Stanine also known as standard nine, is a simple type of normalized standard score that illustrate the process of normalization. Stanines are single digit scores ranging form 1 to 9. The distribution of new scores is divided into nine parts Percent in Stanines 4% 7% 12% 17% 20% 17% 12% 7% 4% 2 3 4 5 6 7 8 9 Stanines 1 Skewness Describes the degree of departures of the distribution of the data from symmetry. The degree of skewness is measured by the coefficient of lsewness, denoted as SK and computed as, SK= 3(mean-media) SD Normal curve is a symmetrical bell shaped curve, the end tails are continuous and asymptotic. The mean, median and mode are equal. The scores are normally distributed if the computed value of SK=0 Areas Under the Normal Curve Positively skewed when the curve is skewed to the right, it has a long tail extending off to the right but a short tail to the left. It increases the presence of a small proportion of relatively large extreme value SK˃0 When the computed value of SK is positive most of the scores of students are very low, meaning to say that they performed poor in the said examination Negatively skewed when a distribution is skewed to the left. It has a long tail extending off to the left but a short tail to the right. It indicates the presence of a high proportion of relatively large extreme values SK˂0. When the computed value of SK is negative most of the students got a very high score, meaning to say that they performed very well in the said examination MODULE 6: MARKS/GRADES AND GRADING SYSTEM BASIC TERMINOLOGY Marks= Cumulative grades that reflect students’ academic progress during a period of instruction Score= Reflect performance on a single assessment Grades= Can be used interchangeably with marks Most of the time these terms are used to mean the same thing FEEDBACK AND EVALUATION Test results can be used for a variety of reasons, such as informing students of their progress, evaluating achievement, and assigning grades Formative evaluation= Activities that are aimed at providing feedback to the students Summative evaluation= Activities that determine the worth, value, or quality of an outcome Often involve the assignment of a grade INFORMAL AND FORMAL EVALUATION Informal evaluation= Not planned and not standardized Can come in the form of commentary such as ―great work‖ or ―try that one again‖ Formal evaluation= More likely to be applied consistently and be written out Includes scores and commentary, often written down THE USE OF FORMATIVE EVALUATION IN SUMMATIVE EVALUATION Sometimes, formative assessment and evaluation can feed into summative evaluation This is recommended more in courses of study that are topical, rather than sequential, as mastery of earlier concepts may not reflect on the assessment of later ones, and vice versa REPORTING STUDENT PROGRESS: WHICH SYMBOLS TO USE? This is often decided by the administration or state Most teachers are familiar with letter (A, B, C, D, F) and numerical (0-100) grades Verbal descriptors= Grades like excellent or needs improvement Pass-fail= A variant of mastery grading in which most students are expected to master the content (i.e. ―pass‖) Supplemental systems= Using means of communication like phone calls home, checklists of objectives, or other methods to communicate feedback BASIS OF GRADES Before assigning grades, consider: Are the grades solely based on academic achievement, or are there other factors to consider? Factors could include attendance, participation, attitudes, etc. Most experts recommend making academic chievement the sole basis for assigning grades If desired, the recommendation is to keep a separate rating system for such nonachievement factors to keep achievement grades unbiased When comparing grades (5th grade to 6th grade, for example) it is critical to consider how grades were calculated. Grades based heavily on homework will not be comparable to grades based heavily on testing. FRAME OF REFERENCE After deciding what to base your grades on, you will then have to decide how you’re going to interpret and compare student scores There are several different frames of reference that suit different needs NORM-REFERENCE GRADING (RELATIVE GRADING) Involves comparing each student’s performance to that of a reference group Also known as ―grading on a curve‖ In this arrangement, a certain amount of students receive each grade (10% receive A’s, 20% receive B’s, and so on) Straightforward method of grading, and helps reduce grade inflation However, depending on the reference group used as a basis, this frame of reference is not always considered fair Another approach is to use ranges instead of exact percentages (10-20% A’s, 20-30% B’s, etc.) CRITERION-REFERENCED GRADING (ABSOLUTE GRADING) Involves comparing a student’s performance to a specified level of performance One common system is the percentage system (A= 90-100%, B=8089%, etc.) Marks directly describe student performance However, there may be considerable variability between teachers of how they assign grades (lower vs. higher expectations) ACHIEVEMENT IN RELATION TO IMPROVEMENT OR EFFORT Students who make higher learning gains earn better grades than those who make smaller gains This method of grading can be risky, as students may figure out to start the year or unit low and finish high to earn a better grade There are also many other technical factors, including the fact that this is not a pure measure of achievement, but a measure of effort as well Can motivate poor students, but may have a negative effect on strong students ACHIEVEMENT RELATIVE TO ABILITY Usually based on performance on an intelligence test There are also numerous technical and consistency issues to be taken into consideration when using this approach RECOMMENDATION Most experts recommend using absolute than relative grading systems, as they represent pure measures of student achievement Both grading systems have their strengths and limitations, which should be taken into consideration when deciding which to use Reporting both styles of grades is also an option COMBINING GRADES INTO A COMPOSITE The decision of how much certain grades should weigh into the composite (or final grade) is up to the teacher or department and is based on the importance of different types of assignments (e.g., five response papers might be 10% each, with 12.5% assigned to 4 tests; this is different from 50% assigned to one major paper and 50% to one cumulative test) There are several different methods of equating scores into composite scores, although most schools have commercial gradebook programs that do this for the teacher INFORMING STUDENTS OF GRADING SYSTEM Students should be informed early on in the course about exactly how they will be graded well before any assessment procedures have taken place Parents should also be informed of grading procedures It is the professional responsibility of a teacher to explain the scores/grades to students and parents in ways that the explanation is understandable This can be done by simply handing out a sheet with a breakdown of the weights of different grades, though it is recommended that Q & A sessions are conducted PARENT CONFERENCES Parent-teacher conferences should be professional and the information disclosed should be kept confidential Discussion should only concern the individual student Teachers should have a file folder or computer file of the student’s performance and grades readily available Presenting the students work as evidence/an indicator of grades is also recommended FUNCTIONS OF GRADING AND REPORTING IN GRADING SYSTE, 1. Improve students’ learning by: clarifying instructional objectives for them showing students’ strengths & weaknesses providing information on personal-social development enhancing students’ motivation (e.g., short-term goals) indicating where teaching might be modified Best achieved by: day-to-day tests and feedback plus periodic integrated summaries 2. Reports to parents/guardians Communicates objectives to parents, so they can help promote learning Communicates how well objectives being met, so parents can better plan 3.Administrative and guidance uses Help decide promotion, graduation, honors, athletic eligibility Report achievement to other schools or to employers Provide input for realistic educational, vocational, and personal counseling TYPES OF GRADING AND REPORTING SYSTEM 1.Traditional letter-grade system Easy and can average them But of limited value when used as the sole report, because: 1. they end up being a combination of achievement, effort, work habits, behavior 2. teachers differ in how many high (or low) grades they give 3. they are therefore hard to interpret 4. they do not indicate patterns of strength and weakness 2. Pass-fail Popular in some elementary schools Used to allow exploration in high school/college Should be kept to the minimum, because: 1. do not provide much information 2. students work to the minimum In mastery learning courses, can leave blank till ―mastery‖ threshold reached 3. Checklists of objectives Most common in elementary school Can either replace or supplement letter grades Each item in the checklist can be rated: Outstanding, Satisfactory, Unsatisfactory; A, B, C, etc. Problem is to keep the list manageable and understandable 4. Letters to parents/guardians Useful supplement to grades Limited value as sole report, because: 1. very time consuming 2. accounts of weaknesses often misinterpreted 3. not systematic or cumulative Great tact needed in presenting problems (lying, etc.) 5. Portfolios Set of purposefully selected work, with commentary by student and teacher Useful for: 1. showing student’s strengths and weaknesses 2. illustrating range of student work 3. showing progress over time or stages of a project 4. teaching students about objectives/standards they are to meet 6. Parent-teacher conferences Used mostly in elementary school Portfolios (when used) are useful basis for discussion Useful for: 1. two-way flow of information 2. getting more information and cooperation from parents Limited in value as the major report, because 1. time consuming 2. provides no systematic record of progress 3. some parents won’t come HOW YOU SHUOLD DEVELOP GRADING SYSTEM? 1. Guided by the functions to be served will probably be a compromise, because functions often conflict but always keep achievement separate from effort 2. Developed cooperatively (parents, students, school personnel) more adequate system more understandable to all 3. Based on clear statement of learning objectives are the same objectives that guided instruction and assessment some are general, some are course-specific aim is to report progress on those objectives practicalities may impose limits, but should always keep the focus on objectives 4. Consistent with school standards should support, not undermine, school standards should use the school’s categories for grades and performance standards should actually measure what is described in those standards 5. Based on adequate assessment implication: don’t promise something you cannot deliver design a system for which you can get reliable, valid data 6. Based on the right level of detail detailed enough to be diagnostic but compact enough to be practical 1. not too time consuming to prepare and use 2. understandable to all users 3. easily summarized for school records probably means a letter-grade system with more detailed supplementary reports 7. Providing for parent-teacher conferences as needed regularly scheduled for elementary school as needed for high school ASSIGNING LETEER GRADES What to include? Only achievement Avoid temptation to include effort for less able students, because: 1. difficult to assess effort or potential 2. difficult to distinguish ability from achievement 3. would mean grades don’t mean same thing for everyone (mixed message, unfair) How to combine data? Properly weight each component to create a composite Must put all components on same scale to weight properly: 1. equate ranges of scores (see example on p. 389, where students score 10-50 on one test and 80-100 on another) 2. or, convert all to T-scores or other standard scores (see chapter 19) What frame of reference? Relative—score compared to other students (where you rank) 1. grade (like a class rank) depends on what group you are in, not just your own performance 2. typical grade may be shifted up or down, depending on group’s ability 3. widely used because much classroom testing is norm-referenced Absolute—score compared to specified performance standards (what you can do) 1. grade does NOT depend on what group you are in, but only on your own performance compared to a set of performance standards 2. complex task, because must I. clearly define the domain II. clearly define and justify the performance standards III. do criterion-referenced assessment 3. conditions hard to meet except in complete mastery learning settings Learning ability or improvement—score compared to learning ―potential‖ or past performance 4. widely used in elementary schools 5. inconsistent with a standards-based system (each child is their own standard) 6. reliably estimating learning ability (separate from achievement) is very difficult 7. can’t reliably measure change with classroom measures 8. therefore, should only be used as a supplement What distribution of grades? Relative (have ranked the students)—distribution is a big issue 1. normal curve defensible only when have large, unselected group 2. when ―grading on the curve,‖ school staff should set fair ranges of grades for different groups and courses 3. when ―grading on the curve,‖ any pass-fail decision should be based on an absolute standard (i.e., failed the minimum essentials) 4. standards and ranges should be understood and followed by all teachers Absolute (have assessed absolute levels of knowledge)—not an issue 1. system seldom uses letter grades alone 2. often includes checklists of what has been mastered (see example on p. 395) 3. distribution of grades is not predetermined Guidelines for Effective Grading 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Describe grading procedures to students at beginning of instruction. Clarify that course grade will be based on achievement only. Explain how other factors (effort, work habits, etc.) will be reported. Relate grading procedures to intended learning outcomes. Obtain valid evidence (tests, etc.) for assigning grades. Try to prevent cheating. Return and review all test results as soon as possible. Properly weight the various types of achievements included in the grade. Do not lower an achievement grade for tardiness, weak effort, or misbehavior. Be fair. Avoid bias. When in doubt, review the evidence. If still in doubt, give the higher grade. Conducting Parent-Teacher Conferences Productive when: Carefully planned Teacher is skilled Guidelines for a good conference 1. Make plans Review your goals Organize the information to present Make list of points to cover and questions to ask If bring portfolios, select and review carefully 2. Start positive—and maintain a positive focus 3. Present student’s strong points first Helpful to have example of work to show strengths and needs Compare early vs. later work to show improvement 4. Encourage parents to participate and share information Be willing to listen Be willing to answer questions 5. Plan actions cooperatively What steps you can each take Summarize at the end 6. End with positive comment Should not be a vague generality Should be true 7. Use good human relations skills DO Be friendly and informal Be positive in approach Be willing to explain in understandable terms Be willing to listen Be willing to accept parents’ feelings Be careful about giving advice DON’T Argue, get angry Ask embarrassing questions Talk about other students, parents, teachers Bluff if you don’t know Reject parents’ suggestions Be a know-it-all with pat answers Reporting Standardized Test Results to Parents Aims Present test results in understandable language, not jargon Put test results in context of total pattern of information about the student Keep it brief and simple Actions 1. Describe what the test measures Use a general statement: e.g., ―this test measures skills and abilities that are useful in school learning‖ Refer to any part of the test report that may list skill clusters Avoid misunderstandings by: a. not referring to tests as ―intelligence‖ tests b. not describing aptitudes and abilities as fixed c. not saying that a test predicts outcomes for an individual person (can say ―people with this score usually….‖ Let a counselor present results for any non-cognitive test (personality, interests, etc.) 2. Explain meaning of test scores (chapter 19 devoted to this) For norm-referenced 1. explain norm group 2. explain score type (percentile, stanine, etc.) 3. stay with one type of score, if possible For criterion-referenced 1. more easily understood than norm-referenced 2. usually in terms of relative degree of mastery 3. describe the standard of mastery 4. may need to distinguish percentile from percent correct 3. Clarify accuracy of scores Say all tests have error Stanines already take account of error (because so broad). Two stanine difference is probably a real difference For other scores, use confidence bands when presenting them If you refer to subscales with few items, describe them as only ―clues‖ and look for related evidence. 4. Discuss use of test results Coordinate all information to show what action they suggest Decisions in Assigning Grades 1. What should grades include (effort, achievement, neatness, spelling, good behavior, etc.)? 2. Grades for individual assessments criterion-reference or norm-referenced? 1. if criterion-referenced, what standard? 2. if norm-referenced, what reference group? letter grades or numbers? 3. Combining assessments for a composite grade what common numerical scale? 1. percentages 2. standard scores 3. range of scores (max-min) 4. combining absolute and relative grades weight to give different assessments? what cut-off points for letter grades? MODULE 7: AUTHENTIC ASSESSMENT MODE OF ASSESSMENT A. Traditional Assessment 1. Assessment in which students typically select an answer or recall information to complete the assessment. Test may be standardized or teacher made test, these tests may be multiplechoice, fill-in-the-blanks, true-false, matching type. 2. Indirect measures of assessment since the test items are designed to represent competence by extracting knowledge and skills from their real life context. 3. Items on standardized instrument tends to test only the domain of knowledge and skill to avoid ambiguity to the test takers. 4. One-time measures to rely on a single correct answer to each item. There is a limited potential for traditional test to measure higher order thinking skills. B. Performance assessment 1. Assessment in which students are asked to perform real-world tasks that demonstrate meaningful application of essential knowledge and skills 2. Direct measures of students’ performance because task are design to incorporate contexts, problems, and solutions strategies that students would use in real life. 3. Designed ill-structured challenges since the goal is to help students prepare for the complex ambiguities in life. 4. Focus on processes and rationales. There is no single correct answer, instead students are led to craft polished, thorough and justifiable responses, performances and products. 5. Involve long-range projects, exhibits, and performances are linked to the curriculum 6. Teacher is an important collaborator in creating tasks, as well as in developing guidelines for scoring and interpretation C. Portfolio Assessment 1. Portfolio is a collection of student’s work specifically to tell a particular story about the student. 2. A portfolio is not a pie of student work that accumulates over a semester or year 3. A portfolio contains a purposefully selected subset of student work 4. It measures the growth and development of students. TRADITIONAL ASSESSMENT VS. AUTHENTIC ASSESSMENT Traditional ----------------------------- Authentic Selecting a Response ------------------- Performing a Task Contrived -------------------------------- Real-life Recall/Recognition -------------------------------- Construction/Application Teacher-structured ---------------------------------- Student-structured Indirect Evidence --------------------------------- Direct Evidence Seven Criteria in Selecting a Good Performance Assessment Task 1. Authenticity – the task is similar to what the students might encounter in the real world as opposed to encountering only in school. 2. Feasibility – the task is realistically implemented in relation to its cost, space, time, and equipment requirements. 3. Generalizability – the likelihood that the students’ performance on the task will generalize to comparable tasks. 4. Fairness – the task is fair to all students regardless of their social status or gender. 5. Teachability – the task allows one to master the skill that one should be proficient in. 6. Multi Foci – the task measures multiple instructional outcomes. 7. Scorability – the task can be reliably and accurately evaluated. Rubrics Rubrics is a scoring scale and instructional tool to assess the performance of student using a task-specific set of criteria. It contains two essential parts: the criteria for the task and levels of performance for each criterion. It provides teachers an effective means of students-centered feedback and evaluation of the work of students. It also enables teachers to provide a detailed and informative evaluations of their performance. Rubrics is very important most especially if you are measuring the performance of students against a set of standard or pre-determined set of criteria. Through the use of scoring rubrics or rubrics the teachers can determine the strengthens and weaknesses of the students, hence it enables the students to develop their skills. Steps in developing a Rubrics 1. Identify your standards, objectives and goals for your students. Standard is a statement of what the students should be able to know or be able to perform. It should indicate that your students should be able to know or be able to perform. It should indicate that your students should met these standards. Know also the goals for instruction, what are the learning outcomes. 2. Identify the characteristics of a good performance on the task, the criteria, when the students perform or present their work, it should indicate that they performed well in the task given to them; hence they met that particular standards. 3. Identify the levels of performance for each criterion. There is no guidelines with regards to the number of levels of performance, it vary according to the task and needs. It can have as few as two levels of performance or as many as the teacher can develop. In this case, the rater can sufficiently discriminate the performance of the students in each criteria. Through this levels of performance, the teacher or the rater can provide more detailed feedback about the performance of the students. It is easier also for the teacher and students to identify the areas needed for improvement. Types of Rubrics 1. Holistic Rubrics In holistic rubrics does not list a separate levels of performance for each criterion. Rather, holistic, rubrics assigns a level of performance along with a multiple criterion as a whole, in other words you put all the component together. Advantage: quick scoring, provide overview of students’ achievement. Disadvantage: does not provide detailed information about the student performance in specific areas of the content and skills. May be difficult to provide one overall score. 2. Analytic Rubrics In analytic rubrics the teacher or the rater identify and assess components of a finished product. Breaks down the final product into component parts and each part is scored independently. The total score is the sum of all the rating for all the parts that are to be assessed or evaluated. In analytic scoring, it is very important for the rater to treat each part as separate to avoid bias toward the whole product. Advantage: more detailed feedback, scoring more consistent across students and graders. Disadvantage: time consuming to score. Example of Holistic Rubric 3-Excellent Researcher Included 10-12 sources No apparent historical inaccuracies Can easily tell which sources information was drawn from All relevant information is included 2- Good Researcher Included 5-9 sources Few historical inaccuracies Can tell with difficulty where information came from Bibliography contains most relevant information 1-Poor Researcher Included 1-4 sources Lots of historical inaccuracies Cannot tell from which source information came from Bibliography contains very little information Example of Analytic Rubric Criteria Limited Acceptable Proficient 1 2 1 Made good observations Observations are Most observations All observations absent or vague are clear and detailed clear and detailed are Made good predictions Predictions are Most predictions are All predictions absent or irrelevant reasonable reasonable are Appropriate conclusion Conclusion absent inconsistent observation is Conclusion is or consistent with most with observations Conclusion consistent observations is with Advantages of Using Rubrics When assessing the performance of the students using performance based assessment it is very important to use scoring rubrics. The advantages of using rubrics in assessing student’s performance are: 1. 2. 3. 4. 5. 6. Rubrics allow assessment to become more objective and consistent Rubrics clarify the criteria in specific terms Rubrics clearly show the student how work will be evaluated and what is expected Rubrics promote student awareness of the criteria to use in assessing peer performance Rubrics provide useful feedbacks regarding the effectiveness of the instruction: and Rubrics provide benchmarks against which to measure and document progress PERFORMANCE BASED ASSESSMENT Performance based assessment is a direct and systematic observation of actual performances of the students based from a pre-determined performance criterion as cited by (Gabuyo, 2011). It is an alternative form of assessing the performance of the students that represent a set of strategies for the application of knowledge, skills and work habits through the performance of tasks that are meaningful and engaging to students‖ Framework of Assessment Approaches Selection Type Supply Type Product Performance True-false Completion Essay, story or poem Oral presentation report Multiple-choice Label a diagram Writing portfolio Musical, dance or dramatic performance Matching type Short answer Research report Typing test Concept man Portfolio exhibit, Art exhibit Diving Writing journal Laboratory demonstration of Cooperation in group works Forms of Performance Based Assessment 1. Extended response task a. Activities for single assessment may be multiple and varied b. Activities may be extended over a period of time c. Products from different students may be different in focus 2. Restricted-response tasks a. Intended performances more narrowly defined than extended-response tasks. b. Questions may begin like a multiple-choice or short answer stem, but then ask for explanation, or justification. c. May have introductory material like an interpretative exercise, but then asks for an explanation of the answer, not just the answer itself 3. Portfolio is a purposeful collection of student work that exhibits the student’s efforts, progress and achievements in one or more areas. Uses of Performance Based Assessment 1. Assessing the cognitive complex outcomes such as analysis, synthesis and evaluation 2. Assessing non-writing performances and products 3. Must carefully specify the learning outcomes and construct activity or task that actually called forth. Focus of Performance Bases Assessment Performance based assessment can assess the process, or product or both (process and product) depending on the learning outcomes. It also involves doing rather that just knowing about the activity or task. The teacher will assess the effectiveness of the process or procedures and the product used in carrying out the instruction. The question is when to use the process and the product? Use the process when: 1. 2. 3. 4. 5. There is no product The process is orderly and directly observable; Correct procedures/steps in crucial to later success; Analysis of procedural steps can help in improving the product, Learning is at the early age. Use the product when: 1. 2. 3. 4. Different procedures result in an equally good product; Procedures not available for observation; The procedures have been mastered already; Products have qualities that can be identified and judge The final step in performance assessment is to assess and score the student’s performance. To assess the performance of the students the evaluator can used checklist approach, narrative or anecdotal approach, rating scale approach, and memory approach. The evaluator can give feedback on a student’s performance in the form of narrative report or grade. There are different was to record the results of performance-based assessments. 1. Checklist Approach are observation instruments that divide performance whether it is certain or not certain. The teacher has to indicate only whether or not certain elements are present in the performances 2. Narrative/Anecdotal Approach is continuous description of student behavior as it occurs, recorded without judgment or interpretation. The teacher will write narrative reports of what was done during each of the performances. Form these reports teachers can determine how well their students met their standards. 3. Rating Scale Approach is a checklist that allows the evaluator to record information on a scale, noting the finer distinction that just presence or absence of a behavior. The teacher they indicate to what degree the standards were met. Usually, teachers will use a numerical scale. For instance, one teacher may arte each criterion on a scale of one to five with one meaning ―skills barely present‖ and five meaning ―skill extremely well executed.‖ 4. Memory Approach the teacher observes the students when performing the tasks without taking any notes. They use the information from memory to determine whether or not the students were successful. This approach is not recommended to use for assessing the performance of the students. PORTFOLIO ASSESSMENT Portfolio assessment is the systematic, longitudinal collection of student work created in response to specific, know instructional objectives and evaluated in relation to the same criteria. Student Portfolio is a purposeful collection of student work that exhibits the students’ efforts, progress and achievements in one or more areas. The collection must include student participation in selecting contents, the criteria for selection, the criteria for judging merit and evidence of student self-reflection. Comparison of Portfolio and Traditional Forms of Assessment Traditional Assessment Portfolio Assessment Measures student’s ability at one time Measures student’s ability over time Done by the teacher alone, students are not aware Done by the teacher and the students, the students of the criteria are aware of the criteria Conducted outside instruction Embedded in instruction Assigns student a grade Involves student in own assessment Does not capture the students language ability Capture many facets if performance language learning Does not include the teacher’s knowledge of Allows for expression of teacher’s knowledge of student as a learner student as learner Does not gives student responsibility Student learns how to take responsibility THREE TYPES OF PORFOLIO There are three basic types of portfolio to consider for classroom use. These are working portfolio, showcase portfolio and progress portfolio 1. Working Portfolio The first type of portfolio is working portfolio also known as ―teacher -student portfolio‖. As the name implies that it is a project ―in work‖ it contains the work in progress as well as the finished samples of work use to reflect in process by the students and teachers. It documents the stages of learning and provides a progressive record of student growth. This is interactive teacher-student portfolio that aids in communication between teacher and student. The working portfolio may be used to diagnose student needs. In both student and teacher have evidence of student strengths and weakness in achieving learning objectives, information extremely useful in designing future instruction. 2. Showcase Portfolio Showcase portfolio is the second type of portfolio and also known as best works portfolio or display portfolio. In this kind of portfolio, it focuses on the student’s best and most representative work. It exhibits the best performance of the student. Best works portfolio may document student activities beyond school for example a story written at home. It is just like an artist’s portfolio where a variety of work is selected to reflect breadth of talent, painters can exhibit the best paintings. Hence, in this portfolio the student selects what he or she thinks is representative work. This folder is most often seen at open houses and parent visitations. The most rewarding use of student portfolios is the display of student’s best work, the work that makes them proud. In this case, it encourages self-assessment and build self-esteem to students. The pride and sense of accomplishment that students feel make the effort well worthwhile and contribute to a culture for learning in the classroom 3. Progress Portfolio This third type of portfolio is progress portfolio and it is also known as Teacher Alternative Assessment Portfolio. It contains examples of student’s work with the same types done over a period of time and they are utilized to assess their progress All the works of the students in this type of portfolio are scored, rated, ranked, or evaluated. Teachers can keep individual student portfolios that are solely for the teacher’s use as an assessment tool. This a focused type of portfolio and is a model approach to assessment. Assessment portfolios used to document student learning on specific curriculum outcomes and used to demonstrate the extent of mastery in any curricular area, Uses of Portfolios 1. It can provide both formative and summative opportunities for monitoring progress toward reaching identified outcomes 2. Portfolios can communicate concrete information about what us expected of students in terms of the content and quality of performance in specific curriculum areas. 3. A portfolio is that they allow students to document aspects of their learning that do not show up well in traditional assessments 4. Portfolios are useful to showcase periodic or end of the year accomplishment of students such as in poetry, reflections on growth, samples of best works, etc. 5. Portfolios may also be used to facilitate communication between teachers and parents regarding their child’s achievement and progress in a certain period of time. 6. The administrator may use portfolios for national competency testing to grant high school credit, to evaluate education programs. 7. Portfolios may be assembled for combination of purposes such as instructional enhancement and progress documentation. A teacher reviews students portfolios periodically and make notes for revising instruction for next year used. According to Mueller (2010) there are seven steps in developing portfolios of students. Below are the discussions of each step. 1. 2. 3. 4. Purpose: What is the purposes of the portfolio? Audience: For what audience will the portfolio be created? Content: What samples of student work will be included? Process: What processes (e.g. selection of work to be included, reflection in work, conferencing) will be engaged in during the development of the portfolio? 5. Management: How will time and materials be managed in the development of the portfolio? 6. Communication: How and when will the portfolio be shared with pertinent audiences? 7. Evaluation: If the portfolio is to be used for evaluation, when and how should it be evaluated? Guidelines for Assessing Portfolios 1. 2. 3. 4. 5. Include enough documents (items) on which to base judgment Structure the contents to provide scorable information Develop judging criteria and a scoring scheme fir raters to use in assessing the portfolios Use observation instruments such as checklists and rating when possible to facilitate scoring. Use trained evaluators or assessors GUIDANCE AND COUNSELING Guidance and Counseling are both process to solve problems of life, they differ only on the approach used. In guidance the client’s problems are listened carefully and readymade solutions are provided by the experts. While in counseling the client’s problem are discussed and relevant information are provided inbetween. Through these information, the client will gain an insight to the problem and become empowered to take his own decision. Guidance Counselor assist each student to benefit from the school experience through attention to their personal, social and academic needs. Guidance (Downing) as pointed out by Lao (2006) is an organized set of specialized services established as an integral part of the school environment designed to promote the development of students and assist them toward a realization of sound, wholesome adjustment and maximum accomplishment commensurate with their potentialities. Guidance (Good) is a process id dynamic interpersonal relationship designed to influence the attitude and subsequent behavior of the person. Counseling is both process and relationship. It is a process by which concentrated attention is given by both counselor and counselee to the problems and concerns of the students in a setting of privacy, warmth, mutual acceptance and confidentiality. As a process it utilizes appropriate tools and procedure which contribute to experience. Counseling is also a relationship characterized by trust, confidence and intimacy in which the students gains intellectual and emotional stability from which he can resolve difficulties, make plans and realize greatest self-fulfillment. Villar (2207) pointed out the different guidance services based from Rules and Regulations of Republic Act 9258, Rule 1, Section 3 Manila standard, 2007) and other services not mentioned in Rules and Regulations 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Individual inventory/ analysis Information Counseling Research Placement Referral Follow-up Evaluation Consultation Program development Public relations Roles of the Guidance Counselor There are 5 roles of the guidance counselor are discussed by Dr. Imelda V.G. Villar in her book ―implementing a comprehensive Guidance and Counseling Programs in the Philippines (2007) 1. 2. 3. 4. 5. As Counselor As Coordinator As Consultant\ As Conductor of Activities As Change Agent Essential Elements of Counseling Process 1. 2. 3. 4. 5. 6. 7. Anticipating the interview Developing a positive working relationship Exploring feelings and attitudes Reviewing and determining present status Exploring alternatives Reading decision Post counseling contact Techniques and Methodologies used in the Guidance Process 1. 2. 3. 4. 5. Autobiography Anecdotal record Case study Cumulative record Interview 6. 7. 8. 9. Observation Projective techniques Rating scale Sociometry Ethical Consideration of the Counselor 1. 2. 3. 4. Counselor’s responsibility to the client and to his family Recognize the boundaries of their competence and their own personal and professional limitations Confidentiality Imposition of one’s values and philosophy of life on the client is considered unethical. Four Important Functions of Guidance Services 1. Counseling Individual counseling Small group counseling Crisis counseling Career counseling Referrals Peer helping programs 2. Prevention Primary, secondary, tertiary plans and programs Individual assessments coordinated student support team activities Students activities Transitional planning REFERENCES Acero, V.O et, al., (2000). Principles and strategies of learning. Manila: Rex Book Store. Calderon, J.F and Expectation, C.G (1993). Measurement and evaluation. Manila: Solares printing Press. Clamorin, L.P (1984). Educational measurement and evaluation. Metro Manila: National Book Store, Inc. Garcia, C.D (2004). Educational measurement and evaluation. Mandaluyong City: Books, Atbp. Publishing Corp. Hopkins, C.D et, al., (1990). Classroom measurement and evaluation. Illinois: F.E. Peacoock Publisher, Inc. Mercado-Del Rosario, A.C. (2001). Educational measurement and evaluation. Manila: JNPM Design and Printing. McMillan, J.H (2004). Classroom assessment: principles and practice foe effective instruction. Boston: Pearson Education, Inc. Navarro, R.L et. al., (2013). Authentic assessment of student learning outcomes: Assessment of learning 2. Quezon City, Metro Manila Philippines: LORIMAR Publishing INC. Seng, T.O et al., (2003). Educational Psychology: A Practioner-Researcher Approach. USA: Thomson Asia Pub. Ltd. Tilestone, D.W. (2004). Student assessment. California: Corwin Press. Internet site: http//jonathan,Mueller,faculty.noctrl.edu http://www1.udel.edu/educ/gottfredson/451/unit11-chap15.htm https://www.google.com/search?q=uses+of+marks+and+grades+in+assessment&oq=U SES+OF+MARKS+AND+GRADES&aqs=chrome.