Review: Performance-Based Assessments • Performanc-based assessment • Real-life setting • H.O.T.S. • Techniques: • • • • • Observation Individual or Group Projects Portfolios Performances Student Logs or Journals • Developing performance-based assessments • Determining the purpose of assessment • Deciding what constitutes student learning • Selecting the appropriate assessment task • Setting performance criteria Review: Grading • Grading process: Objectives of instruction Test selection and administration Results compared to standards • Making grading fair, reliable, and valid • • • • • • • Determine defensible objectives Ability group students Construct tests which reflect objectivity No test is perfectly reliable Grades should reflect status, not improvement Do not use grades to reward good effort Consider grades as measurements, not evaluations Final grades Cognitive Assessments Physical Fitness Knowledge Physical Fitness Knowledge HPER 3150 Dr. Ayers Test Planning • Types Mastery (driver’s license) Meet minimum requirements Achievement (mid-term) Discriminate among levels of accomplishment Table of Specifications (content-related validity) • Content Objectives history, values, equipment, etiquette, safety, rules, strategy, techniques of play • Educational Objectives (Blooms’ taxonomy, 1956) knowledge, comprehension, application, analysis, synthesis, evaluation Table of Specifications for a 33 Item Exercise Physiology Concepts Test (Ask-PE, Ayers, 2003) T of SPECS-E.doc Test Characteristics • When to test • Often enough for reliability but not too often to be useless • How many questions (p. 145-6 guidelines) • More items yield greater reliability • Format to use (p. 147 guidelines) • Oral (NO), group (NO), written (YES) • Open book/note, take-home • Advantages: ↓anxiety, ask more application Qs • Disadvantages: ↓ incentive to prepare, uncertainty of who does work Test Characteristics • Question types • Semi-objective • short-answer • completion • mathematical • Objective • • • • t/f Matching multiple-choice Classification • Essay Semi-objective Questions • Short-answer, completion, mathematical • When to use (factual & recall material) • Weaknesses • Construction Recommendations (p. 151) • Scoring Recommendations (p. 152) Objective Questions • True/False, matching, multiple-choice • When to use (M-C: MOST IDEAL) • FORM7 (B,E).doc • Pg. 160-3: M-C guidelines • Construction Recommendations (p. 158-60) • Scoring Recommendations (p. 163-4) Figure 8.1 The difference between extrinsic and intrinsic ambiguity (A is correct) B B B A A A D D C C D C Too easy Extrinsic ambiguity Intrinsic Ambiguity (weak Ss miss) (all foils = appealing) Cognitive Assessments I • Explain one thing that you learned today to a classmate Review: Cognitive Assessments I • Test types • Mastery Achievement • Table of Specifications • Identify content, assign cognitive demands, weight areas • Provides support for what type of validity? • Questions Types • Semi-objective: short-answer, completion, mathematical • Objective: t/f, match, multiple-choice • Which is desirable: intrinsic/extrinsic ambiguity Essay Questions • When to use (definitions, interpretations, comparisons) • Weaknesses • Scoring • Objectivity • Construction & Scoring recommendations (p. 167-9) Administering the Written Test • Before the Test • During the Test • After the Test Characteristics of “Good” Tests • Reliable • Valid • Average difficulty • Discriminate Gotten correct by more knowledgeable students Missed by less knowledgeable students • Time consuming to write Quality of the Test • Reliability • Role of error in an observed score • Error sources in written tests • • • • • Inadequate sampling Examinee’s mental/physical condition Environmental conditions Guessing Changes in the field (dynamic variable being measured) Quality of the Test • Validity • CONTENT key for written tests • Is critical information assessed by a test? • T of Specs helps support validity • Overall Test Quality • Based on individual item quality (steps 1-8, pg. 175-80) Item Analysis • Used to determine quality of individual test items • Item Difficulty Percent answering correctly • Item Discrimination How well the item "functions“ Also how “valid” the item is based on the total test score criterion Item Difficulty 0 (nobody got right) – 100 (everybody got right) Goal=50% Uc Lc Difficulty * 100 Un Ln Item Discrimination <20% & negative (poor) 20-40% (acceptable) Goal > 40% Uc Lc Discri min ation * 100 Un Figure 8.4 The relationship between item discrimination and difficulty Moderate difficulty maximizes discrimination Sources of Written Tests • Professionally Constructed Tests (FitSmart, Ask-PE) • Textbooks (McGee & Farrow, 1987) • Periodicals, Theses, and Dissertations Questionnaires • • • • • • • • Determine the objectives Delimit the sample Construct the questionnaire Conduct a pilot study Write a cover letter Send the questionnaire Follow-up with non-respondents Analyze the results and prepare the report Constructing Open-Ended Questions • Advantages Allow for creative answers Allow for respondent to detail answers Can be used when possible categories are large Probably better when complex questions are involved • Disadvantages Analysis is difficult because of non-standard responses Require more respondent time to complete Can be ambiguous Can result in irrelevant data Constructing Closed-Ended Questions • Advantages Easy to code Result in standard responses Usually less ambiguous Ease of response relates to increased response rate • Disadvantages Frustration if correct category is not present Respondent may chose inappropriate category May require many categories to get ALL responses Subject to possible recording errors Factors Affecting the Questionnaire Response • Cover Letter Be brief and informative • Ease of Return You DO want it back! • Neatness and Length Be professional and brief • Inducements Money and flattery • Timing and Deadlines Time of year and sufficient time to complete • Follow-up At least once (2 about the best response rate you will get) The BIG Issues in Questionnaire Development • Reliability Consistency of measurement Stability reliability: 2-4 wks between administrations • Validity Truthfulness of response Good items, expert reviewed, pilot testing, confidentiality/anonymity • Representativeness of the sample To whom can you generalize? Cognitive Assessments II Ask for clarity on something that challenged you today