Chapter 1

advertisement
Review: Performance-Based
Assessments
• Performanc-based assessment
• Real-life setting
• H.O.T.S.
• Techniques:
•
•
•
•
•
Observation
Individual or Group Projects
Portfolios
Performances
Student Logs or Journals
• Developing performance-based assessments
• Determining the purpose of assessment
• Deciding what constitutes student learning
• Selecting the appropriate assessment task
• Setting performance criteria
Review: Grading
• Grading process:
Objectives
of instruction
Test selection
and administration
Results compared
to standards
• Making grading fair, reliable, and valid
•
•
•
•
•
•
•
Determine defensible objectives
Ability group students
Construct tests which reflect objectivity
No test is perfectly reliable
Grades should reflect status, not improvement
Do not use grades to reward good effort
Consider grades as measurements, not evaluations
Final
grades
Cognitive Assessments
Physical
Fitness
Knowledge
Physical
Fitness
Knowledge
HPER 3150
Dr. Ayers
Test Planning
• Types
Mastery (driver’s license)
Meet minimum requirements
Achievement (mid-term)
Discriminate among levels of accomplishment
Table of Specifications
(content-related validity)
• Content Objectives
history, values, equipment, etiquette, safety, rules,
strategy, techniques of play
• Educational Objectives (Blooms’ taxonomy, 1956)
knowledge, comprehension, application, analysis,
synthesis, evaluation
Table of Specifications for a 33
Item Exercise Physiology Concepts Test
(Ask-PE, Ayers, 2003)
T of SPECS-E.doc
Test Characteristics
• When to test
• Often enough for reliability but not too often to be useless
• How many questions (p. 145-6 guidelines)
• More items yield greater reliability
• Format to use (p. 147 guidelines)
• Oral (NO), group (NO), written (YES)
• Open book/note, take-home
• Advantages: ↓anxiety, ask more application Qs
• Disadvantages: ↓ incentive to prepare, uncertainty of who does work
Test Characteristics
• Question types
• Semi-objective
• short-answer
• completion
• mathematical
• Objective
•
•
•
•
t/f
Matching
multiple-choice
Classification
• Essay
Semi-objective Questions
• Short-answer, completion, mathematical
• When to use (factual & recall material)
• Weaknesses
• Construction Recommendations (p. 151)
• Scoring Recommendations (p. 152)
Objective Questions
• True/False, matching, multiple-choice
• When to use (M-C: MOST IDEAL)
• FORM7 (B,E).doc
• Pg. 160-3: M-C guidelines
• Construction Recommendations
(p. 158-60)
• Scoring Recommendations (p. 163-4)
Figure 8.1
The difference between extrinsic and intrinsic ambiguity
(A is correct)
B B
B
A
A
A
D D
C
C
D
C
Too easy
Extrinsic
ambiguity
Intrinsic
Ambiguity
(weak Ss miss)
(all foils = appealing)
Cognitive Assessments I
• Explain one thing that you learned today
to a classmate
Review: Cognitive Assessments I
• Test types
• Mastery
Achievement
• Table of Specifications
• Identify content, assign cognitive demands, weight areas
• Provides support for what type of validity?
• Questions Types
• Semi-objective: short-answer, completion, mathematical
• Objective: t/f, match, multiple-choice
• Which is desirable: intrinsic/extrinsic ambiguity
Essay Questions
• When to use (definitions, interpretations, comparisons)
• Weaknesses
• Scoring
• Objectivity
• Construction & Scoring recommendations (p. 167-9)
Administering the Written Test
• Before the Test
• During the Test
• After the Test
Characteristics of “Good” Tests
• Reliable
• Valid
• Average difficulty
• Discriminate
Gotten correct by more knowledgeable students
Missed by less knowledgeable students
• Time consuming to write
Quality of the Test
• Reliability
• Role of error in an observed score
• Error sources in written tests
•
•
•
•
•
Inadequate sampling
Examinee’s mental/physical condition
Environmental conditions
Guessing
Changes in the field (dynamic variable being measured)
Quality of the Test
• Validity
• CONTENT key for written tests
• Is critical information assessed by a test?
• T of Specs helps support validity
• Overall Test Quality
• Based on individual item quality (steps 1-8, pg. 175-80)
Item Analysis
• Used to determine quality of individual test items
• Item Difficulty
Percent answering correctly
• Item Discrimination
How well the item "functions“
Also how “valid” the item is based on the total test score criterion
Item Difficulty
0 (nobody got right) – 100 (everybody got right)
Goal=50%
Uc  Lc
Difficulty 
* 100
Un  Ln
Item Discrimination
<20% & negative (poor) 20-40% (acceptable)
Goal > 40%
Uc  Lc
Discri min ation 
* 100
Un
Figure 8.4
The relationship between item discrimination and difficulty
Moderate difficulty maximizes discrimination
Sources of Written Tests
• Professionally Constructed Tests (FitSmart, Ask-PE)
• Textbooks (McGee & Farrow, 1987)
• Periodicals, Theses, and Dissertations
Questionnaires
•
•
•
•
•
•
•
•
Determine the objectives
Delimit the sample
Construct the questionnaire
Conduct a pilot study
Write a cover letter
Send the questionnaire
Follow-up with non-respondents
Analyze the results and prepare the report
Constructing Open-Ended Questions
• Advantages
Allow for creative answers
Allow for respondent to detail answers
Can be used when possible categories are large
Probably better when complex questions are involved
• Disadvantages
Analysis is difficult because of non-standard responses
Require more respondent time to complete
Can be ambiguous
Can result in irrelevant data
Constructing Closed-Ended Questions
• Advantages
Easy to code
Result in standard responses
Usually less ambiguous
Ease of response relates to increased response rate
• Disadvantages
Frustration if correct category is not present
Respondent may chose inappropriate category
May require many categories to get ALL responses
Subject to possible recording errors
Factors Affecting the Questionnaire Response
• Cover Letter
Be brief and informative
• Ease of Return
You DO want it back!
• Neatness and Length
Be professional and brief
• Inducements
Money and flattery
• Timing and Deadlines
Time of year and sufficient time to complete
• Follow-up
At least once (2 about the best response rate you will get)
The BIG Issues in Questionnaire Development
• Reliability
Consistency of measurement
Stability reliability: 2-4 wks between administrations
• Validity
Truthfulness of response
Good items, expert reviewed, pilot testing,
confidentiality/anonymity
• Representativeness of the sample
To whom can you generalize?
Cognitive Assessments II
Ask for clarity on something
that challenged you today
Download