Evaluation of Knowledge & Learning 4-29

advertisement
SHOWTIME!
EVALUATING
KNOWLEDGE
INTRODUCTION
• KNOWLEDGE IS AN OBJECTIVE OF MOST
PHYSICAL EDUCATION PROGRAMS
• KNOWLEDGE CAN INCREASE ENJOYMENT
AS A SPECTATOR
• KNOWLEDGE IS AN OBJECTIVE IN ADULT
FITNESS AND REHABILITATION PROGRAMS:
WHY FITNESS IS IMPORTANT, HOW TO
DEVELOP AND MAINTAIN FITNESS,
IMPORTANCE OF GOOD DIET, WHY DID AN
INJURY OCCUR, HOW CAN AN INJURY BE
AVOIDED, ETC
PURPOSES OF KNOWLEDGE TESTS
• ASSIGNING A GRADE OR SUMMATIVE
EVALUATION
• MEASURING PROGRESS OR FORMATIVE
FEEDBACK
• PROVIDING FEEDBACK TO STUDENTS OR
PROGRAM PARTICIPANTS AS TO THEIR
STATUS AND WHAT THE CLASS OR
PROGRAM KNOWLEDGE EXPECTATIONS
ARE
• MOTIVATING STUDENTS OR PROGRAM
PARTICIPANTS TO LEARN THE MATERIAL
TESTED
• ASSESSING TEACHING OR INSTRUCTIONAL
EFFECTIVENESS
LEVELS OF KNOWLEDGE
THE FIRST FLOOR LEVELS OF BLOOM’S TAXONOMY
TYPES OF KNOWLEDGE TESTS
ESSAY VERSUS OBJECTIVE
• ESSAY TEST - PEOPLE ANSWER EACH
ITEM (QUESTION) WITH WHATEVER
INFORMATION THEY CHOOSE AND
WRITE THEIR ANSWERS IN SENTENCES
• OBJECTIVE TESTS - TRUE-FALSE,
MULTIPLE-CHOICE, AND MATCHING
HAVE POTENTIAL ANSWERS
PROVIDED WITH EACH QUESTION
TYPES OF KNOWLEDGE TESTS
MASTERY VERSUS DISCRIMINATION
• MASTERY TEST - FORMATIVE EVALUATION
WITH CRITERION-REFERENCED STANDARDS
WHICH ARE USED TO DETERMINE
WHETHER INDIVIDUALS HAVE MASTERED
THE MATERIAL (PASS-FAIL, PROFICIENTNON-PROFICIENT); TYPICALLY EASIER
QUESTIONS ASKED
• DISCRIMINATION TEST - SUMMATIVE
EVALUATION WITH NORM-REFERENCED
STANDARDS DESIGNED TO DIFFERENTIATE
AMONG STUDENTS IN TERMS OF
KNOWLEDGE; TYPICALLY HARDER
QUESTIONS ASKED
TEST CONSTRUCTION
• STEP 1: CONSTRUCT A TABLE OF
SPECIFICATIONS
• STEP 2: DECIDE ON NATURE OF TEST OR
TYPE OF QUESTIONS TO BE USED
• STEP 3: CONSTRUCT THE TEST ITEMS
(QUESTIONS)
• STEP 4: DETERMINE THE TEST FORMAT AND
ADMINISTRATIVE DETAILS (INSTRUCTIONS,
NEATLY TYPED, EASY TO READ, ALL ITEM
INFORMATION ON SAME PAGE, TEST
ENVIRONMENT, ETC)
TYPES OF TEST ITEMS
(QUESTIONS)
TRUE-FALSE QUESTIONS
ADVANTAGES
• MANY ITEMS CAN BE ON TEST AS
THEY CAN BE ANSWERED QUICKLY
• EASY AND QUICK TO WRITE
• QUICK TO SCORE
• FACTUAL INFORMATION IS EASILY
TESTED
• STANDARDIZED ANSWER SHEETS CAN
BE USED
DISADVANTAGES
• ONLY FIRST LEVEL OF BLOOM’S
TAXONOMY, BASIC KNOWLEDGE, CAN BE
TESTED
• 50% CHANCE OF GUESSING ANSWER
• EASY FOR A PERSON TO CHEAT
• ENCOURAGES MEMORIZATION RATHER
THAN UNDERSTANDING OF FACTS
• CAN BE AMBIGUOUS
• MAY TEST TRIVIAL INFORMATION
• REQUIRES MORE QUESTIONS TO ENSURE
RELIABILITY
CONSTRUCTION PROCEDURES
•
•
•
•
•
•
•
•
•
•
•
KEEP QUESTION SHORT
USE ONLY A SINGLE CONCEPT IN EACH QUESTION
KEEP VOCABULARY SIMPLE
DO NOT COPY STATEMENTS DIRECTLY FROM THE TEXT
WHEN POSSIBLE STATE THE ITEMS POSITIVELY RATHER
THAN NEGATIVELY
AVOID WORDS LIKE ALWAYS, ALL, NEVER, OR NONE
DO NOT ALLOW MORE THAN 60% OF THE ITEMS TO HAVE THE
SAME ANSWER
AVOID LONG STRINGS OF ITEMS TO HAVE THE SAME ANSWER
AVOID PATTERNS IN THE ANSWERS
DO NOT GIVE CLUES IN ONE ITEM TO ANOTHER ITEM
AVOID INTERDEPENDENT TERMS IN ITEMS
MULTIPLE-CHOICE ITEMS
(QUESTIONS)
ADVANTAGES
• MANY ITEMS CAN BE ON TEST AS
THEY CAN BE ANSWERED QUICKLY
• QUICKLY SCORED
• ALL LEVELS OF BLOOM’S TAXONOMY
(KNOWLEDGE, COMPREHENSION,
APPLICATION, ETC) CAN BE TESTED
• DECREASES CHANCE OF GUESSING
CORRECTLY
• STANDARDIZED ANSWER SHEETS CAN
BE USED
DISADVANTAGES
• FEWER ITEMS CAN BE ASKED THAN
WITH TRUE-FALSE
• TAKES TIME TO THINK OF GOOD
DISTRCTOR RESPONSES
• SOME DANGER TO CHEATING
• TO SOME EXTENT, ENCOURAGES
MEMORIZATION WITHOUT
UNDERSTANDING IMPLICATIONS
• PEOPLE ARE UNABLE TO
DEMONSTRATE THE EXTENT OF THEIR
KNOWLEDGE AS THEY CAN ONLY
RESPOND TO THE ITEMS WRITTEN
CONSTRUCTION
• KEEP STEMS & RESPONSES SHORT
• MAKE ALL RESPONSES APPROXIMATELY
THE SAME LENGTH
• USE APPARENTLY ACCEPTABLE ANSWERS
FOR ALL RESPONSES
• USE 3-5 RESPONSES FOR EACH STEM
• IF STEM IS INCOMPLETE SENTENCE,
RESPONSE SHOULD COMPLETE SENTENCE
• DO NOT GIVE AWAY THE ANSWER WITH
ENGLISH USAGE
• DO NOT GIVE AWAY THE ANSWER TO ONE
ITEM IN THE CONTENT OF ANOTHER ITEM
CONSTRUCTION
• DO NOT ALLOW THE ANSWER OF ONE ITEM
TO DEPEND ON THE ANSWER TO ANOTHER
ITEM
• DO NOT CONSTRUCT A STEM THAT
SOLICITS A PERSON’S OPINION
• USE LETTERS (A, B, C, ETC) TO ENUMERATE
RESPONSES TO NUMBERED QUESTIONS
• TRY TO EQUALLY USE EACH LETTER AS THE
CORRECT RESPONSE
• WHEN POSSIBLE, STATE THE STEM
POSITIVELY RATHER THAN NEGATIVELY
MATCHING ITEMS
(QUESTIONS)
ADVANTAGES
• SAVES SPACE (AND TREES) BY GIVING
THE SAME POTENTIAL ANSWERS FOR
SEVERAL ITEMS
• LOWERS THE ODDS OF GUESSING
CORRECTLY
• QUICKER TO CONSTRUCT THAN
MULTIPLE CHOICE ITEMS
• STANDARDIZED ANSWER SHEETS CAN
BE USED IF THERE ARE 5 OR LESS
RESPONSES
DISADVANTAGES
• SIMILAR TO TRUE-FALSE ITEMS,
USUALLY ONLY TESTS FACTUAL
INFORMATION (LOWEST LEVEL OF
BLOOM’S TAXONOMY)
• STANDARDIZED ANSWER SHEETS CAN
NOT BE USED IF THERE ARE MORE
THAN FIVE RESPONSES
CONSTRUCTION
• STATE THE ITEMS AND POTENTIAL ANSWERS
CLEARY AND SUCCINTLY
• NUMBER ITEMS AND LETTER POTENTIAL ANSWERS
• KEEP ALL ANSWERS AND ITEMS ON THE SAME
PAGE
• MAKE ALL ITEMS SIMILAR IN CONTENT
• PROVIDE MORE ANSWERS THAN ITEMS TO
PREVENT PEOPLE FROM DEDUCING ANSWERS BY
ELIMINATION
• IN DIRECTIONS, INDICATE WHETHER OR NOT THE
ANSWER CAN BE USED MORE THAN ONCE
• HAVE SEVERAL POTENTIAL ANSWERS FOR EACH
ITEM
• IF MORE THAN 5 RESPONSES EXIST, ARRANGE
POTENTIAL ANSWERS IN LOGICAL GROUPINGS
(E.G., NUMERICAL ANSWERS TOGETHER, DATES
TOGETHER, ETC)
SHORT-ANSWER AND
ESSAY ITEMS (QUESTIONS)
ADVANTAGES
• STUDENTS ARE FREE TO ANSWER ESSAY
ITEMS IN THE WAY THAT SEEMS BEST TO
THEM
• STUDENTS CAN DEMONSTRATE THE DEPTH
OF THEIR KNOWLEDGE
• ENCOURAGES STUDENTS TO RELATE ALL
THE MATERIAL TO A TOTAL CONCEPT
RATHER THAN JUST LEARN THE FACTS
• ITEMS ARE EASY AND QUICK TO
CONSTRUCT
• ALL LEVELS OF BLOOM’S TAXONOMY CAN
BE TESTED
DISADVANTAGES
• TIME-CONSUMING TO GRADE
• OBJECTIVITY OF TEST SCORES ARE OFTEN
LOW
• RELIABILITY AND HENCE VALIDITY OF
TEST SCORE ARE OFTEN LOW
• ESSAY ITEMS REQUIRE SOME SKILL IN SELF
EXPRESSION, WHICH IF IT IS NOT AN
INSTRUCTIONAL OBJECTIVE, VALIDITY
MAY BE FURTHER LOWERED DUE TO LACK
OF RELEVANCY
• PENMANSHIP AND NEATNESS AFFECT
GRADES, WHICH AGAIN LOWERS THE
VALIDITY
• THE HALO AFFECT IS PRESENT
CONSTRUCTION
• STATE THE ITEM AS CLEARLY AND
CONCISELY AS POSSIBLE
• NOTE ON THE TEST THE APPROXIMATE
TIME STUDENTS SHOULD SPEND ON EACH
ITEM
• NOTE ON THE TEST THE POINT VALUE FOR
EACH ITEM
• CAREFULLY KEY THE TEST BEFORE
ADMINISTRATION WHICH WILL HELP
IDENTIFY AMBIGUOUS ITEMS AND IMPROVE
OBJECTIVITY (AND HENCE RELIABILITY
AND VALIDITY) IN THE GRADING
ADMINISTRATION OF TEST
• TEST SETTING SHOULD BE QUIET,
WELL LIGHTED, PROPERLY HEATED,
ODOR-FREE, SPACIOUS, AND
COMFORTABLE
• STUDENTS SHOULD FACE THE SAME
DIRECTION AND BE SPACED OUT
• MAY WANT TO CONSIDER PARALLEL
TESTS IF TESTING MORE THAN ONE
CLASS; DIFFICULT AND TIME
CONSUMING TO CONSTRUCT SIMILAR
EXAMS THAT TEST THE SAME
CONTENT
SCORING PROCEDURES
• OBJECTIVE EXAMS
- FAST TO GRADE
- USE COMPUTER OR LAYOVER KEY
FOR GRADING
• ESSAY EXAMS
– USE KEY
– REMOVE STUDENT’S NAME
– GRADE EACH QUESTION FOR ALL
STUDENTS BEFORE GRADING THE NEXT
QUESTION FOR ALL STUDENTS TO HELP
EXAM’S OBJECTIVITY
ANALYSIS AND REVISION
•
•
•
•
•
OVERALL DIFFICULTY
VARIABILITY IN TEST SCORES
RELIABILITY
THE DIFFICULTY OF EACH ITEM
THE DISCRIMINATION, OR VALIDITY,
OF EACH ITEM
• QUALITY OF EACH RESPONSE IN A
MULTIPLE-CHOICE ITEM
DIFFICULTY AND VARIABILITY
• MEAN REFLECTS OVERALL
DIFFICULTY
• HIGHER THE MEAN THE EASIER THE
TEST AND VICE-VERSA
• STANDARD DEVIATION REFLECTS
VARIABILITY IN TEST SCORES
• LARGER THE STANDARD DEVIATION,
THE MORE RELIABLE THE TEST AND
THE MORE THE TEST DISCRIMINATES
BETWEEN ABILITY
RELIABILITY
• RELIABILITY OF TEST SCORES IS
USUALLY ESTIMATED USING EITHER
THE KUDER-RICHARDSON OR
COEFFICIENT ALPHA METHOD (PP.
453-455)
ITEM ANALYSIS
• USED TO DETERMINE THE DIFFICULTY
AND VALIDITY OF THE ITEMS
(QUESTIONS) AND THE EFFICIENCY OF
RESPONSES
• INCLUDES
– ITEM DIFFICULTY
– DISCRIMINATION INDEX
– RESPONSE QUALITY
ITEM DIFFICULTY
• THE PERCENTAGE OF PEOPLE WHO CHOSE
THE RIGHT ANSWER
• IT IS LARGE WHEN THE TEST IS EASY AND
SMALL WHEN THE TEST IS HARD
ITEM DIFFICULTY
ITEM DIFFICULTY
ITEM DIFFICULTY
DISCRIMINATION INDEX (r)
• ITEM VALIDITY, OR ITEM DISCRIMINATION,
INDICATES HOW WELL A TEST ITEM
DISCRIMINATES BETWEEN THOSE WHO
PERFORMED WELL AND THOSE WHO DID POORLY
• POSITIVE DISCRIMINATION INDEX (r) - AN ITEM IS
ANSWERED CORRECTLY BY MORE OF THE BETTER
PERFORMERS THAN THE WORSE PERFORMERS
• NEGATIVE DISCRIMINATION INDEX (r) - AN ITEM IS
ANSWERED CORRECTLY BY MORE OF THE WORSE
PERFORMERS THAN THE BETTER PERFORMERS
• DISCRIMINATION INDEX (r) RANGES FROM -1 TO +1
• POSITIVE DISCRIMINATION IS DESIRABLE
• GENERALLY ITEMS WITH A DIFFICULTY OF ABOUT
.50 RESULTS IN ITEMS THAT THAT HAVE A GOOD
POSITIVE DISCRIMINATION INDEX
DISCRIMINATION INDEX
RESPONSE QUALITY
• IDEALLY, EACH RESPONSE OF A
MULTIPLE-CHOICE ITEMS SHOULD BE
SELECTED BY AT LEAST SOME OF THE
STUDENTS TAKING THE TEST
ITEM ANALYSIS
• VERY TIME CONSUMING TO DO BY
HAND
• THEREFORE, A COMPUTER IS
GENERALLY NEEDED TO DO AN ITEM
ANALYSIS FOR EACH QUESTION (ITEM)
ON A TEST
• A COMPROMISE WOULD BE TO DO AN
ITEM ANALYSIS BY HAND ON
RANDOMLY SELECTED QUESTIONS OR
QUESTIONS WHICH MAY APPEAR TO
BE POOR OR HAVE PROBLEMS
REVISING THE TEST
• AFTER CALCULATING THE
DIFFICULTY OF AND DISCRIMINATION
INDEX FOR EACH ITEM, THE OVERALL
QUALITY OF TEST AND OF EACH ITEM
MUST BE DETERMINED SO THAT THE
TEST CAN BE REVISED AS NECESSARY
STANDARDS FOR TEST REVISION
QUESTIONNAIRES
• FOLLOWS PROCEDURES AND STRATEGIES
VERY SIMILAR TO THOSE OF KNOWLEDGE
TESTS
• BELIEFS, PRACTICES, ATTITUDES,
KNOWLEDGE, INSTRUCTOR AND/OR
COURSE EVALUATION, PARTICIPANT
EVALUATION OF EXERCISE PROGRAM,
PARTICIPANT RECALL OF EXERCISE
ADHERENCE, BARRIERS TO EXERCISE,
ATTITUDES TOWARD EXERCISE, TENSION
REDUCTION, KNOWLEDGE ABOUT BENEFITS
OF EXERCISE, SUBSTANCE ABUSE, ETC ARE
OFTEN EXAMINED USING QUESTIONNAIRES
FACTORS AFFECTING SUCCESS OF
QUESTIONNAIRES (I.E., COMPLETION AND
RETURN OF QUESTIONNAIRES)
•
•
•
•
•
•
•
COVER LETTER
TIMING
APPEARANCE
FORM
LENGTH
CONTENT
DEMOGRAPHIC INFORMATION AT END OF
QUESTIONNAIRE
QUESTIONNAIRES
• MINIMUM DATA ANALYSIS IS
FREQUENCY COUNTS FOR THE
RESPONSES TO EACH ITEM (QUESTION)
• OFTEN EACH OF THE DEMOGRAPHIC
ITEMS (E.G., MALE OR FEMALE) IS CROSS
TABULATED WITH EACH OF THE
NON-DEMOGRAPHIC ITEMS TO SEE IF
DIFFERENT CLASSIFICATIONS OF PEOPLE
RESPONDED DIFFERENTLY TO THE NONDEMOGRAPHIC QUESTIONS
QUESTIONS OF COMMENTS??
THANK YOU!!
Download