Centre for Research in Applied Measurement

advertisement
“How You Can Learn To Love Large-Scale Assessment:
Let Me Count the Ways”
An Outline For Our Future At The University of Alberta
Dr. Mark Gierl, Professor and Canada Research Chair
Centre for Research in Applied Measurement and Evaluation
University of Alberta
Presentation at the Centre for Teaching and Learning (CTL) “Teaching Big” Symposium
University of Alberta—August, 2012
TO BEGIN…
• Educational measurement is a discipline and a profession focused on the
use of methodologies for assigning test scores to examinees, typically on a
numeric scale, so we can make inferences about their knowledge, skills,
and competencies
• Once a static and largely quantitatively-driven field, recent developments
in the learning sciences, mathematical statistics, computer technology,
educational psychology, and computing science are creating profound
changes in educational measurement—as a result, our contemporary
assessments barely resemble their predecessors of decade ago
Centre for Research in Applied Measurement and Evaluation
OVERVIEW
BACKGROUND
• Measurement, Evaluation, and Cognition (MEC) Program in the
Department of Educational Psychology
• Centre for Research in Applied Measurement and Evaluation (CRAME)
PRESENTATION
• Four principles of testing in large classrooms
• Two applications for putting principles into practice
• Plea for our collective future
• My presentation today will have four key messages
Centre for Research in Applied Measurement and Evaluation
OVERVIEW
• Measurement, Evaluation, and Cognition (MEC) is 1 or 8 areas in the
Department of Educational Psychology
• Graduate students (16 currently) who receive an MEd or PhD in MEC
specialize in educational measurement, statistics, research methods,
cognition applied to assessment, and/or program evaluation
• Our graduates work in the private sector at testing companies like the
Educational Testing Service (ETS) or in the public sector for different
agencies (e.g., Alberta Education; Medical Council of Canada)
• MEC has five faculty members: Drs. Mark Gierl, Jacqueline Leighton, Ying
Cui, Cheryl Poth, and Sharla King
• The Centre for Research in Applied Measurement and Evaluation (CRAME)
is a centre within MEC focused on conducting research in the areas of
educational measurement, cognitive psychology , and statistics with the
goal of making assessment an integral part of learning and instruction
Centre for Research in Applied Measurement and Evaluation
OVERVIEW
MESSAGE #1: Educational measurement is a specialized
discipline where you can earn a graduate degree at both the
MEd and PhD levels—this indicates that testing is embedded in
a discipline that requires rigorous and comprehensive training
MESSAGE #2: You have colleagues at the University of Alberta
who actually love to talk about tests and who train graduate
students who also like and excel in our discipline [resources exist
on campus]
Centre for Research in Applied Measurement and Evaluation
“TESTING TIPS BY MARK”
HOW TO MAKE A GOOD MULTIPLE-CHOICE TEST ITEM
The item measures specific content, as outlined in the test specifications.
The item is based on important topic in the curriculum and is designed to measure key thinking
and problem-solving skills.
The item is carefully edited, formatted, and presented using correct grammar, punctuation,
capitalization, and spelling.
The central idea in included in the stem, not the options.
The stem of the item is worded positively, and avoids negatives such as NOT or EXCEPT.
Only one of the options is clearly correct.
The correct option is not cued due to item writing errors such as presenting a conspicuous
correct options or blatantly incorrect options.
All of the distractors are plausible (e.g., basing distractors on typical errors made by students)
Etc., etc., etc., etc., etc., etc….
Centre for Research in Applied Measurement and Evaluation
OUR FOUR PRINCIPLES
PRINCIPLE #1: We will shift from infrequent summative assessments (e.g., 2
midterms + final) to more frequent formative assessment (e.g., 8-10 exams
or more per term)
PRINCIPLE #2: Testing on-demand is required where students can write
exams at any time and at any location
PRINCIPLE #3: Assessments will be scored immediately and students will
receive both instant and detailed feedback on their overall performance as
well as their problem-solving strengths and weaknesses
PRINCIPLE #4: You will spend less time and less effort implementing these
principles in your large classes compared to the amount of time you
currently spend on assessment-related activities—in fact, much less
Centre for Research in Applied Measurement and Evaluation
COMPUTED-BASED TESTING
APPLICATION #1:
COMPUTER-BASED TESTING
Centre for Research in Applied Measurement and Evaluation
PAPER-BASED TESTING
Test Development
Test Administration
Test Reporting
Centre for Research in Applied Measurement and Evaluation
COMPUTED-BASED TESTING
Centre for Research in Applied Measurement and Evaluation
COMPUTED-BASED TESTING
AUTOMATED
Centre for Research in Applied Measurement and Evaluation
COMPUTED-BASED TESTING
Centre for Research in Applied Measurement and Evaluation
COMPUTED-BASED TESTING
Centre for Research in Applied Measurement and Evaluation
COMPUTED-BASED TESTING
Centre for Research in Applied Measurement and Evaluation
COMPUTED-BASED TESTING
• In short, computer-based testing is a very good thing and it is here to
stay—computer-based testing either eliminates or automates 2/3 of the
testing activities that, currently, you do manually
• Admittedly, we are focusing on examples that use objectively-scored
assessment items—but examples can also be cited for automated essay
scoring of student-produced assessment tasks
• The architecture for a computer-based testing system is feasible
–BASED TESTING IS DEAD]
[PAPER
MESSAGE #3: The University of Alberta needs a computer-based testing
system because YOU need this system for all of your classes, big and small
Centre for Research in Applied Measurement and Evaluation
COMPUTED-BASED TESTING
Test Development
Test Administration
*ELIMINATED*
Test Reporting
*AUTOMATED*
Centre for Research in Applied Measurement and Evaluation
AUTOMATIC ITEM GENERATION
APPLICATION #2:
AUTOMATIC ITEM GENERATION
Centre for Research in Applied Measurement and Evaluation
ONE WAY TO CREATE TEST ITEMS…
Professor writing test items the day
before the midterm exam…
Centre for Research in Applied Measurement and Evaluation
AUTOMATIC ITEM GENERATION

Another way to address this item development challenge is with
automatic item generation (AIG)

Automatic item generation is the process of using item models to
generate test items with the aid of computer technology—with this
approach, hundreds or even thousands of items can be generated with a
single item model

While the idea of automatic item generation may be viewed as a “dream
come true” —I am here to tell you that the dream is well within our reach
because of developments in modern educational measurement theory
Centre for Research in Applied Measurement and Evaluation
A 54-year-old woman has a laparoscopic cholecystectomy. On
post-operative day 3 she has a temperature of 38.5c. Physical
examination reveal a red and tender wound and calf
tenderness. Which one of the following is the best next step?
a. Mobilize
b. Antibiotics
c. Anti coagulation
d. Reopen the wound
AUTOMATIC ITEM GENERATION
Centre for Research in Applied Measurement and Evaluation
AUTOMATIC ITEM GENERATION
• That ugly diagram is a cognitive model highlighting the knowledge, skills,
and content required to make a medical diagnosis
The model includes three key outcomes:
1. Identify THE PROBLEM (i.e., Post-Operative Fever);
2. Specify SOURCES OF INFORMATION required to diagnose the problem
(e.g., Type of Surgery); and
3. Describe KEY FEATURES within each information source (e.g., Guarding
and Rebound) needed to create different instances of the problem
Centre for Research in Applied Measurement and Evaluation
AUTOMATIC ITEM GENERATION
Centre for Research in Applied Measurement and Evaluation
AUTOMATIC ITEM GENERATION
• Next, an item models is created, where an item model is like a template or a
mould of the assessment task (i.e., it’s a target where we want to place the
content in the test item)
A 54-year-old woman has a <TYPE OF SURGERY>. On post-operative day <TIMING OF
FEVER> the patient has a temperature of 38.5c. Physical examination reveal
<PHYSICAL EXAMINATION>. Which one of the following is the best next step?
TYPE OF SURGERY: Gastrectomy, Right Hemicolectomy, Left Hemicolectomy,
Appendectomy, Laparoscopic Cholecystectomy
TIMING OF FEVER: 1 to 6 days
PHYSICAL EXAMINATION: Red and Tender Wound, Guarding and Rebound, Abdominal
Tenderness, Calf Tenderness
Centre for Research in Applied Measurement and Evaluation
AUTOMATIC ITEM GENERATION
• Finally, we combine this information
systematically to produce new items
• To accomplish this complex combinatoric
task, we created software for item
generation called IGOR (Item GeneratOR)
• IGOR was programmed using JAVA
Centre for Research in Applied Measurement and Evaluation
AUTOMATIC ITEM GENERATION
• When we used our method with 5 different item models developed for the
MCC QE Part I in surgery, more than 20,000 items were generated:
Item Model 1: Gallstones—288
Item Model 2: Hernias—256
Item Model 3: Aneurism—5,184
Item Model 4: Post Operation Management—7,488
Item Model 5: Post Operation Fever—7,680
• We have also developed item models at the K-12 levels in Language Arts,
Social, Science, Math as well as AP Biology and Architecture in addition
to 10 different content areas in Medicine producing millions of test items
Centre for Research in Applied Measurement and Evaluation
AUTOMATIC ITEM GENERATION
16. A 60-year-old woman has been booked for a laparoscopic cholecystectomy for symptomatic gallstones. Prior to her surgery,
she presents to the Emergency Department with a history of feeling faint and unwell. She has had rigors. On physical examination,
her temperature is 40 C. Her white blood count is 22 x 109/L; aspartate aminotransferase 63 U/L; alanine aminotransferase 78 U/L;
alkaline phosphatase 450 U/L; amylase level 200 U/L and bilirubin 50 µmol/L. Which one of the following is the most likely
diagnosis?
(a) Cholecystitis.
(b) Cholangitis.
(c) Pancreatitis.
(d) Hepatic abscess.
(e) Duodenal ulcer.
39. An obese 61-year-old male collapsed with sudden pain at a shopping center and is brought to hospital by ambulance. He is
diaphoretic. His pulse is 96/minute; blood pressure 100/70 mm Hg; he complains of severe pain in his abdomen and left flank. Which
one of the following is the most likely diagnosis?
(a) Acute hemorrhagic pancreatitis.
(b) Ruptured aortic aneurysm.
(c) Mesenteric vascular occlusion.
(d) Acute diverticulitis.
(e) Volvulus of sigmoid colon.
Centre for Research in Applied Measurement and Evaluation
CONCLUSION
• Educational measurement is a specialized discipline requiring advanced
graduate training—this implies that assessment contains many complex and
thorny issues but please remember that you have colleagues on-campus
who can help you deal with these issues
• Our discipline is undergoing profound changes that will yield much better
methods for evaluating students while at the same time requiring less time
and effort for the examiner because much of the unpleasant work is being
automated—computer-based testing and automatic item generation are
but two examples from a list of many
MESSAGE #4: There is no going back to the “good old days”…therefore, we
must work together to structure our future at the University of Alberta by
building and implementing these new assessment systems…but also
recognize that this work is just getting started
Centre for Research in Applied Measurement and Evaluation
THANK YOU
Dr. Mark J. Gierl (mark.gierl@ualberta.ca)
6-110 Education Centre North
Centre for Research in Applied Measurement and Evaluation
Download