Eliminating Bias from Evaluation Instruments

advertisement
Capturing Useful Assessment Data:
Eliminating Unintended Cognitive Bias
from your Evaluation Instruments
Conference Session: SES11
2010 ACGME Annual Education Conference
Nancy Piro, PhD, Program Manager/Ed Specialist
Ann Dohn, MA, DIO
Department of Graduate Medical Education
Overall Questions







What is assessment?
What is evaluation?
What is it used for?
Why do we evaluate?
How do we construct a useful evaluation?
What is cognitive bias?
How do we eliminate bias from our evaluations?
Defining the Rules of the Game
Defining the Rules of the Game
Assessment - Evaluation: What’s
the difference and what are they
used for?
Assessment is the analysis and use of data by
residents or sub-specialty residents (trainees),
faculty, program directors and/or departments to
make decisions about improvements in teaching
and learning.
Assessment Examples

Example 1: A faculty member provides feedback to a
resident regarding performance on a performed
procedure. The trainee uses that feedback to
study/practice differently in order to improve learning
and performance.

Example 2: Surgical residents provide feedback on
the faculty and program curriculum to the Program
Director which is aggregated and analyzed and then
used to make improvements in the General Surgery
program.
Assessment - Evaluation: What’s
the difference and what are they
used for?
Evaluation is the analysis and use of data by
faculty to make judgments about trainee
performance.
–
Evaluation includes obtaining accurate performance
based, empirical information which is used to make
competency decisions on trainees across the six
domains.
Evaluation Examples

Example 1: A trainee delivers an oral presentation at
a Journal Club. The faculty member provides a
critique of the delivery and content accompanied by
a rating for the assignment.

Example 2: A program director provides a final
evaluation to a resident accompanied by an
attestation that the resident has demonstrated
sufficient ability and acquired the appropriate clinical
and procedural skills to practice competently and
independently.
Why do we assess and evaluate?
(Besides the fact it is required…)






Demonstrate and improve trainee competence in core and
related competency areas - Knowledge and application
Ensure our programs produce graduates, each of whom:
“has demonstrated sufficient ability and acquired the
appropriate clinical and procedural skills to practice
competently and independently.”
Track the impact of curriculum/organizational change and
gain feedback on program, curriculum and faculty
effectiveness
Provide residents/fellows a means to communicate
confidentially
Provide an early warning system
Identify gaps between competency based goals and
individual performance
So What’s the Game Plan for
Constructing Effective Evaluations ?
How do we construct a useful
evaluation or assessment?
STEP 1. Create the Evaluation (Plan)
Curriculum (Competency) Goals & Objectives \ Outcomes
Question and Scale Development
STEP 2. Deploy (Do)
Online /In-Person (Paper)
STEP 3. Analyze (Study /Check)
Reporting / Benchmarking and Statistical Analysis
Rank Order / Norms (Within the Institution/National)
STEP 4. Take Action (Act)
Develop & Implement Learning/Action Plans
Measure Progress Against Learning Goals
Adjust Learning/Action Plans
Step 1: Question and Response
Scale Construction
Two Major Goals:

Construct unbiased, focused and nonleading questions that produce valid data

Design and use valid unbiased response
scales
Step 1: Create the
Evaluation-Eliminating Unintended
Cognitive Bias
What is cognitive bias?


Cognitive bias is distortion in the way we
perceive reality / information.
Cognitive response bias is a type of
cognitive bias which can affect the results of
an evaluation if evaluators answer questions
in the way they think they are designed to be
answered, or with a positive or negative bias
toward the evaluatee
Step 1: Create the Evaluation - Where
does response bias occur?
1.
Response bias can be in the raters
themselves: Central Tendency, Similarity Effect, First
Impressions, Halo Effect, Devil Effect
2.
Response bias most occurs most often in
the wording of the question.
- Response bias is present when a question
contains a leading phrase or words.
3.
Response bias can also occur in the rating
scales.
Response Bias in the Raters/Evaluators Beware the Halo Effect


The halo effect refers to a type of cognitive bias where
the perception of a particular behavior or trait is
positively influenced by the perception of the former
positive traits in a sequence of interpretations.
Thorndike (1920) was the first to support the halo effect
with empirical research.
–
“People seem not to think of other individuals in mixed terms;
instead we seem to see each person as roughly good or
roughly bad across all categories of measurement.”
The Halo Effect and Expectations

The halo effect is evident in Kelley's implicit
personality theory
–
“the first traits we recognize in other people
influence our interpretation and perception of later
ones because of our expectations…”
The Halo Effect Extends to Products
and Marketing Efforts
Copyright © 2010 Apple Inc.
The iPod has had positive effects on perceptions of Apple’s other products…
Could this impact our evaluations ?
GME HouseStaff Survey 2009
24.4%
15.6%
A majority of trainees believed that:
“ The general feeling in my program is that
your ability will be labeled based on your
initial performance.”
54.7%

Reverse Halo Effect

A corollary to the halo effect is the “reverse halo
effect” (devil effect)
–
Individuals or brands which are seen to have a single
undesirable trait are later judged to have many poor
traits…
- i.e., a single weak point (showing up late, for
example) influences others' perception of the
person or brand
Blind Spots

In the 1970s, the social psychologist Richard
Nisbett demonstrated that we typically have
no awareness of when the halo effect
influences us (Nisbett, R.E. and Wilson, T.D.,
1977)

The problem with Blind Spots is that we
are blind to them…
Step 1: Create the Evaluation
Question Construction - Exercise One

Review each question (Handout) and share
your thinking of what makes it a good or bad
question.
Question Construction: Exercise 1

Example (1):
–

Example (2):
–

“Incomplete, inaccurate medical interviews, physical
examinations; incomplete review and summary of other data
sources. Fails to analyze data to make decisions; poor clinical
judgment.”
Example (4):
–

“Sufficient career planning resources are available to me and
my program director supports my professional aspirations .”
Example (3):
–

"I can always talk to my Program Director about residency
related problems.”
"Communication in my sub-specialty program is good.“
Example (5):
–
"The pace on our service is chaotic."
Question Construction - Test
Your Knowledge

Example 1: "I can always talk to my Program
Director about residency related problems."
Problem: Terms such as "always" and
"never" will bias the response in the opposite
direction.

Result: Data will be skewed.

Question Construction - Test
Your Knowledge



Example 2: “Career planning resources are
available to me and my program director
supports my professional aspirations."
Problem: Double-barreled ---resources and
aspirations… Respondents may agree with
one and not the other. Researcher cannot
make assumptions about which part of the
question respondents were rating.
Result: Data is useless.
Question Construction - Test
Your Knowledge

Example 3: "Communication in my subspecialty program is good."

Problem: Question is too broad. If score is less
than 100% positive, researcher/evaluator still
does not know what aspect of communication
needs improvement.

Result: Data is of little or no usefulness.
Question Construction - Test Your
Knowledge

Example 4: “Evidences incomplete, inaccurate medical
interviews, physical examinations; incomplete review
and summary of other data sources. Fails to analyze
data to make decisions; poor clinical judgment.”

Problem: Multi-barreled ---Respondents may need to
agree with some and not the others. Evaluator cannot
make assumptions about which part of the question
respondents were rating.

Result: Data is useless.
Question Construction - Test
Your Knowledge

Example (5):
–
"The pace on our service is chaotic.“

Problem: The question is negative, and
broadcasts a bad message about the
rotation/program.

Result: Data will be skewed, and the
climate may be negatively impacted.
Evaluation Question
Design Principles
Avoid ‘double-barreled’ or multi-barreled
questions. Eliminate them from your
evaluations.

A multi-barreled question combines two or
more issues or “attitudinal objects” in a single
question.

More examples….
Avoiding Double-Barreled Questions

Example: COMPETENCY 1 – Patient Care
“Resident provides sensitive support to patients
with serious illness and to their families, and
arranges for on-going support or preventive
services if needed.”
Evaluation Question Design
Principles
Combining the two or more questions into one question
makes it unclear which object attribute is being measured,
as each question may elicit a different perception of the
resident’s performance.
RESULT
•Respondents are confused and results are confounded
leading to unreliable or misleading results
Tip: If the word “and” or the word “or” appears in a
question, check to verify whether it is a double-barreled
question.
Evaluation Question
Design Principles


Avoid questions with double negatives…
When respondents are asked for their agreement
with a negatively phrased statement, double
negatives can occur.
–

Example:
Do you agree or disagree with the following statement?
Attendings should not be required to supervise their
residents during night call.”
Evaluation Question
Design Principles



If you respond that you disagree, you are
saying you do not think attendings should not
supervise residents. In other words, you
believe that attendings should supervise
residents.
Phrase the questions positively if possible.
If you do use a negative word like “not”,
consider highlighting the word by underlining or
bolding it to catch the respondent’s attention.
Evaluation Question
Design Principles



Because every question is measuring
“something”, it is important for each to be clear
and precise.
Remember…Your goal is for each respondent
to interpret the meaning of each survey
question in exactly the same way.
Pre-testing questions is definitely
recommended!
Evaluation Question
Design Principles



If your respondents are not clear on what is
being asked in a question, their responses may
result in data that cannot or should not be
applied to your evaluation results…
Example: "For me, further development of my
medical competence, it is important enough to
take risks"
Does this mean to take risks with patient
safety, risks to one's pride, or something
else?
Evaluation Question
Design Principles




Keep questions short.
Long questions can be confusing.
Bottom line: Focus on short, concise, clearly
written statements that get right to the point,
producing actionable data that can inform
individual learning plans (ILPs).
Short concise questions take only seconds
to respond to and are easily interpreted.
Evaluation Question
Design Principles


Do not use “loaded” or “leading” questions
A loaded or leading question biases the
response the respondent gives. A loaded
question is one that contains loaded words.
–
For example: “I’m concerned about doing a
procedure if my performance would reveal that I
had low ability”
Disagree
Agree
Evaluation Question
Design Principles

"I’m concerned about doing a procedure if
my performance would reveal that I had low
ability"

How can this be answered with “agree or
disagree” if you think you have good abilities
in appropriate tasks for your area?
Evaluation Question
Design Principles

A leading question is phrased in such a way that
suggests to the respondent that a certain answer
is expected:
–
Example: Don’t you agree that nurses should show
more respect to residents and attendings?


Yes, they should show more respect
No, they should not show more respect
Evaluation Question
Design Principles


Do use of Open-Ended Questions
Do use comment boxes after negative ratings
–

To explain the reasoning and target areas for
focus and improvement
General, open-ended questions at the end of
the evaluation can prove very beneficial
–
Often it is found that entire topics have been
omitted from the evaluation that should have been
included.
Exercise 2 “Post Test” / Answers
1. Please rate the general surgery resident’s
communication and technical skills
- Please rate the general resident’s communication skills.
- Please rate the general surgery resident's technical skills.
2. Rate the resident’s ability to communicate with
patients and their families
- Rate the resident’s ability to communicate with patients
- Rate the resident’s ability to communicate with families
Exercise 2 “Post Test” Answers
3. Rate the resident’s abilities with respect to
case familiarization; effort in reading about
patient’s disease process and familiarizing
with operative care and post op care
–
–
–
–
Rate the resident’s ability with respect to case familiarization
Rate the resident’s ability with respect to effort in reading about
patient disease process
Rate the resident’s ability with respect to familiarizing with
operative care
Rate the resident’s ability with familiarizing with post op care
Exercise 2 “Post Test” Answers
4. Residents deserve higher pay for all the
hours they put in, don’t they?
–
To what extent do you agree that residents are
adequately paid?
5. Explains and performs steps required in
resuscitation and stabilization
–
–
Explains steps required in resuscitation
Explains steps required in stabilization following
resuscitation
Exercise 2 “Post Test” Answers
6. Do you agree or disagree that residents shouldn’t have
to pay for their meals when on-call?
–
To what extent do you agree that residents need to pay for their
own on-call meals?
7. Demonstrates an awareness of and responsiveness to
the larger context of health care
–
–
Demonstrates an awareness of the larger context of health care
Demonstrates responsiveness to larger context of health care
8. Demonstrates ability to communicate with faculty and
staff
–
–
Demonstrates ability to communicate with faculty
Demonstrates ability to communicate with staff
Bias in the Rating Scales for
Questions
The scale you
construct can
also skew
your data,
much like we
discussed
about
question
construction
Evaluation Design Principles:
Rating Scales


By far the most popular scale asks
respondents to rate their agreement with the
evaluation questions or statements.
After you decide what you want respondents
to rate (competence, agreement, etc.), you
need to decide how many levels of rating you
want them to be able to make.
Evaluation Design Principles:
Rating Scales


Determine how fine a distinction you want to
be able to make between agreement and
disagreement.
Using too few can give less precise,
cultivated information, while using too many
could make the question hard to read and
answer (do you really need a 9 or 10 point
scale?)
Evaluation Design Principles:
Rating Scales

Psychological research has shown that a 6point scale with three levels of agreement
and three levels of disagreement works best.
An example would be:






Disagree Strongly
Disagree Moderately
Disagree Slightly
Agree Slightly
Agree Moderately
Agree Strongly
Evaluation Design Principles:
Rating Scales



This 6-point scale affords you ample flexibility
for data analysis.
Depending on the questions, other scales
may be appropriate, but the important thing
to remember is that it must be balanced, or
you will build in a biasing factor.
Avoid neutral and neither agree nor
disagree…you’re just giving up 20% of your
evaluation ‘real estate’
Evaluation Design Principles:
Rating Scales – Group Exercise
The scale itself can skew your data, as well.
1. Please rate the variety of patients available to
the program for educational purposes.
Poor Fair Satisfactory Good Very Good
2. Please rate the attendance of your faculty
members at your journal club conferences.
Limited Fair Satisfactory Good Very Good
Evaluation Design Principles:
Rating Scales
3. Knowledge in general medicine.
Poor OK Good Very Good Excellent
The data will be artificially skewed in the positive
direction with this scale because there are far
more (4:1) positive than negative rating
options.
Gentle Words of Wisdom - Avoid large
numbers of questions….


Respondent fatigue – the respondent tends to give
similar ratings to all items without giving much
thought to individual items, just wanting to finish
In situations where many items are considered
important, a large number can receive very similar
ratings at the top end of the scale
–
–

Items are not traded-off against each other
Many items that not at the extreme ends of the scale or
that are considered similarly important are given a similar
rating
Respondents quit answering questions at all…
–
–
Total Started Survey: 578
Total Completed Survey: 473 (81.8%)
Gentle Words of Wisdom Begin with the End Goal in Mind



What do you want as your
outcomes?
Be prepared to put in the
time with pretesting.
The faculty member, nurse,
patient, resident has to be
able to understand the
intent of the question - and
each must find it credible
and interpret it in the same
way
Gentle Words of Wisdom –
Relevancy/Accuracy – Collect data in a
valid and reliable way




If the questions aren’t framed properly, if they
are too vague or too specific, it’s impossible to
get any meaningful data.
Question miswording can lead to skewed data
with little or no usefulness.
Ensure your response scales are balanced and
appropriate.
If you don't plan or know how you are going
to use the data, don't ask the question!
Gentle Words of Wisdom
Continued…

If you are using aggregated data, the statistical
analyses must be appropriate for your
evaluation or, however sophisticated and
impressive, the numbers generated that look
real will actually be false and misleading.
–
Are differences really significant given your sample
size?
Summary: Evaluation Do’s and
Don’ts
DO’s


Keep Questions Clear,
Precise and Relatively
Short.
Use a balanced
response scale
–

(4-6 scale points
recommended)
Use open ended
questions
DON’Ts



Do not use Double+
Barreled Questions
Do not use Double
Negative Questions
Do not use Loaded or
Leading Questions.
Applying Workshop ‘Learnings’ –
Exercise 3

Part A.
–
Take a look at the evaluation you brought to this
session.



Identify questions and response scales which may have
unintended bias
Modify questions or response scales to eliminate bias
and share if time permits.
Part B.
–
Review (or set up a review process/panel) your
Program’s or Institutions’ evaluations with the goal
of eliminating any unintentional bias
Ready to Play the Game – and Get
better Results!
Questions
Feel free to contact us:
Ann Dohn, MA – adohn1@stanford.edu
Nancy Piro, PhD – npiro@stanford.edu
Download