Eliminating Bias from Evaluation Instruments

Capturing Useful Assessment Data: Eliminating Unintended Cognitive Bias from your Evaluation Instruments Conference Session: SES11 2010 ACGME Annual Education Conference Nancy Piro, PhD, Program Manager/Ed Specialist Ann Dohn, MA, DIO Department of Graduate Medical Education Overall Questions        What is assessment? What is evaluation? What is it used for? Why do we evaluate? How do we construct a useful evaluation? What is cognitive bias? How do we eliminate bias from our evaluations? Defining the Rules of the Game Defining the Rules of the Game Assessment - Evaluation: What’s the difference and what are they used for? Assessment is the analysis and use of data by residents or sub-specialty residents (trainees), faculty, program directors and/or departments to make decisions about improvements in teaching and learning. Assessment Examples  Example 1: A faculty member provides feedback to a resident regarding performance on a performed procedure. The trainee uses that feedback to study/practice differently in order to improve learning and performance.  Example 2: Surgical residents provide feedback on the faculty and program curriculum to the Program Director which is aggregated and analyzed and then used to make improvements in the General Surgery program. Assessment - Evaluation: What’s the difference and what are they used for? Evaluation is the analysis and use of data by faculty to make judgments about trainee performance. – Evaluation includes obtaining accurate performance based, empirical information which is used to make competency decisions on trainees across the six domains. Evaluation Examples  Example 1: A trainee delivers an oral presentation at a Journal Club. The faculty member provides a critique of the delivery and content accompanied by a rating for the assignment.  Example 2: A program director provides a final evaluation to a resident accompanied by an attestation that the resident has demonstrated sufficient ability and acquired the appropriate clinical and procedural skills to practice competently and independently. Why do we assess and evaluate? (Besides the fact it is required…)       Demonstrate and improve trainee competence in core and related competency areas - Knowledge and application Ensure our programs produce graduates, each of whom: “has demonstrated sufficient ability and acquired the appropriate clinical and procedural skills to practice competently and independently.” Track the impact of curriculum/organizational change and gain feedback on program, curriculum and faculty effectiveness Provide residents/fellows a means to communicate confidentially Provide an early warning system Identify gaps between competency based goals and individual performance So What’s the Game Plan for Constructing Effective Evaluations ? How do we construct a useful evaluation or assessment? STEP 1. Create the Evaluation (Plan) Curriculum (Competency) Goals & Objectives \ Outcomes Question and Scale Development STEP 2. Deploy (Do) Online /In-Person (Paper) STEP 3. Analyze (Study /Check) Reporting / Benchmarking and Statistical Analysis Rank Order / Norms (Within the Institution/National) STEP 4. Take Action (Act) Develop & Implement Learning/Action Plans Measure Progress Against Learning Goals Adjust Learning/Action Plans Step 1: Question and Response Scale Construction Two Major Goals:  Construct unbiased, focused and nonleading questions that produce valid data  Design and use valid unbiased response scales Step 1: Create the Evaluation-Eliminating Unintended Cognitive Bias What is cognitive bias?   Cognitive bias is distortion in the way we perceive reality / information. Cognitive response bias is a type of cognitive bias which can affect the results of an evaluation if evaluators answer questions in the way they think they are designed to be answered, or with a positive or negative bias toward the evaluatee Step 1: Create the Evaluation - Where does response bias occur? 1. Response bias can be in the raters themselves: Central Tendency, Similarity Effect, First Impressions, Halo Effect, Devil Effect 2. Response bias most occurs most often in the wording of the question. - Response bias is present when a question contains a leading phrase or words. 3. Response bias can also occur in the rating scales. Response Bias in the Raters/Evaluators Beware the Halo Effect   The halo effect refers to a type of cognitive bias where the perception of a particular behavior or trait is positively influenced by the perception of the former positive traits in a sequence of interpretations. Thorndike (1920) was the first to support the halo effect with empirical research. – “People seem not to think of other individuals in mixed terms; instead we seem to see each person as roughly good or roughly bad across all categories of measurement.” The Halo Effect and Expectations  The halo effect is evident in Kelley's implicit personality theory – “the first traits we recognize in other people influence our interpretation and perception of later ones because of our expectations…” The Halo Effect Extends to Products and Marketing Efforts Copyright © 2010 Apple Inc. The iPod has had positive effects on perceptions of Apple’s other products… Could this impact our evaluations ? GME HouseStaff Survey 2009 24.4% 15.6% A majority of trainees believed that: “ The general feeling in my program is that your ability will be labeled based on your initial performance.” 54.7%  Reverse Halo Effect  A corollary to the halo effect is the “reverse halo effect” (devil effect) – Individuals or brands which are seen to have a single undesirable trait are later judged to have many poor traits… - i.e., a single weak point (showing up late, for example) influences others' perception of the person or brand Blind Spots  In the 1970s, the social psychologist Richard Nisbett demonstrated that we typically have no awareness of when the halo effect influences us (Nisbett, R.E. and Wilson, T.D., 1977)  The problem with Blind Spots is that we are blind to them… Step 1: Create the Evaluation Question Construction - Exercise One  Review each question (Handout) and share your thinking of what makes it a good or bad question. Question Construction: Exercise 1  Example (1): –  Example (2): –  “Incomplete, inaccurate medical interviews, physical examinations; incomplete review and summary of other data sources. Fails to analyze data to make decisions; poor clinical judgment.” Example (4): –  “Sufficient career planning resources are available to me and my program director supports my professional aspirations .” Example (3): –  "I can always talk to my Program Director about residency related problems.” "Communication in my sub-specialty program is good.“ Example (5): – "The pace on our service is chaotic." Question Construction - Test Your Knowledge  Example 1: "I can always talk to my Program Director about residency related problems." Problem: Terms such as "always" and "never" will bias the response in the opposite direction.  Result: Data will be skewed.  Question Construction - Test Your Knowledge    Example 2: “Career planning resources are available to me and my program director supports my professional aspirations." Problem: Double-barreled ---resources and aspirations… Respondents may agree with one and not the other. Researcher cannot make assumptions about which part of the question respondents were rating. Result: Data is useless. Question Construction - Test Your Knowledge  Example 3: "Communication in my subspecialty program is good."  Problem: Question is too broad. If score is less than 100% positive, researcher/evaluator still does not know what aspect of communication needs improvement.  Result: Data is of little or no usefulness. Question Construction - Test Your Knowledge  Example 4: “Evidences incomplete, inaccurate medical interviews, physical examinations; incomplete review and summary of other data sources. Fails to analyze data to make decisions; poor clinical judgment.”  Problem: Multi-barreled ---Respondents may need to agree with some and not the others. Evaluator cannot make assumptions about which part of the question respondents were rating.  Result: Data is useless. Question Construction - Test Your Knowledge  Example (5): – "The pace on our service is chaotic.“  Problem: The question is negative, and broadcasts a bad message about the rotation/program.  Result: Data will be skewed, and the climate may be negatively impacted. Evaluation Question Design Principles Avoid ‘double-barreled’ or multi-barreled questions. Eliminate them from your evaluations.  A multi-barreled question combines two or more issues or “attitudinal objects” in a single question.  More examples…. Avoiding Double-Barreled Questions  Example: COMPETENCY 1 – Patient Care “Resident provides sensitive support to patients with serious illness and to their families, and arranges for on-going support or preventive services if needed.” Evaluation Question Design Principles Combining the two or more questions into one question makes it unclear which object attribute is being measured, as each question may elicit a different perception of the resident’s performance. RESULT •Respondents are confused and results are confounded leading to unreliable or misleading results Tip: If the word “and” or the word “or” appears in a question, check to verify whether it is a double-barreled question. Evaluation Question Design Principles   Avoid questions with double negatives… When respondents are asked for their agreement with a negatively phrased statement, double negatives can occur. –  Example: Do you agree or disagree with the following statement? Attendings should not be required to supervise their residents during night call.” Evaluation Question Design Principles    If you respond that you disagree, you are saying you do not think attendings should not supervise residents. In other words, you believe that attendings should supervise residents. Phrase the questions positively if possible. If you do use a negative word like “not”, consider highlighting the word by underlining or bolding it to catch the respondent’s attention. Evaluation Question Design Principles    Because every question is measuring “something”, it is important for each to be clear and precise. Remember…Your goal is for each respondent to interpret the meaning of each survey question in exactly the same way. Pre-testing questions is definitely recommended! Evaluation Question Design Principles    If your respondents are not clear on what is being asked in a question, their responses may result in data that cannot or should not be applied to your evaluation results… Example: "For me, further development of my medical competence, it is important enough to take risks" Does this mean to take risks with patient safety, risks to one's pride, or something else? Evaluation Question Design Principles     Keep questions short. Long questions can be confusing. Bottom line: Focus on short, concise, clearly written statements that get right to the point, producing actionable data that can inform individual learning plans (ILPs). Short concise questions take only seconds to respond to and are easily interpreted. Evaluation Question Design Principles   Do not use “loaded” or “leading” questions A loaded or leading question biases the response the respondent gives. A loaded question is one that contains loaded words. – For example: “I’m concerned about doing a procedure if my performance would reveal that I had low ability” Disagree Agree Evaluation Question Design Principles  "I’m concerned about doing a procedure if my performance would reveal that I had low ability"  How can this be answered with “agree or disagree” if you think you have good abilities in appropriate tasks for your area? Evaluation Question Design Principles  A leading question is phrased in such a way that suggests to the respondent that a certain answer is expected: – Example: Don’t you agree that nurses should show more respect to residents and attendings?   Yes, they should show more respect No, they should not show more respect Evaluation Question Design Principles   Do use of Open-Ended Questions Do use comment boxes after negative ratings –  To explain the reasoning and target areas for focus and improvement General, open-ended questions at the end of the evaluation can prove very beneficial – Often it is found that entire topics have been omitted from the evaluation that should have been included. Exercise 2 “Post Test” / Answers 1. Please rate the general surgery resident’s communication and technical skills - Please rate the general resident’s communication skills. - Please rate the general surgery resident's technical skills. 2. Rate the resident’s ability to communicate with patients and their families - Rate the resident’s ability to communicate with patients - Rate the resident’s ability to communicate with families Exercise 2 “Post Test” Answers 3. Rate the resident’s abilities with respect to case familiarization; effort in reading about patient’s disease process and familiarizing with operative care and post op care – – – – Rate the resident’s ability with respect to case familiarization Rate the resident’s ability with respect to effort in reading about patient disease process Rate the resident’s ability with respect to familiarizing with operative care Rate the resident’s ability with familiarizing with post op care Exercise 2 “Post Test” Answers 4. Residents deserve higher pay for all the hours they put in, don’t they? – To what extent do you agree that residents are adequately paid? 5. Explains and performs steps required in resuscitation and stabilization – – Explains steps required in resuscitation Explains steps required in stabilization following resuscitation Exercise 2 “Post Test” Answers 6. Do you agree or disagree that residents shouldn’t have to pay for their meals when on-call? – To what extent do you agree that residents need to pay for their own on-call meals? 7. Demonstrates an awareness of and responsiveness to the larger context of health care – – Demonstrates an awareness of the larger context of health care Demonstrates responsiveness to larger context of health care 8. Demonstrates ability to communicate with faculty and staff – – Demonstrates ability to communicate with faculty Demonstrates ability to communicate with staff Bias in the Rating Scales for Questions The scale you construct can also skew your data, much like we discussed about question construction Evaluation Design Principles: Rating Scales   By far the most popular scale asks respondents to rate their agreement with the evaluation questions or statements. After you decide what you want respondents to rate (competence, agreement, etc.), you need to decide how many levels of rating you want them to be able to make. Evaluation Design Principles: Rating Scales   Determine how fine a distinction you want to be able to make between agreement and disagreement. Using too few can give less precise, cultivated information, while using too many could make the question hard to read and answer (do you really need a 9 or 10 point scale?) Evaluation Design Principles: Rating Scales  Psychological research has shown that a 6point scale with three levels of agreement and three levels of disagreement works best. An example would be:       Disagree Strongly Disagree Moderately Disagree Slightly Agree Slightly Agree Moderately Agree Strongly Evaluation Design Principles: Rating Scales    This 6-point scale affords you ample flexibility for data analysis. Depending on the questions, other scales may be appropriate, but the important thing to remember is that it must be balanced, or you will build in a biasing factor. Avoid neutral and neither agree nor disagree…you’re just giving up 20% of your evaluation ‘real estate’ Evaluation Design Principles: Rating Scales – Group Exercise The scale itself can skew your data, as well. 1. Please rate the variety of patients available to the program for educational purposes. Poor Fair Satisfactory Good Very Good 2. Please rate the attendance of your faculty members at your journal club conferences. Limited Fair Satisfactory Good Very Good Evaluation Design Principles: Rating Scales 3. Knowledge in general medicine. Poor OK Good Very Good Excellent The data will be artificially skewed in the positive direction with this scale because there are far more (4:1) positive than negative rating options. Gentle Words of Wisdom - Avoid large numbers of questions….   Respondent fatigue – the respondent tends to give similar ratings to all items without giving much thought to individual items, just wanting to finish In situations where many items are considered important, a large number can receive very similar ratings at the top end of the scale – –  Items are not traded-off against each other Many items that not at the extreme ends of the scale or that are considered similarly important are given a similar rating Respondents quit answering questions at all… – – Total Started Survey: 578 Total Completed Survey: 473 (81.8%) Gentle Words of Wisdom Begin with the End Goal in Mind    What do you want as your outcomes? Be prepared to put in the time with pretesting. The faculty member, nurse, patient, resident has to be able to understand the intent of the question - and each must find it credible and interpret it in the same way Gentle Words of Wisdom – Relevancy/Accuracy – Collect data in a valid and reliable way     If the questions aren’t framed properly, if they are too vague or too specific, it’s impossible to get any meaningful data. Question miswording can lead to skewed data with little or no usefulness. Ensure your response scales are balanced and appropriate. If you don't plan or know how you are going to use the data, don't ask the question! Gentle Words of Wisdom Continued…  If you are using aggregated data, the statistical analyses must be appropriate for your evaluation or, however sophisticated and impressive, the numbers generated that look real will actually be false and misleading. – Are differences really significant given your sample size? Summary: Evaluation Do’s and Don’ts DO’s   Keep Questions Clear, Precise and Relatively Short. Use a balanced response scale –  (4-6 scale points recommended) Use open ended questions DON’Ts    Do not use Double+ Barreled Questions Do not use Double Negative Questions Do not use Loaded or Leading Questions. Applying Workshop ‘Learnings’ – Exercise 3  Part A. – Take a look at the evaluation you brought to this session.    Identify questions and response scales which may have unintended bias Modify questions or response scales to eliminate bias and share if time permits. Part B. – Review (or set up a review process/panel) your Program’s or Institutions’ evaluations with the goal of eliminating any unintentional bias Ready to Play the Game – and Get better Results! Questions Feel free to contact us: Ann Dohn, MA – adohn1@stanford.edu Nancy Piro, PhD – npiro@stanford.edu

Eliminating Bias from Evaluation Instruments

Related documents

Products

Support

Eliminating Bias from Evaluation Instruments

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib