The Evidence-Based Practitioner: Applying Research to Meet Client Needs 4366_FM_i-xxii.indd i 27/10/16 2:13 pm 4366_FM_i-xxii.indd ii 27/10/16 2:13 pm The Evidence-Based Practitioner: Applying Research to Meet Client Needs Catana Brown, PhD, OTR/L, FAOTA Midwestern University Department of Occupational Therapy Glendale, Arizona 4366_FM_i-xxii.indd iii 27/10/16 2:13 pm F. A. Davis Company 1915 Arch Street Philadelphia, PA 19103 www.fadavis.com Copyright © 2017 by F. A. Davis Company Copyright © 2017 by F. A. Davis Company. All rights reserved. This product is protected by copyright. No part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher. Printed in the United States of America Last digit indicates print number: 10 9 8 7 6 5 4 3 2 1 Senior Acquisitions Editor: Christa A. Fratantoro Director of Content Development: George W. Lang Developmental Editor: Nancy J. Peterson Content Project Manager: Julie Chase Art and Design Manager: Carolyn O’Brien As new scientific information becomes available through basic and clinical research, recommended treatments and drug therapies undergo changes. The author(s) and publisher have done everything possible to make this book accurate, up to date, and in accord with accepted standards at the time of publication. The author(s), editors, and publisher are not responsible for errors or omissions or for consequences from application of the book, and make no warranty, expressed or implied, in regard to the contents of the book. Any practice described in this book should be applied by the reader in accordance with professional standards of care used in regard to the unique circumstances that may apply in each situation. The reader is advised always to check product information (package inserts) for changes and new information regarding dose and contraindications before administering any drug. Caution is especially urged when using new or infrequently ordered drugs. Library of Congress Cataloging-in-Publication Data Names: Brown, Catana, author. Title: The evidence-based practitioner : applying research to meet client needs / Catana Brown. Description: Philadelphia : F.A. Davis Company, [2017] | Includes bibliographical references and index. Identifiers: LCCN 2016046032 | ISBN 9780803643666 (pbk. : alk. paper) Subjects: | MESH: Occupational Therapy | Evidence-Based Practice | Physical Therapy Modalities | Speech Therapy | Language Therapy | Problems and Exercises Classification: LCC RM735.3 | NLM WB 18.2 | DDC 615.8/515—dc23 LC record available at https://lccn.loc.gov/2016046032 Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by F. A. Davis Company for users registered with the Copyright Clearance Center (CCC) Transactional Reporting Service, provided that the fee of $.25 per copy is paid directly to CCC, 222 Rosewood Drive, Danvers, MA 01923. For those organizations that have been granted a photocopy license by CCC, a separate system of payment has been arranged. The fee code for users of the Transactional Reporting Service is: 978080364366/17 0 + $.25. 4366_FM_i-xxii.indd iv 27/10/16 2:13 pm For Lauren— You’re an astonishing teacher, despite the fact that you won’t read this book. But then I probably won’t read your opera book, either. CB 4366_FM_i-xxii.indd v 27/10/16 2:13 pm 4366_FM_i-xxii.indd vi 27/10/16 2:13 pm Foreword The introduction of evidence-based medicine by David Sackett and other researchers in the 1990s (Sackett, 1997) initiated a radical shift in the approach to instruction in research methods and the application of research findings to health-care practice. Until then, practitioners learned about research through standard academic research methods courses in which they were taught to read and critique journal articles using the well-established criteria of reliability and validity. They were then expected to use those skills to “keep up” with the research literature relevant to their area of practice and apply the results to patient care. Unfortunately, for the most part, they didn’t. Sackett and his colleagues determined that the traditional approach to applying research to practice was ineffective, and they proposed a radically different approach—what we now recognize as evidence-based practice. What was so different? Sackett and colleagues recognized that research was relevant and useful to the practitioner only to the extent that it addressed a clinical question of importance to practice and provided a useful guide to clinical decision-making. From this perspective, reading journal articles just to “keep current” and without a particular question in mind was unfocused and unproductive. The alternative method they proposed taught practitioners to use research evidence as one of three integral components of clinical reasoning and decision-making. This method is reflected in the now-familiar definition of evidence-based practice: integration of the clinician’s expertise and the best available scientific evidence with the client’s preferences and values to determine an appropriate course of action in a clinical encounter. To support the use of evidence-based practice as an integral part of clinical reasoning, a different method of instruction was developed, which is exemplified in The Evidence-Based Practitioner: Applying Research to Meet Client Needs. Evidence-based practice (EBP) is a process to be learned, not a content area to be mastered the way we learn to identify the bones of the body or the details of a particular assessment. Although it does require learning about research methods and design, measurement, and statistics, this knowledge is mastered in the context of appraising evidence in relation to a particular clinical question regarding a particular clinical scenario. The EBP process involves a specific set of steps to formulate an answerable question, and then to search, select, appraise, and apply the evidence to answer the clinical decision at hand. Ideally, students will have multiple opportunities to practice these steps so that ultimately the process can be initiated and carried out smoothly and efficiently in occupational therapy practice. One of the valuable features of this text is that it is designed to be used with team-based learning. This approach supports another important element of Sackett’s (Sackett, 1997) and others’ original recommendations for how to conduct EBP: that is, the importance of distributing the work and learning from one another’s insights. Team-based learning models a method that can be carried forward into the “real world” to continue to implement EBP in practice. Here’s what this can look like: Each of the five practitioners in a group prepares and shares an appraisal of one key study that addresses a clinical question of importance to the group. In less than an hour of discussion, the group synthesizes the findings and reaches a decision on the best answer (known as the “clinical bottom line” in EBP) to a clinical question at hand. One busy practitioner working alone might find that amount of work daunting. In addition, he or she would miss the crucial insights that other group participants provide. There’s another important advantage to team-based EBP: it’s much more fun. Group members energize one another, and examining the evidence becomes an interesting exploration and lively discussion of how best to balance strengths and limitations, clinical relevance and feasibility, and similarities and differences in the evidence. The outcome of that lively discussion will help ensure that your clinical decisions are guided by the best evidence available to help your clients. vii 4366_FM_i-xxii.indd vii 27/10/16 2:13 pm viii Foreword In The Evidence-Based Practitioner: Applying Research to Meet Client Needs, Catana Brown provides occupational therapy, physical therapy, and speech-language pathology students with a clear and concise overview of research designs, methodology, use of statistical analysis, and levels of evidence, as well as the tools with which to evaluate and apply evidence. Interesting and engaging features such as From the Evidence lead the readers through the steps to becoming effective consumers of evidence. Exercises and Critical Thinking Questions motivate learners to explore how this knowledge can be applied to their clinical practice. 4366_FM_i-xxii.indd viii I hope that you will approach learning EBP as a great adventure and that you and your fellow students make exciting discoveries. Wendy Coster, PhD, OTR/L FAOTA Professor and Chair, Department of Occupational Therapy Director, Behavior and Health Program Boston University Boston, Massachusetts, USA Sackett, D. L. (1997). Evidence-based medicine: How to practice and teach EBM. New York/Edinburgh: Churchill Livingstone. 27/10/16 2:13 pm Preface Evidence-based practice is no longer a new idea: it’s a mandate from third-party payers, accrediting bodies, health-care institutions, and clients. Although the majority of therapists will become practitioners and consumers of research rather than academic researchers, good consumers of research must still understand how a study is put together and how to analyze the results. Occupational therapists, physical therapists, and speech-language pathologists are expected to use evidence when discussing intervention options with clients and their families, and when making clinical decisions. The skills required to be an effective evidence-based practitioner are complex; for many therapists, finding and reviewing research is considered a daunting or tedious endeavor. In addition, evidence-based practice is still new enough that many working therapists were not trained in the methods, and some work settings have not yet adopted a culture of evidence-based practice that provides sufficient resources. GUIDING PRINCIPLE: CONSUMING VS. CONDUCTING RESEARCH The Evidence-Based Practitioner: Applying Research to Meet Client Needs is designed for entry-level graduate students in occupational therapy, physical therapy, and speechlanguage pathology, particularly those in courses that focus on evidence-based practice versus the performance of research. Its emphasis is on providing therapists with the knowledge and tools necessary to access evidence, critique its strength and applicability, and use evidence from all sources (i.e., research, the client, and clinical experience) to make well-informed clinical decisions. This textbook was designed with multiple features that allow students and practitioners not only to acquire knowledge about evidence-based practice, but also to begin to apply that knowledge in the real world. Numerous examples and excerpts of published journal articles from occupational therapy, physical therapy, and speech-language pathology are used throughout the text. In addition to learning about evidence-based practice, students are exposed to research in their own disciplines and the disciplines of their future team members. The text contains 11 chapters and is intended to fit within a single entry-level course in a health-care program. It will fit ideally into programs offering a course on evidence-based practice, and can also be used to support a traditional research methods text in research courses that cover evidence-based practice. The content of the initial chapters focuses on explaining basic research concepts, including describing qualitative and quantitative approaches. A separate chapter on statistics is included in this introductory material. Subsequent chapters explain the different designs used in healthcare research, including separate chapters for each of the following types of research: intervention, assessment, descriptive/predictive, and qualitative, as well as a chapter on systematic reviews. These chapters prepare students to match their own evidence-based questions with the correct type of research. In addition, students will acquire the knowledge and skills necessary to understand research articles, including those aspects of the research article that can be particularly befuddling: statistics, tables, and graphs. Importantly, the chapters provide students with an understanding of how to evaluate the quality of research studies. The text ends with a chapter on integrating evidence from multiple sources, which highlights the importance of involving clients and families in the decision-making process by sharing the evidence. A TEAM-BASED LEARNING WORKTEXT This text uses a unique team-based learning (TBL) approach. TBL is a specific instructional strategy that facilitates the type of learning that helps students solve problems. It is a method that requires active involvement of the student in the learning process from the outset. Ideally, students work in small teams, using methods that enhance accountability for both individual and team work; this can result in a deeper level of understanding that is more relevant to real-life practice. Still, this textbook is useful for all types of instructional strategies and is relevant even with approaches that do not use a TBL format. Nevertheless, TBL provides the pedagogy for applying information, and therefore one strength of this text is its emphasis on application. ix 4366_FM_i-xxii.indd ix 27/10/16 2:13 pm x Preface To facilitate application, the text is presented as a worktext that interweaves narrative with exercises, critical thinking questions, and other means of engaging students and helping them comprehend the information. When appropriate, answers to these questions are included at the end of the chapter. An advantage of the worktext approach is that it gets students engaged with the material from the beginning. In courses that use a TBL format, the worktext prepares students to be effective team members. TERMINOLOGY A challenging aspect of evidence-based practice for students and instructors alike is terminology. In fact, this was one of the greatest challenges for the author of this text. In evidence-based practice, several different terms can be used to describe the same or similar concepts. Making matters more difficult, there are several issues with terminology that can make deciphering the research literature perplexing. For example: • Different terms are used to describe the same or similar concepts. • There are disagreements among experts as to the proper use of some terms. • Terms are used incorrectly, even in peer-reviewed articles. • Labels and terms are sometimes omitted from research articles. Because deciphering research terminology is challenging, a significant effort was directed toward using the most common terms that are likely to appear in the literature. When multiple terms are routinely used, this is explained in the text. For example, what some call a nonrandomized controlled trial may be described by others as a quasi-experimental study. Due to the challenges with terminology, students need to read actual articles and excerpts of articles during the learning process so that these terminology issues can become transparent. When students have a more thorough understanding of a concept and the terms involved, they can better interpret the idiosyncrasies of individual articles. Fortunately many journals are creating standard formats for reporting research, and with time some terminology issues will be resolved, although differences in opinion and disciplines (e.g., school-based practice vs. medicine) will likely continue to exist. SPECIAL FEATURES The special features developed for this text will enable students to better understand content, develop the 4366_FM_i-xxii.indd x advanced skills needed for assessing the strength and applicability of evidence, and apply the material to practice. The Evidence-Based Practitioner: Applying Research to Meet Client Needs includes several special features. Key Terms An alphabetical list of key terms appears at the beginning of each chapter. These terms are also bolded where they are first described in the chapter and fully defined in the end-of-book glossary. From the Evidence Students often have trouble applying research concepts to reading a research article. This key feature helps students make the link by providing real-world examples from research articles in occupational therapy, physical therapy, and speech-language pathology. From the Evidence visually walks the student through graphic examples such as abstracts, tables, and figures to illustrate key concepts explained in the chapter. Arrows and text boxes are used to point out and elucidate the concept of interest. From the Evidence features are included in each chapter. Each has at least one corresponding question to ensure that the student fully understands the material. Answers to these questions are provided at the end of each chapter. Exercises Exercises are distributed throughout the chapters to help students learn to apply information in context. In TBL courses, the exercises are intended to prepare students for the in-class team assignments; similarly, in flipped classrooms, students would complete the exercises at home and arrive at class prepared for discussions and activities. Each exercise is tied directly to a Learning Outcome and includes questions requiring students to apply the knowledge acquired in the chapter. There is space in the text for the student to complete the exercise, and the answers are provided at the end of the chapter. Understanding Statistics After Chapter 4, “Understanding Statistics: What They Tell You and How to Apply Them in Practice,” the Understanding Statistics feature is included in chapters in which specific statistical procedures are described. Understanding Statistics boxes provide an example of a statistic with additional explanation to reinforce information that is typically challenging. The feature also helps put the information in context for students by associating the statistic with a specific research design. 27/10/16 2:13 pm Preface Evidence in the Real World The Evidence in the Real World feature uses a storytelling or case scenario approach to demonstrate how theoretical research concepts apply to real-life practice. It serves as another method of demystifying research concepts—such as how the concept of standard deviations can be used to understand the autism spectrum— and showing students the relevance/practical application of what they are learning. Critical Thinking Questions Each chapter ends with Critical Thinking Questions. These questions require higher-level thinking and serve 4366_FM_i-xxii.indd xi xi as prompts for students to evaluate their comprehension of the chapter concepts. CLOSING THOUGHTS In today’s health-care environment, occupational therapists, physical therapists, and speech-language pathologists must be proficient in accessing, critiquing, and applying research in order to be effective evidence-based practitioners. With solid foundational information and engaging application exercises, this text provides the framework for developing the evidence-based practice skills that allow practitioners to best meet their clients’ needs. 27/10/16 2:13 pm 4366_FM_i-xxii.indd xii 27/10/16 2:13 pm Acknowledgment Although it is now widely valued, evidence-based practice is not the favorite topic of most rehabilitation therapy students. When I began this process, I knew that I wanted a very different sort of textbook that would require students to actively engage with the material; hence, the use of a team-based learning format. However, doing something different required a lot of help along the way. First, I would like to acknowledge the fantastic editorial support provided by F.A. Davis. In particular I would like to thank Christa Fratantoro, the acquisitions editor, who grasped my vision for a new evidence-based textbook and believed in my ability to pull it off. I appreciate her friendship and backing. Nancy Peterson, developmental editor extraordinaire, was with me through every step of the process. All the things that are good about this text are better because of Nancy. In addition, Nancy is my sounding board, my counselor, motivator, and guide. I owe a debt of gratitude to the occupational therapy and physical therapy students at Midwestern University– Glendale in Arizona, who used different variations of the rough drafts of the text and provided invaluable feedback, resulting in the addition, clarification, and improvement of the content. I would especially like to thank Morgan Lloyd, who helped me with some of the content that was the most difficult to explain. Larry Michaelsen, who developed the team-based learning approach, inspired me to try a new way of teaching, which ultimately led to my insight that a new type of textbook was needed. Furthermore, I would like to thank Bill Roberson and Larry Michaelsen for contributing a marvelous team-based learning primer as part of the instructor resources. Finally, a big thanks to those who offered support, both professional and personal, providing me with the time, space, and encouragement to make this text a reality. This includes my chair, Chris Merchant; my husband, Alan Berman; and my friend, Bob Gravel. Catana Brown, PhD, OTR/L, FAOTA xiii 4366_FM_i-xxii.indd xiii 27/10/16 2:13 pm 4366_FM_i-xxii.indd xiv 27/10/16 2:13 pm Reviewers Evelyn Andersson, PhD, OTR/L Sharon Gutman, PhD, OTR, FAOTA Associate Professor School of Occupational Therapy Midwestern University Glendale, AZ Associate Professor Programs in Occupational Therapy Columbia University New York, NY Suzanne R. Brown, PhD, MPH, PT Elisabeth L. Koch, MOT, OTR/L Educational Consultant Mesa, AZ Faculty and Clinical Coordinator Occupational Therapy Assistant Program Metropolitan Community College of Kansas City–Penn Valley, Health Science Institute Kansas City, MO April Catherine Cowan, OTR, OTD, CHT Assistant Professor Occupational Therapy The University of Texas Medical Branch Galveston, TX Denise K. Donica, DHS, OTR/L, BCP Associate Professor, Graduate Program Director Occupational Therapy East Carolina University Greenville, NC Marc E. Fey, PhD, CCC-SLP Professor Department of Hearing and Speech University of Kansas Medical Center Kansas City, KS Teresa Plummer, PhD, OTR/L, CAPS, ATP Assistant Professor School of Occupational Therapy Belmont University Nashville, TN Patricia J. Scott, PhD, MPH, OT, FAOTA Associate Professor Occupational Therapy Indiana University Indianapolis, IN Thomas F. Fisher, PhD, OTR, CCM, FAOTA Professor and Chair Occupational Therapy Indiana University Indianapolis, IN xv 4366_FM_i-xxii.indd xv 27/10/16 2:13 pm 4366_FM_i-xxii.indd xvi 27/10/16 2:13 pm Contents in Brief Chapter 1 Evidence-Based Practice: Why Do Practitioners Need to Understand Research? Chapter 59 5 Validity: What Makes a Study Strong? Chapter Chapter 6 Choosing Interventions for Practice: Designs to Answer Efficacy Questions 103 163 10 Tools for Practitioners That Synthesize the Results of Multiple Studies: Systematic Reviews and Practice Guidelines 183 Chapter 81 145 9 Qualitative Designs and Methods: Exploring the Lived Experience 39 127 8 Descriptive and Predictive Research Designs: Understanding Conditions and Making Clinical Predictions Chapter 4 Understanding Statistics: What They Tell You and How to Apply Them in Practice Chapter 21 7 Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests Chapter 3 Research Methods and Variables: Creating a Foundation for Evaluating Research Chapter 1 2 Finding and Reading Evidence: The First Steps in Evidence-Based Practice Chapter Chapter 11 Integrating Evidence From Multiple Sources: Involving Clients and Families in Decision-Making 203 Glossary 217 Index 225 xvii 4366_FM_i-xxii.indd xvii 27/10/16 2:13 pm 4366_FM_i-xxii.indd xviii 27/10/16 2:13 pm Contents Chapter 1 Chapter 2 Evidence-Based Practice: Why Do Practitioners Need to Understand Research? 1 Finding and Reading Evidence: The First Steps in Evidence-Based Practice 21 INTRODUCTION IDENTIFYING DATABASES INTRODUCTION 2 WHAT IS EVIDENCE-BASED PRACTICE? 2 EMPLOYING SEARCH STRATEGIES 6 THE PROCESS OF EVIDENCE-BASED PRACTICE 7 Formulate a Question Based on a Clinical Problem 7 Identify the Relevant Evidence 7 Evaluate the Evidence 7 Implement Useful Findings 8 Evaluate the Outcomes 8 WRITING AN EVIDENCE-BASED QUESTION ANSWERS 17 REFERENCES 18 16 25 Selecting Key Words and Search Terms 26 Combining Terms and Using Advanced Search Using Limits and Filters 27 Expanding Your Search 29 ACCESSING THE EVIDENCE 26 29 The Research Librarian 30 Professional Organizations 31 9 Questions on Efficacy of an Intervention 9 Research Designs for Efficacy Questions and Levels of Evidence 10 Questions for Usefulness of an Assessment 13 Research Designs Used in Assessment Studies 13 Questions for Description of a Condition 14 Research Designs Used in Descriptive Studies 14 Questions for Prediction of an Outcome 14 Research Designs Used in Predictive Studies 14 Questions About the Client’s Lived Experience 15 Research Designs Addressing the Client’s Lived Experience 16 CRITICAL THINKING QUESTIONS 22 PubMed 24 Cumulative Index of Nursing and Allied Health Literature 25 Cochrane Database of Systematic Reviews 25 External Scientific Evidence 3 Practitioner Experience 3 Client Situation and Values 5 WHY EVIDENCE-BASED PRACTICE? 22 DETERMINING THE CREDIBILITY OF A SOURCE OF EVIDENCE 31 Websites 32 The Public Press/News Media 32 Scholarly Publications 33 Impact Factor 33 The Peer-Review Process 33 Research Funding Bias 34 Publication Bias 34 Duplicate Publication 34 READING A RESEARCH ARTICLE 35 Title 35 Authorship 35 Abstract 35 Introduction 35 Methods 35 Results 36 Discussion 37 xix 4366_FM_i-xxii.indd xix 27/10/16 2:13 pm xx Contents References 37 Acknowledgments 37 CRITICAL THINKING QUESTIONS ANSWERS 37 38 REFERENCES Chapter Inferential Statistics for Analyzing Relationships 72 Scatterplots for Graphing Relationships 72 Relationships Between Two Variables 73 Relationship Analyses With One Outcome and Multiple Predictors 74 Logistic Regression and Odds Ratio 74 38 EFFECT SIZE AND CONFIDENCE INTERVALS 3 CRITICAL THINKING QUESTIONS Research Methods and Variables: Creating a Foundation for Evaluating Research 39 INTRODUCTION 40 TYPES OF RESEARCH HYPOTHESIS TESTING: TYPE I AND TYPE II ERRORS 52 52 CRITICAL THINKING QUESTIONS Chapter Chapter 80 5 INTRODUCTION VALIDITY 82 82 STATISTICAL CONCLUSION VALIDITY 82 Threats to Statistical Conclusion Validity 82 Fishing 83 Low Power 83 85 Threats to Internal Validity 85 Assignment and Selection Threats 85 Maturation Threats 88 History Threats 89 Regression to the Mean Threats 90 Testing Threats 90 Instrumental Threats 91 Experimenter and Participant Bias Threats 91 Attrition/Mortality Threats 93 55 56 REFERENCES REFERENCES INTERNAL VALIDITY Independent Variables 52 Dependent Variables 53 Control Variables 53 Extraneous Variables 53 ANSWERS 79 Validity: What Makes a Study Strong? 81 40 Experimental Research 40 Nonexperimental Research 41 Quantitative Research 43 Qualitative Research 46 Cross-Sectional and Longitudinal Research 47 Basic and Applied Research 48 VARIABLES ANSWERS 57 EXTERNAL VALIDITY 4 95 Threats to External Validity 95 Sampling Error 96 Ecological Validity Threats 96 Understanding Statistics: What They Tell You and How to Apply Them in Practice 59 INTERNAL VERSUS EXTERNAL VALIDITY INTRODUCTION ANSWERS 60 SYMBOLS USED WITH STATISTICS DESCRIPTIVE STATISTICS 60 60 Chapter 97 100 100 102 6 Choosing Interventions for Practice: Designs to Answer Efficacy Questions 103 65 Statistical Significance 66 Inferential Statistics to Analyze Differences The t-test 66 Analysis of Variance 66 Analysis of Covariance 69 4366_FM_i-xxii.indd xx CRITICAL THINKING QUESTIONS REFERENCES 60 Frequencies and Frequency Distributions Measure of Central Tendency 61 Measures of Variability 62 INFERENTIAL STATISTICS 76 78 66 INTRODUCTION 104 RESEARCH DESIGN NOTATION 104 BETWEEN- AND WITHIN-GROUP COMPARISONS 105 27/10/16 2:13 pm Contents RESEARCH DESIGNS FOR ANSWERING EFFICACY QUESTIONS 107 Designs Without a Control Group 108 Randomized Controlled Trials 108 Crossover Designs 110 Nonrandomized Controlled Trials 110 Factorial Designs 114 Single-Subject Designs 117 Retrospective Intervention Studies 117 SAMPLE SIZE AND INTERVENTION RESEARCH 120 USING A SCALE TO EVALUATE THE STRENGTH OF A STUDY 120 COST EFFECTIVENESS AS AN OUTCOME CRITICAL THINKING QUESTIONS ANSWERS 122 123 REFERENCES Chapter 122 125 7 Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests 127 INTRODUCTION 128 TYPES OF SCORING AND MEASURES 128 Continuous Versus Discrete Data 128 Norm-Referenced Versus Criterion-Referenced Measures 129 Norm-Referenced Measures 129 Criterion-Referenced Measures 130 TEST RELIABILITY 131 ANSWERS 141 REFERENCES 4366_FM_i-xxii.indd xxi 142 INTRODUCTION 146 DESCRIPTIVE RESEARCH FOR UNDERSTANDING CONDITIONS AND POPULATIONS 146 Incidence and Prevalence Studies 146 Group Comparison Studies 147 Survey Research 149 STUDY DESIGNS TO PREDICT AN OUTCOME 150 Predictive Studies Using Correlational Methods 150 Simple Prediction Between Two Variables 150 Multiple Predictors for a Single Outcome 151 Predictive Studies Using Group Comparison Methods 155 Case-Control Studies 155 Cohort Studies 155 EVALUATING DESCRIPTIVE AND PREDICTIVE STUDIES 157 LEVELS OF EVIDENCE FOR PROGNOSTIC STUDIES 158 CRITICAL THINKING QUESTIONS ANSWERS 160 REFERENCES Chapter 159 161 9 164 THE PHILOSOPHY AND PROCESS OF QUALITATIVE RESEARCH 164 Philosophy 164 Research Questions 165 Selection of Participants and Settings 165 Methods of Data Collection 166 Data Analysis 167 Construct Validity 135 Sensitivity and Specificity 136 Relationship Between Reliability and Validity 138 138 CRITICAL THINKING QUESTIONS Descriptive and Predictive Research Designs: Understanding Conditions and Making Clinical Predictions 145 INTRODUCTION 134 RESPONSIVENESS 8 Qualitative Designs and Methods: Exploring the Lived Experience 163 Standardized Tests 131 Test-Retest Reliability 132 Inter-Rater Reliability 132 Internal Consistency 133 TEST VALIDITY Chapter xxi 141 QUALITATIVE RESEARCH DESIGNS 168 Phenomenology 168 Grounded Theory 170 Ethnography 171 27/10/16 2:13 pm xxii Contents Narrative 173 Mixed-Method Research 175 PROPERTIES OF STRONG QUALITATIVE STUDIES 176 CRITICAL THINKING QUESTIONS Credibility 177 Transferability 177 Dependability 178 Confirmability 178 ANSWERS Chapter 180 REFERENCES Chapter 179 10 Tools for Practitioners That Synthesize the Results of Multiple Studies: Systematic Reviews and Practice Guidelines 183 184 SYSTEMATIC REVIEWS 184 Finding Systematic Reviews 184 Reading Systematic Reviews 185 Evaluating the Strength of Systematic Reviews Replication 186 Publication Bias 186 Heterogeneity 189 DATA ANALYSIS IN SYSTEMATIC REVIEWS Meta-Analyses 190 Qualitative Thematic Synthesis PRACTICE GUIDELINES 11 190 186 INTRODUCTION 204 CHILD-CENTERED PRACTICE 204 SHARED DECISION-MAKING 204 EDUCATION AND COMMUNICATION 206 Components of the Process 208 People Involved 208 Engaging the Client in the Process 208 Consensus Building 208 Agreement 210 Decision Aids 210 Content 210 Resources for Shared Decision-Making 210 CRITICAL THINKING QUESTIONS ANSWERS 193 213 214 REFERENCES 215 Glossary 217 195 Finding Practice Guidelines 197 Evaluating the Strength of Practice Guidelines 198 4366_FM_i-xxii.indd xxii 201 Integrating Evidence From Multiple Sources: Involving Clients and Families in Decision-Making 203 180 INTRODUCTION 199 200 REFERENCES CRITICAL THINKING QUESTIONS ANSWERS THE COMPLEXITIES OF APPLYING AND USING SYSTEMATIC REVIEWS AND PRACTICE GUIDELINES 199 Index 225 27/10/16 2:13 pm “Facts are stubborn things; and whatever may be our wishes, our inclinations, or the dictates of our passions, they cannot alter the state of facts and evidence.” —John Adams, second President of the United States 1 Evidence-Based Practice Why Do Practitioners Need to Understand Research? CHAPTER OUTLINE LEARNING OUTCOMES KEY TERMS INTRODUCTION WHAT IS EVIDENCE-BASED PRACTICE? External Scientific Evidence Practitioner Experience Client Situation and Values WHY EVIDENCE-BASED PRACTICE? THE PROCESS OF EVIDENCE-BASED PRACTICE Formulate a Question Based on a Clinical Problem Identify the Relevant Evidence WRITING AN EVIDENCE-BASED QUESTION Questions on Efficacy of an Intervention Research Designs for Efficacy Questions and Levels of Evidence Questions for Usefulness of an Assessment Research Designs Used in Assessment Studies Questions for Description of a Condition Research Designs Used in Descriptive Studies Questions for Prediction of an Outcome Research Designs Used in Predictive Studies Questions About the Client’s Lived Experience Research Designs Addressing the Client’s Lived Experience Evaluate the Evidence CRITICAL THINKING QUESTIONS Implement Useful Findings ANSWERS Evaluate the Outcomes REFERENCES LEARNING OUTCOMES 1. Identify the three sources of evidence, including what each source contributes to evidence-based decision-making. 2. Apply an evidence-based practice hierarchy to determine the level of evidence of a particular research study. 3. Describe the different types of research questions and the clinical information that each type of question elicits for therapists. 1 4366_Ch01_001-020.indd 1 28/10/16 2:30 pm 2 CHAPTER 1 ● Evidence-Based Practice KEY TERMS client-centered practice random assignment control randomized controlled trial critically appraised paper reflective practitioner cross-sectional research reliability evidence-based practice replication incidence scientific method internal validity sensitivity levels of evidence shared decision-making longitudinal research specificity PICO systematic review prevalence validity INTRODUCTION “H ow much water should you drink every day?” Most of us have heard, read, or even adhered to the recommendation that adults should drink at least eight glasses of 8 ounces of water each day (abbreviated as “8 ⫻ 8”), with caffeinated beverages not counting toward the total. Is this widely accepted recommendation based on scientific evidence? Heinz Vatlin (2002) examined the research, consulted with specialists in the field, and found no evidence to support the 8 ⫻ 8 advice. In fact, studies suggested that such large amounts of water are not needed for healthy, sedentary adults and revealed that caffeinated drinks are indeed useful for hydration. The 8 ⫻ 8 recommendation is an example of practice that is not supported by research, or “evidence.” Such practices even creep into our professions. No doubt there are practices that rehabilitation professionals have adopted and accepted as fact that, although they are not as well-known as the 8 ⫻ 8 adage, are also ingrained in practice—despite the fact that they are not supported by evidence. Let’s look at an example: For decades, the recommended treatment for acute low back pain was bedrest, typically for 2 days with no movement other than toileting and eating. A Finnish study examined this recommendation in a well-designed, randomized controlled trial that compared 2 days of bedrest with back extension exercises and ordinary activity (Malmiraara et al, 1995). The study found the best results with ordinary activity. Subsequent research confirmed this finding, or at least found that staying active was as effective as bedrest for treating low back pain, and 4366_Ch01_001-020.indd 2 had obvious advantages associated with less disruption of daily life (Dahm, Brurberg, Jamtveat, & Hagen, 2010). Without the research evidence, the recommendation for bedrest may have been difficult to challenge; bedrest did eventually ameliorate low back pain, so clinical and client experience suggested a positive outcome. Only through testing of alternatives was the accepted standard challenged. Questioning what we do every day as health-care practitioners, and making clinical decisions grounded in science, is what evidence-based practice (EBP) is all about. However, the use of scientific evidence is limited; clinical decisions are made within the context of a clinician’s experience and an individual client’s situation. Any one profession will never have a suitable number of relevant studies with adequate reliability and validity to answer all practice questions. However, the process of science is a powerful self-correcting resource. With the accumulation of research, clinicians can continually update their practice knowledge and make better clinical decisions so that clients are more likely to achieve positive results. Evidence-based practitioners are reflective and able to articulate what is being done and why. In evidence-based practice, decisions are not based on hunches, “the way it has always been done,” or what is easiest or most expedient. Rather, in evidence-based practice, the therapist’s clinical decisions and instructions can be explained, along with their rationale; evidence-based practice is explicit by nature. This chapter provides an introduction to evidencebased practice. Topics such as sources of evidence, the research process, and levels of evidence are discussed so that the reader can understand the larger context in which evidence-based practice takes place. These topics are then explored in greater detail in subsequent chapters. This chapter focuses on the what, why, and how of evidence-based practice: What is evidenced-based practice? Why is evidence-based practice a “best practice”? How do practitioners integrate evidence into their practice? WHAT IS EVIDENCE-BASED PRACTICE? Evidence-based practice in rehabilitation stems from evidence-based medicine. David Sackett, a pioneer of evidence-based medicine, and his colleagues provided the following widely cited definition: “Evidence based medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996, p. 71). Evidence-based practice requires an active exchange between researchers and clinicians (Thomas, Saroyan, & Dauphinee, 2011). Researchers produce fi ndings with clinical relevance and disseminate those findings through presentations and publications. Clinicians then 28/10/16 2:30 pm CHAPTER 1 use this information in the context of their practice experience. The researchers’ findings may be consistent or inconsistent with the clinician’s experience and understanding of a particular practice question. Reflective practitioners capitalize on the tension that exists between research findings and clinical experience to expand their knowledge. In addition, from a client-centered practice perspective, the values and experience of the clients, caregivers, and family are essential considerations in the decision-making process. Thus, evidence-based practice is a multifaceted endeavor comprising three components (Fig. 1-1): 1. The external scientific evidence 2. The practitioner’s experience 3. Client/family situation and values Each source of evidence provides information for clinical decision-making. The best decisions occur when all three sources are considered. External Scientific Evidence External scientific evidence is a component of evidencebased practice that arises from research. Typically therapists obtain scientific evidence from research articles published in scientific journals. The scientific evidence provides clinicians with objective information that can be applied to clinical problems. In the development of this source of evidence, researchers use the scientific method to attempt to remove bias by designing a well-controlled study, objectively collecting data, and Practitioner experience External evidence Client’s situation and values ● Evidence-Based Practice 3 using sound statistical analysis to answer research questions. The steps of the scientific method include: 1. 2. 3. 4. 5. Asking a question Gathering information about that question Formulating a hypothesis Testing the hypothesis Examining and reporting the evidence From the Evidence 1-1 provides an example of the scientific method from a critically appraised paper. A critically appraised paper selects a published research study, critiques the study, and interprets the results for practitioners. In this paper, Dean (2012) summarizes a study by van de Port, Wevers, Lindeman, and Kwakkel (2012) to answer a question regarding the efficacy of a group circuit training intervention compared with individualized physical therapy for improving mobility after stroke. Of particular interest is the efficiency of using a group versus an individual approach. As a clinician, you might have a question about group versus individual treatment, and this external scientific evidence can help you answer your clinical question. The hypothesis, although not explicitly stated, is implied: The group circuit training will be as effective as individual physical therapy. The hypothesis is tested by comparing two groups (circuit training and individual physical therapy) and measuring the outcomes using several assessments. Although slight differences exist between the two groups at some time points, the overall conclusion was that the group circuit training was as effective as individual physical therapy. Although this example provides evidence from a single study that uses a strong randomized controlled trial design (described in Chapter 6), all studies have limitations. The results of a single study should never be accepted as proof that a particular intervention is effective. Science is speculative, and findings from research are not final. This concept speaks to another important characteristic of the scientific method: replication. The scientific method is a gradual process that is based on the accumulation of results. When multiple studies produce similar findings, as a practitioner you can have more confidence that the results are accurate or true. Later in this chapter, the hierarchical levels of scientific evidence and the limitations of this approach are presented. Subsequent chapters describe in greater detail the research process that is followed to create evidence to address different types of research questions. Practitioner Experience b d decision d i i made in Evidence-based collaboration with practitioner and client FIGURE 1-1 Components of evidence-based practice. Each source of evidence provides information for clinical decision-making. 4366_Ch01_001-020.indd 3 Initial models of evidence-based practice were criticized for ignoring the important contribution of practitioner experience in the clinical decision-making process. From early on, Sackett and colleagues (1996) have argued 28/10/16 2:30 pm FROM THE EVIDENCE 1-1 Critically Appraised Paper Dean, C. (2012). Group task-specific circuit training for patients discharged home after stroke may be as effective as individualized physiotherapy in improving mobility. Journal of Physiotherapy, 58(4), 269. doi:10.1016/S1836-9553(12)70129-7. Note A: The research question implies the hypothesis that is tested: Group circuit training is as effective as individualized PT for improving mobility after stroke. Abstract QUESTION: Does task-oriented circuit training improve mobility in patients with stroke compared with individualised physiotherapy? DESIGN: Randomised, controlled trial with concealed allocation and blinded outcome assessment. SETTING: Nine outpatient rehabilitation centres in the Netherlands. PARTICIPANTS: Patients with a stroke who had been discharged home and who could walk 10m without assistance were included. Cognitive deficits and inability to communicate were key exclusion criteria. Randomisation of 250 participants allocated 126 to task oriented circuit training and 124 to individualised physiotherapy. INTERVENTIONS: The task-oriented circuit training group trained for 90min twice-weekly for 12 weeks supervised by physiotherapists and sports trainers as they completed 8 mobility-related stations in groups of 2 to 8 participants. Individualised outpatient physiotherapy was designed to improve balance, physical conditioning, and walking. OUTCOME MEASURES: The primary outcome was the mobility domain of the stroke impact scale measured at 12 weeks and 24 weeks. The domain includes 9 questions about a patient's perceived mobility competence and is scored from 0 to 100 with higher scores indicating better mobility. Secondary outcome measures included other domains of the stroke impact scale, the Nottingham extended ADL scale, the falls efficacy scale, the hospital anxiety and depression scale, comfortable walking speed, 6-minute walk distance, and a stairs test. RESULTS: 242 participants completed the study. There were no differences in the mobility domain of the stroke impact scale between the groups at 12 weeks (mean difference (MD) -0.05 units, 95% CI -1.4 to 1.3 units) or 24 weeks (MD -0.6, 95% CI -1.8 to 0.5). Comfortable walking speed (MD 0.09m/s, 95% CI 0.04 to 0.13), 6-minute walk distance (MD 20m, 95% CI 35.3 to 34.7), and stairs test (MD -1.6s, 95% CI -2.9 to -0.3) improved a little more in the circuit training group than the control group at 12 weeks. The memory and thinking domain of the stroke impact scale (MD -1.6 units, 95% CI -3.0 to -0.2), and the leisure domain of the Nottingham extended ADL scale (MD -0.74, 95% CI -1.47 to -0.01) improved a little more in the control group than the circuit training group at 12 weeks. The groups did not differ significantly on the remaining secondary outcomes at 12 weeks or 24 weeks. CONCLUSION: In patients with mild to moderate stroke who have been discharged home, task-oriented circuit training completed in small groups was as effective as individual physiotherapy in improving mobility and may be a more efficient way of delivering therapy. Note B: Objective measurements provide data supporting the hypothesis that the group intervention is as effective as an individualized approach. FTE 1-1 Question 4366_Ch01_001-020.indd 4 How might this external scientific evidence influence your practice? 28/10/16 2:30 pm CHAPTER 1 against “cookbook” approaches and submit that best practices integrate scientific evidence with clinical expertise. Practice knowledge is essential when the scientific evidence is insufficient for making clinical decisions and translating research into real-world clinical settings (Palisano, 2010). Research will never keep up with clinical practice, nor can it answer all of the specific questions that therapists face every day, given the diversity of clients, the different settings in which therapists practice, and the pragmatic constraints of the real world. There may be studies to indicate that a particular approach is effective, but it is much less common to find evidence related to frequency and intensity, or how to apply an intervention to a complicated client with multiple comorbidities, as typically seen in practice. Practitioners will always need to base many of their decisions on expertise gathered through professional education, interaction with colleagues and mentors, and accumulation of knowledge from their own experience and practice. Practitioner expertise is enriched by reflection. The reflective practitioner takes the time to consider what he or she has done, how it turned out, and how to make things even better. Again, it is reflection that makes knowledge more explicit and easier to communicate to others. Reflection becomes even more explicit and methodical when therapists use program evaluation methods. Collecting data on your own clients and evaluating their responses to treatment will provide important information for enhancing the overall effectiveness of your services. An example of integrating research evidence and practice experience is illustrated by Fox et al’s work with the Lee Silverman Voice Treatment (LSVT) LOUD and BIG approach for individuals with Parkinson’s disease (Fox, Ebersbac, Ramig, & Sapir, 2012). Preliminary evidence indicates that individuals receiving LSVT LOUD and BIG treatment are more likely than individuals in control conditions to increase their vocal loudness and frequency variability and improve motor performance, including walking speed and coordination. Yet, when working with an individual client who has Parkinson’s disease, the practitioner still faces many questions: At what stage in the disease process is the intervention most effective? Is the intervention effective for clients who also experience depression or dementia? Is the intervention more or less effective for individuals receiving deep brain stimulation? The intervention is typically provided in 16 60-minute sessions over a 1-month period. Can a more or less intensive schedule be used for specific clients? What about the long-term effects of the treatment? Because Parkinson’s disease is progressive in nature, will there be maintenance issues? This is where clinical reasoning comes into play. You will use your practice experience to make decisions about whether or not to implement the approach with 4366_Ch01_001-020.indd 5 ● Evidence-Based Practice 5 a particular client and how the intervention should be implemented. If you do implement LSVT LOUD and BIG, you will reflect upon whether or not it is working and what modifications might be warranted. Client Situation and Values Interestingly, client-centered practice has moved into the forefront at the same time that evidence-based practice is gaining traction. Client-centered practice emphasizes client choice and an appreciation for the client’s expertise in his or her life situation. A client’s situation should always be considered in the treatment planning process. An intervention is unlikely to result in successful outcomes if a client cannot carry it out due to life circumstances. Some very intensive therapies may be ineffective if the individual does not have the financial resources, endurance/motivation, or social support necessary to carry them out. For example, a therapist should consider the issues involved when providing a single working mother of three with an intensive home program for her child with autism. Client preferences and values also play an important part in the decision-making process. For example, an athlete who is eager to return to his or her sport and is already accustomed to intensive training is more likely to respond favorably to a home exercise program than a client who views exercise as painful and tedious. Rarely is there a single option in the treatment planning process. When shared decision-making occurs between clients and health-care providers, clients increase their knowledge, are more confident in the intervention they are receiving, and are more likely to adhere to the recommended therapy (Stacey et al, 2011). Shared decision-making is a collaborative process in which the clinician shares information from research and clinical experience, and the client shares information about personal values and experiences. Different options are presented, with the goal of arriving at an agreement regarding treatment. From a client-centered practice perspective, the client is the ultimate decision maker, and the professional is a trusted advisor and facilitator. The accessibility of Internet resources has increased clients’ involvement in the treatment planning process. Today, clients are more likely to come to the practitioner with their own evidence-based searches (Ben-Sasson, 2011). The practitioner can help the client understand and interpret the evidence in light of the client’s own situation. In addition, the practitioner who is well versed in the research literature may be able to supplement the client’s search with further evidence on the topic of interest and help evaluate the sources the client has already located. Chapter 11 provides additional information about the process of integrating practitioner experience, client values, and research evidence. 28/10/16 2:30 pm 6 CHAPTER 1 ● Evidence-Based Practice EVIDENCE IN THE REAL WORLD Client Inclusion in Decision-Making The following personal example illustrates the inclusion (or lack of inclusion) of the client in the decision-making process. When my daughter was in elementary school, she broke her arm in the early part of the summer. Our typical summer activity of spending time at the pool was disturbed. On our scheduled follow-up visit, when it was our understanding that the cast would be removed, we showed up at the appointment wearing our bathing suits underneath our clothes, ready to hit the pool as soon as the appointment was over. However, after the cast was removed and an x-ray was taken, the orthopedic surgeon explained to us that, although the bone was healing well, a small line where the break had occurred indicated that the bone was vulnerable to refracturing if my daughter were to fall again. The orthopedic surgeon was ready to replace the cast and told us that my daughter should wear this cast for several more weeks to strengthen the bone. I have every confidence that this decision was based on the research evidence and the orthopedic surgeon’s practice experience. He was interested in keeping my daughter from reinjuring herself. However, his recommendation was not consistent with our values at that time. I requested that instead of a cast, we would like to have a splint that my daughter would wear when she was not swimming. I explained that we understood the risk, but were willing to take said risk after weighing the pros and cons of our personal situation. The orthopedic surgeon complied with the request, yet made it clear that he thought we were making the wrong decision and included this in his progress note on our visit. As a health-care practitioner, it is easy to appreciate the orthopedic surgeon’s dilemma. Furthermore, it is natural to want our expertise to be valued. In this case, the orthopedic surgeon may have felt that his expertise was being discounted—but the family situation and my opinion as a parent were important as well. If the health-care professional (in this case the orthopedic surgeon) had approached the situation from a shared decision-making perspective, the values of the child and parent would have been determined and considered from the beginning. Best practice occurs when the decision-making process is collaborative from the outset and the perspectives of all parties are appreciated. B. EXERCISE 1-1 Strategizing When Client Values and Preferences Conflict With the External Research Evidence and/or the Practitioner’s Experience (LO1) The “Evidence in the Real World” example describes an experience in which there was conflict between the mother’s preference and the research evidence and the orthopedic surgeon’s experience. There will likely be situations in your own practice when similar conflicts emerge. QUESTION 1. Identify three strategies that you might use to address a conflict such as this while still honoring the client’s values and autonomy to make decisions. A. 4366_Ch01_001-020.indd 6 C. WHY EVIDENCE-BASED PRACTICE? In the past, practitioners were comfortable operating exclusively from experience and expert opinion, but best practice in today’s health-care environment requires the implementation of evidence-based practice that incorporates the research evidence and values of the client. It is expected and in many instances required. The official documents of professional organizations speak to the importance of evidence-based practice. For example, the Occupational Therapy Code of Ethics and Ethics Standards (American Occupational Therapy Association [AOTA], 2010) includes this statement in the section addressing beneficence: “use to the extent possible, evaluation, planning, and intervention techniques and equipment that 28/10/16 2:30 pm CHAPTER 1 are evidence-based and within the recognized scope of occupational therapy practice.” The Position Statement from the American Speech-Language-Hearing Association’s (ASHA’s) Committee on Evidence-Based Practice includes the following: “It is the position of the American Speech-Language-Hearing Association that audiologists and speech-language pathologists incorporate the principles of evidence-based practice in their clinical decisionmaking to provide high quality care” (ASHA, 2005). The World Confederation of Physical Therapy’s policy statement on evidence-based practice maintains that “physical therapists have a responsibility to use evidence to inform practice and ensure that the management of patients/ clients, carers and communities is based on the best available evidence” (WCPT, 2011). Clinical decisions grounded in evidence carry more weight and influence when they are supported with appropriate evidence. Imagine participating in a team meeting and being asked to justify your use of mirror therapy for a client recovering from stroke. You respond by telling the team that not only is your client responding favorably to the treatment, but a Cochrane review of 14 studies also found that mirror therapy was effective for reducing pain and improving upper extremity motor function and activities of daily living (Thieme, Merholz, Pohl, Behrens, & Cohle, 2012). Use of evidence can increase the confidence of both your colleagues and your client that the intervention is valid. Likewise, payers are more likely to reimburse your services if they are evidence based. Evidence-based practice also facilitates communication with colleagues, agencies, and clients. As clinical decision-making becomes more explicit, the practitioner is able to support choices with the source(s) of evidence that were used and explain those choices to other practitioners, clients, and family members. Ultimately, the most important reason to implement evidence-based practice is that it improves the quality of the services you provide. An intervention decision that is justified by scientific evidence, grounded in clinical expertise, and valued by the client will, in the end, be more likely to result in positive outcomes than a decision based on habits or expediency. THE PROCESS OF EVIDENCE-BASED PRACTICE The process of evidence-based practice mirrors the steps of the scientific method (Fig. 1-2). It is a cyclical process that includes the following steps: 1. 2. 3. 4. 5. Formulate a question based on a clinical problem. Identify the relevant evidence. Evaluate the evidence. Implement useful findings. Evaluate the outcomes. 4366_Ch01_001-020.indd 7 ● Evidence-Based Practice 7 Formulate a question based on a clinical problem Evaluate the outcomes Identify the relevant evidence Implement useful findings Evaluate the evidence FIGURE 1-2 The cycle of evidence-based practice. Formulate a Question Based on a Clinical Problem The first step in evidence-based practice involves identification of a clinical problem and formulation of a question to narrow the focus. Once the problem has been identified, a question is formulated. The formulation of a specific evidence-based question is important because it provides the parameters for the next step of searching the literature. Questions can be formulated to address several areas of practice. The most common types of questions address the following clinical concerns: (1) efficacy of an intervention, (2) usefulness of an assessment, (3) description of a condition, (4) prediction of an outcome, and (5) lived experience of a client. Each type of question will lead the practitioner to different types of research. The process of writing a research question is discussed in more detail later in this chapter. Identify the Relevant Evidence After the question has been formulated, the next step is to find relevant evidence to help answer it. Evidence can include information from the research literature, practice knowledge, and client experience and values. Searching the literature for evidence takes skill and practice on the part of practitioners and students. Development of this skill is the focus of Chapter 2. However, as mentioned previously, the research evidence is only one component of evidence-based practice. Therapists should always consider research evidence in light of their previous experience, as well as information gathered about the client and his or her situation. Evaluate the Evidence Once evidence is found, evidence-based practitioners must critically appraise that evidence. The design of the 28/10/16 2:30 pm 8 CHAPTER 1 ● Evidence-Based Practice study, size of the sample, outcome measures used, and many other factors all play a role in determining the strength of a particular study and the validity of its conclusions. In addition, practitioners need to evaluate the applicability of a particular study to their practice situation and client life circumstances. Much of this textbook focuses on evaluating research, and additional information is presented in Chapters 5 through 10. Implement Useful Findings Clinical decision-making may focus on an intervention or assessment approach, use evidence to better understand a diagnosis or an individual’s experience, and/or predict an outcome. Once the evidence has been collected, screened, and presented to the client, the practitioner and client use a collaborative approach and, through shared decisionmaking, apply the gathered evidence to practice. Chapter 11 provides more information on clinical decision-making and presenting evidence to clients. Evaluate the Outcomes The process of evidence-based practice is recursive; that is, the process draws upon itself. When a practitioner evaluates the outcomes of implementing evidence-based practice, the evaluation process contributes to practice knowledge. The practitioner determines whether the evidence-based practice resulted in the intended outcomes. For example, did the intervention help the client achieve established goals? Did the assessment provide the therapist with the desired information? Was prediction from the research evidence consistent with the results of clients seen by the therapist? Did the client’s lived experience resonate with the research evidence? Evidence-based practitioners reflect on the experience as well as gather information directly from their clients to evaluate outcomes. The step of evaluating the outcomes helps the practitioner to make clinical decisions in the future and ask new questions to begin the evidence-based process over again. EVIDENCE IN THE REAL WORLD Steps in Evidence-Based Practice The following example shows all of the steps in the process of evidence-based practice. You are working with an 8-year-old boy, Sam, who has a diagnosis of autism. During therapy, Sam’s parents begin discussing issues related to sleep. Sam frequently awakens in the night and then, when encouraged to go back to sleep, typically becomes very upset, sometimes throwing temper tantrums. You explain to the parents that you will help them with this concern, but first you would like to examine the evidence. First, you formulate the question: “Which interventions are most effective for reducing sleep problems (specifically nighttime awakening) in children with autism?” Second, you conduct a search of the literature and identify relevant evidence in the form of a systematic review by Vriend, Corkum, Moon, and Smith in 2011. (A systematic review provides a summary of many studies on the same topic.) You talk with the parents about what approaches they have tried in the past. The parents explain that, when Sam awakens, one of the parents typically stays in his room until he eventually falls asleep again. They are unhappy with this tactic, but have not found a technique that works any better. The Vriend et al (2011) systematic review is evaluated. Although a systematic review is considered a high level of evidence, this review finds that the studies addressing sleep issues for children with autism are limited. The review discusses the following approaches: extinction, scheduled awakening, faded bedtime, stimulus fading for co-sleeping, and chronotherapy. The approach with the most research support for addressing nighttime awakenings is extinction. Standard extinction (i.e., ignoring all negative behaviors) was examined in three studies and resulted in a decrease in night awakening that was maintained over time. You work to put the findings in an appropriate language for the parents to understand and present the evidence to them. Sam’s parents decide to implement these useful findings and try standard extinction. This technique can be challenging for parents to implement because, in the short term, it is likely to result in an increase in tantrums and agitation. However, the parents are desperate and willing to give it a try. You provide them with basic instruction, and together you develop a plan. The parents decide to start the intervention on a long weekend so that they will have time to adjust to the new routine before returning to work. After the initial weekend trial, you talk to the parents about the extinction process and evaluate the outcomes for Sam. They report that, although it was initially very difficult, after 1 week they are already seeing a significant reduction in their son’s night awakenings and an improvement in his ability to self-settle and get back to sleep. 4366_Ch01_001-020.indd 8 28/10/16 2:30 pm CHAPTER 1 ● Evidence-Based Practice 9 WRITING AN EVIDENCE-BASED QUESTION 3. Description of a condition 4. Prediction of an outcome 5. Lived experience of a client This section helps you begin to develop the skills of an evidence-based practitioner by learning to write an evidence-based question. As mentioned previously, there are different types of questions; the appropriate type depends on the information you are seeking. The five types of questions relevant to this discussion include: Table 1-1 provides examples of questions for each category, as well as the research designs that correspond to the question type. Subsequent chapters describe the research designs in much greater detail. 1. Efficacy of an intervention 2. Usefulness of an assessment Questions related to the efficacy of an intervention are intended to help therapists make clinical decisions about Questions on Efficacy of an Intervention TABLE 1-1 Examples of Different Types of Evidence-Based Clinical Questions Type of Question Examples Common Designs/ Research Methods Efficacy of an intervention • In individuals with head and neck cancer, what is the efficacy of swallowing exercises versus usual care for preventing swallowing problems during chemotherapy? • In infants, what is the efficacy of swaddling (versus no swaddling) for reducing crying? • For wheelchair users, what is the best cushion to prevent pressure sores? • Randomized controlled trials • Nonrandomized controlled trials • Pretest/posttest without a control group Usefulness of an assessment • What is the best assessment for measuring improvement in ADL function? • How reliable is goniometry for individuals with severe burns? • What methods increase the validity of health-related quality of life assessment? • • • • Description of a condition • What motor problems are associated with cerebral palsy? • What are the gender differences in sexual satisfaction issues for individuals with spinal cord injury? • Incidence and prevalence studies • Group comparisons (of existing groups) • Surveys and interviews Prediction of an outcome • What predictors are associated with successful return to employment for individuals with back injuries? • What childhood conditions are related to stuttering in children? • Correlational and regression studies • Cohort studies Lived experience of a client • What is the impact of multiple sclerosis on parenting? • How do athletes deal with career-ending injuries? • • • • 4366_Ch01_001-020.indd 9 Psychometric methods Reliability studies Validity studies Sensitivity and specificity studies Qualitative studies Ethnography Phenomenology Narrative 28/10/16 2:30 pm 10 CHAPTER 1 ● Evidence-Based Practice implementing interventions. Efficacy questions are often structured using the PICO format: P = population I = intervention C = comparison or control condition O = outcome The following is an example of a PICO question: “For individuals with schizophrenia (population), is supported employment (intervention) more effective than transitional employment (comparison) for work placement, retention, and income (outcomes)?” The order of the wording is less important than inclusion of all four components. PICO questions are useful when you are familiar with the available approaches and have specific questions about a particular approach. However, it may be necessary to start with a more general question that explores intervention options. For example, you might ask, “What approach/es is/are most effective for increasing adherence to home exercise programs?” Searching for answers to this question may involve weeding through a substantial amount of literature; however, identification of the possible interventions is your best starting place. In other situations, you may know a great deal about an intervention and want to ask a more specific or restricted efficacy question, such as, “How does depression affect outcomes for individuals receiving low vision rehabilitation?” or “What are the costs associated with implementing a high-intensity aphasia clinic on a stroke unit?” Research Designs for Efficacy Questions and Levels of Evidence Evidence-based practitioners need a fundamental understanding of which research designs provide the strongest evidence. An introduction to designs used to answer efficacy questions is provided here so that you can begin to make basic distinctions. Designs used to answer efficacy questions are discussed in greater detail in Chapter 6. The concept of levels of evidence establishes a hierarchical system used to evaluate the strength of the evidence for research that is designed to answer efficacy questions. Determining if a particular approach is effective implies a cause-and-effect relationship: that is, the intervention resulted in or caused a particular outcome. Certain research designs are better suited for determining cause and effect. Hence, knowing that a researcher used an appropriate type of study design means the practitioner can have more confidence in the results. There is no universally accepted hierarchy of levels of evidence; several exist in the literature. However, all hierarchies are based on principles reflecting strong internal validity. Controlled studies with random assignment result in the highest level of evidence for a single study. Table 1-2 gives examples of evidence hierarchies and their references. For a study to be deemed a randomized controlled trial, three conditions must be met: 1. The study must have at least two groups, an experimental and a control or comparison condition. 2. The participants in the study must be randomly assigned to the conditions. 3. An intervention (which serves as the manipulation) must be applied to the experimental group. Stronger than a single study is a systematic review, which identifies, appraises, and analyzes (synthesizes) the results of multiple randomized controlled trials on a TABLE 1-2 Examples of Evidence Hierarchies and Supporting References Hierarchy Reference Number of Levels Oxford Centre for Evidence Based Medicine 2011 Levels of Evidence OCEBM Levels of Evidence Working Group (2011) 5 Sackett and colleagues Sackett, Straus, and Richardson (2000) 10 AOTA hierarchy (adaptation of Sackett) Arbesman, Scheer, and Lieverman (2008) 5 Research Pyramid for Experimental, Outcome, and Qualitative Evidence Tomlin and Borgetto (2011) 4* ASHA hierarchy ASHA (n.d.) 6 *Includes four levels each for experimental, outcome, and qualitative evidence. 4366_Ch01_001-020.indd 10 28/10/16 2:30 pm CHAPTER 1 single topic using a rigorous set of guidelines. Other factors taken into consideration in some level-ofevidence hierarchies are issues such as sample size, confidence intervals, and blinding (these topics are addressed in Chapters 4, 5, and 6). As a result, different hierarchies include varying numbers of levels. Table 1-3 outlines an example of a standard levels-of-evidence hierarchy that can be used for the purpose of evaluating studies that examine the efficacy of an intervention. Because different hierarchies exist, it is important to recognize that a Level II as described in this table may differ from a Level II in another hierarchy. In the hierarchy shown in Table 1-3, the highest level of evidence is a systematic review of randomized controlled trials. Because a systematic review involves analysis of an accumulation of studies, this levelof-evidence hierarchy supports the value of replication. Although this is the highest level of evidence, it does not mean that, just because a systematic review has been conducted, the practice is supported. At all levels, the research may or may not result in statistically significant findings to support the conclusion that the intervention caused a positive outcome. In other words, it is possible that a systematic review may find strong evidence that the intervention of interest is not effective. Also, it is important to consider the studies included in the review. A systematic review may not provide the highest level of evidence if the studies in the review are not randomized controlled trials. If randomized controlled trials have not been conducted in the area under study and therefore could not be included in the review, the systematic review would not meet the criteria for Level I evidence. Systematic reviews are described in more detail in Chapter 10. TABLE 1-3 Example of Standard Levels-ofEvidence Hierarchy Level Description I Systematic review of randomized controlled trials II Randomized controlled trial III Nonrandomized controlled trial IV One group trial with pretest and posttest V Case reports and expert opinion 4366_Ch01_001-020.indd 11 ● Evidence-Based Practice 11 Level II evidence comes from randomized controlled trials. The strength of the randomized controlled trial lies in its ability to indicate that the intervention, rather than another influence or factor, caused the outcome. This quality of a study is also known as internal validity. Factors that contribute to internal validity include the use of a control group and random assignment. Random assignment to a control group eliminates many biases that could confound the findings of a study. This is discussed in much greater detail in Chapter 4. A Level III study is similar to a Level II study, with the exception that the assignment to groups is not random. This is a fairly common occurrence in studies reported in the research literature. There may be pragmatic or ethical reasons for avoiding random assignment. A typical example of nonrandomized group comparisons is a study in which, in one setting, individuals receive the intervention and in the other setting they do not. In another design, individuals in one setting may receive one intervention, which is compared with a different intervention provided in an alternate setting. Similarly, in a school setting, one classroom may receive an intervention, while another classroom receives standard instruction. Problems with the validity of conclusions arise in such situations. For example, the teacher in one classroom may be more effective than the teacher in the other classroom; therefore, the results of the study are due not to the intervention, but to the skills of the teacher. In the situation of different hospital settings, one hospital may tend to have patients with more severe conditions who are thus less responsive to treatment, which introduces a different type of bias. Sometimes there is confusion about the term control, because a control group is typically considered to be a group that receives no intervention. However, individuals receiving an alternate intervention or a standard intervention often serve as the control. In a nonrandomized group comparison, the control group provides the primary support for the internal validity of the design. Studies yielding evidence at Level IV typically have no control group. These studies are often referred to as pre-experimental. In these types of studies, individuals serve as their own control through the use of a pretest and a posttest. A pretest is administered, the intervention is applied, and then the posttest is administered. The conclusion that differences from pretest to posttest are due to the intervention must be made with great caution, as is discussed further in Chapter 4. In the vast majority of Level II and III studies, pretests and posttests are also used. Because it lacks a control group, a Level IV study is much less robust for drawing conclusions about cause-and-effect relationships. The improvements that occur from pretest to 28/10/16 2:30 pm 12 CHAPTER 1 ● Evidence-Based Practice posttest could be due to general maturation or healing rather than the intervention; in other words, the client would get better if left alone. Another reason for improvements could be due to a placebo effect, whereby the individual gets better because he or she expects to get better and not because of actions specific to the intervention. Study designs that use pretests and posttests without a control group are often used as initial pilot studies so that minimal resources are used to identify whether an approach has potential benefits. If the results are promising, future research will typically use stronger designs, such as a randomized controlled trial. From the Evidence 1-2 provides an example of a Level IV study. Level V evidence includes case reports and expert opinion. Level V studies do not use statistical analyses to draw conclusions. This level of evidence is based on a single case study or expert opinion. FROM THE EVIDENCE 1-2 Example of a Level IV Study Case-Smith, J., Holland, T., Lane, A., & White, S. (2012). Effect of a coteaching handwriting program for first graders: One-group pretest–posttest design. American Journal of Occupational Therapy, 66(4), 396–405. doi:10.5014/ajot. 2012.004333. Note A: This study used a pretest-posttest design without a control group. Although children improved, without a comparison group one cannot know if the children would have improved without an intervention. We examined the effects of a co-taught handwriting and writing program on first-grade students grouped by low, average, and high baseline legibility. The program's aim was to increase legibility, handwriting speed, writing fluency, and written expression in students with diverse learning needs. Thirty-six first-grade students in two classrooms participated in a 12-wk handwriting and writing program co-taught by teachers and an occupational therapist. Students were assessed at pretest, posttest, and 6-mo follow-up using the Evaluation Tool of Children's Handwriting-Manuscript (ETCH-M) and the Woodcock-Johnson Writing Fluency and Writing Samples tests. Students made large gains in ETCH-M legibility (η² = .74), speed (η²s = .52-.65), Writing Fluency (η² = .58), and Writing Samples (η² = .59). Students with initially low legibility improved most in legibility; progress on the other tests was similar across low-, average-, and high-performing groups. This program appeared to benefit first-grade students with diverse learning needs and to increase handwriting legibility and speed and writing fluency. FTE 1-2 Question 4366_Ch01_001-020.indd 12 Which conditions that must be present for Level II evidence are lacking in this study? 28/10/16 2:30 pm CHAPTER 1 EXERCISE 1-2 Using the Levels-of-Evidence Hierarchy to Evaluate Evidence (LO2) ● Evidence-Based Practice 13 training or usual care. The number of falls for the two groups is compared for 1 month before the intervention and 1 month after the intervention. Level of evidence: QUESTIONS For the following examples, identify the level of evidence that best matches the study description for the question, “For individuals in long-term care facilities, what is the efficacy of strength and balance training when compared with usual care for reducing falls?” 1. Forty individuals in a long-term care facility receive strength and balance training to reduce falls. The number of falls for a 1-month period before the intervention is compared with the number of falls for a 1-month period after the intervention. Level of evidence: 2. Twenty individuals on one wing of a long-term care facility receive strength and balance training, while 20 individuals on another wing of the same long-term care facility receive usual care. The number of falls for the two groups is compared for 1 month before the intervention and 1 month after the intervention. Level of evidence: Questions for Usefulness of an Assessment When practitioners have questions about assessments and assessment tools, psychometric methods are used. The primary focus of psychometric methods is to examine the reliability and validity of specific assessment instruments. Reliability addresses the consistency of a measure; that is, dependability of scores, agreement of scoring for different testers, and stability of scoring across different forms of a measure. For example, a practitioner may want to know which measure of muscle functioning is the most consistent when scored by different therapists. Validity is the ability of a measure to assess what it is intended to measure. For example, a practitioner may be interested in identifying which measure most accurately assesses speech intelligibility. Questions about the usefulness of assessments guide practitioners in finding evidence to determine if the measures they are currently using have sufficient reliability and validity, and help practitioners identify the best measure for a particular client need. Chapter 7 discusses assessment studies in greater detail. Research Designs Used in Assessment Studies 3. The results of three randomized controlled trials examining the efficacy of strength and balance training to reduce falls in long-term care facilities are analyzed and synthesized. Level of evidence: 4. A resident in a long-term care facility receives individualized strength and balance training. The number of falls for this individual is determined at different time points and presented in a report. Level of evidence: 5. Individuals in three long-term care facilities are randomly assigned to receive either strength and balance 4366_Ch01_001-020.indd 13 In reliability studies, the emphasis is on determining the consistency of the scores yielded by the assessment. In reliability studies, the measure may be administered more than one time to the same individual, or different evaluators may evaluate the same individual. A reliable measure will produce comparable scores across these different conditions. In validity studies, the measure of interest is often compared with other similar measures to determine if it measures the same construct. Validity studies may also use the measure with different populations to determine if it distinguishes populations according to the theoretical basis of the measure. For example, a measure of depression should result in higher scores for individuals with a diagnosis of depression when compared with scores of individuals in the general population. Another important consideration in the selection of a measure for use in a clinical setting is sensitivity and specificity. Sensitivity of a test refers to the proportion of the individuals who are accurately identified as possessing the condition of interest. Specificity is the proportion 28/10/16 2:30 pm 14 CHAPTER 1 ● Evidence-Based Practice of individuals who are correctly identified as not having the condition. It is possible for a measure to be sensitive but not specific, and vice versa. For example, a measure may correctly identify all of the individuals with a balance problem (highly sensitive), but misidentify individuals without balance problems as having a problem (not very specific). Ideally, measures should be both sensitive and specific. Another feature of measures that is important in gauging the efficacy of an intervention is sensitivity to change. If a measure lacks the precision necessary to detect a change, it will not be useful as an outcome measure. A validity study provides an example of how practitioners can use evidence to make decisions about what assessment to use in practice. For example, a study comparing the shortened Fugl-Meyer Assessment with the streamlined Wolf Motor Function Test for individuals with stroke found that the Fugl-Meyer Assessment was more sensitive to changes in rehabilitation and a better predictor of outcomes (Fu et al, 2012). The results from a reliability study show how practitioners can use evidence to make decisions about the method of administration of a measure. Researchers compared direct observation with an interview of parents to gather information for the WeeFIM (Sperle, Ottenbacher, Braun, Lane, & Nochajski, 1997). The results indicated that the scores were highly correlated for the two methods of administration, suggesting that parental interviews are comparable to direct observation. The clinical application one can draw from this study is that, in situations in which the therapist is unable to directly observe the child, this study provides support for the practice of collecting information from the parent to score the WeeFIM. Questions for Description of a Condition As a practitioner, you will often have questions about the conditions that you commonly see, or you may encounter a client with an atypical condition that you know very little about. There is a great deal of research evidence available to answer questions about different health-care conditions. Perhaps you notice that many of your clients with chronic obstructive pulmonary disease (COPD) also experience a great deal of anxiety, so you have a question about the comorbidity of COPD and anxiety disorders. Perhaps you are interested in gender differences as they relate to symptoms in attention deficit disorder. By gathering evidence from descriptive questions, you can better understand the people you treat. Chapter 8 discusses descriptive studies in greater detail. Research Designs Used in Descriptive Studies Research designs intended to assist in answering descriptive questions do not involve the manipulation of 4366_Ch01_001-020.indd 14 variables, as in efficacy studies. Instead, the phenomena are depicted as they occur. Hence, these designs use observational methods. One type of descriptive research that is common in health care is the prevalence and incidence study. Prevalence is the proportion of individuals within a population who have a particular condition, whereas incidence is the risk of developing a condition within a period of time. For example, the prevalence of Alzheimer’s disease in people aged 65 and older living in the United States is 13% (Alzheimer’s Association, 2012). The same report indicates that the incidence for a lifetime risk of developing Alzheimer’s disease is 17.2% for women and 9.1% for men. The difference is likely attributable to the longer life span for women. Rehabilitation practitioners are often interested in incidence and prevalence statistics that occur within a particular diagnostic group. For example, what is the incidence of pressure sores in wheelchair users? What is the prevalence of swallowing disorders among premature infants? Observational methods may also be used to compare existing groups of individuals to describe differences, rather than assigning individuals to groups. For example, cognitive assessments may be administered to describe differences in cognition for individuals with and without schizophrenia; or the social skills of children with and without autism may be compared. This type of study allows researchers to describe differences and better understand how individuals with a particular condition differ from individuals without the condition. Survey methods are also used to answer descriptive questions. With surveys, the participating individuals themselves (respondents) are asked descriptive questions. One of the advantages of survey methods is that many questions can be asked, and a significant amount of data can be collected in a single session. From the Evidence 1-3 is based on a survey of individuals with multiple sclerosis. It illustrates the rates and levels of depression among a large sample of individuals with multiple sclerosis. Questions for Prediction of an Outcome With predictive questions, associations are made between different factors. Some predictive questions are framed in terms of a prognosis. In other words, what factors contribute to the prognosis, response, or outcome in a particular condition? Chapter 8 discusses predictive studies in greater detail. Research Designs Used in Predictive Studies Predictive questions are similar to descriptive questions in terms of the methods used. Observational or survey data are collected to examine relationships or predictors. As in descriptive studies, individuals are not randomly assigned to groups, but instead are studied in terms of 28/10/16 2:30 pm CHAPTER 1 ● Evidence-Based Practice 15 FROM THE EVIDENCE 1-3 Research Using a Survey Jones, K. H., Ford, D. V., Jones, P. A., John, A., Middleton, R. M., Lockhart-Jones, H., Osborne, L. A., & Noble, J. G. (2012). A large-scale study of anxiety and depression in people with multiple sclerosis: A survey via the web portal of the UK MS Register. PLoS One, 7(7), Epub. doi:10.1371/journal.pone.0041910. Depression This pie chart from a large, descriptive study of 4,178 individuals with multiple sclerosis used survey research to identify levels of depression. This study’s finding of high rates of depression among people with multiple sclerosis is important information for practitioners who work with this population. 6.3% Normal Mild depression 34.6% 32.7% Moderate depression Severe depression 26.4% FTE 1-3 Question Assume that you are a health professional treating someone with multiple sclerosis. Based on the results of this study, how likely would it be that the individual would be experiencing some level of depression? the groupings or characteristics that naturally occur. Data may be collected at a single point in time (crosssectional research) or over a period of time (longitudinal research). For example, in a cross-sectional study, Hayton and colleagues (2013) found that the predictors of adherence and attendance for pulmonary rehabilitation included smoking status (smokers were less likely to attend), social support (greater support was associated with better attendance), and severity of disease (more severity was associated with lower attendance). In assessing the quality of predictive studies, sample size is an important consideration. A larger sample will be more representative of the population of interest, and the findings will be more stable and consistent. With a small sample, there is a greater likelihood of bias or that outliers will influence the results. Predictive studies find associations, but their designs make it more difficult to imply causation. “Correlation does not equal causation” is an important axiom of the evidence-based 4366_Ch01_001-020.indd 15 practitioner that will be discussed in greater detail in subsequent chapters. Questions About the Client’s Lived Experience The evidence-based questions and research designs described thus far heavily emphasize objective data collection, analysis, and interpretation. However, numerical data do not tell the full story, because each individual’s experience is unique and subjective. Questions about the lived experience provide practitioners with evidence from the client’s perspective. These questions tend to be more open-ended and exploratory, such as, “What is the meaning of recovery for people with serious mental illness?” and “How do caregivers of individuals with dementia describe their experience of managing difficult behaviors?” More information about lived experience studies is provided in Chapter 9. 28/10/16 2:30 pm 16 CHAPTER 1 ● Evidence-Based Practice Research Designs Addressing the Client’s Lived Experience Questions about lived experience are answered using qualitative methods. Based on a different paradigm than quantitative research, qualitative research is concerned with meaning and explanation. The individual is appreciated for his or her understanding of a particular phenomenon. Instead of using numbers and statistics, qualitative research is typically presented in terms of themes that emerge during the course of the research; in reports of such studies, these themes are exemplified with quotations. The purpose of qualitative research is to uncover new ideas or develop insights into a phenomenon. In qualitative research, participants are interviewed extensively and/or observed extensively in natural contexts. Conditions are not manipulated or contrived. Norman and colleagues (2010) used qualitative methods to identify questions that individuals with spinal cord injury have related to experiences of pain. Extensive interviews indicated a general theme of dissatisfaction with the information participants had received about pain from health-care professionals. The following quote was one of many used to illustrate the findings: “I should have been told that I could be in pain. . . . I kind of wished I had known that because then I would have been better prepared mentally” (p. 119). EXERCISE 1-3 2. Usefulness of an assessment question (reliability, validity, sensitivity, or specificity of an assessment for people with strokes): 3. Description of a condition question (consider questions that will reveal more about people with strokes): 4. Prediction of an outcome question (these questions generally contain the words relationship, association, or predictors of an outcome): 5. Lived experience of a client question (focus on understanding and explaining from the perspective of the client): Writing Evidence-Based Clinical Questions (LO3) QUESTIONS Imagine that you are a practitioner at a rehabilitation hospital with a caseload that includes many individuals who have experienced a stroke. Consider the questions you might have and write one research question for each of the types of questions discussed in this chapter. Use the PICO format when writing the efficacy question. Remember that these questions are used for searching the literature, so think about wording and content that would make your question searchable. A question that is too vague or open-ended will be difficult to search, whereas specific questions are typically easier to search. For example, it would be very difficult to answer the following question with a search of the research evidence: “What are the functional impairments associated with neurological conditions?” 1. Efficacy of an intervention question (using PICO format to ask a question related to interventions for stroke): 4366_Ch01_001-020.indd 16 CRITICAL THINKING QUESTIONS 1. Identify at least three reasons why practitioner experience is an important component of evidence-based practice. 2. How might you facilitate the shared decision-making process when the client’s preferences conflict with the external evidence? 28/10/16 2:30 pm CHAPTER 1 3. Why is a randomized controlled trial not an appropriate design for many clinical questions? 4. Identify at least three advantages that the evidence-based practitioner has over a practitioner who does not use evidence in practice. ANSWERS EXERCISE 1-1 1. There is no single answer to this exercise, but some strategies you might consider include: • Be sure to ask clients about their preferences and values before making any clinical decisions. • Familiarize yourself with possible intervention options and avoid entering a treatment planning situation without consulting with the client first. • Clearly present information about the external research evidence and your practitioner experience to the client, and explain your rationale for recommending a particular intervention approach. • Honor the client’s opinion regardless of the decision and support the client in that choice. • If the client’s choice presents a significant safety concern, explain why you cannot ethically honor the client’s choice. EXERCISE 1-2 1. IV. In this study, there is a single group (no comparison) that is assessed before and after an intervention. 2. III. A comparison of two groups is made, so this study includes a control condition, but there is no randomization in the assignments to the groups. 3. I. This is an example of a systematic review, in which the results of three randomized controlled trials are compared. 4. V. A case study describes an intervention and outcome for a single individual only; thus, it is considered a lower level of evidence. 5. II. The highest level of evidence for a single study is a randomized controlled trial. 4366_Ch01_001-020.indd 17 ● Evidence-Based Practice 17 EXERCISE 1-3 There is no single answer to this exercise, but a poor example and a good example are provided for each of the types of questions. Try to judge your questions against these examples. 1. Efficacy of an Intervention Question Poor example: “Is constraint-induced motor therapy effective?” This question does not include all of the components of the PICO question: The condition, comparison, and outcome are missing. If you were to search the evidence on constraint-induced therapy, you would find many studies, but much of your search would be irrelevant. Good example: “Is constraint-induced motor therapy more effective than conventional upper-extremity rehabilitation for improving fine and gross motor skills in people with strokes?” Now your question includes all of the components of a PICO question and will lead you to more relevant studies. 2. Usefulness of an Assessment Question Poor example: “What assessments should I be using when working with people with strokes?” Stroke is a multifaceted condition and the number of available assessments is considerable. A better question would focus on an assessment you are already using or considering using, or identifying an assessment for a particular purpose. Good example: “What is the reliability and validity associated with the Wolf Motor Function Test?” This question will help you gather data on the usefulness of a specific test used in stroke rehabilitation. 3. Description of a Condition Question Poor example: “What are the symptoms of stroke?” This is the sort of question that is answered in a textbook, which combines the writer’s clinical expertise with the evidence. However, answering this question with a search of the evidence will be very difficult. Typically researchers will focus on a narrower aspect of the condition. Good example: “What is the prevalence of left-sided neglect in individuals with a right-sided cerebrovascular accident (CVA)?” This question will lead you down a more specific path of research. Once the question is answered, you will be more prepared to deal with the client who has had a right-sided CVA. 4. Prediction of an Outcome Question Poor example: “How are placement decisions made after acute stroke rehabilitation?” This question is vague and does not 28/10/16 2:30 pm 18 CHAPTER 1 ● Evidence-Based Practice include the common terms of this type of question: associated, related, and predictive. Experienced clinicians can describe the decision-making process, but this question does not help you predict an outcome. Good example: “What motor, cognitive, and psychological conditions after acute stroke rehabilitation are most associated with inability to return to independent living?” Research evidence can answer this question and thereby help you inform your clients and their families about their prognosis. 5. Question About the Client’s Lived Experience Poor example: “What percentage of individuals with stroke are satisfied with their rehabilitation experience?” Although this may be an answerable question, it is not a qualitative question because it deals with numbers. Qualitative questions are answered narratively, typically using themes derived or drawn from quotes and observations. Good example: “How do individuals with stroke describe their rehabilitation experience?” In the case of the qualitative question, sometimes vague is better. When obtaining information from the individual’s perspective, it is more effective to let the individual do the talking and remain open to whatever themes emerge. This approach also captures and reveals the diversity of experiences as well as commonalities. FROM THE EVIDENCE 1-1 This study provides you as a practitioner with external evidence that group treatment may be as effective as individual treatment for improving mobility after stroke. FROM THE EVIDENCE 1-2 The study does not have at least two groups; therefore, because there is only one group, the participants cannot be randomly assigned to different conditions. There is an intervention applied to the one group. FROM THE EVIDENCE 1-3 If you combine mild, moderate, and severe depression, 65.4% of individuals in this study experienced some level of depression. Therefore, this study suggests that a large proportion of individuals with multiple sclerosis will experience depression. This is an important consideration for health-care professionals, in that depression will likely have a significant impact on an individual’s daily life and ability to engage in the rehabilitation process. 4366_Ch01_001-020.indd 18 REFERENCES Alzheimer’s Association. (2012). 2012 Alzheimer’s disease facts and figures. Alzheimer’s Association, 8(2), 1–72. American Occupational Therapy Association (AOTA). (2010). The occupational therapy code of ethics and ethics standards. American Journal of Occupational Therapy, 64(suppl.), S17–S26. American Speech-Language-Hearing Association (ASHA). (n.d.). Assessing the evidence. Retrieved from http://www.asha.org/Research/ EBP/Assessing-the-Evidence. American Speech-Language-Hearing Association (ASHA). (2005). Evidence-based practice in communication disorders (Position Statement). Retrieved from www.asha.org/policy/PS2005-00221.htm Arbesman, M., Scheer, J., & Lieverman, D. (2008). Using AOTA’s critically appraised topic (CAT) and critically appraised paper (CAP) series to link evidence to practice. OT Practice, 13(5), 18–22. Ben-Sasson, A. (2011). Parents’ search for evidence-based practice: A personal story. Journal of Paediatrics and Child Health, 47, 415–418. Case-Smith, J., Holland, T., Lane, A., & White, S. (2012). Effect of a coteaching handwriting program for first graders: One-group pretest-posttest design. American Journal of Occupational Therapy, 66, 396–405. doi:10.5014/ajot.2012.004333 Dahm, K. T., Brurberg, K. G., Jamtveat, G., & Hagen, K. B. (2010). Advice to rest in bed versus advice to stay active for acute low-back pain & sciatica. Cochrane Database of Systematic Reviews. Dean, C. (2012). Group task-specific circuit training for patients discharged home after stroke may be as effective as individualized physiotherapy in improving mobility: A critically appraised paper. Journal of Physiotherapy, 58, 269. doi:10.1016/S1836-9553(12) 70129–7 Fox, C., Ebersbac, G., Ramig, L., & Sapir, S. (2012). LSVT LOUD and LSVT BIG: Behavioral treatment programs for speech and body movement in Parkinson disease. Parkinson Disease, Epub, 1-13. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3316992/ Fu, T. S., Wu, C. Y., Lin, K. C., Hsieh, C. J., Liu, J. S., Wang, T. N., & Ou-Yang, P. (2012). Psychometric comparison of the shortened Fugl-Meyer Assessment and the streamlined Wolf Motor Function Test in stroke rehabilitation. Clinical Rehabilitation, 26, 1043–1047. Hayton, C., Clark, A., Olive, S., Brown, P., Galey, P., Knights, E., . . . Wilson, A. M. (2013). Barriers to pulmonary rehabilitation: Characteristics that predict patient attendance and adherence. Respiratory Medicine, 107, 401–407. Jones, K. H., Ford, D. V., Jones, P. A., John, A., Middleton, R. M., Lockhart-Jones, H., Osborne, L. A., & Noble, J. G. (2012). A largescale study of anxiety and depression in people with multiple sclerosis: A survey via the web portal of the UK MS Register. PLoS One, 7(7), Epub. Malmiraara, A., Hakkinen, U., Aro, T., Heinrichs, M. J., Koskenniemi, L., Kuosma, E., . . . Hernberg, S. (1995). The treatment of acute lowback pain: Bed rest, exercises or ordinary activity. New England Journal of Medicine, 332, 351–355. Norman, C., Bender, J. L., Macdonald, J., Dunn, M., Dunne, S., Siu, B., . . . Hunter, J. (2010). Questions that individuals with spinal cord injury have regarding their chronic pain: A qualitative study. Disability Rehabilitation, 32(2), 114–124. OCEBM Levels of Evidence Working Group. (2011). The Oxford 2011 levels of evidence. Retrieved from http://www.cebm.net/index .aspx?o=5653 Palisano, R. J. (2010). Practice knowledge: The forgotten aspect of evidence-based practice. Physical & Occupational Therapy in Pediatrics, 30, 261–262. 28/10/16 2:30 pm CHAPTER 1 Sackett, D., Rosenberg, W. M. C., Gray, J. A. M., Haynes, R. B., & Richardson, W. S. (1996). Evidence-based medicine: What it is and what it isn’t. BMJ, 312, 71–72. Sackett, D., Straus, S. E., & Richardson, W. S. (2000). How to practice and teach evidence-based medicine (2nd ed.). Edinburgh, Scotland: Churchill Livingstone. Sperle, P. A., Ottenbacher, K. J., Braun, S. L., Lane, S. J., & Nochajski, S. (1997). Equivalence reliability of the Functional Independence Measure for children (WeeFIM) administration methods. American Journal of Occupational Therapy, 51, 35–41. Stacey, D., Bennett, C. L., Barry, M. J., Col, N. F., Eden, K. B., HolmesRovner, M., . . . Thomson, R. (2011). Decision aids for people facing health treatment or screening decisions. Cochrane Database of Systematic Reviews, 10, 3. Thieme, H., Merholz, J., Pohl, M., Behrens, J., & Cohle, C. (2012). Mirror therapy for improving motor function after stroke. Cochrane Database of Systematic Reviews, 14, 3. Thomas, A., Saroyan, A., & Dauphinee, W. D. (2011). Evidence-based practice: A review of theoretical assumptions and effectiveness of 4366_Ch01_001-020.indd 19 ● Evidence-Based Practice 19 teaching and assessment interventions in health professions. Advances in Health Science Education, 16, 253–276. Tomlin, G., & Borgetto, B. (2011). Research pyramid: A new evidencebased practice model for occupational therapy. American Journal of Occupational Therapy, 65, 189–196. van de Port, I. G., Wevers, L. E., Lindeman, E., & Kwakkel, G. (2012). Effects of circuit training as an alternative to usual physiotherapy after stroke: Randomized controlled trial. BMJ, 344, e2672. Vatlin, H. (2002). “Drink at least eight glasses of water a day.” Really? Is there scientific evidence for “8 ⫻ 8”? American Journal of Physiology: Regulatory, Integrative and Comparative Physiology, 283, 993–1004. Vriend, J. L., Corkum, P. V., Moon, E. C., & Smith, I. M. (2011). Behavioral interventions for sleep problems in children with autism disorders: Current findings and future directions. Journal of Pediatric Psychology, 36, 1017–1029. World Confederation of Physical Therapy (WCPT). (2011). Policy statement: Evidence based practice. London, UK: WCPT. Retrieved from www.wcpt.org/policy/ps-EBP 28/10/16 2:30 pm 4366_Ch01_001-020.indd 20 28/10/16 2:30 pm “The search for truth takes you where the evidence leads you, even if, at first, you don’t want to go there.” —Bart D. Ehrman, Biblical scholar and author 2 Finding and Reading Evidence The First Steps in Evidence-Based Practice CHAPTER OUTLINE LEARNING OUTCOMES Scholarly Publications KEY TERMS Impact Factor INTRODUCTION The Peer-Review Process IDENTIFYING DATABASES Research Funding Bias PubMed Publication Bias Cumulative Index of Nursing and Allied Health Literature Duplicate Publication Cochrane Database of Systematic Reviews EMPLOYING SEARCH STRATEGIES READING A RESEARCH ARTICLE Title Authorship Selecting Key Words and Search Terms Abstract Combining Terms and Using Advanced Search Introduction Using Limits and Filters Expanding Your Search ACCESSING THE EVIDENCE The Research Librarian Professional Organizations DETERMINING THE CREDIBILITY OF A SOURCE OF EVIDENCE Websites Methods Results Discussion References Acknowledgments CRITICAL THINKING QUESTIONS ANSWERS REFERENCES The Public Press/News Media 21 4366_Ch02_021-038.indd 21 27/10/16 3:33 pm 22 CHAPTER 2 ● Finding and Reading Evidence LEARNING OUTCOMES 1. Identify the relevant databases for conducting a search to answer a particular research question. 2. Use search strategies to find relevant studies based on a research question. 3. Identify evidence-based practice resources that are available through your institution and professional organizations. 4. Evaluate the credibility of a specific source of evidence. KEY TERMS Boolean operator peer-review process database primary source impact factor publication bias institutional animal care and use committee secondary source institutional review board to determine what evidence to include; that is, the database will focus on an area of research, which determines which journals, textbook chapters, newsletters, and so on, will be catalogued. Once your topic of interest is identified by your research question, you use this information to select a database. Exercise 2-1 guides you through some selection considerations. EXERCISE 2-1 Selecting Databases (LO1) QUESTIONS INTRODUCTION Using Table 2-1, identify which database(s) you would most likely search to find the given information. I 1. PEDro ratings on randomized controlled trials examining interventions for people with Parkinson’s disease n the past, to keep up with the latest evidence, you would need to go to a library building and look through the card catalog or skim through stacks of journals to find an article of interest. Today, access to scientific literature is simpler than ever, and search engines make the research process faster and easier. Still, the amount of information at your fingertips can be overwhelming. The challenge becomes sorting through all of that information to find what you need, and then being able to identify the most credible sources. Chapter 1 describes the importance of evidence to practice and how to write a research question. This chapter explains how to use your research question as a framework for locating trustworthy evidence, and provides general guidelines for deciphering research articles. It serves only as an introduction to locating evidence, by providing basic information and tips; finding evidence takes skill and creativity that are honed with practice and experience. IDENTIFYING DATABASES An electronic database is an organized collection of digital data—a compilation of the evidence within a searchable structure. Each database uses its own criteria 4366_Ch02_021-038.indd 22 2. Systematic reviews examining the efficacy of exercise for depression 3. Identifying effective interventions to improve seat time in elementary school-age children with ADHD 4. Early examples of textbooks in occupational therapy 5. Best approaches for rehabilitation of the knee after ACL surgery 27/10/16 3:33 pm CHAPTER 2 ● Finding and Reading Evidence Most health sciences databases are comprised primarily of peer-reviewed journal articles. There is often overlap of journals across databases, but what may be found in one database may not be available in another. For this reason it is helpful to know the primary databases that contain content pertinent to rehabilitation research. Three 23 databases—PubMed, CINAHL, and the Cochrane Database of Systematic Reviews—are described in detail in this chapter. Table 2-1 provides a more comprehensive list of databases used by health-care professionals, including information about accessing them. If a database is not free to the public, it may be available through your school’s library. TABLE 2-1 Health-Care Databases Access Free of Charge? Database Content Focus Clinicaltrials.gov A registry of publicly and privately funded clinical trials. Provides information on studies that are currently underway. May include results if study has been completed. Yes Cochrane Library Database of systematic reviews and registry of clinical trials. Yes Educational Resources Information Center (ERIC) Includes articles and book chapters of particular interest to school-based practice. Yes Google Scholar A wide-ranging database of peer-reviewed literature. Does not include the extensive limits and functions of most scientific databases. Yes Health and Psychosocial Instruments (HAPI) Provides articles about assessment tools. No Medline Comprehensive database of peer-reviewed medical literature. Medline is included in PubMed. Yes National Rehabilitation Information Center (NARIC) Includes publications from NARIC and other articles with a focus on rehabilitation. Yes OT Search American Occupational Therapy Association’s comprehensive bibliography of literature relevant to the profession. No OTseeker Abstracts of systematic reviews and randomized controlled trials relevant to occupational therapy. Includes PEDro ratings. Yes PEDro Physical therapy database with abstracts of systematic reviews, randomized controlled trials, and evidence-based clinical practice guidelines with ratings. Yes Continued 4366_Ch02_021-038.indd 23 27/10/16 3:33 pm 24 CHAPTER 2 ● Finding and Reading Evidence TABLE 2-1 Health-Care Databases (continued) Access Free of Charge? Database Content Focus PsycINFO American Psychological Association’s abstracts of journal articles, book chapters, and dissertations. Useful for research on psychiatric and behavioral conditions. No PubMed The National Library of Medicine’s comprehensive database of peer-reviewed literature. Includes Medline and other resources. Yes PubMed Central A subset of PubMed’s database of all free full-text articles. Yes SpeechBite An evidence-based database of literature related to speech therapy. Includes PEDro ratings. Yes SPORTDiscus Full-text articles relevant to sports medicine, exercise physiology, and recreation. No Although you are likely familiar with Google Scholar, this is not the optimum database to search for professional literature. Google Scholar does not provide many of the features important for evidence-based practice, such as the ability to limit your search to specific research designs. When searching Google Scholar, the “most relevant results” are highly influenced by the number of times a particular study was cited. Hence, you are less likely to find newly published studies. Also, the most frequently cited studies are not necessarily the studies with the strongest evidence; in fact, in an effort to be comprehensive, Google Scholar even includes journals with questionable reputations (Beall, 2014). PubMed PubMed is the electronic version of the U.S. National Library of Medicine. PubMed offers free access and includes Medline, a database comprised of thousands of health-care journals, textbooks, and other collections of evidence such as the Cochrane Database of Systematic Reviews. PubMed is the most comprehensive medical database. All journals included in PubMed are peer reviewed. The abstracts of journal articles are available on PubMed and in some cases the full text of an article is also free. PubMed Central is a subset of PubMed that includes the full text of all articles. PubMed uses a system titled MeSH® (Medical Subject Headings), which is a set of terms or descriptors hierarchically arranged in a tree structure. When you 4366_Ch02_021-038.indd 24 search PubMed using a MeSH term, PubMed will automatically search for synonyms and related terms. To determine the appropriate MeSH term to use, search the MeSH browser at http://www.nlm.nih.gov/mesh/ MBrowser.html. For example, when searching orthotic devices, the structure is as follows: Equipment and Supplies Surgical Equipment Orthopedic Equipment Artificial Limbs Canes Crutches Orthotic Devices Athletic Tape Braces Foot Orthoses If you would like to broaden the search, use the term “orthotic equipment,” which is higher in the tree, or narrow the search by using the term “athletic tape.” In rehabilitation research, the specific intervention or term you wish to examine frequently is not included because it is not a MeSH term. In this case you can still enter the term as a key word by selecting a term that you would expect to find in the article title or abstract of the article. For example, “kinesio tape” is not a MeSH term; however, entering it as a key word brings up many relevant studies. Searching MeSH terms and key words is further discussed later in this section. 27/10/16 3:33 pm CHAPTER 2 ● Finding and Reading Evidence The home page of PubMed is shown in Figure 2-1. You would enter the selected MeSH term, or a key word that you would expect to find in the article title or abstract, in the box at the top of the page. Cumulative Index of Nursing and Allied Health Literature As its name indicates, the focus of the Cumulative Index of Nursing and Allied Health Literature (CINAHL) is on journals specific to nursing and allied health disciplines. It is useful to search in CINAHL because some journals in the occupational therapy, physical therapy, and speech and language professions that are not referenced in PubMed are included in CINAHL, such as the British Journal of Occupational Therapy, the Journal of Communication Disorders, and the European Journal of Physiotherapy. In addition, CINAHL includes a broader range of publications, such as newsletters and magazines like OT Practice, the Mayo Clinic Health Letter, and PT in Motion. Due to the broader range of publications, it is important to recognize that not all articles in CINAHL are peer reviewed. CINAHL does not use the MeSH system, instead using its own system of subject headings. The home page of CINAHL is shown in Figure 2-2. You can click “CINAHL Headings” at the top of the page to find terms used in CINAHL. Cochrane Database of Systematic Reviews The Cochrane Database of Systematic Reviews is part of the Cochrane Library. The systematic reviews contained in this database are conducted by members of the Cochrane Collaboration using rigorous methodology. As of 2015, the database included more than 5,000 reviews that are regularly updated. It includes one database of interventions and another for diagnostic test accuracy; therefore, the reviews are designed to answer efficacy 25 and diagnostic questions. However, the Cochrane Database is not an optimal source for descriptive, relationship, or qualitative research. The database is available free of charge, although access to the actual reviews requires a subscription. Most medical and university libraries hold subscriptions. Figure 2-3 shows the home page for the Cochrane Database of Systematic Reviews. You can go to the “Browse the Handbook” link to enter the names of interventions or conditions you would like to search. EMPLOYING SEARCH STRATEGIES After selecting a database, the next step is to enter the terms or keywords into the database; that is, you enter a MeSH term, a CINAHL subject heading, or a key word you expect to find in the article title or abstract. Your initial search may not elicit the relevant studies you are looking for, in which case you need to apply additional strategies to locate the studies that are most applicable to your research question. Although the Internet has made evidence more available to students and practitioners, locating and sorting through the evidence found there can be challenging. For example, a search on PubMed using the MeSH term “dementia” resulted in 142,151 hits on the day this sentence was written, and that number will only increase. Sifting through that much evidence to find the information most relevant to your question would be overwhelming, but familiarity with the databases and search strategies makes the process much more efficient (although this chapter can only include basic guidelines). Most databases include online tutorials, which can be extremely useful for improving your search skills. This section describes search strategies, using the PubMed database in the examples. The search process may vary with other databases, but if you are familiar This is where you enter the MeSH term or key word. FIGURE 2-1 Home page for PubMed. (Copyright National Library of Medicine.) 4366_Ch02_021-038.indd 25 27/10/16 3:33 pm 26 CHAPTER 2 ● Finding and Reading Evidence This is where you find the CINAHL terms (comparable to MeSH terms). CINAHL makes it easy to enter multiple terms from a variety of fields. FIGURE 2-2 Home page for the Cumulative Index of Nursing and Allied Health Literature (CINAHL). (Copyright © 2015 EBSCO Industries, Inc. All rights reserved.) This is where you would enter search terms to find systematic reviews at the Cochrane Library. FIGURE 2-3 Home page for the Cochrane Database of Systematic Reviews. (Copyright © 2015—The Cochrane Collaboration.) with the process of searching on PubMed, you will be able to apply the same or similar search strategies to other databases. Selecting Key Words and Search Terms Your research question is the best place to start when beginning your search. Identify the key words used in your question and enter them into the search box. Think about which words are most important. Let’s take a question from Chapter 1 to use as an example: “In infants, what is 4366_Ch02_021-038.indd 26 the efficacy of swaddling (versus no swaddling) for reducing crying?” The word that stands out is swaddling. Swaddling is not listed in the MeSH system; however, if you enter swaddling as a common word, you can expect to find it in a relevant article title or abstract. PubMed provides you with several common combinations. For example, after entering swaddling into the search box, you get the term “crying swaddling” as an option. If you click on this term, the result is a very manageable 18 articles (Fig. 2-4). A more complex search ensues with a different question: “What predictors are associated with successful return to employment for individuals with back injuries?” If you search back injuries as a MeSH term, you find that back injuries is indeed a MeSH term. However, if you enter the MeSH term back injuries into PubMed, the result is more than 25,000 possibilities, and a quick review of the titles reveals that many articles appear to be irrelevant to the question. Combining Terms and Using Advanced Search A useful skill for expanding or narrowing your search involves combining terms. A simple way to combine terms is to use the Advanced Search option in PubMed. This allows you to use the Boolean operators (the words used in a database search for relating key terms) AND, OR, and NOT. You can use AND when you want to find articles 27/10/16 3:33 pm CHAPTER 2 ● Finding and Reading Evidence 27 Combination of terms provided by PubMed: “crying swaddling.” FIGURE 2-4 After you enter a term, PubMed will provide you with common combinations, thereby narrowing down the number of articles. (Copyright National Library of Medicine.) that use both terms. Using OR will broaden your search and identify articles that use either term; this may be useful when different terms are used to describe a similar concept. For example, if you are looking for studies on kinesio taping, you might also search for articles that use the term strapping. The NOT operator eliminates articles that use the identified term. Perhaps you are interested in kinesio taping for the knee, but NOT the ankle. The All Fields option in Advanced Search means that the terms you enter can fall anywhere in the article’s title or abstract. However, you can limit the field to options such as the title or author or journal. Returning to the original example, enter back injuries AND employment in the Advanced Search option (Fig. 2-5). The outcome of this search is a more manageable 330 articles (Fig. 2-6). Using Limits and Filters Another search strategy involves using limits or filters. In the search box in Figure 2-6, several options for limiting your search by using filters are listed on the left side of the screen. You can use filters that will limit your search by type of study (e.g., clinical trials or review articles), availability of the full text, publication date, species, language of the article, age of the participants, and more. For example, you can limit your search to human studies and exclude all animal studies, limit your search to infants, or limit your search to studies published during the past year. The articles listed in a search are provided in order of publication date. Another useful strategy is to click the “sort by relevance” feature on the right side of the screen (Fig. 2-6). This search now provides articles that are likely to be more closely related to the specific question (Fig. 2-7). You can combine terms in the Advanced Search function of PubMed. FIGURE 2-5 Search using the Advanced Search option with the entry back injuries AND employment. (Copyright National Library of Medicine.) 4366_Ch02_021-038.indd 27 27/10/16 3:34 pm 28 CHAPTER 2 ● Finding and Reading Evidence In this example the number of articles is limited by using the Boolean operator AND. Examples of filters, and you can add even more filters here. FIGURE 2-6 Results from the search in Figure 2-5 and list of filters (left of screen). (Copyright National Library of Medicine.) Sorting by relevance instead of date may make it easier to find what you are looking for. FIGURE 2-7 The “sort by relevance” feature can be used to locate articles more pertinent to your question. (Copyright National Library of Medicine.) 4366_Ch02_021-038.indd 28 27/10/16 3:34 pm CHAPTER 2 ● Finding and Reading Evidence 29 A quick scan of the initial results in Figures 2-6 and 2-7 suggests this to be true. the best evidence. Box 2-1 supplies a summary of tips for searching the evidence. Expanding Your Search ACCESSING THE EVIDENCE When conducting a search, balance the likelihood of finding what seems to be too many studies to review against the possibility of missing something relevant. When a search results in a small number of articles, consider using additional strategies to locate studies that are more difficult to find but are specific to the research question. One strategy is to draw on a major study as a starting point. Examples of major studies are ones that include a large number of participants, a review paper that includes multiple studies, or a study published in a major journal such as the Journal of the American Medical Association (JAMA). When you select a particular study, you will get the full abstract and a screen on the right side that lists other studies in which this study was cited and a list of related citations (Fig. 2-8). Review these citations to determine if any of the other studies are relevant to your research. Important studies are cited frequently and can lead you to similar work. Another strategy for expanding your search is to use the reference lists of relevant studies to find other studies of interest. This approach can be time-consuming, but it helps ensure that you do not miss any important information when conducting a comprehensive review to locate Looking at the results of a search (as in Fig. 2-7), some articles include a “Free PMC Article” link. Simply click on the link to obtain access to the article. If this link is not available, click on the article title to get the abstract of the study. In Figure 2-8, there are links in the upper right corner for accessing the full text of the study. The links take you to the publisher and in some cases access through your library. Sometimes it is possible to access an article for free after clicking on the link. In other cases, you will need to pay a fee to the publisher. As a student, you will typically have access to a university library with substantial access to online subscriptions. Generally, if the library does not have a subscription to the title, it can still access the article through interlibrary loan, which may or may not require payment of a fee. As a practicing therapist, you may not have access to a medical library; however, there may be other ways to access the full text of articles without paying a fee. Some university alumni organizations offer library access to former students who are members of the alumni club or to fieldwork sites that take students. Some articles are available to the general public and are identified as full free text. There is no easy way to know which articles will be free, as many factors are involved. Specific The “similar articles” and “cited by” features can be helpful in finding other relevant studies. FIGURE 2-8 Using a “related citations” function to expand a search. (Copyright National Library of Medicine.) 4366_Ch02_021-038.indd 29 27/10/16 3:34 pm 30 CHAPTER 2 ● Finding and Reading Evidence BOX 21 Tips for Searching the Evidence • Use your research question as a starting point to identify search terms. • Use the Boolean operators (AND, OR, NOT) to limit or expand your search. • Use MeSH terms or subject headings when applicable, but remember that not all topics will be included as MeSH terms. • Remember to use filters and limits to narrow your search. • Recognize that finding one relevant study can help you identify other relevant studies by using the related citations or reference lists. • Become a member of professional organizations that provide access to many full-text articles. • Use the tutorials associated with databases to improve your search skills. • Take advantage of the expertise of your research librarian. journals may provide limited access, such as to articles published in the most recent edition; conversely, journals may provide access to all articles except those published in the previous year. The National Institutes of Health Public Access Policy (NIH, n.d.) now requires that all studies that receive funding from the NIH must be accessible to all individuals through PubMed no later than 12 months after the official date of publication. This policy took effect in April of 2008, so many more studies are now available. Research librarians can be helpful in both finding and accessing articles. In addition, professional organizations offer another source of articles. More information on using the research librarian or professional organization is provided later in this chapter. EXERCISE 2-2 Using Search Strategies to Locate the Evidence (LO2) Write a research question and then follow the instructions below and answer the associated questions. Your research question: 4366_Ch02_021-038.indd 30 QUESTIONS Enter the relevant key word(s), MeSH terms, or CINAHL subject headings into a PubMed search. 1. What words did you enter, and how many results did you obtain? Now go to the Advanced Search option and practice using the Boolean operators AND, OR, and NOT. 2. How did this change your search? Did it result in a reasonable number of articles to search through? Do you think you are missing some of the evidence? Practice using other strategies, such as searching for a particular author or journal, or limiting the search to clinical trials or review articles, a particular age group, or human studies. 3. What strategies were most helpful? The Research Librarian If you have access to a medical library, the research librarian can be an invaluable resource. Research librarians are specifically trained to help people find the resources they are looking for. You can go to the research librarian with your research question, and he or she can help you conduct a search and access the material. Medical research librarians typically have a master’s degree and in-depth training in finding research evidence. Although this chapter gives you the basic tools to use in searching for evidence, and practice and experience will improve your skills, research librarians are experts who can help you when you are finding a search difficult and need to ensure that you have located all of the available articles. In most states, there is a medical library supported with public funds. You can also contact this library to determine if the research librarian can assist you in a search. The librarian at your public library may also be able to offer assistance with a search. Some large health-care institutions also have a medical library and librarian. 27/10/16 3:34 pm CHAPTER 2 ● Finding and Reading Evidence EXERCISE 2-3 Learning About the Resources at Your Institution (LO3) 31 2. Learn how to use the interlibrary loan system at your institution. When is a cost associated with obtaining full-text articles using interlibrary loan? What is the cost? 1. Access your electronic library resources and, using the following table, identify the databases that are available through your institution. Database Clinicaltrials.gov Cochrane Library Educational Resources Information Center (ERIC) Google Scholar Health and Psychosocial Instruments (HAPI) Medline National Rehabilitation Information Center (NARIC) OT Search OTseeker PEDro Place a √ If Available 3. Use the advanced search option of PubMed or CINAHL to search for the journals listed here. Select a study to determine whether or not the full text is available through your institution. Note by circling yes or no. Archives of Physical Medicine and Rehabilitation International Journal of Therapy and Rehabilitation Audiology Research YES NO YES NO YES NO Professional Organizations As mentioned in Chapter 1, the American Occupational Therapy Association, the American Physical Therapy Association, and the American Speech-Language-Hearing Association are associations that are dedicated to increasing evidence-based practice among their membership. Consequently, these organizations provide many resources for evidence-based practice, including access to relevant journals. Members of the American Occupational Therapy Association have access to all full-text articles published in the American Journal of Occupational Therapy, as well as articles in the British Journal of Occupational Therapy and the Canadian Journal of Occupational Therapy. Members of the American Speech-Language-Hearing Association have access to articles in the American Journal of Audiology, the American Journal of Speech-Language Pathology, Contemporary Issues in Communication Sciences and Disorders, and the Journal of Speech, Language, and Hearing Research. Members of the American Physical Therapy Association have access to Physical Therapy Journal and PT Now. Membership in professional associations assists you in functioning more efficiently as an evidence-based practitioner. PsycINFO PubMed PubMed Central SpeechBite SPORTDiscus 4366_Ch02_021-038.indd 31 DETERMINING THE CREDIBILITY OF A SOURCE OF EVIDENCE One important consideration in evaluating evidence is the credibility of the source. The source of research may be a journal article, professional report, website, the popular press, and/or other sources. The primary source of information is the most reliable, as it has not been interpreted or summarized by others. Primary sources 27/10/16 3:34 pm 32 CHAPTER 2 ● Finding and Reading Evidence can include original research studies and professional and governmental reports that are based on original research or data collection. Secondary sources are documents or publications that interpret or summarize a primary source. They are one step removed and include someone else’s thinking. Websites are typically secondary sources, although they may include original work. The scholarly nature of websites varies immensely. Stories in the news media are secondary sources. When you hear about a study on the radio or television, or read about it in a newspaper, you cannot assume that you are getting full and completely accurate information. In such cases, it is wise to access the original study published in a scholarly journal. As an informed and critical evidence-based practitioner, you can read the original publication and evaluate the evidence for yourself. In fact, even work published in a highly respected scholarly journal should be read with a skeptic’s eye. The following discussion provides some tips for appraising the credibility of a source of evidence. Websites Websites providing health-care information are very popular with health-care providers and the general public; however, much of the information on the Internet is inaccurate. Medline, a service of the U.S. National Library of Medicine, provides a guide for evaluating the quality of health-care information on the Internet, the MedlinePlus Guide to Healthy Web Surfing (Medline, 2012). Box 2-2 outlines the key points of this document. Much of the BOX 22 Highlights of the MedlinePlus Guide to Healthy Web Surfing information provided in the Medline guide is relevant to print sources as well. Key factors to consider when evaluating information on a website are the source of the information, timeliness of the information, and review process of the website. Of primary interest is the provider of the information. Respectable sources, such as professional organizations like AOTA, APTA, and ASHA, and federal government sites such as the Centers for Disease Control and the National Institutes of Health, are recognized authorities—as opposed to individuals or businesses that have websites to promote a product or service. A credible website will clearly describe its mission and provide information about the individuals involved in the organization, such as who makes up the board of directors and the affiliations of those individuals. If you find that the website is linked to a corporation with financial ties to the information, and particularly when the website is selling a product, you should be concerned about bias. Sites that rely on testimonials and individual reports are less reliable than sites that utilize research evidence. The research should be cited so that the reader can access the original work. When extraordinary claims are made, be wary and compare the information with information from other credible sources. Sites that have an editorial board and a review policy/process that utilizes experts in the field are more credible. The approval process for information on a website can often be found in the “About Us” section of the website. Information posted there should be up to date. In summary, when using information from the Internet, look for sources that cite research from peer-reviewed journals, have up-to-date information, and describe a review process for posting content. The Public Press/News Media • Consider the source: Use recognized authorities • Focus on quality: Not all websites are created equal • Be a cyberskeptic: Quackery abounds on the Internet/Web • Look for the evidence: Rely on medical research, not opinion • Check for currency: Look for the latest information • Beware of bias: Examine the purpose of the website • Protect your privacy: Ensure that health information is kept confidential Adapted from: Medline. (2012). MedlinePlus guide to healthy web surfing. Retrieved from http://www.nlm.nih.gov/medlineplus/ healthywebsurfing.html 4366_Ch02_021-038.indd 32 Stories about health-care research are very common in the news media. These stories are often based on press releases provided by scholarly publications or institutions that alert health-care providers and the lay public to newly released research. It is possible that your clients and their families will approach you with information they obtained from the news media. The news media is a valuable source for sharing important health-care evidence with the public. However, the coverage of healthcare research in the media tends to overemphasize lower levels of evidence and report findings that are more favorable than the actual study results reveal (Yavchitz et al, 2012). As indicated in Chapter 1, replication of research findings is essential to establish a finding. However, much media reporting emphasizes only initial findings, and gives little follow-up as the evidence develops or matures over time. 27/10/16 3:34 pm CHAPTER 2 ● Finding and Reading Evidence 33 EVIDENCE IN THE REAL WORLD Clients’ Use of Evidence Just as you have greater access than ever to health-care research, so too does your client. Don’t be surprised when a client asks about a specific intervention or treatment based on a news story or Internet search. It is important to listen to your client respectfully and with an open mind. Based on your own familiarity with the topic, you may need to do your own search before offering an opinion. Once you have gathered information, engage in a discussion on the topic. Educate your client as needed, taking care not to disregard or discount the client’s opinions. This process of making collaborative decisions, known as shared decision-making, is explained in greater detail in Chapter 11. Scholarly Publications Scholarly journals provide the highest level of rigor because the publication process includes protections to enhance the credibility of published research. Scholarly publications can come from professional organizations, such as AOTA, which publishes the American Journal of Occupational Therapy; the APTA, which publishes Physical Therapy; and the ASHA, which publishes the American Journal of Speech-Language Pathology; interdisciplinary organizations that focus on a particular aspect of health care (e.g., the Psychiatric Rehabilitation Association, which publishes the Psychiatric Rehabilitation Journal); and publishers that specialize in medical research (e.g., Lippincott, Williams & Wilkins, which publishes Spine). Scholarly journals often adopt guidelines, such as the guidelines for scholarly medical journals proposed by the International Committee of Medical Journal Editors (ICMJE) (ICMJE, 2014) or Enhancing the Quality and Transparency of Health Research (EQUATOR, n.d.), to ensure high-quality reporting. Impact Factor One measure of the importance of a journal is its impact factor. The impact factor is based on the number of times articles in that particular journal are cited in other articles (Garfield, 2006). For example, in 2012 the impact score for the highly respected Journal of the American Medical Association (JAMA) was 29.9; the Archives of Physical Medicine and Rehabilitation had an impact score of 2.35; Physical Therapy’s score was 2.77; the American Journal of Occupational Therapy had a score of 1.47; and the Journal of Speech Language and Hearing Research was 1.97 (Journal Citation Reports, 2012). The score is calculated by using the number of citations for the two previous years. For example, the Physical Therapy score indicates that the articles published during 2010 and 2011 were cited an average of 2.77 times. These numbers reflect only one aspect of a journal’s quality and have been criticized for favoring journals that include many self-citations, long articles, 4366_Ch02_021-038.indd 33 and review articles, as well as journals that report on hot topics in research (Seglen, 1997). The Peer-Review Process The peer-review process is used by scholarly journals to ensure quality. It involves a critical appraisal of the quality of the study by experts in the field. Peerreviewed journals have strict guidelines for submission and review. The peer reviewers and journal editors determine whether the study meets the quality standards of the journal. In a research article, a scientific method of reporting is followed by detailed methods and results. References and citations that follow a specific style manual are required. It is important to recognize that not all peer-reviewed journals have the same mission and requirements for rigor. The peer-review process provides a level of quality control, but should not be regarded as a guarantee of credibility or accuracy. Some journals may include a peer-review process, but set a low bar for acceptance. Even journals with rigorous standards cannot guarantee that the research reported in the article is completely accurate. Errors may exist in the statistical analysis or other sections of the study that go undetected. More problematic is the possibility of fraudulent reporting, in which the researcher intentionally misrepresents the data. Failure to identify fraudulent reporting is perhaps the greatest weakness of peer review. However, when fraud is detected, it is the responsibility of the journal to issue a retraction. One study found that the number of retractions has increased sharply, and the majority of retractions in medical and life sciences journals are due to misconduct (Fang, Steen, & Casadevall, 2012). This study found that more than 67% of retractions were due to misconduct; fraud was the most common form of misconduct, followed by duplicate publication (described later in this section) and then plagiarism. The number of articles that have been retracted due to fraud has increased tenfold since 1975. 27/10/16 3:34 pm 34 CHAPTER 2 ● Finding and Reading Evidence Research Funding Bias Publication Bias The funding of research has the potential to bias the reporting of outcomes. Much of the published research in health care is funded by either a public or a private organization. Any sources of funding—and thus potential conflict of interest by the authors—is a required disclosure of most peer-reviewed journals. The largest public funding source is the United States government, which funds health-care research through organizations such as the National Institutes of Health and the Centers for Disease Control and Prevention (CDC). Private charitable organizations, such as the Bill and Melinda Gates Foundation, and professional organizations, such as the American Occupational Therapy Foundation, can also serve as funding sources for research. Generally speaking, these organizations do not have a financial interest in the outcomes of a particular study and fund research solely to advance the sciences. In contrast, private research funding that is linked to the product under study can present a conflict of interest. This does not mean that all industry-sponsored research is biased, but a healthy skepticism is advised. For example, a Cochrane review (Lundh, Sismondo, Lexchin, Busvio, & Bero, 2012) found that studies sponsored by drug and medical device companies were more likely to report favorable results than studies sponsored by other sources. Weak study designs and misrepresentation of the data do not appear to account for this difference, as industry-sponsored research is often well designed and reported. However, it has been suggested that industrysupported research is more likely to utilize the desired product in the most ideal situation with a less favorable comparison or control condition (e.g., a low dose of the comparison drug; Bastian, 2006). Most journals require authors to include a full disclosure of the author’s financial interests and relationship with sponsors. Look for this information (often in the acknowledgment section) when reading research studies and, when possible, compare the results of industrysponsored research with research conducted with a less-biased sponsor. Even without funding, a researcher may be biased by an intervention or assessment that he or she developed. It is commonplace for initial studies to be carried out by the developer of the intervention or assessment. Not only will these researchers have a vested interest in positive outcomes, but they are also more likely to have expertise in administering the approach that would not be expected of the typical clinician. It will likely be apparent from the methods section or the reference list whether the researcher is also the developer of the intervention or assessment. Evidence for a particular approach is strengthened when research findings are repeated by individuals who are not the developers. Journals are more likely to accept studies that indicate a positive finding (e.g., in the case of an intervention study, the intervention was effective). Simply put, positive findings are more interesting than negative findings. However, the result can be a publication bias toward positive outcomes. Publication bias is the inclination for journals to publish positive outcomes more frequently than negative outcomes. When collecting evidence on a single topic, it is possible to conclude that an intervention is more effective than it truly is because you do not have access to the studies that were not published due to negative outcomes. This is particularly relevant when considering compilations of studies such as systematic reviews. The best reviews will make an effort to obtain unpublished results by contacting researchers in the field or following up on grants and study protocols. However, locating unpublished research is a challenging process that may still not uncover all of the research conducted on a topic. Publication bias is discussed in greater detail in Chapter 10. 4366_Ch02_021-038.indd 34 Duplicate Publication A research article should represent original work, unless it is identified as a republication. Duplicate publication of the same work in multiple journals is viewed as an unethical practice. It is not unusual for researchers to publish several articles from a single research study; however, each article should include unique data. Some researchers will “stretch” the data to create as many articles as possible. If you come upon studies with the same or similar authors and a similar number of participants and procedures, it is important to examine the results to determine if the same findings are reported in both articles, or if the authors are providing unique information. EXERCISE 2-4 Evaluating the Credibility of the Evidence (LO4) There is a great deal of controversy surrounding the issue of childhood vaccinations and their potential link to autism. Consider the research question, “Do childhood vaccinations increase the risk for developing autism?” Evaluate the following websites and their information about autism risk and vaccines. Consider the source and purpose of the website. Think about the evidence that is used to support the information from each source. • http://www.cdc.gov/vaccinesafety/Concerns/ Autism/Index.html 27/10/16 3:34 pm CHAPTER 2 ● Finding and Reading Evidence • http://www.infowars.com/do-vaccines-causeautism/ • http://autismsciencefoundation.org/what-isautism/autism-and-vaccines/ • http://www.generationrescue.org/resources/ vaccination/ After evaluating the websites, also consider the following systematic review: Gerber, J. S., & Offit, P.A. (2009). Vaccines and autism: A tale of shifting hypotheses. Clinical Infectious Diseases, 48, 456-461. QUESTIONS After reading the information, what conclusion do you draw, and why? What recommendation would you make to a parent trying to decide whether or not to vaccinate a child? 35 will be contact information in the form of addresses and e-mail for the lead author. Consider: • Information about individual authors may not be meaningful to beginning readers, but once familiar with a topic, you will begin to recognize key contributors. Abstract The abstract provides a summary of the entire paper. Professional journals have a word limit that results in a very brief synopsis; therefore, many details will be missing from the abstract. However, the abstract is a useful place to start to get oriented to the study and help determine if you wish to read further. Consider: • Does the abstract provide enough information to determine if the article is relevant to the evidence-based question? Read the full text if the article is relevant. Introduction READING A RESEARCH ARTICLE Research articles follow a standard format that includes the title, authorship, an abstract, introduction, methods, results, discussion, references, and acknowledgment sections. When you are reading a study, it is helpful to know what each section contains. Each section of the article is described here, along with question prompts for evaluating the general readability and meaningfulness of the article, as well as key aspects of that section. Subsequent chapters address how to critique the quality of the research itself. Title The title should reflect the content of the paper. A wellworded title will include key words that readers are likely to use in a search. Consider: • Does the title accurately reflect the content of the article? Authorship The authors of an article are listed in order of importance in terms of their contribution to the work. The lead author is the individual who contributed most to the writing of the article. All listed authors of an article should make a significant contribution to the writing of the article; in fact, some journals require that the authors identify their particular contribution. The author’s affiliation (i.e., where he or she works) will be listed, and typically there 4366_Ch02_021-038.indd 35 The introduction provides background information, explains why the study is important, and includes a literature review of related articles. A good introduction provides a strong justification of the need for the study. The introduction should include up-to-date references and seminal or classic studies that may be older. The introduction typically ends with a purpose statement that explains the aims of the study. This is followed by the research question or hypothesis. Consider: • Is there enough information to understand why the study is important? • Are there adequate references to support the background? • Will the research question identified in this study answer your own evidence-based question? Methods The methods section describes the process the researcher(s) used when conducting the study. It should provide enough detail to allow the study to be replicated. The methods section is typically divided into subsections. The first section describes the design of the study; for example, the author may state that the study design was a randomized controlled trial and go on to describe the experimental and control conditions. Another subsection describes the participants; typically this explains how the participants were selected, the inclusion/exclusion criteria, and the human subjects consent process. Reporting guidelines require authors to indicate the ethical board that approved the research, such as the institutional 27/10/16 3:34 pm 36 CHAPTER 2 ● Finding and Reading Evidence review board (IRB) for research involving human subjects research and the institutional animal care and use committee (IACUC) for animal research (see Box 2-3). Some articles describe the demographic characteristics of the participants in the methods section, whereas others include this information in the results. The methods section also includes a subsection describing the measures used in the study and any other instruments or materials. If it is an intervention study, the intervention itself will be described in the methods section. The methods section BOX 23 IRB and IACUC Because the focus of this textbook is on evidencebased practice rather than the methods of conducting research, an extensive discussion of research ethics is not practical. However, practitioners should be aware of the importance of the institutional review board (IRB) for research involving human subjects and the institutional animal care and use committee (IACUC) for animal research. Institutions such as universities, hospitals, and research centers that conduct health-care research must have an IRB and IACUC in place to review the research that takes place within their institution or is conducted by employees of the institution. These boards review the research before it begins to ensure that the benefits of the research outweigh the risks; that ethical principles are adhered to; in the case of human research, that participants are fully and adequately informed of the research process before providing consent; and, in the case of animal research, that the animals are cared for and not subjected to unnecessary pain or suffering. Most journals now require that the researchers identify that an IRB or IACUC approved the project within the methods section of the paper. Although the IRB and IACUC provide safeguards, it is still possible for individual researchers to become involved in unethical practices, such as coercing participants or misrepresenting the data. Institutions that receive federal funding can be audited if unethical practices are suspected; if misconduct is found, research at that institution may be discontinued. In 2001, the Federal Office of Human Research Protections suspended human research at Johns Hopkins after a healthy volunteer died in an asthma study: The young woman suffered lung damage and eventually died after inhaling a drug that was intended to induce an asthma attack (“Family of fatality,” 2001). 4366_Ch02_021-038.indd 36 typically ends with an explanation of the data analysis procedures. Consider: • Can you ascertain the design of the study? • Are the methods for selecting participants and assigning them to groups specified? • If groups are compared, is the control or comparison condition sufficiently described? • Are the measures described, including information about their reliability and validity? • If an intervention study, is the intervention described in enough detail to allow you to determine its applicability in a typical practice setting? • Are the specific statistical analyses identified? Results The results section of a journal article describes the findings of the study. If the participants are not described in the methods section, the results section will begin with an accounting and description of the participants. In an intervention study that involves two or more groups, the demographics and baseline scores of the outcome measures will be compared across the groups to determine if there are any systematic differences in the characteristics of subjects in each of the groups. In an intervention study, the researcher would like for there to be no differences between the groups in the demographics or baseline measures because this introduces potential confounding factors to the study. Additional information regarding confounding factors is presented in greater detail in Chapter 5. After the participants are described, the results of the study as they relate to the research question/hypothesis are presented. Additional findings that may not have been addressed by the original question may also be presented. The results section will describe the findings in statistical terms and often includes tables and graphs. The results are presented factually without speculation; the latter is generally reserved for the discussion section. The results section can be intimidating for individuals who are unfamiliar with research design and statistics, and this textbook is intended to decrease the level of intimidation. Although scholarly work should be presented in an objective manner, the interpretation of the results in the discussion section will be influenced by the author’s point of view. Sophisticated evidence-based practitioners can read the results section and draw their own conclusions about the findings. Consider: • Are the results presented in such a way that the research question is answered and, if relevant, the hypothesis is identified as supported or not supported? • Are tables and figures used to help elucidate the results? • Are the results presented in an objective manner without obvious bias? 27/10/16 3:34 pm CHAPTER 2 ● Finding and Reading Evidence Discussion The discussion section summarizes and explains the findings. It should begin by restating the primary aims of the study and answering the research question or describing how the results relate to the original hypothesis. In this section the author includes an interpretation of the results and may speculate as to why a particular finding was obtained. However, a good discussion section does not go beyond the data. For example, if the results of an intervention study suggest a trend toward beneficial outcomes of a particular intervention, yet there is no statistically significant difference between the groups, it would be premature and inappropriate to conclude that the intervention is the preferred approach. In addition, the discussion should include citations from the existing literature to support the interpretation. The discussion should specify whether the results of this study are consistent with previous research. It should also describe the limitations of the study and include implications for practice. Consider: • Does the discussion help you understand the findings? • Does the discussion accurately reflect the results (e.g., it does not overstate the findings)? • Is there an accounting of the limitations of the study? • Are references provided to help substantiate the conclusions of the author(s)? • Does the author explain how the results are meaningful and/or applicable to practice? 37 CRITICAL THINKING QUESTIONS 1. Why is a scientific database search more efficient than a Google search when trying to locate research evidence? 2. In what ways has technology made evidence-based practice more practical? How has it made evidence-based practice more challenging? 3. If an initial database search results in too many articles to manage, what strategies can you use to pare the number down to fewer articles that are more relevant to your topic? References The end of the article will display a reference list including all of the studies cited in the article. Different journals use different styles for references. Journals that use the American Psychological Association’s style list the references in alphabetical order, whereas journals that follow the American Medical Association’s style list references in the order in which they appear in the article. Although the same basic information is presented, the order and format of reference citations may vary considerably depending on the publication manual followed. Consider: • Are the dates of references recent? 4. How does the peer-review process enhance the credibility of the evidence, and what are its limitations? 5. Many novice readers of the evidence skip over the results section and go directly to the discussion of a research article. Why should evidence-based practitioners read the results section in addition to the discussion? Acknowledgments The acknowledgments section recognizes individuals who contributed to the work of the research but are not authors of the paper. This section also identifies the funding source of the research. Consider: • Does the acknowledgment section provide enough information to determine possible conflicts of interest on the part of the researcher? 4366_Ch02_021-038.indd 37 27/10/16 3:34 pm 38 CHAPTER 2 ● Finding and Reading Evidence ANSWERS EXERCISE 2-1 1. 2. 3. 4. 5. OTseeker, PEDro, Speechbite Cochrane, PubMed, Psychlit ERIC, PubMed OT Search SPORTDiscus, PubMed EXERCISE 2-2 Every search will result in unique answers. The more you practice, the more familiar you will become with the databases and the most effective methods for locating the evidence. EXERCISE 2-3 The answers are dependent on your particular institution. Check your answers with those of your classmates to determine if you agree. EXERCISE 2-4 The websites supported by better known and more respected organizations (e.g., the Centers for Disease Control and the Autism Science Foundation) also provide links to scientifi c evidence supporting their position that vaccinations do not cause autism. The systematic review of multiple studies came to the same conclusion. 4366_Ch02_021-038.indd 38 REFERENCES Bastian, H. (2006). They would say that, wouldn’t they? A reader’s guide to author and sponsor biases in clinical research. Journal of the Royal Society of Medicine, 99, 611–614. Beall, J. (2014). Google Scholar is filled with junk science. Scholarly Open Access. Retrieved from https://scholarlyoa.com/2014/11/04/ google-scholar-is-filled-with-junk-science/ Enhancing the Quality and Transparency of Health Research (EQUATOR). (n.d.). Home page. Retrieved from http://www.equatornetwork.org/ Family of fatality in study settle with Johns Hopkins. (2001, October 12). New York Times, p. A14. Retrieved from http://www.nytimes.com/2001/ 10/12/us/family-of-fatality-in-study-settles-with-johns-hopkins.html Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences of the United States of America, 109, 17028–27033. Garfield, E. (2006). The history and meaning of the journal impact factor. Journal of the American Medical Association, 295, 90–93. International Committee of Medical Journal Editors (ICMJE). (2014). Recommendations for the conduct, reporting, editing and publication of scholarly work in medical journals. Retrieved from http:// www.icmje.org/icmje-recommendations.pdf Journal Citation Reports. (2012). Home page. Retrieved from http:// thomsonreuters.com/journal-citation-reports/ Lundh, A., Sismondo, S., Lexchin, J., Busvio, O. A., & Bero, L. (2012). Industry sponsorship and research outcome. Cochrane Database of Systematic Reviews, 2012(12). doi:10.1002/14651858.MR000033.pub2. Medline. (2012). MedlinePlus guide to healthy web surfing. Retrieved from http://www.nlm.nih.gov/medlineplus/healthywebsurfing.html National Institutes of Health. (n.d.). NIH public access policy. Retrieved from http://publicaccess.nih.gov/policy.htm Seglen, P. O. (1997). Why the impact factor of a journal should not be used for evaluating research. British Medical Journal, 314, 497. Yavchitz, A., Boutron, I., Bafeta, A., Marroun, I., Charles, P., Mantz, J., & Ravaud, P. (2012). Misrepresentation of randomized controlled trials in press releases and news coverage: A cohort study. PLoS Medicine, 9(9), 1–10. 27/10/16 3:34 pm 3 “The plural of anecdote is not data.” —Roger Brinner, economist Research Methods and Variables Creating a Foundation for Evaluating Research CHAPTER OUTLINE LEARNING OUTCOMES HYPOTHESIS TESTING: TYPE I AND TYPE II ERRORS KEY TERMS VARIABLES INTRODUCTION Independent Variables TYPES OF RESEARCH Dependent Variables Experimental Research Control Variables Nonexperimental Research Extraneous Variables Quantitative Research CRITICAL THINKING QUESTIONS Qualitative Research ANSWERS Cross-Sectional and Longitudinal Research REFERENCES Basic and Applied Research LEARNING OUTCOMES 1. Categorize the major types of research. 2. Determine the most appropriate type of research method to answer a given research question. 3. Given a research abstract or study, classify the variables. 39 4366_Ch03_039-058.indd 39 28/10/16 1:47 pm 40 CHAPTER 3 ● Research Methods and Variables KEY TERMS applied research basic research categorical variable continuous variable control group nonexperimental research nonrandomized controlled trial observational studies with a control group that does not receive treatment. Or perhaps it has been determined that it is unethical to withhold treatment, so the control group receives “treatment as usual,” which can mean many different things. This chapter provides an overview of the major types of research so that you can begin to make these distinctions. It also introduces you to some important research concepts, including hypothesis testing and variables. As you progress through your research course, you will be able to distinguish several research designs commonly used in rehabilitation research. control variable pre-experimental research correlational studies qualitative research TYPES OF RESEARCH cross- sectional research quantitative research dependent variable quasi-experimental study In Chapter 1 you learned that there are different types of research questions, and that different types of research methods are used to answer different types of questions. As an evidence-based practitioner, it is important to know what type of research will best address a particular practice question. Different types of research include: directional hypothesis efficacy studies experimental research extraneous variable factorial design hypothesis independent variable intervention studies longitudinal research randomized controlled trial research design third variable problem translational research true experiment Type I error Type II error variable nondirectional hypothesis • • • • • • • • Experimental Nonexperimental Quantitative Qualitative Cross-sectional Longitudinal Basic Applied As an evidence-based practitioner, it is important to know what type of research will best address a particular practice question. Experimental Research INTRODUCTION T he language of research can be intimidating to new evidence-based practitioners. There are many ways in which research can be categorized, and different people often use different terminology. In addition, there are many variations within each type of research. These variations within a research type may be considered a research design. A research design is the more specific plan for how a study is organized. For example, the terms true experiment and randomized controlled trial both refer to the research design, but that may not be evident from the terminology. A randomized controlled trial is a research design that compares at least two groups, with participants randomly assigned to a group. However, there are numerous variations within this design. In one randomized controlled trial, there may be a comparison of an intervention group with a control group that does not receive treatment; another may compare two different intervention approaches, and still another may include three groups with a comparison of two interventions 4366_Ch03_039-058.indd 40 The terms research and experiment are often used interchangeably, yet experimental research is just one type of research—a very important one for evidence-based practitioners. Experimental research examines cause-andeffect relationships. In evidence-based practice, clinicians want to know whether or not an intervention resulted in a positive outcome for the client. These studies are also described as efficacy studies or intervention studies; in other words, they answer the research question, “Was the intervention effective?” For a causal relationship to be inferred, a specific methodology must be followed. In a typical experiment, participants are assigned to one of two groups, and the groups are manipulated. One group receives the intervention of interest. The second group is the control group; participants in the control group may or may not receive the intervention. Control is the operative word here. By controlling for alternate explanations, the researcher can infer that differences between the intervention and control group are due to (or caused by) the intervention. The 28/10/16 1:47 pm CHAPTER 3 ● Research Methods and Variables strengths and weaknesses of different control conditions are discussed in Chapters 4 and 5. Experimental studies may also be referred to as difference studies or group comparison studies because the researcher seeks to determine if there is a difference between two or more groups (e.g., an efficacy study determines whether there is a difference between the group that received the intervention of interest and the group that did not). From the Evidence 3-1 is an example of experimental research answering an efficacy question. In this study, an intervention group receiving constraint-induced movement therapy (CIMT) is compared with a control group receiving traditional rehabilitation. The study is designed to examine the efficacy of CIMT for improving reaching, grasping, and functional movement. The researchers conclude that CIMT caused the improvements that were discovered. At first glance, this abstract from PubMed may be intimidating, but with a little practice you can quickly recognize an experimental study when you see one. There are several reasons why the abstract in From the Evidence 3-1 suggests an experimental design: • It is an intervention study examining efficacy. • A comparison is being made between groups. • The purpose of the study is to determine if an intervention caused an improvement. It is always important to read research with a critical eye, which includes being careful about accepting the author’s language. In this case, because the study used an experimental design, the author is indeed accurate in inferring that the intervention caused the improvement. The CIMT study is not identified as an experimental study, but as a randomized controlled trial. A randomized controlled trial is the same as a true experiment. Like a randomized controlled trial, with a true experiment at least two groups are compared, the two groups are manipulated, and participants are randomly assigned to a group. In experimental research there are true experiments and quasi-experiments. Quasi-experimental studies are also designed to answer cause-and-effect questions. The major difference is that in a true experiment, participants are randomly assigned to groups, whereas in a quasi-experimental study participants are not randomly assigned. The lack of random assignment results in a decrease in the researcher’s ability to draw cause-and-effect conclusions. The limitations of quasi-experimental studies are discussed in greater detail in Chapters 5 and 6. Pragmatic and ethical issues may arise in the decision to use a quasi-experimental study. For example, studies of school-aged children frequently compare two classrooms. Perhaps a therapist is interested in studying the efficacy of a new handwriting program. Because it is difficult to randomly assign children in the same classroom to two different approaches, one classroom receives the new 4366_Ch03_039-058.indd 41 41 handwriting program, and the second classroom acts as a control group and receives traditional instruction. In health-care research, the term nonrandomized controlled trial is often used; it can be equated with quasiexperiment. In this research design, two groups are compared, but the participants are not randomly assigned to groups. Often the groups are pre-existing (e.g., clients in two different clinics, classrooms, or housing units). In other nonrandomized controlled trials, groups are comprised of individuals who volunteer to receive the new intervention; they are then compared with individuals who receive an existing intervention. With all other factors being equal, a randomized controlled trial (as compared with a nonrandomized trial) provides stronger evidence that the intervention caused the outcome. This means that, although both types of experiments compare groups to determine cause-and-effect relationships, you can have greater confidence in the results of a randomized controlled trial. The primary limitation of a quasi-experimental study is that the lack of randomization may bias the selection process, such that there is a difference between the two groups at the outset. Using the two-classroom example, one teacher may be more effective and/or more experienced than the other teacher, so the outcomes may reflect the teachers’ abilities rather than the intervention. In other quasi-experimental studies, volunteers for an intervention are identified first, and a control group is identified later. Individuals who volunteer for an intervention are likely to be different from individuals who volunteer to participate as controls. For example, the intervention volunteers may be more motivated to change, have more self-efficacy or belief that they can change, or generally show more initiative toward following through with an extensive intervention process. These personal characteristics may contribute to improvements outside of the effects of the intervention. When a single group is compared before and after an intervention, the study is referred to as pre-experimental research; in health-care research, this is typically referred to as a pretest-posttest design. In this case, there is only one group, nothing is manipulated (i.e., all participants receive the same intervention), and there is no random assignment. Therefore, it stands to reason that this design is much less powerful than a design with a control or comparison group. The different types of experimental designs are discussed in greater detail in Chapters 5 and 6. In general, experimental research is intended to answer practice questions that involve causation. It is one of the most important types of research in evidence-based practice because it results in evidence-based decisions about the efficacy of interventions. Nonexperimental Research In contrast to experimental research, nonexperimental research cannot determine causal relationships; 28/10/16 1:47 pm FROM THE EVIDENCE 3-1 An Experimental Study That Answers an Efficacy Question Lin, K.C., Wu, C.Y., Wei, T.H., Gung, C., Lee, C.Y., & Liu, J.S. (2007). Effects of modified constraint-induced movement therapy on reach-tograsp movements and functional performance after chronic stroke: A randomized controlled study. Clinical Rehabilitation, 12(21): 1075–1086. doi:10.1177/0269215507079843. Note A: Hint: “Effects of” suggests an efficacy study. Note B: There are two groups, indicating that a comparison is being made. At first glance, this abstract from PubMed can be intimidating, but with a little practice you can quickly recognize an experimental study when you see one. Clin Rehabil. 2007 Dec;21(12):1075-86. Effects of modified constraint-induced movement therapy on reach-to-grasp movements and functional performance after chronic stroke: a randomized controlled study. Lin KC, Wu CY, Wei TH, Lee CY, Liu JS. Source: School of Occupational Therapy, College of Medicine, National Taiwan University and Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Taipei, Taiwan. Abstract OBJECTIVE: To evaluate changes in (1) motor control characteristics of the hemiparetic hand during the performance of a functional reach-to-grasp task and (2) functional performance of daily activities in patients with stroke treated with modified constraint-induced movement therapy. DESIGN: Two-group randomized controlled trial with pretreatment and posttreatment measures. SETTING: Rehabilitation clinics. SUBJECTS: Thirty-two chronic stroke patients (21 men, 11 women; mean age = 57.9 years, range = 43-81 years) 13-26 months (mean 16.3 months) after onset of a first-ever cerebrovascular accident. INTERVENTION: Thirty-two patients were randomized to receive modified constraint-induced movement therapy (restraint of the unaffected limb combined with intensive training of the affected limb) or traditional rehabilitation for three weeks. MAIN MEASURES: Kinematic analysis was used to assess motor control characteristics as patients reached to grasp a beverage can. Functional outcomes were evaluated using the Motor Activity Log and Functional Independence Measure. RESULTS: There were moderate and significant effects of modified constraint-induced movement therapy on some aspects of motor control of reach-to-grasp and on functional ability. The modified constraint-induced movement therapy group preplanned reaching and grasping (P = 0.018) more efficiently and depended more on the feedforward control of reaching (P = 0.046) than did the traditional rehabilitation group. The modified constraint-induced movement therapy group also showed significantly improved functional performance on the Motor Activity Log (P < 0.0001) and the Functional Independence Measure (P = 0.016). CONCLUSIONS: In addition to improving functional use of the affected arm and daily functioning, modified constraint-induced movement therapy improved motor control strategy during goal-directed reaching, a possible mechanism for the improved movement performance of stroke patients undergoing this therapy. Note C: The authors conclude that the constraint-induced therapy resulted in (caused) the improvements. FTE 3-1 Question This study is described as a group comparison between the CIMT group and the traditional rehabilitation group. What specifically about these two groups is being compared? 4366_Ch03_039-058.indd 42 28/10/16 1:47 pm CHAPTER 3 ● Research Methods and Variables nevertheless, this type of research is essential for evidence-based practice. Not all research questions are causal in nature; nonexperimental research can answer descriptive and relationship questions. Many different methods can be used to collect data and information in nonexperimental research. Common approaches include surveys, observation of behavior, standardized measures, and existing data from medical records. Group comparison studies may be conducted to answer descriptive questions when comparing existing groups. For example, in health-care research, researchers are often interested in the differences between people with and without a particular condition. When making these comparisons, it is not possible to arbitrarily or randomly assign individuals to groups. For example, when comparing people with and without schizophrenia, the researcher will recruit individuals with schizophrenia and compare the schizophrenia group with a similar group of individuals without schizophrenia. The research may elucidate how these groups differ in terms of particular cognitive abilities, employment rates, and quality of life. In addition, there is no manipulation. For this reason these studies are observational studies. In an observational study, the naturally occurring circumstances are studied, as opposed to assigning individuals to an intervention or research condition. From the Evidence 3-2 provides an example of a group comparison study that is intended to answer a descriptive question. No intervention is included in this study; instead, the researchers compare people with and without Parkinson’s disease. More specifically, they examine whether there is a difference between the two groups in gait when asked to walk while performing a cognitive task. Nonexperimental studies can describe people with a particular diagnosis, identify the incidence of a condition, or predict an outcome. For example, in a nonexperimental study of youth with autism, 55% were employed, and fewer than 35% had attended college six years after graduating from high school (Shattuck et al, 2012). This study provides helpful information to the practitioner because it identifies significant areas of need that can be addressed by health-care providers. In another example, Metta et al (2011) found that individuals with Parkinson’s disease were more likely to experience fatigue in the later stages of the disorder, or when they were also experiencing depression, anxiety, or difficulty sleeping. In this study, the cause of the fatigue is unclear, but health-care practitioners can use the information to predict when fatigue may need to be addressed. Dalvand et al (2012) found a strong relationship between gross motor function and intellectual function in children with cerebral palsy. It is improbable that the gross motor impairments cause the intellectual impairments, and more likely that the causal factor affecting both intellect and motor skills is related to brain functioning. Yet knowing that this relationship exists has 4366_Ch03_039-058.indd 43 43 implications for understanding a client who has cerebral palsy and developing appropriate intervention plans. Some nonexperimental studies examine relationships. These studies, which are also known as correlational studies, seek to determine whether a relationship exists between two constructs and, if so, assess the strength of that relationship. For example, rehabilitation studies often examine the relationship between a particular impairment, such as cognition or muscle strength, and activities of daily living. In correlational studies, the existence of the third variable problem always presents a potential alternative. In other words, the two constructs may indeed be related, but a third variable could account for that relationship or influence the relationship. The relationship between cold weather and flu season is a case in point. It is often stated that getting a chill or spending too much time outdoors in cold weather will cause the flu. However, scientifically it is known that viruses are the culprit. Why the connection? The third or fourth variable probably lies in one’s behavior during the winter; that is, instead of being outside, it is the extra time spent inside around and exposed to other sick people that results in catching the flu. Figure 3-1 illustrates the third variable problem in the context of developing the flu. Other well-known relationships that are probably more complex than a simple oneto-one cause and effect include video games and violent behavior, and lower weight and eating breakfast. Still, in the media, everyday conversation, and even in the scientific literature, correlations are sometimes presented in terms of causation. As an evidence-based practitioner, you know that correlation does not mean causation. In nonexperimental research, one cannot make conclusions related to causation; however, these studies answer questions that cannot be answered with experimental research. For example, seeking information about incidence or prevalence, identifying coexisting conditions, and predicting outcomes are practice issues for which nonexperimental research can provide the answers. Table 3-1 outlines the differences between experimental and nonexperimental research and some of the more specific designs within these types. (See Chapter 8 for more details about nonexperimental research.) Quantitative Research Research that uses statistics and describes outcomes in terms of numbers is quantitative research; in fact, most research is quantitative in nature. Quantitative research is centered on testing a hypothesis. The researcher develops a hypothesis based on prior knowledge that informs an idea or question to be answered. Consequently, the hypothesis is related to the research question, and the study is designed to answer the question. The research study is then designed to test the hypothesis, and the data collected will either support or fail to support the hypothesis. 28/10/16 1:47 pm 44 CHAPTER 3 ● Research Methods and Variables FROM THE EVIDENCE 3-2 A Nonexperimental Group Comparison Study That Answers a Descriptive Question Plotnick, M., Giladi, N., & Hausdorff, J. M. (2009). Bilateral coordination of gait and Parkinson’s disease: The effects of dual tasking. Journal of Neurology & Neurosurgical Psychiatry, 80(3), 347–350. doi:10.1136/jnnp.2008.157362. Note A: The question is not about the effectiveness of an intervention, but is descriptive in nature, asking if gait is affected in people with Parkinson’s disease when they are engaged in a cognitive task. Note B: Individuals with and without Parkinson’s disease are compared. Because of the condition, participants cannot be randomly assigned to groups. The etiology of gait disturbances in Parkinson's disease (PD) is not fully understood. Recently, it was shown that in patients with PD, bilateral coordination of gait is impaired and that walking while being simultaneously engaged in a cognitive task is detrimental to their gait. To assess whether cognitive function influences the bilateral coordination of gait in PD, this study quantified left-right stepping coordination using a phase coordination index (PCI) that evaluates both the variability and inaccuracy of the left-right stepping phase (phi) generation (where the ideal phi value between left and right stepping is 180 degrees). This report calculated PCI values from data obtained from force sensitive insoles embedded in subjects' shoes during 2 min of walking in a group of patients with PD (n = 21) and in an age matched control group (n = 13). All subjects walked under two walking conditions: usual walking and dual tasking (DT) (ie, cognitive loading) condition. For patients with PD, PCI values were significantly higher (ie, poorer coordination) during the DT walking condition compared with usual walking (p < 0.001). In contrast, DT did not significantly affect the PCI of the healthy controls (p = 0.29). PCI changes caused by DT were significantly correlated with changes in gait variability but not with changes in gait asymmetry that resulted from the DT condition. These changes were also associated with performance on a test of executive function. The present findings suggest that in patients with PD, cognitive resources are used in order to maintain consistent and accurate alternations in left-right stepping. Note C: The findings suggest that Parkinson’s disease requires individuals to use cognitive resources when walking and that the person is compromised when these cognitive resources are overloaded. FTE 3-2 Question Why must the researchers avoid drawing cause-and-effect conclusions from the results of a study comparing individuals with and without Parkinson’s disease? 4366_Ch03_039-058.indd 44 28/10/16 1:47 pm CHAPTER 3 ● Research Methods and Variables 45 People are more likely to develop the flu during colder months: Exposure to cold Develop the flu However, there are intervening variables between cold and flu: Exposure to cold Infected by others inside who have the flu Spend more time inside Develop the flu FIGURE 3-1 Cold and flu: An example of why practitioners need to be careful when interpreting relationships. TABLE 31 Experimental and Nonexperimental Research Type of Research Type of Question Designs Within a Type Experimental Efficacy Nonexperimental Descriptive Relationship 4366_Ch03_039-058.indd 45 Features of Design Other Terms Randomized controlled trial At least two groups Participants randomly assigned Groups are manipulated True experiment Nonrandomized controlled trial At least two groups Participants are not randomly assigned Groups are manipulated Quasi-experimental study Pretest/posttest One group All participants receive the same intervention or condition Pre-experiment Group comparison Two or more existing groups Groups compared to identify differences on one or more characteristics Observational Incidence/ prevalence The occurrence of one or more characteristics is calculated Observational Correlation The relationship between two or more constructs is calculated Observational Predictive Multiple predictors are considered in terms of their impact on a particular outcome Observational 28/10/16 1:47 pm 46 CHAPTER 3 ● Research Methods and Variables Starting with a hypothesis strengthens the research design and is much more effective than collecting data on a topic and hoping to find something interesting in the data. It is important to emphasize that the results of a study can only support a hypothesis; the hypothesis is never proven. With replication and similar results, our confidence that a hypothesis is correct is bolstered. Similarly, a hypothesis cannot be disproven, but it may be discarded for lack of evidence to support it. Just as with research types, hypotheses are described using various classifications, such as directionality or support. A hypothesis can be either directional or nondirectional. A directional hypothesis indicates that the researcher has an assumption or belief in a particular outcome. A nondirectional hypothesis is exploratory, suggesting that the researcher does not have a prior notion about what the study results may be, but may assume that a difference or relationship exists. For example, in a correlational study a researcher may set out to examine the relationship between being bilingual and particular cognitive abilities such as memory, problem-solving, cognitive flexibility, and divided attention. The researcher may have a directional hypothesis that bilingualism is associated with greater cognitive flexibility and divided attention. In contrast, the researcher may go into the study expecting a relationship, but not speculating about the particular associations (nondirectional hypothesis). Generally speaking, it is preferable from a scientific method perspective to begin a study with a directional hypothesis. A strong hypothesis is one that is clear and testable. When the researcher has an expected outcome, the study can be better designed to collect data that will test the hypothesis. A researcher who goes into a study blind may be described as going on a “fishing expedition”; he or she is likely to find something, but that something may be difficult to explain. In addition, findings gathered from a “shotgun approach,”—one in which the researcher hunts for interesting findings among the data—are more likely to represent a chance result that is difficult to replicate. In a hypothesis-driven study, statistical analysis is used to determine whether or not the hypothesis is supported. However, not everything can be quantified, and numbers tend to obscure the uniqueness and individuality of the participants. A drawback of quantitative research is the loss of the individual when data are combined in means, standard deviations, and other statistics. Qualitative Research Qualitative research is an important type of research for answering questions about meaning and experience. Qualitative research provides a more personal and indepth perspective of the person or situation being studied than quantitative research. In that way, qualitative research is able to provide information that quantitative research cannot. Although qualitative research is nonexperimental 4366_Ch03_039-058.indd 46 research, it is included as a separate type in this text because it is so different. In fact, qualitative research operates from a unique and different paradigm or way of thinking. Qualitative research utilizes inductive reasoning. Instead of beginning with a hypothesis and working down to determine if the evidence supports the hypothesis (deductive reasoning), qualitative research moves from the specific to the general (inductive reasoning). Very specific information is collected from interviews, observations, and the examination of documents or other artifacts, and the qualitative researcher looks for patterns in the data. These patterns are identified as themes and, in some qualitative research, a theory is developed out of the patterns that have been discovered. In qualitative research, the focus is on the experience of the individual and, most importantly, from the perspective of the individual. Qualitative research looks for meaning in individual stories and experiences. Extensive data are collected on a few individuals (sometimes only one). In qualitative research, it is important to gather extensive information so that the researcher has a thorough understanding of the phenomenon in question. There is an emphasis on discovery as opposed to confirmation. Once the data are collected, the analysis, which may include photographs or diagrams, identifies recurring themes within the data. Although the research methods vary, qualitative research articles utilize the same format as quantitative research, with introduction, methods, results, and discussion sections. Table 3-2 outlines the differences between quantitative and qualitative research. Qualitative research is a broad term that encompasses several different designs, such as ethnography, grounded theory, phenomenology, and participatory action research. These topics are described in greater detail in Chapter 9. An example of a qualitative study is provided in From the Evidence 3-3. This study looks at the meaning of multiple sclerosis from the perspective of the individual with the condition. In-depth information is collected on a relatively small number of individuals, and the results are presented as themes instead of being summarized in a numerical format. This abstract of a qualitative study, particularly the results section, looks very different from most quantitative research. Some of the unique features of qualitative research are identified in this study, which delves into the lived experience of people with multiple sclerosis. Although qualitative research is not intended to be generalizable to the larger population, this form of evidence can inform health-care practice. For example, after reading this study, practitioners may become more sensitive to how the illness requires a reformulation of self-concept. It is important to distinguish qualitative research from descriptive survey research, which is quantitative and nonexperimental. A survey could be developed to answer questions about the emotional, social, and practical impact of multiple sclerosis. However, if numbers are used and 28/10/16 1:48 pm CHAPTER 3 ● Research Methods and Variables 47 TABLE 32 Differences Between Quantitative and Qualitative Research Category Quantitative Research Qualitative Research Purpose Tests theory and/or hypotheses; focus is on confirmation Builds theory and/or explores phenomenon; focus is on discovery Point of view Outsider Objective Insider Subjective Reasoning Deductive Inductive Data collection Use of quantifiable, typically standardized measures with many participants Interviews and observations of a few individuals in their natural environments Data analysis Descriptive and inferential statistics Identification of themes using text or pictures Evaluating the rigor of the research Reliability and validity: Are the data accurate and consistent? Trustworthiness: Are the data believable? combined among the participants (e.g., the percentage of respondents who report feeling depressed), the study becomes quantitative research. In qualitative research, openended questions are used, and participants are typically interviewed multiple times. Many other methods, such as focus groups, observations, and reviews of documents and artifacts, are incorporated in qualitative research to ensure that a deep understanding is reached. The results of qualitative research are presented narratively and can take the form of stories and/or themes. Sometimes researchers use a mixed-methods approach and combine qualitative and quantitative methods. Cross-Sectional and Longitudinal Research Research can also be categorized by the time period over which data are collected. In cross-sectional research, data are collected at a single point in time, whereas longitudinal research requires that data be collected over at least two time points and typically covers an extended period of time, such as several years or decades. Cross-sectional studies use nonexperimental methods; they are observational in nature, meaning that the researcher does not manipulate a situation (e.g., provide an intervention). Descriptive and correlational studies frequently use cross-sectional research. Longitudinal research is intended to examine the effect of time (such as development, aging, or recovery) on some phenomenon (such as cognition, independent living, or language). 4366_Ch03_039-058.indd 47 Most longitudinal studies examine naturalistic changes, making them observational. However, occasionally intervention studies examine the impact of an intervention over an extended time period. Although there is no time frame criterion for determining whether a study is longitudinal, a simple pretest-posttest intervention study is not considered longitudinal. Therefore, most intervention studies are neither cross-sectional nor longitudinal. Cross-sectional studies often compare different groups of individuals at the same point in time, whereas longitudinal studies compare the same people over several time points. One advantage of cross-sectional studies is that they are efficient because all of the data are collected at once, or at least over a short period of time. The results of crosssectional studies are available quickly. Because longitudinal research examines changes over time, the studies are more time consuming, and it is more likely that researchers will lose participants, which compromises follow-up. The results of longitudinal studies come more slowly than those of cross-sectional studies. Imagine studying the same group of individuals for 30 years. Although the results will be extremely valuable, such research requires a lot of patience from researchers and participants. When cross-sectional and longitudinal methods are used to answer the same questions about the impact of time on some phenomenon, longitudinal findings generally have greater credibility. In a cross-sectional study, individuals of different ages, developmental stages, or points in recovery are compared, whereas in a longitudinal study 28/10/16 1:48 pm 48 CHAPTER 3 ● Research Methods and Variables FROM THE EVIDENCE 3-3 A Qualitative Experience Examining the Lived Experience of Multiple Sclerosis Mozo-Dutton, L., Simpson, J., & Boot, J. (2012). MS and me: Exploring the impact of multiple sclerosis on perceptions of self. Disability and Rehabilitation, 34(14), 1208–1217. [Epub December 13, 2011]. doi:10.3109/09638288.2011.638032. Note A: Twelve participants are included in the study, which is actually a fairly large number for qualitative research. Note B: The phenomenon of self-perception is analyzed thematically with words. Note C: The researchers are “exploring” rather than determining, confirming, proving, etc. Abstract Purpose: The aim of this qualitative study was to explore the impact of multiple sclerosis (MS) on perceptions of self as well as the emotional, social, and practical implications of any self-reported changes. Method: Twelve participants were interviewed and interpretative phenomenological analysis used to analyze the data. Participants were recruited from a MS hospital clinic in the north-west of England. Results: Four themes were identified although for reasons of space and novelty three were discussed, (i) 'my body didn't belong to me': the changing relationship to body, (ii) 'I miss the way I feel about myself': the changing relationship to self, and (iii) 'let's just try and live with it': incorporating yet separating MS from self. Conclusions: The onset of MS was seen to impact upon self yet impact did not necessarily equate with a loss of self but rather a changed self. Self-related changes did, however, carry the potential to impact negatively upon a person's mood and psychological functioning and consequently, clinicians are encouraged to consider issues relating to self as standard. FTE 3-3 Question Compare the abstract of the qualitative study to the abstracts in FTE 3-1 and FTE 3-2. Beyond the difference in reporting thematically as opposed to numerically, what other differences do you notice in this qualitative study? the same individuals are followed as they get older, develop, or recover, making it possible to determine if people actually change over time. Consider a study that examines cooking practices in the home. A cross-sectional study may find that older adults prepare more food at home than younger adults. However, a longitudinal study that has followed individuals over 30 years finds that older adults actually cook at home less than they did when they were younger and had children at home. 4366_Ch03_039-058.indd 48 A longitudinal study is better at providing evidence as to how people change over time, whereas a cross-sectional study identifies differences in people at one point in time. Basic and Applied Research The concepts of basic and applied research are better conceptualized as a continuum rather than absolute and discrete categories. What appears to be basic research to 28/10/16 1:48 pm CHAPTER 3 ● Research Methods and Variables 49 EVIDENCE IN THE REAL WORLD Cross-Sectional and Longitudinal Research in Older Drivers In one example, a cross-sectional study suggested that older drivers (above age 65) are more likely to make safety errors than younger drivers (ages 40 to 64) (Askan et al, 2013). However, the cause-and-effect relationship was ambiguous. Does age alone affect driving, or do the conditions associated with aging impact driving? Longitudinal studies can provide more information about the cause of the difficulty. Two studies, one that examined driver safety over 2 years (Askan et al, 2012) and one that looked at driving cessation over 10 years (Edwards, Bart, O’Connor, & Cissell, 2010), followed a group of older adults over time. Both studies concluded that diminished cognitive functioning, particularly speed of processing and visual attention, was associated with driving problems. Taken together, these studies suggest that it is important to assess cognitive function in older adults when making decisions about driving, rather than relying on age as a determinant of fitness to drive. Gaining this information was possible only because the same individuals were followed over time. a clinician may be categorized as applied research to a scientist working in a laboratory. Generally speaking, basic research is used to investigate fundamental questions that are directed at better understanding individual concepts. Using brain imaging techniques to determine the purposes of different brain regions, identifying environmental conditions that contribute to a stress reaction, and examining the role of proteins in antibody responses are examples of research on the basic end of the continuum. Applied research, in contrast, has direct application to health-care practices. Studies that determine the efficacy of a fall prevention program, describe the prevalence of foot drop in multiple sclerosis, and ascertain the strength of the relationship between diet and ADHD symptoms would be considered examples of research on the applied end of the continuum. One might ask, “If applied research has a clear application and basic research does not, then why are clinicians interested in basic research at all?” One reason is because we never know what the implications of basic research might be. Basic research may lead to unintended discoveries. However, even scientists conducting basic research who are far removed from clinical practice typically have some real-world application in mind. The information obtained from a basic scientist who studies cellular differences in aging brains can lead to both an understanding of and treatment for Alzheimer’s disease. However, additional research will have to take place to make the link. In contrast, an applied researcher who studies caregiver training for people with dementia will be able to make direct recommendations for practice. Both basic and applied research are important endeavors and contribute to rehabilitation practice. When the two come together, the result is translational research. Translational research takes place when findings from the laboratory are used to generate clinical research. The newest institute of the National Institutes of Health (NIH) is the National Center for Advancing Translational Sciences, 4366_Ch03_039-058.indd 49 which has a mission to promote more translational research (National Institutes of Health, n.d.). Consider the example of neuroplasticity and CIMT, as illustrated in the continuum shown in Figure 3-2. The early researchers, who used basic cell-based experiments, likely did not predict that CIMT would be an application, yet the applied research was based on earlier cellular and then animal studies (Gleese & Cole, 1949). Basic animal research indicates that the brain can adapt to injury, and that injury can lead to nonuse (Taub, 1980). Early developers of CIMT theorized that, by forcing individuals to use the affected limb, they would prevent learned non-use from occurring and promote brain reorganization (Buonoman & Merzenick, 1998). Research on people with stroke provides evidence in support of these theories. One applied study indicates that CIMT is an effective intervention for improving function of the affected limb (Wolf et al, 2006), and another applied study using brain imaging indicates changes in the brain after CIMT intervention (Laible et al, 2012). EXERCISE 3-1 Identifying Types of Studies (LO1) QUESTIONS Classify the studies described in this exercise as experimental or nonexperimental, quantitative or qualitative, cross-sectional or longitudinal, and basic or applied. 1. Hu, G., Bidel, S., Jousilahti, P., Antikainen, R., & Tuomilehto, J. (2007). Coffee and tea consumption and the risk of Parkinson’s disease. Movement Disorders, 15, 2242-2248. Several prospective studies have assessed the association between coffee consumption and Parkinson’s disease (PD) risk, but the results are inconsistent. 28/10/16 1:48 pm 50 CHAPTER 3 ● Research Methods and Variables We examined the association of coffee and tea consumption with the risk of incident PD among 29,335 Finnish subjects aged 25 to 74 years without a history of PD at baseline. During a mean follow-up of 12.9 years, 102 men and 98 women developed an incident PD. The multivariate-adjusted (age, body mass index, systolic blood pressure, total cholesterol, education, leisure-time physical activity, smoking, alcohol and tea consumption, and history of diabetes) hazard ratios (HRs) of PD associated with the amount of coffee consumed daily (0, 1-4, and ≥ 5 cups) were 1.00, 0.55, and 0.41 (P for trend = 0.063) in men, 1.00, 0.50, and 0.39 (P for trend = 0.073) in women, and 1.00, 0.53, and 0.40 (P for trend = 0.005) in men and women combined (adjusted also for sex), respectively. In both sexes combined, the multivariate-adjusted HRs of PD for subjects drinking ≥ 3 cups of tea daily compared with tea nondrinkers was 0.41 (95% CI 0.20-0.83). These results suggest that coffee drinking is associated with a lower risk of PD. More tea drinking is associated with a lower risk of PD. 2. Troche, M. S., Okun, M. S., Rosenbek, J. C., Musson, N., Fernandez, H. H., Rodriguez, R., . . . Sapienza, C. M. (2010). Aspiration and swallowing in Parkinson disease and rehabilitation with EMST: A randomized trial. Neurology, 23, 1912-1919. Objective Dysphagia is the main cause of aspiration pneumonia and death in Parkinson disease (PD), with no established restorative behavioral treatment to date. Reduced swallow safety may be related to decreased elevation and excursion of the hyolaryngeal complex. Increased submental muscle force generation has been associated with expiratory muscle strength training (EMST) and subsequent increases in hyolaryngeal complex movement provide a strong rationale for its use as a dysphagia treatment. The current study objective was to test the treatment outcome of a 4-week device-driven EMST program on swallow safety and define the physiologic mechanisms through measures of swallow timing and hyoid displacement. Methods This was a randomized, blinded, sham-controlled EMST trial performed at an academic center. Sixty participants with PD completed EMST for 4 weeks, 5 days per week, for 20 minutes per day, using a calibrated or sham handheld device. Measures of 4366_Ch03_039-058.indd 50 swallow function, including judgments of swallow safety (penetration-aspiration [PA] scale scores), swallow timing, and hyoid movement, were made from videofluoroscopic images. Results No pretreatment group differences existed. The active treatment (EMST) group demonstrated improved swallow safety compared to the sham group as evidenced by improved PA scores. The EMST group demonstrated improvement of hyolaryngeal function during swallowing, findings not evident for the sham group. Conclusions EMST may be a restorative treatment for dysphagia in those with PD. The mechanism may be explained by improved hyolaryngeal complex movement. Classification of Evidence This intervention study provides Class I evidence that swallow safety as defined by PA score improved post EMST. 3. Todd, D., Simpson, J., & Murray, C. (2010). An interpretative phenomenological analysis of delusions in people with Parkinson’s disease. Disability and Rehabilitation, 32, 1291-1299. Purpose The aim of this study was to explore what delusional experiences mean for people with Parkinson’s disease (PD) and to examine how psychosocial factors contribute to the development and maintenance of delusional beliefs. Method Eight participants were interviewed, and interpretative phenomenological analysis was used to identify themes within their accounts. Participants were either recruited from a hospital-based outpatient movement disorder clinic or from a PD support group in the northwest of England. Results Four themes emerged from the analysis: (1) “I got very frightened”: The emotional experience associated with delusions; (2) “Why the hell’s that happening?”: Sense of uncertainty and of losing control; (3) “I feel like I’m disintegrating”: Loss of identity and sense of self; (4) “I’ve just tried to make the best of things”: Acceptance and adjustment to experience of delusions. These interconnected 28/10/16 1:48 pm CHAPTER 3 ● Research Methods and Variables themes in participants’ accounts of delusional beliefs were reflected in their descriptions of living with, and adjusting to, PD. Conclusions The results of this study add to the evidence base indicating the urgent examination of psychological alternatives to conventional, medication-based approaches to alleviating distress caused by delusions in people with PD. 4. Van der Schuit, M., Peeters, M., Segers, E., van Balkom, H., & Verhoeven, L. (2009). Home literacy environment of pre-school children with intellectual disabilities. Journal of Intellectual Disability Research, 53, 1024-1037. Background For preschool children, the home literacy environment (HLE) plays an important role in the development of language and literacy skills. As little is known about the HLE of children with intellectual disabilities (IDs), the aim of the present study was to investigate the HLE of children with IDs in comparison to children without disabilities. Method Parent questionnaires concerning aspects of the HLE were used to investigate differences between 48 children with IDs, 107 children without disabilities of the same chronological age, and 36 children without disabilities of the same mental age (MA). Rat and primate studies indicate that brain components reorganize to recover function after brain injury Gleese & Cole, 1949 51 Furthermore, for the children with IDs, correlations were computed between aspects of the HLE and children’s nonverbal intelligence, speech intelligibility, language, and early literacy skills. Results and Conclusions From the results of the multivariate analyses of variance, it could be concluded that the HLE of children with IDs differed from that of children in the chronological age group on almost all aspects. When compared with children in the MA group, differences in the HLE remained. However, differences mainly concerned child-initiated activities and not parent-initiated activities. Correlation analyses showed that children’s activities with literacy materials were positively related with MA, productive syntax and vocabulary age, and book orientation skills. Also, children’s involvement during storybook reading was related with their MA, receptive language age, productive syntax and vocabulary age, book orientation, and rapid naming of pictures. The amount of literacy materials parents provided was related to a higher productive syntax age and level of book orientation of the children. Parent play activities were also positively related to children’s speech intelligibility. The cognitive disabilities of the children were the main cause of the differences found in the HLE between children with IDs and children without disabilities. Parents also adapt their level to the developmental level of their child, which may not always be the most stimulating for the children. Research with monkeys leads to concept of learned non-use Taub, 1980 Changes in hand function from CIMT are related to functional MRI change in activation in the primary sensory cortex in individuals with CVA Laible et al, 2012 Somatosensory cortex of brain is modifiable and can be mapped as markers of recovery Buonoman & Merzenick, 1998 EXCITE RCT of 222 participants finds constraint-induced movement therapy (CIMT) is effective in improving upper extremity function Wolf et al, 2006 FIGURE 3-2 An example of translational research with a continuum of basic to applied research in the topic of stroke intervention. 4366_Ch03_039-058.indd 51 28/10/16 1:48 pm 52 CHAPTER 3 ● Research Methods and Variables HYPOTHESIS TESTING: TYPE I AND TYPE II ERRORS When conducting a quantitative study, the researcher typically decides to accept or reject the hypothesis based on the p value obtained from the statistical analysis. If p is less than or equal to 0.05, the hypothesis is accepted. A p value that is less than 0.05 means that there is less than a 5% chance that the hypothesis is incorrect (from a statistical point of view). When p is greater than 0.05, the hypothesis is rejected. How do you know that the conclusion that is reached is accurate? Well, you don’t. It is possible that the results are misleading and in actuality the statistical conclusion is incorrect. Statistical conclusion validity is discussed in greater detail in Chapter 4. Mistakes that occur when interpreting the results of a study can fall into two categories: Type I and Type II errors. A Type I error occurs when the hypothesis is accepted, yet the hypothesis is actually false. This error might occur because of chance. Although by convention we generally accept the research hypothesis when p ≤ 0.05, there is still a 5% chance that the hypothesis is incorrect. A Type II error occurs when the hypothesis is rejected, yet the hypothesis is true. Perhaps the most common reason for a Type II error is a sample size that is too small: The smaller the sample, the more difficult it is to detect a difference between two groups. Therefore, a study with too few participants will often draw the incorrect conclusion that an intervention was ineffective. Table 3-3 illustrates the decision-making process associated with hypothesis testing. VARIABLES Variables are characteristics of people, activities, situations, or environments that are identified and/or measured in a study and have more than one value. As the name implies, a variable is something that varies. For example, if a research sample is comprised of children with TABLE 33 Hypothesis Testing Hypothesis Is True Hypothesis Is False Hypothesis accepted Correct decision Type I error Hypothesis rejected Type II error Correct decision EXERCISE 3-2 Matching Research Questions and Categories of Research (LO2) QUESTIONS Using these questions from Chapter 1, indicate whether they would best be answered using qualitative or quantitative research. Then consider whether the type of research will most likely be experimental or nonexperimental. 1. For wheelchair users, what is the best cushion to prevent pressure sores? 2. What are the gender differences in sexual satisfaction issues for individuals with spinal cord injury? 3. How do athletes deal with career-ending injuries? 4. What childhood conditions are related to stuttering in children? autism, autism is not a variable. However, if you compare girls and boys with autism, gender is a variable. As in this example, some variables are categorical; however, in research categorical variables may be assigned a number and compared. Other examples of categorical variables are group assignment, such as control and intervention groups, race, and geographical region. Continuous variables are ones in which the numbers have meaning in relation to one another; that is, a higher number means there is more of something. Using the autism example, severity of autism would be a continuous variable, as would age and grade level. In research there are different types of variables. Familiarity with variables is important for understanding research designs and choice of statistic. Four important types of variables in research are independent, dependent, control, and extraneous. Independent Variables 4366_Ch03_039-058.indd 52 Independent variables are the variables that are manipulated or compared in a study. In research concerning interventions, the independent variable is often identified as the intervention being studied. What varies is whether individuals are assigned to the intervention or control (no-intervention) group. In this example, the independent 28/10/16 1:48 pm CHAPTER 3 ● Research Methods and Variables variable has two levels: intervention and control. Of course, a study can involve more than two groups: A researcher can compare two different types of interventions and then use a third group that is a no-treatment control. In this case the independent variable has three levels. The levels of an independent variable are the number of categories or groups that make up the variable. You can also think of the independent variable as the variable by which a comparison is made between two or more groups. In some cases you may not manipulate the independent variable, but simply make a comparison. For example, if you compare boys and girls or people with and without autism, these independent variables are not manipulated. Researchers often will consider more than one independent variable in a study. An intervention study of children with autism could compare a social skills intervention group, a play therapy group, and a no-treatment control group, as well as determine if there are differences in how boys and girls respond to the interventions. This study would be identified as a 3 ⫻ 2 design. There are two independent variables (intervention type and gender). One independent variable (intervention type) has three levels (social skills intervention, play therapy, and no treatment), and the other independent variable (gender) has two levels (boys and girls). When more than one independent variable is included in a study, the study is described as a factorial design. In a factorial design, the interaction or impact of both independent variables can be examined simultaneously. Factorial designs are discussed in more detail in Chapter 6. Dependent Variables Dependent variables are observed and, in the case of an experimental study, are intended to measure the result of the manipulation. Returning to the preceding example, suppose that the researcher is interested in the effect of the intervention on the communication skills of the participants. Communication skills would be the dependent variable. In an intervention study, the dependent variable is what is measured and may also be referred to as the outcome or outcome variable. In a weight loss study, one dependent variable would likely be weight. In a descriptive group comparison study, the dependent variable is the characteristic(s) that is measured for differences between the groups. For example, individuals with and without obesity may be compared to identify differences in sleeping patterns; in this case, sleeping patterns would be the dependent variable. Control Variables Control variables are those variables that remain constant. These variables could potentially affect the outcome of a study, but they are controlled by the design 4366_Ch03_039-058.indd 53 53 of the study or the statistical procedures used. Again using the preceding example, age may be a variable that could affect social skills, because you would expect that older children would have better social skills than younger children. This variable could be controlled in a number of ways: all children in the study could be the same age, the ages of children could be the same across the three groups, or age could be controlled statistically through a procedure such as an analysis of covariance. Analysis of covariance is explained in greater detail in Chapter 4. When groups are compared, it is important that they be as similar as possible in all factors other than the independent variable(s) of interest. The more control that is in place, the more you can be confident that the independent variable caused the change in the dependent variable. Methods used to promote similarity across groups include random assignment and matching. When groups are not similar and a variable is not controlled—for example, children in one group are older than the children in another group—the study may be confounded. This means that the difference in the variable (in this example age) could account for any differences found between the independent variables being studied. Again, older children would be expected to have better social skills than younger children. Extraneous Variables It is impossible to control everything, and researchers find it difficult, if not impossible, to imagine all of the potential variables that can influence the outcome of a study. However, sometimes extraneous variables are tracked and then later examined to determine their influence. Using the autism study as an example, the number of siblings could influence the outcome, because having siblings in the household could support the development of social skills. Therefore, the researcher may keep track of the number of siblings for each participant and then determine if the number of siblings affected the outcome. The researcher would report this influence if it did indeed exist, and future research might then take this into account and try to control for this variable. From the Evidence 3-4 illustrates the different types of variables found in an experimental study investigating the efficacy of a psychoeducation intervention for people with schizophrenia. This study compares a psychosocial intervention group with a control group (independent variable) on insight and willingness to engage in treatment (dependent variables). The researcher suggests that the short length of treatment may have affected the findings (a possible extraneous variable). The use of dependent and independent variables only applies to studies in which groups are compared (the independent variable) and then observed on some measure 28/10/16 1:48 pm 54 CHAPTER 3 ● Research Methods and Variables FROM THE EVIDENCE 3-4 Abstract of an Experimental Study Examining an Intervention for People With Schizophrenia to Illustrate Types of Variables Medalia, A., Saperstein, A., Choi, K. H., & Choi, J. (2012). The efficacy of a brief psycho-educational intervention to improve awareness of cognitive dysfunction in schizophrenia. Psychiatry Research, 199(3), 164–168. doi:10.1016/j.psychres.2012.04.042. Note A: The two groups (psychoeducational intervention and control condition) are manipulated and therefore would be classified as independent variables. Note B: Participants are the same at baseline on several variables. Because they are the same, these variables are controlled. People with schizophrenia have neuro-cognitive deficits that are associated with poor functional outcome, yet their awareness of their cognitive deficiencies is variable. As new treatments for cognition are developed, it will be important that patients are receptive to the need for more therapy. Since insight into symptoms has been associated with treatment compliance, it may be of value to provide psycho-education to improve understanding about cognition in schizophrenia. We report a randomized controlled trial that enrolled 80 subjects in either a brief psycho-education intervention about cognition, or a control condition. Subjects in the two conditions did not differ at baseline in insight or receptiveness to treatment, or on demographic, cognitive, or psychiatric variables. Current cognitive impairment of subjects was evidenced by the indices of working memory, attention, and executive functioning abilities, (X = 77.45 intervention group; 82.50 control condition), that was significantly below both the normative mean and estimated average premorbid IQs (X = 101.3 intervention group; X = 104.57 control condition). Multivariate repeated measures (ANOVAs) indicated that subjects who received the psycho-education did not improve insight into their cognitive deficits or willingness to engage in treatment for cognitive dysfunction. While the failure to find a significant impact of this intervention on awareness of cognitive deficit and receptiveness to cognitive treatment raises questions about the malleability of insight into neuro-cognitive deficits, the intervention was briefer than most reported psycho-education programs and multi-session formats may prove to be more effective. Note C: The outcome variables in this study are insight and willingness to engage in treatment. These are the dependent variables of the study. FTE 3-4 Question Write a hypothesis that would pertain to this research study. The research questions should contain the independent and dependent variables and be testable. Did the findings of the study support the hypothesis? 4366_Ch03_039-058.indd 54 28/10/16 1:48 pm CHAPTER 3 ● Research Methods and Variables (the dependent variable). A study describing the social skill impairments in autism or correlating social skill impairments with school performance would not have dependent and independent variables; rather, social skill impairments and school performance would be the variables of interest. Control and extraneous variables still exist in descriptive research. For example, controlling for age and considering the number of siblings as an extraneous variable would be relevant and important in descriptive studies of autism. 55 designing a dysphagic patient’s management plan. The longer-term impact of short-term prevention of aspiration requires further study. Identify the following variables as: A. B. C. D. Independent Dependent Control Extraneous 1. Dementia and Parkinson’s disease EXERCISE 3-3 Identifying Variables (LO3) The following study is an experiment that uses only one group. Also called a crossover design, all individuals receive the same treatments, but in a random order. This design is typically considered comparable to a randomized controlled trial with more than one group. Logemann, J. A., Gensler, G., Robbins, J., Lindblad, A. S., Brandt, D., Hind, J. A., . . . Miller Gardner, P. J. (2008). A randomized study of three interventions for aspiration of thin liquids in patients with dementia or Parkinson’s disease. Journal of Speech Language Hearing Research, 51, 173-183. Purpose This study was designed to identify which of three treatments for aspiration of thin liquids—chin-down posture, nectar-thickened liquids, or honey-thickened liquids—results in the most successful immediate elimination of aspiration of thin liquids during the videofluorographic swallow study in patients with dementia and/or Parkinson’s disease. Method This randomized clinical trial included 711 patients ages 50 to 95 years who aspirated thin liquids as assessed videofluorographically. All patients received all three interventions in a randomly assigned order during the videofluorographic swallow study. Results Immediate elimination of aspiration of thin liquids occurred most often with honey-thickened liquids for patients in each diagnostic category, followed by nectar-thickened liquids and then chin-down posture. Patient preference was best for chin-down posture, followed closely by nectar-thickened liquids. Conclusion To identify the best short-term intervention to prevent aspiration of thin liquids in patients with dementia and/or Parkinson’s disease, a videofluorographic swallow assessment is needed. Evidence-based practice requires taking patient preference into account when 4366_Ch03_039-058.indd 55 2. Individuals with aspiration of thin liquids 3. Degree of aspiration 4. Three treatments: chin- down posture, nectar-thickened liquids, honey- thickened liquids 5. Patient preference 6. Patient’s anxiety level CRITICAL THINKING QUESTIONS 1. What are the differences between experimental, quasiexperimental, and pre-experimental research? 2. How is an experimental longitudinal study different from a nonexperimental longitudinal study? 28/10/16 1:48 pm 56 CHAPTER 3 ● Research Methods and Variables 3. How does basic research relate to applied research? ANSWERS EXERCISE 3-1 4. What is the major limitation of cross-sectional studies when it comes to understanding differences across time? 1. Nonexperimental, quantitative, longitudinal, applied 2. Experimental, quantitative; neither cross-sectional (because it is experimental and variables are manipulated rather than observed) nor longitudinal (follow-up is not long enough); applied 3. Nonexperimental, qualitative; neither cross-sectional nor longitudinal, as these terms are applied only to quantitative research; applied 4. Nonexperimental, quantitative, cross-sectional, applied EXERCISE 3-2 5. What contributions does qualitative research make to evidence-based practitioners? 6. Why is it important to have a directional hypothesis in quantitative research? 1. Quantitative and experimental—this is an efficacy study that would likely compare either groups of individuals using different cushions, or compare the same people using different cushions. 2. Quantitative and nonexperimental—because this is a descriptive question, the researcher would likely identify the percentage of individuals with different sexual satisfaction issues and then determine if there were differences between men and women (a nonexperimental group comparison). 3. Qualitative and nonexperimental—this question is more exploratory in nature and does not imply hypothesis testing. Instead, the researcher is searching for the answer(s). 4. Quantitative and nonexperimental—with this question, predictors are examined to determine relationships between particular conditions and stuttering. EXERCISE 3-3 7. Why is it important to operate without a hypothesis in qualitative research? 1. 2. 3. 4. 5. 6. C C B A B D FROM THE EVIDENCE 3-1 8. Explain the difference between control variables and extraneous variables. 4366_Ch03_039-058.indd 56 The manipulation in this study is the two groups: the experimental group receiving CIMT and the control group receiving traditional rehabilitation. These two groups are compared to determine if there is a difference in the reach-to-grasp performance of the hemiparetic hand and if there is a difference in the performance of activities of daily living. More importantly, the researcher wants to know if the improvements from pretest to posttest for the CIMT group are greater than improvements from pretest to posttest for the traditional rehabilitation group. This type of comparison is discussed in much greater detail in Chapter 5. 28/10/16 1:48 pm CHAPTER 3 ● Research Methods and Variables FROM THE EVIDENCE 3-2 In this nonexperimental group comparison study, individuals cannot be randomly assigned to groups based on their condition of Parkinson’s disease or no Parkinson’s disease. Although it may be sensible to conclude that the Parkinson’s disease causes the difference in outcomes, there could be other factors common to this group that explain the difference. For example, the medications taken for Parkinson’s disease could make the difference, rather than the disease itself. FROM THE EVIDENCE 3-3 There is no hypothesis; instead, the study is open to what might be found. One difference you might have noted is that there are no groups. Even among the 12 participants, the data are not aggregated, but rather presented individually. In a qualitative study, a subjective approach with an interview is used to both collect and interpret the data. The results are presented from the point of view of the participants. The conclusions are based on inductive reasoning. Specific quotations are used to reveal a more general experience of MS that involved a change in self. FROM THE EVIDENCE 3-4 Hypothesis: A psychoeducational intervention for people with schizophrenia will increase insight into cognitive deficits and willingness to engage in treatment. The hypothesis was not supported, as the intervention group did not have better outcomes than the control group on the dependent variables of insight and willingness to engage in treatment. REFERENCES Askan, N., Anderson, S. W., Dawson, J. D., Johnson, A. M., Uc, E. Y., & Rizzo, M. (2012). Cognitive functioning predicts driver safety on road tests 1 and 2 years later. Journal of the American Geriatric Society, 60, 99–105. Askan, N., Dawson, J. D., Emerson, J. L., Yu, L., Uc, E. Y., Anderson, S. W., & Rizzo, M. (2013). Naturalistic distraction and driving safety in older drivers. Human Factors, 55, 841–853. Buonoman, D. V., & Merzenick, M. M. (1998). Cortical plasticity: From synapses to maps. Annual Review of Neuroscience, 21, 149–186. Dalvand, H., Dehghan, L., Hadian, M. R., Feizy, A., & Hosseini, S. A. (2012). Relationship between gross motor and intellectual function in children with cerebral palsy: A cross-sectional study. Archives of Physical Medicine and Rehabilitation, 93, 480–484. Edwards, J. D., Bart, E., O’Connor, M. L., & Cissell, G. (2010). Ten years down the road: Predictors of driving cessation. Gerontologist, 50, 393–399. 4366_Ch03_039-058.indd 57 57 Gleese, P., & Cole, J. (1949). The reappearance of coordinated movement of the hand after lesions in the hand area of the motor cortex of the rhesus monkey. Journal of Physiology, 108, 33. Hu, G., Bidel, S., Jousilahti, P., Antikainen, R., & Tuomilehto, J. (2007). Coffee and tea consumption and the risk of Parkinson’s disease. Movement Disorders, 15, 2242–2248. Laible, M., Grieshammer, S., Seidel, G., Rijntjes, M., Weiller, C., & Hamzei, F. (2012). Association of activity changes in the primary sensory cortex with successful motor rehabilitation of the hand following stroke. Neurorehabilitation and Neural Repair, 26, 881–888. Logemann, J. A., Gensler, G., Robbins, J., Lindblad, A. S., Brandt, D., Hind, J. A., . . . Miller Gardner, P. J. (2008). A randomized study of three interventions for aspiration of thin liquids in patients with dementia or Parkinson’s disease. Journal of Speech Language Hearing Research, 51, 173–183. Medalia, A., Saperstein, A., Choi, K. H., & Choi, J. (2012, May 29).The efficacy of a brief psycho-educational intervention to improve awareness of cognitive dysfunction in schizophrenia. Psychiatry Research. [Epub ahead of print]. Metta, V., Logishetty, K., Martinez-Martin, P., Gage, H. M., Schartau, P. E., Kaluarachchi, T., . . . Chaudhuri, K. R. (2011). The possible clinical predictors of fatigue in Parkinson’s disease: A study of 135 patients as part of international nonmotor scale validation project. Parkinson’s Disease, 124271. PMID: 22191065. Mozo-Dutton, L., Simpson, J., & Boot, J. (2012). MS and me: Exploring the impact of multiple sclerosis on perceptions of self. Disability and Rehabilitation, 34(14), 1208–1217. [Epub December 13, 2011]. National Institutes of Health. (n.d.). National Center for Advancing Translational Sciences. Retrieved from http://www.ncats.nih.gov/ Plotnick, M., Giladi, N., & Hausdorff, J. M. (2009). Bilateral coordination of gait and Parkinson’s disease: The effects of dual tasking. Journal of Neurology & Neurosurgical Psychiatry, 80(3), 347–350. doi:10.1136/jnnp.2008.157362 Shattuck, P. T., Narendorf, S. C., Cooper, B., Sterzing, P. R., Wagner, M., & Taylor, J. L. (2012). Postsecondary education and employment among youth with an autism spectrum disorder. Pediatrics, 129, 1042–1049. Taub, E. (1980). Somatosensory deafferentation research with monkeys: Implications for rehabilitation medicine. In L. P. Ince (Ed.), Behavioral psychology in rehabilitation medicine: Clinical implication. New York, NY: Williams & Wilkins. Todd, D., Simpson, J., & Murray, C. (2010). An interpretative phenomenological analysis of delusions in people with Parkinson’s disease. Disability and Rehabilitation, 32, 1291–1299. Troche, M. S., Okun, M. S., Rosenbek, J. C., Musson, N., Fernandez, H. H., Rodriguez, R., Romrell, J., . . . Sapienza, C. M. (2010). Aspiration and swallowing in Parkinson disease and rehabilitation with EMST: A randomized trial. Neurology, 23, 1912–1919. Van der Schuit, M., Peeters, M., Segers, E., van Balkom, H., & Verhoeven, L. (2009). Home literacy environment of pre-school children with intellectual disabilities. Journal of Intellectual Disability Research, 53, 1024–1037. Wolf, S. L., Winstein, C. J., Miller, J. P., Taub, E., Uswatte, G., Morris, D., . . . EXCITE Investigators. (2006). Effect of constraintinduced movement therapy on upper extremity function 3 to 9 months after stroke: The EXCITE randomized clinical trial. Journal of the American Medical Association, 296, 2095–2104. 28/10/16 1:48 pm 4366_Ch03_039-058.indd 58 28/10/16 1:48 pm “Life is not just a series of calculations and a sum total of statistics, it’s about experience, it’s about participation, it is something more complex and more interesting than what is obvious.” —Daniel Libeskind, architect, artist, and set designer 4 Understanding Statistics What They Tell You and How to Apply Them in Practice CHAPTER OUTLINE LEARNING OUTCOMES KEY TERMS Inferential Statistics for Analyzing Relationships INTRODUCTION Scatterplots for Graphing Relationships SYMBOLS USED WITH STATISTICS Relationships Between Two Variables DESCRIPTIVE STATISTICS Relationship Analyses With One Outcome and Multiple Predictors Frequencies and Frequency Distributions Measure of Central Tendency Measures of Variability INFERENTIAL STATISTICS Statistical Significance Inferential Statistics to Analyze Differences Logistic Regression and Odds Ratios EFFECT SIZE AND CONFIDENCE INTERVALS CRITICAL THINKING QUESTIONS ANSWERS REFERENCES The t-Test Analysis of Variance Analysis of Covariance LEARNING OUTCOMES 1. Understand descriptive statistics and the accompanying graphs and tables that are used to explain data in a study. 2. Interpret inferential statistics to determine if differences or relationships exist. 3. Interpret statistics in tables and text from the results section of a given research study. 4. Determine which types of statistics apply to specific research designs or questions. 59 4366_Ch04_059-080.indd 59 27/10/16 2:02 pm 60 CHAPTER 4 ● Understanding Statistics KEY TERMS analysis of covariance (ANCOVA) Cohen’s d confidence interval (CI) correlation dependent sample t-test descriptive statistics median mixed model ANOVA mode normal distribution odds ratio one-way ANOVA effect size (ES) Pearson productmoment correlation frequencies range frequency distribution regression equation independent sample t-test repeated measures ANOVA inferential statistics scatterplot level of significance skewed distribution linear regression Spearman correlation logistic regression standard deviation mean statistical significance measure of central tendency variability INTRODUCTION F or many people, statistics are intimidating. All of those numbers, symbols, and unfamiliar terms can be daunting. The field of statistical analysis is complex and even though numbers may seem to be a very concrete subject, there are many controversies surrounding the use, application, and interpretation of particular statistical tests. Neither a single chapter on statistics nor several classes will make you an expert on the subject. However, as an evidence-based practitioner, it is important to be able to read the results section of an article and make sense of the numbers there. Some basic knowledge of and practice in reading tables, graphs, and results narratives will provide you with a better understanding of the evidence and allow you to be a more critical consumer of research. This chapter serves as an overview of descriptive and inferential statistics. Subsequent chapters include a feature titled “Understanding Statistics,” in which statistical tests are matched with particular research designs. Within this feature, examples are provided with additional explanation. Health care is a science and an art. Statistics are a key component of the science of clinical practice. However, 4366_Ch04_059-080.indd 60 it is important to avoid reducing clients to numbers and thus failing to recognize the uniqueness of each individual and situation. The artful evidence-based practitioner is able to meld the numbers with personal experience and client needs. SYMBOLS USED WITH STATISTICS One of the reasons statistics can seem like a foreign language is that the results sections of research articles often include unfamiliar symbols and abbreviations. Table 4-1 provides a key to many symbols commonly used in statistics. DESCRIPTIVE STATISTICS As the name implies, descriptive statistics describe the data in a study. Descriptive statistics provide an analysis of data that helps describe, show, or summarize it in a meaningful way such that, for example, patterns can emerge from the data. Descriptive statistics are distinguished from inferential statistics, which are techniques that allow us to use study samples to make generalizations that apply to the population (described in greater detail later). At this point, it is useful to understand that descriptive statistics are used in the calculation of inferential statistics. When summarizing data, you always lose some of the important details. Consider a student’s grade point average, which provides a summary of grades received in all of the classes taken, but excludes some important information, such as which classes were more difficult and how many credit hours were taken at the time. Although descriptive statistics are useful for condensing large amounts of data, it is important to remember that details and individual characteristics are lost in the process. Different types of descriptive statistics are used to summarize data, including frequencies and frequency distributions, measures of central tendency, and measures of variability. Frequencies and Frequency Distributions Frequencies are used to describe how often something occurs. Typically, the actual number or count is provided, along with the percentage. For example, a study may indicate that 70 (35%) men and 130 (65%) women were enrolled in the study. When a graph is used to depict the count, the graph is called a frequency distribution. In this type of graph, the vertical axis (or y axis) identifies the frequency with which a score occurs. The horizontal axis (or x axis) presents the values of the scores. From the Evidence 4-1 provides an example of a frequency distribution graph; this one represents the pretest and 27/10/16 2:02 pm CHAPTER 4 TABLE 41 Statistical Symbols Used to Describe the Sample and Statistics in Research Articles Symbol Meaning x- mean of a sample s, sd, σ standard deviation of a sample s2 variance of a sample N, n number of participants in a study or number of participants in a group in a study α alpha level; the level of significance that is used to determine whether or not an analysis is statistically significant p probability value (p value); the calculated likelihood of making a Type I error df degrees of freedom; the number of values that are free to vary in a statistical analysis based on the number of participants and number of groups r correlation coefficient r2 variance; the amount of variance accounted for in a correlational study t the critical value in a t-test F the critical value in an ANOVA ES effect size η2 eta squared; an effect-size statistic ω2 omega squared; an effect-size statistic CI confidence interval 4366_Ch04_059-080.indd 61 ● Understanding Statistics 61 posttest scores for children on the Student Performance Profile. With this measure, children are rated in terms of the degree to which they meet their individualized education plan (IEP), according to percent ability, in 10% increments, with 0% indicating no ability and 100% indicating total ability. A normal distribution is a type of frequency distribution that represents many data points distributed in a symmetrical, bell-shape curve. In a normal distribution, the two halves are mirror images, with the largest frequency occurring at the middle of the distribution, and the smallest occurring at the ends or “tails.” A skewed distribution is one in which one tail is longer than the other. Some distributions may have more than one peak and are described in relation to the number of peaks; for example, a distribution with two peaks is classified as a bimodal distribution. Figure 4-1 illustrates the different types of distributions. Distributions of scores differ in terms of their measures of central tendency and variability. Knowing the central tendency and variability of a set of data provides you with important information for understanding a particular distribution of scores. The connection between distributions and other descriptive statistics is explained in the following sections. Measure of Central Tendency A measure of central tendency describes the location of the center of a distribution. The three most commonly used measures of central tendency are the mode, median, and mean. 1. The mode is simply the score value that occurs most frequently in the distribution. The mode provides information about the distribution, but it is generally of less use than other measures of central tendency. The mode is greatly influenced by chance and does not take into account other scores in the distribution. 2. The median is the score value that divides the distribution into the lower and upper halves of the scores. The median is most useful when distributions are skewed, because it is less sensitive to extreme scores. If there are an odd number of participants in a distribution, the median is the score value of the participant exactly in the middle of the distribution. When there is an even number of participants in a distribution, the median is the score halfway between the scores of the two middle participants. 3. The mean (x-) is the same as the average and balances the scores above and below it. The mean is calculated by summing the scores and dividing the sum by the number of participants. The symbol for the sample mean is x-. The relationship between the different measures of central tendency depends on the frequency distribution. If the scores are normally distributed, the values of the 27/10/16 2:02 pm 62 CHAPTER 4 ● Understanding Statistics FROM THE EVIDENCE 4-1 Frequency Distribution Watson, A. H., Ito, M., Smith, R. O., & Andersen, L. T. (2010). Effect of assistive technology in a public school setting. American Journal of Occupational Therapy, 64(1), 18–29. doi:10.5014/ajot.64.1.18. Frequency of Averaged SPP Scores Frequency distribution of the scores of the averaged Student Performance Profile (SPP) on the pretest and posttest forms, in percentages, of current ability level of individualized education program goals and objectives (N = 13). 6 5 4 3 2 1 0 0 10 20 30 40 50 60 70 80 90 100 Scores on SPP (% Current Ability) Note A: The number of children (frequency) who received a particular score are indicated on the y axis. FTE 4-1 Question Distribution of Scores Pretest Distribution of Scores Posttest Note B: The score received on the SPP in 10% increments is indicated on the x axis. How many children score at 60% ability at pretest, and how many children score at 60% ability at posttest? mode, mean, and median are equal. In a positively skewed distribution, the mode is a lower score than the mean, while the median falls between the mode and mean. In a negatively skewed distribution, the mode is a higher score than the mean, and once again the median falls between the mode and mean. Figure 4-2 depicts the measures of central tendency with the different distributions. The mean is the measure of central tendency used most often in research, particularly when calculating inferential statistics. The fact that the mean balances the distribution is a desirable quality because it takes into account the values of all participants. However, in some cases, a few outliers can influence the mean to such a degree that the median becomes a more accurate descriptor of the central tendency of the distribution. For example, when describing the demographics of a sample, the median is 4366_Ch04_059-080.indd 62 often used for income. If most individuals have incomes around $70,000, but a few millionaires are included in the sample, income will be distorted if the mean is reported. However, if the median is reported, these outliers will not misrepresent the majority. Measures of Variability Variability refers to the spread of scores in a distribution. Distributions with the same central tendencies can still be very different because of the variability in the scores. The range is one measure of central tendency that simply indicates the lowest and highest scores. For example, the age range of participants in a study may be expressed as 18 to 49. The most common measure of variability is the standard deviation, which is the expression of the 27/10/16 2:02 pm CHAPTER 4 FIGURE 4-1 Types of frequency distributions: A. Normal distribution. B. Negatively skewed distribution. C. Positively skewed distribution. D. Bimodal distribution. A ● B Normal distribution Negatively skewed distribution D C Bimodal distribution Positively skewed distribution A Mean Median Mode B Mode Median Mean 63 Understanding Statistics amount of spread in the frequency distribution and the average amount of deviation by which each individual score varies from the mean. Standard deviation is abbreviated as s, sd, or σ, and is often shown in parentheses next to the mean in a table. A large standard deviation means that there is a high degree of variability, whereas a small standard deviation means that there is a low degree of variability. If all of the scores in a distribution are exactly the same, the standard deviation is zero. When standard deviations are different, the frequency distributions are also different, even with the exact same mean. Figure 4-3 illustrates three frequency distributions with the same mean, but variability that is dissimilar. The concept of a perfectly normal distribution is only theoretical; with actual data it is rare that a distribution will be exactly symmetrical. That said, many distributions are Large S.D. Small S.D. C Moderate S.D. Mode Median Mean –5 FIGURE 4-2 Measure of central tendency in different distributions: A. Normal distribution. B. Positively skewed distribution. C. Negatively skewed distribution. 4366_Ch04_059-080.indd 63 –4 –3 –2 –1 0 1 2 3 4 5 FIGURE 4-3 Frequency distributions with the same mean but different amounts of variability. Dark blue line indicates large standard deviation; dashed line indicates small standard deviation; light blue line indicates a moderate standard deviation. 27/10/16 2:02 pm 64 CHAPTER 4 ● Understanding Statistics EVIDENCE IN THE REAL WORLD Applying the Statistical Concept of Variability to the Term Spectrum When individuals with autism are described as being on a “spectrum,” this term is meant to characterize their behavioral repertoire as being heterogeneous (i.e., as a population, this group of individuals has a lot of variability). The Autism Speaks organization explains autism spectrum disorder (ASD) as follows: Each individual with autism is unique. Many of those on the autism spectrum have exceptional abilities in visual skills, music, and academic skills. About 40 percent have intellectual disability (IQ less than 70), and many have normal to above-average intelligence. Indeed, many persons on the spectrum take deserved pride in their distinctive abilities and “atypical” ways of viewing the world. Others with autism have significant disability and are unable to live independently. About 25 percent of individuals with ASD are nonverbal but can learn to communicate using other means (Autism Speaks, n.d.). approximately normal, and understanding the characteristics of the normal distribution is helpful in understanding the role of standard deviations in both descriptive and inferential statistics. With a normal distribution, it is easy to predict the number of individuals who will fall within the range of a particular standard deviation: 34% of the scores will fall one standard deviation above the mean, and another 34% will fall one standard deviation below the mean, for a total of 68% of the scores (Fig. 4-4). Therefore, in a normal distribution, most people will have scores within one standard deviation of the mean, approximately 14% will have scores between one and two standard deviations on either side of the mean, and a little more than 2% will have scores more than two standard deviations from either side of the mean. For example, in a distribution in which the mean is 50 and the standard deviation is 10, 68% of individuals will have scores between 40 and 60, 34% will score between 40 and 50, and the remaining 34% will score between 50 and 60. If an individual has a score that is 0.5 standard deviations above the mean, his or her score is 55. From the Evidence 4-2 displays a bar graph from a descriptive study comparing individuals with Parkinson’s disease (PD), multiple sclerosis (MS), and healthy controls. Each bar includes indications of the mean and standard deviations. In both conditions, the PD and MS groups had fewer appropriate pauses than the controls. EXERCISE 4-1 Using Statistics to Describe Samples (LO1) QUESTIONS 1. What different types of information would be provided by the mean and standard deviation for the following two groups? A. A group composed exclusively of basketball players B. A mixed group of athletes including basketball players, jockeys, and baseball players 2. What measure of central tendency would be most useful in describing length of stay in a hospital where most people stay less than two weeks, but a few patients stay more than three months? Explain why. 34.1% 2.1% –3 34.1% 13.6% –2 –1 13.6% 0 +1 +2 68.3% 2.1% +3 3. What type of graph illustrates both the central tendencies and variability of a sample? 95.5% 99.73% FIGURE 4-4 Standard deviations in a normal distribution. 4366_Ch04_059-080.indd 64 27/10/16 2:02 pm CHAPTER 4 ● Understanding Statistics 65 FROM THE EVIDENCE 4-2 Bar Graph Comparing Speech Patterns in Three Different Groups Tjaden, K., & Wilding, G. (2011). Speech and pause characteristics associated with voluntary rate reduction in Parkinson’s disease and multiple sclerosis. Journal of Communication Disorders, 44(6), 655–665. doi:10.1016/j.jcomdis.2011.06.003. Proportion of Grammatically Appropriate Pauses Note A: The top of the bar indicates the mean, and the extension represents the standard deviation. 1.2 1.0 0.8 0.6 MS PD Control 0.4 Habitual Slow FTE 4-2 Question When looking at the standard deviations in the habitual condition, what useful information is provided beyond what you learn by just looking at the mean differences? INFERENTIAL STATISTICS The important element of inferential statistics is “infer,” in that this type of statistic is used when the researcher wants to infer something about a larger population based on the sample used in the study. Consider researchers who are interested in a particular social skill training program for individuals with autism. If those researchers conduct a study, they will be unable to study all children with autism, so a sample is selected. The results of the study are calculated with inferential statistics and then used to infer or suggest that the same or similar findings would be expected with another sample of children with 4366_Ch04_059-080.indd 65 autism. Of course, the representativeness of the sample is an important consideration when making inferences (see Chapter 5). Most results sections that report inferential statistics also provide the descriptive statistics (e.g., means and standard deviations) that were used in the calculation. Inferential statistics are often divided into two categories: (1) tests of differences (e.g., t-tests and analysis of variance) and (2) tests of relationships (e.g., correlations and regressions). With inferential statistics, a test is conducted to determine if the difference or relationship is statistically significant. 27/10/16 2:02 pm 66 CHAPTER 4 ● Understanding Statistics Statistical Significance Even impeccably designed and conducted research involves chance. For example, there are many reasons why a sample may not perfectly represent a population and, as a result, introduce a sampling error. Statistical significance is a number that expresses the probability that the result of a given experiment or study could have occurred purely by chance. In other words, statistical significance is the amount of risk you are willing to assume when conducting a study. It is based on an agreedupon level of significance (also called alpha or α). Typically, 0.05 is the standard level of significance that most researchers use. This means there is a 5% risk that the difference between two groups, or the difference in scores from pretest to posttest, is not a true difference but instead occurred by chance. Conversely, there is a 95% chance that the difference is a true difference. In correlational studies, the level of significance is interpreted as the amount of risk that the relationship has occurred by chance and is not a true relationship. Sometimes, when it is important to be more certain, the level of significance is set at 0.01, or 1%. Occasionally, researchers conducting exploratory research will accept a 10% chance, but there is reason to be skeptical of findings when researchers are willing to accept greater levels of risk. Inferential Statistics to Analyze Differences Many rehabilitation studies are interested in differences. Intervention studies seek to determine whether there is a difference between the intervention and control groups, or whether there are differences before and after treatment. A descriptive study may examine the differences between people with and without a condition. For example, when determining the cognitive impairments in people with traumatic brain injury, a group of people with brain injury and a group of people without brain injury may be compared to determine if there is a difference in cognition between the two groups. There are many statistical tests for comparing differences, but the most common difference statistics are the t-test and analysis of variance (ANOVA). In these tests, the means of the two groups and/or time periods are compared, while taking into account the standard deviations of the means. If a difference is found within the sample, an inference is made that a difference would exist within a larger population. For example, if a study finds that there is a difference (improvement) in strength after an exercise intervention, the implication is that this difference would also occur with similar clients who receive the same intervention. Many of the 4366_Ch04_059-080.indd 66 same tests are used in both intervention and descriptive studies. Table 4-2 summarizes the purpose of each difference test and provides examples of research designs that would use the statistic. The t-Test The t-test is the most basic of inferential difference statistics because it examines the difference between two groups at a single time point, or one group at two time points. There are two types of t-tests: (1) a t-test for independent samples or between-group analysis (also called a two-group/two-sample t-test) and (2) a t-test for dependent samples or within-group analysis (also called a paired-sample t-test). An independent sample t-test compares the difference in the mean score for two groups—the two groups being independent of, or unrelated to, each another. It is called an independent sample t-test because the two groups make up the independent variable in the study. In a dependent sample t-test, the comparison is within the same group and compares a dependent variable. Most often, this type of t-test is used to compare the pretest and posttest scores of a single group. However, two different tests that have the same metric could also be compared. For example, a t-test for dependent samples could be used to compare a percentile score on a final exam with that on the final exam of another course. When a t-test is reported in the results section of a research paper, the following three statistics are often included: t, df, and p. For example, a study may report that there was a statistically significant difference from pretest to posttest (t = 2.39, df = 15, p = 0.03). The t denotes the critical value of the t-test, a number that cannot be interpreted out of context. The df represents degrees of freedom, which indicates the number of values in a study that are free to vary, based on the number of participants in the study and, in some analyses, the number of groups. The p stands for probability and indicates the likelihood that the difference is not a true difference or the likelihood of making a Type I error. In this example, there is a 3% chance that the difference is not a true difference. From the Evidence 4-3 presents the results of a t-test comparing pretest and posttest scores. Three separate t and p values are provided, indicating that three separate t-tests were conducted, resulting in three different outcomes of a caregiver training program for individuals with dementia. Analysis of Variance When more than two means are compared, an analysis of variance (ANOVA) is the appropriate test. With a t-test, the t statistic is used. With an ANOVA, an F-test is used 27/10/16 2:02 pm CHAPTER 4 ● Understanding Statistics 67 TABLE 42 Application of Inferential Statistics That Analyze Differences Sample Research Designs That Would Use the Statistic Statistical Test Purpose Independent sample t-test (also known as two-sample or two-group t-test) • Compare the differences between two groups (independent variables) at one point in time on a single dependent measure • Descriptive study comparing two groups • Efficacy study comparing two groups that uses a change score (i.e., difference between the pretest and posttest) as the single dependent variable • Efficacy study that uses a posttest only design Dependent sample t-test (also known as paired sample t-test) • Compare the differences within a group at two time points on a single dependent measure • Compare the difference within a group on two different dependent measures • Efficacy study using a single group pretest-posttest design • Descriptive study with a single group comparing the same individuals on two measures with the same metric • Developmental study (descriptive) comparing changes in individuals over two time points One-way analysis of variance (ANOVA) • Compare the differences between three or more groups on a single dependent measure • Descriptive study comparing three different groups on a single outcome Repeated measures ANOVA • Compare the differences within one group at three or more time points • Efficacy study using a single-group design that includes more than two time points (e.g., pretest, posttest, follow-up) • Developmental study (descriptive) comparing changes over more than two time points Mixed model ANOVA • Compare both betweengroup and within-group differences simultaneously, providing an interaction effect • Provide separate results of the between-group differences and within-group differences (i.e., main effects) • Efficacy study that compares two or more groups at two or more time points (e.g., randomized controlled trial, nonrandomized controlled trial) Analysis of covariance (ANCOVA) • Compare differences between and/or within groups while statistically controlling (equalizes groups) a variable (this variable is the covariate) • Efficacy study that compares two or more groups and uses the pretest as a covariate (randomized controlled trial, nonrandomized controlled trial) • Descriptive study that compares two or more groups but controls for an extraneous variable by using that variable as the covariate 4366_Ch04_059-080.indd 67 27/10/16 2:02 pm 68 CHAPTER 4 ● Understanding Statistics FROM THE EVIDENCE 4-3 Comparing Pretest and Posttest Results Using a t-Test DiZazzo-Miller, R., Samuel, P. S., Barnas, J. M., & Welker, K. M. (2014). Addressing everyday challenges: Feasibility of a family caregiver training program for people with dementia. American Journal of Occupational Therapy, 68(2), 212–220. doi:10.5014/ ajot.2014.009829. Note A: You know a t-test was used because a t value is reported. Activities of Daily Living Knowledge Pre- and Posttest Results. Module Communication and nutrition Pretest Posttest Transfers and toileting Pretest Posttest Bathing and dressing Pretest Posttest N M SD 53 53 73.87 92.13 19.75 14.49 46 46 88.02 94.56 11.50 12.21 45 45 86.22 92.89 16.42 11.41 t p (2-tailed) 7.05 (52) .000** 3.10 (45) .003* 2.71 (44) .010* Note. M = mean; SD = standard deviation. *p < .05. **p < .001. FTE 4-3 Questions 1. What type of t-test was used in this analysis? 2. What does the N stand for in the table? to compare means, and the F statistic is used. There are many variations of an ANOVA, including the one-way ANOVA, repeated measures ANOVA, and mixed model ANOVA. A one-way ANOVA is similar to the independent sample or between-group t-test; however, instead of only comparing two groups, three or more groups are compared at a single point in time. For example, Pegorari, Ruas, and Patrizzi (2013) compared three groups 4366_Ch04_059-080.indd 68 of elderly individuals—prefrail, frail, and nonfrail— in terms of respiratory function. Using a one-way ANOVA, the researchers found differences among the groups. When more than two comparisons are made, follow-up tests are necessary to determine where the differences lie. In this example, three follow-up analyses would be necessary to compare (1) the prefrail and frail groups, (2) the prefrail and nonfrail groups, and (3) the frail and nonfrail groups. 27/10/16 2:02 pm CHAPTER 4 The repeated measures ANOVA is similar to a dependent sample or within-group t-test and is used when the means are compared over more than two time periods (i.e., the within-group analysis is repeated). For example, a repeated measures ANOVA may determine if there is a difference at baseline, at 3 months, and at 6 months. Follow-up analyses are necessary to examine the difference for each pair: (1) baseline and 3 months, (2) baseline and 6 months, and (3) 3 months and 6 months. In intervention research, a within- and between-group analysis is often made simultaneously to determine if there is an interaction effect (i.e., do the two groups have a pattern of differences over time?). A mixed model ANOVA is used when between- and within-group analyses are conducted simultaneously; however, in the literature you will often see this referred to simply as a repeated measures ANOVA, with the betweengroup analysis implied. In these analyses, two or more groups are compared over two or more time points. In a mixed model ANOVA, it is possible to examine both main effects and interaction effects. One main effect would look at time differences and determine if there is a difference from pretest to posttest with both groups combined. However, when trying to determine if an intervention is effective, it is typically most important to determine if there is an interaction effect. The interaction tells you if there is a difference in how the two groups perform over time. When there is no interaction, this means that the groups in the study perform similarly. The graph in Figure 4-5 shows that both groups improved from pretest to posttest, but there was no difference in how much they improved. When there is no interaction, the mixed model ANOVA calculated with the means and standard deviations of the two groups at both time points would indicate an interaction effect of p > 0.05. When an interaction effect exists (p < 0.05), there will be a difference in how the groups perform over time. 25 20 An analysis of covariance (ANCOVA) is used when the researcher wants to statistically control a variable that may affect the outcome of a study. An ANCOVA is useful when a demographic variable differs between groups and when that demographic variable is related to the outcome variable (dependent variable). For example, a study of cognitive remediation has one group with older participants; in this study, age is related to (i.e., correlated with) the outcome variable of memory. In this case, age may be covaried in the analysis. In doing so, variance or error associated with age is removed, and the researcher is more likely to find a difference between the intervention and control groups. Some analyses use the baseline scores as the covariate. In this instance, the analysis is similar to a repeated measures ANOVA; however, instead of comparing individuals before and after an intervention, the error or difference at baseline is equalized. The baseline (pretest) is used as the covariate and then only the posttest 25 Control 20 15 10 10 5 5 Posttest FIGURE 4-5 Graph showing no interaction (p > 0.05); both groups improved from pretest to posttest, but there was not a difference in how much they improved. 4366_Ch04_059-080.indd 69 69 Analysis of Covariance 15 Pretest Understanding Statistics The graph in Figure 4-6 illustrates an interaction effect. Both groups improved, but the intervention group improved to a much greater extent. When there is an interaction, the mixed model ANOVA calculated with the means and standard deviations of the two groups at both time points would indicate an interaction effect of p < 0.05. From the Evidence 4-4 provides an example of a mixed model ANOVA. In this study, physical fitness was compared in children at 9 years of age and again at 12 years of age (Haga, 2009). The children were also grouped into a low motor competence (LMC) group and a high motor competence (HMC) group. Several physical fitness tests were performed. The results were similar for most of the fitness tests. The following example provides the results from the standing broad jump. Intervention 0 ● 0 Intervention Control Pretest Posttest FIGURE 4-6 Graph showing interaction effect (p < 0.05); both groups improved, but the intervention group improved to a much greater extent. 27/10/16 2:02 pm 70 CHAPTER 4 ● Understanding Statistics FROM THE EVIDENCE 4-4 Main and Interaction Effect Haga, M. (2009). Physical fitness in children with high motor competence is different from that in children with low motor competence. Physical Therapy, 89, 1089–1097. doi:10.2522/ptj.20090052. Note A: Effect size statistics described at the end of the chapter provide an estimate of the magnitude of the difference. Note B: In the statistical analysis, the subscript with the F (in this case the numbers 1 and 15) indicates the degrees of freedom. Standing Broad Jump A significant main effect was obtained for time (F1,15 = 7.707, p < .05), with a moderate effect size (partial η2 = .339). A significant main effect also was obtained for group (F1,15 = 12.700, p < .05), with a moderate effect size (partial η2 = .458). There was no significant interaction effect (F1,15 = 0.135, p > .05; partial η2 = .009). The article also provides the following descriptive statistics for the standing broad jump. At 9 years of age, the LMC group had a mean score of 1.18 (0.20), and the HMC had a mean score of 1.50 (0.21). At 12 years, the LMC score was 1.30 (0.34), and the HMC score was 1.69 (1.50). Note C: The main effect for time indicates that children in both groups improved from ages 9 to 12. The main effect for group indicates that there were significant differences between the groups at ages 9 and 12. FTE 4-4 Question How would you interpret the interaction effect? scores are compared. For example, in a study of kinesiotaping for shoulder impingement, an ANCOVA was used to control for pretesting; researchers found that the change in pain level was greater for the experimental group than for the control group (Shakeri, Keshavarz, Arab, & Ebrahimi, 2013). 4366_Ch04_059-080.indd 70 Multiple statistical tests are often used in a single study. For example, three different independent sample t-tests may be used, each with a different outcome measure. Or, a oneway ANOVA may be used initially to compare three groups, followed by three separate independent sample t-tests to compare each pair of groups (1 and 2, 2 and 3, and 1 and 3). 27/10/16 2:02 pm CHAPTER 4 ● Understanding Statistics 71 2. EXERCISE 4-2 Describing the Results From a Study (LO2 and LO3) The graphs in Figures 4-7 through 4-11, which present data that can be analyzed with difference statistics, come from Hall et al’s 1999 study, “Characteristics of the Functional Independence Measure in Traumatic Spinal Cord Injury.” In all cases, the dependent variable is motor scores from the Functional Independence Measure. For each graph, write a sentence to describe which differences are compared. Consider whether or not the comparison involves a difference between groups, a difference in time points, or a combination. 35 32.5 30 25 22.6 20 14.9 15 16.9 10 5 0 Admission QUESTIONS 25 C4 spinal cord injury C6 spinal cord injury C8 spinal cord injury Thoracic spinal cord injury 22.6 FIGURE 4-9 One-way ANOVA (four groups, one time point). 20 16.9 3. 15 10 5 0 Admission C6 spinal cord injury C8 spinal cord injury 50 FIGURE 4-7 Independent sample t-test (two groups, one time point). 46.7 45 39.7 40 C6 spinal cord injury 37.4 35 1. 30 25 20 15 16.9 10 5 40 0 37.4 35 30 1 year post 2 years post FIGURE 4-10 Repeated measures ANOVA (one group, four time 25 points). 20 15 Admission Discharge C6 spinal cord injury 16.9 10 4. 5 0 Admission Discharge FIGURE 4-8 Dependent sample t-test (one group, two time points). 4366_Ch04_059-080.indd 71 27/10/16 2:02 pm 72 CHAPTER 4 ● Understanding Statistics 80 68.7 70 68.4 61.9 C6 spinal cord injury 60 50 46.7 40 37.4 30 20 22.6 16.9 14.9 10 23.1 C4 spinal cord injury 39.7 26.9 25.4 1 year post 2 years post C8 spinal cord injury 0 Admission Discharge FIGURE 4-11 Repeated measures ANOVA (one group, four time points). 5. Inferential Statistics for Analyzing Relationships Statistics for analyzing relationships are based on correlations, or the degree to which two or more variables fluctuate together. The different types of correlational statistics are described here. Correlations are described in terms of strength, direction, and significance. Correlations range in strength from 0 to 1.0 and are reported as r values (e.g., r = 0.35). The direction of a correlation can be either positive or negative. A positive correlation means that two variables are related in the same direction; as one variable increases, so does the other variable. For example, speed and distance are generally positively correlated, meaning that the faster you are, the more distance you can cover. In contrast, speed and time are negatively correlated; the faster you are, the less time it takes to get to a destination. As with other inferential statistics, a p value will indicate whether the strength of the relationship is statistically significant. Returning to the speed and distance example, if all other variables were held constant, these examples would be perfectly correlated at r = 1.0 and r = –1.0, respectively. In other words, you can perfectly predict one variable if you know the other. For example, if time is held constant at one half-hour, and someone runs at 6 miles per hour, the distance covered would be 3 miles. If another person runs 3.4 miles in the same time period, that person’s speed is 6.8 miles per hour. Similarly, if distance is held constant at 3 miles and someone runs 4 miles per hour, it will take that person 45 minutes to run 3 miles. If another individual can run 6 miles per hour, that individual will cover 3 miles in only 30 minutes. 4366_Ch04_059-080.indd 72 However, in most rehabilitation research, the relationships being examined are not so simple. Two variables may be related, but that relationship is less than perfect, and some of the variance is unaccounted for. For example, Kim et al (Kim, Kim, & Kim, 2014) examined the relationship between activities of daily living and quality of life for individuals after stroke. When correlating the Functional Independence Measure (FIM) and the Stroke Specific Quality of Life (SS-QOL) measure, they found a strong positive correlation of r = 0.80, which was statistically significant (p < 0.01). Total variance accounted for is simple to calculate when you express the correlation as the variance equals the correlation squared (in this case, 0.802 = 0.64). The variance is always smaller than the correlation. These findings suggest that functional independence accounts for 64% of quality of life, and 36% of quality of life is related to other unknown factors. A simple Venn diagram helps explain the meaning of the correlation (R). The overlapping area is the amount of variance (64%) that is shared between the two variables. The remaining variance, 36%, remains unaccounted for and is due to other factors (see Fig. 4-12). Scatterplots for Graphing Relationships Graphically, the relationship between two variables can be illustrated using a scatterplot, which is a graph of plotted points that shows the relationship between two sets of data. You can visually examine a scatterplot to determine if a relationship is weak or strong, and whether the relationship is positive or negative. Using the preceding example, each person’s score is plotted as a point based on his or her scores on the two measures. Figure 4-13 illustrates different scatterplots. Shared variance (64%) Quality of life Functional independence Variance unaccounted for (36%) FIGURE 4-12 Correlations and variance (relationship statistics). 27/10/16 2:02 pm CHAPTER 4 ● Understanding Statistics 73 Relationships Between Two Variables A r = +1.0 D B r ⯝ +0.6 r = 0.0 E C r = –1.0 r ⯝ –0.6 FIGURE 4-13 Examples of scatterplot diagrams: A. Positive relationship B. No relationship. C. Negative relationship. D. Positive relationship. E. Negative relationship. The simplest correlations ascertain the relationship between two variables. When two continuous variables are correlated (e.g., speed and distance, age and grip strength, memory and attention), the statistic most commonly used is the Pearson product-moment correlation, which is an inferential statistic that examines the strength of the relationship between two continuous variables. A similar statistic is the Spearman correlation, an inferential statistic that examines the strength of the relationship between two variables when one or both of the variables is rank-ordered (e.g., burn degree). The results of correlations are often displayed in a table called a correlation matrix; From the Evidence 4-5 provides an example. In FROM THE EVIDENCE 4-5 Correlation Matrix Examining Relationships Between Variables and Fall Efficacy Jeon, B. J. (2013). The effect of obesity on fall efficacy in elderly people. Journal of Physical Therapy Science, 25(11), 1485–1489. doi:10.1589/jpts.25.1485. Note A: The strongest correlation is between BMI and VFA. Correlations among the variables (N = 351) BMI VFA Pain Mobility Balance Age Gender Falls efficacy BMI VFA Pain Mobility Balance Age –0.16* –0.12 –0.40*** –0.41*** 0.25** –0.23** 0.18* 0.75** 0.03 –0.01 –0.08 –0.09 –0.30 0.06 0.02 –0.20** 0.14* –0.12* 0.30 *** –0.23 *** 0.24 *** –0.23 *** –0.26 *** 0.38 *** –0.20 *** –0.31 *** 0.16 ** 0.06 *p < 0.05; **p < 0.01; ***p < 0.001 VFA = visceral fat area Note B: Many correlations are negative. FTE 4-5 Question Why is the relationship between BMI and VFA so much stronger than any of the other correlations? 4366_Ch04_059-080.indd 73 27/10/16 2:02 pm 74 CHAPTER 4 ● Understanding Statistics this study by Jeon (2013), the relationship between body mass index and several other variables was calculated. Relationship Analyses With One Outcome and Multiple Predictors Predicting outcomes is a major area of study in rehabilitation research. Researchers often want to know what factors have the greatest impact on a particular outcome. In this case, multiple variables may be entered as predictors (sometimes referred to as independent variables) to determine what factors carry the most weight. The weight of each variable indicates the extent to which each variable is useful in predicting the outcome. Studies to predict outcomes use regression equations. A regression equation calculates the extent to which two or more variables predict a particular outcome. In linear regression, several predictors are entered into a regression equation to determine how well they predict an outcome of interest (sometimes referred to as a criterion). In linear regression, typically the outcome variable is continuous, such as scores on the Functional Independence Measure (FIM), degrees of range of motion, and scores on an assessment of social skills. The results are described in terms of the total contribution of all of the predictors, as well as the unique contribution of individual predictors. For example, a linear regression may be used to predict the outcome of grades in a course. Predictors might include interest in the topic, amount of hours studied, and previous research courses taken. The linear regression will indicate the strength of these predictors taken together in determining the outcome (i.e., do they account for a large proportion of the variance?) and how much each predictor uniquely contributes to the outcome. If the number of hours studied is the strongest predictor of outcome, you would recommend that students study more to do better in the class. In contrast, if the strongest predictor is previous research courses taken, perhaps an additional course should be required as a prerequisite. An important contribution of linear regression is that it allows for the removal of shared variance among predictors so that you can determine which predictors are most important. For example, a researcher may find that upperbody strength and range of motion together contribute 30% of the variance of FIM scores in individuals with stroke; however, strength only contributes 5% uniquely, and range of motion only contributes 7% uniquely. This indicates that strength and range of motion are highly intercorrelated. Perhaps the researcher adds cognition to the predictors and finds that this predictor contributes 15% of unique variance to the equation. As a therapist, you would then appreciate that both combined motor abilities and cognition are important predictors of function, suggesting that you should work on both motor skills and cognition to improve FIM scores. 4366_Ch04_059-080.indd 74 The challenge of linear regression lies in choosing predictors. The amount of total variance accounted for provides a clue as to how effective the researcher was in the selection of predictors. When much of the variance is left unaccounted for, you can only speculate as to what those unknown predictors might be. In the preceding example, perhaps it is not the skills of the individual (motor and cognition), but something outside of the person that is a better predictor of FIM outcomes, such as amount of social support or the type of rehabilitation facility that is providing services. Ambrose, Fey, and Eisenberg (2012) studied phonological awareness and print knowledge in children with cochlear implants. Phonological awareness is the ability to break words down into component syllables and sounds and to manipulate these abstract components, such as by building them back into words. These are essential skills for learning to read words. Print awareness involves recognition of letters, the sounds they produce, and their names. Ambrose et al conducted a linear regression analysis to determine if these skills were related to speech production and language abilities. A separate regression equation was computed for the two different criteria of phonological awareness and print knowledge. In other words, the researchers wanted to determine how much of the variability in phonological awareness and print knowledge skills of preschoolers with cochlear implants could be predicted by knowledge of their ability to speak intelligently and in grammatical sentences. Language composite, speech production, and speech perception were entered as predictors. The table and the narrative in From the Evidence 4-6 describe the contribution of the three predictors taken together, as well as the individual contribution of each predictor. Logistic Regression and Odds Ratios Logistic regression is a statistical method for analyzing a dataset in which the outcome is measured with a dichotomous variable (i.e., there are only two possible outcomes), such as employed vs. not employed or experienced a fall vs. did not fall. In this case, the study examines what predicts the dichotomous outcome, and the number that is reported is an odds ratio (an estimate of the odds when the presence or absence of one variable is associated with the presence or absence of another variable). When a single predictor is used that is also dichotomous, the odds ratio calculation is a simple one that uses a 2 ⫻ 2 table and the formula OR = AD/BC. In a hypothetical study that compares supported and transitional employment and has 50 individuals in each group, the 2 ⫻ 2 table looks like that shown in Table 4-3. The odds ratio is then calculated as OR = 40*25/10*25 = 1000/250 = 4.0. 27/10/16 2:02 pm CHAPTER 4 ● Understanding Statistics 75 FROM THE EVIDENCE 4-6 Linear Regression Table Ambrose, S. E., Fey, M. E., & Eisenberg, L. S. (2012). Phonological awareness and print knowledge of preschool children with cochlear implants. Journal of Speech, Language, and Hearing Research, 55(3), 811–823. doi:10.1044/1092-4388(2011/11-0086). Note A: The total contribution of all three variables and each individual predictor is expressed in terms of variance (R2). As expected, the combined variance is greater than the contribution of each individual predictor, which in this case is very small. Predictor Regression 1: Phonological awareness (R2 = .34) β t p sr2 Language composite Speech production Speech perception .324 .066 .287 1.06 0.23 1.33 .301 .820 .197 .04 <.01 .06 β .243 .271 –.006 Regression 2: Print knowledge (R2 = .23) t p sr2 0.74 0.88 –0.03 .468 .390 .978 .02 0.3 <.01 Note. For both regressions, all predictor variables were entered simultaneously in one step. sr2 is the squared semi-partial correlation coefficient, which represents the percentage of total variance in the criterion variable accounted for by each individual predictor variable with the other variables controlled. For the first regression, TOPEL Phonological Awareness was entered as the criterion variable and the language composite, speech production, and speech perception variables were entered simultaneously as predictors. This model was significant (F(3, 20) = 3.43, p = .037), with the predictor variables accounting for 34% of the variance in the CI group’s phonological awareness abilities. None of the predictor variables contributed unique variance to phonological awareness after accounting for the variance that was shared with the other predictors. A second regression was conducted with TOPEL Print Knowledge entered as the criterion variable. Again, the language composite, speech production, and speech perception variables were entered simultaneously as predictors. Although language and speech variables correlated significantly with TOPEL scores on their own, this model combining all three variables was not significant, F(3, 20) = 2.00, p = .146. FTE 4-6 Question In the phonological awareness linear regression, how can the total model accounting for 34% of the variance be statistically significant if none of the individual predictors contributes a unique amount of variance? 4366_Ch04_059-080.indd 75 27/10/16 2:02 pm 76 CHAPTER 4 ● Understanding Statistics TABLE 43 Example of a Hypothetical Odds Ratio Type of Employment Services Employed Not Employed Supported A 40 B 10 Transitional C 25 D 25 Odds ratios are interpreted as the odds that a member of a group will have a particular outcome compared with the odds that a member of another group will have that outcome. In this example, individuals in supported employment are four times more likely than individuals in transitional employment to be employed. With an odds ratio, a value of 1 indicates that the odds are equal or no different for either condition. Odds greater than 1 indicate an increased chance, and odds less than 1 indicate a decreased chance of the outcome of interest. In logistic regression, many predictors can be examined as potentially affecting the outcome. Odds ratios are calculated for each predictor to determine which are most impactful. From the Evidence 4-7 shows a table from a study examining children with physical disabilities that attempted to identify predictors associated with whether the child received postsecondary education (Bjornson et al, 2011). The different relationship statistics described in this chapter are presented in Table 4-4. EFFECT SIZE AND CONFIDENCE INTERVALS In addition to statistical significance, other statistics such as effect size and confidence intervals are useful in interpreting the results of a study and determining the importance of the findings in clinical practice or decisionmaking. Effect size (ES) describes the magnitude or strength of a statistic (i.e., the magnitude of the difference or the relationship). Confidence interval (CI) is a reliability estimate that suggests the range of outcomes expected when an analysis is repeated. It is important to understand that statistical significance is not the same thing as clinical significance. Statistical significance is highly dependent on sample size; you are likely to find a difference between two groups and/or before and after an intervention when you have a large sample size, even if that difference is very small. If 4366_Ch04_059-080.indd 76 a study found a statistically significant improvement on the Functional Independence Measure (FIM), but that improvement was only a mean of 1.5 points, a therapist familiar with the measure would recognize that 1.5 points is not clinically relevant. Another way to evaluate the magnitude of a difference is to look at its effect size. Not all researchers report the effect size, but it is becoming more valued as an important statistic, with some researchers asserting that it is more important than significance testing. A common effect size statistic that is easy to understand is Cohen’s d. This effect size measures the difference between two group means reported in standard deviation units. The overall standard deviation is calculated by pooling the standard deviations from each group. For example, a comparison of pretest to posttest difference with an effect size of d = 0.5 means the group changed one-half of a standard deviation. Similarly, if an effect size for an intervention and control group was reported as 1.0, this difference could be interpreted as the intervention group improving one standard deviation more than the control group. Cohen (1988), who developed this statistic, indicates that 0.2 to 0.5 is a small effect, 0.5 to 0.8 is a medium effect, and anything greater than 0.8 is a large effect. Because Cohen’s d is expressed in terms of standard deviations, the d value can be greater than 1. Other effect sizes that you are likely to encounter are eta squared (η2) and omega squared (ω2). Usually, the author will provide an interpretation of the magnitude of the effect. In relationship studies, the r value is the effect size, as it expresses the strength of the relationship. If r = 0.87, this would indicate a large effect and a strong relationship, whereas r = 0.20 is a small effect and a weak relationship. Another statistic that is useful in interpreting the meaningfulness of inferential statistics is a confidence interval. Whenever a statistic is calculated, such as a t- or F-statistic, an odds ratio, or an effect size, the result is an imperfect estimate that contains error. The potential impact of this error can be expressed by calculating the confidence intervals of the statistic. Remember that with inferential statistics, the calculated mean from the sample is an estimate of the true value of the population, whereas a confidence interval is the range of values estimated for the population. A 95% confidence interval, which is most commonly reported in research studies, suggests that you can be 95% confident that the true mean of the population exists between those two values. As with odds ratios, the 95% confidence ratio tells you the range within which you would expect to find the odds of the true population. As such, a smaller confidence interval is better than a large confidence interval because clinicians can more reliably predict an outcome when the range of scores is smaller. The confidence interval may also be interpreted as the range of scores you would expect to find if you were to conduct the study again. 27/10/16 2:02 pm FROM THE EVIDENCE 4-7 Predictors of Postsecondary Education for Children With Physical Disabilities Bjornson, K., Kobayashi, A., Zhou, C., & Walker, W. (2011). Relationship of therapy to postsecondary education and employment in young adults with physical disabilities. Pediatric Physical Therapy, 23(2), 179–186. doi:10.1097/PEP.0b013e318218f110. Note A: The ability to use both hands is the strongest predictor. Children able to use both hands were 4.8 times more likely to obtain postsecondary education. Receiving OT or PT services was another strong predictor, whereas the use of expressive language was not a strong predictor. Table 2. A Logistic Regression Model Describing the Relationship Between Therapy Services During Secondary Education Years and Report of Postsecondary Education Among Participants in the National Longitudinal Transition Study 2 Who Have Physical Disabilitiesa Predictor of Interest Intercept Has received any OT or PT Gender Age Is able to use both arms and hands normally for holding things like a pencil or spoon Has increased self-care skills Higher parental education Has increased expressive language Uses a wheelchair for mobility Has any health insurance Child has a high school diploma or GED Has increased overall social skills OR (95% Cl)b P 0.021 (3.1E–6–144.1) 3.2 (1.133–9.151) .386 .029 0.5 (0.198–1.286) 0.8 (0.542–1.21) 4.8(1.863–12.174) .149 .297 .001 1.3 (1.065–1.705) .014 2.1 (1.494–3.029) <.001 1.1 (0.608–1.93) .783 1.4 (0.599–3.14) .449 1.4 (0.309–6.178) .667 3.1 (1.157–8.328) .025 1.2 (1.075–1.329) .001 Abbreviations: CI, confidence interval; GED, General Educational Development test; OR, odds ratio; OT, occupational therapy; PT, physical therapy. aValues in bold represent factors that are significantly associated with the outcomes of interest. bOdds ratios are the odds of exposure (eg, to PT or OT) to each predictor among people who did develop the outcome (eg, who did receive postsecondary education) divided by the odds of exposure among people who did not develop the outcome. The odds ratio expresses likelihood that an exposure of interest is related to the outcome. The 95% confidence interval (CI) expresses the precision of this estimate. FTE 4-7 Question How would you interpret the predictor “child has a high school diploma or GED”? 4366_Ch04_059-080.indd 77 27/10/16 2:02 pm 78 CHAPTER 4 ● Understanding Statistics TABLE 44 Application of Relationship Statistics Statistic Purpose Sample Research Designs That Would Use the Statistic Pearson productmoment correlation Describe the relationship between two continuous variables. Nonexperimental relationship studies Spearman correlation Describe the relationship between two variables when at least one variable is rank-ordered. Nonexperimental relationship studies Linear regression Identify predictors of a continuous outcome measure. Nonexperimental predictive studies Logistic regression Identify predictors of a categorical outcome measure. Nonexperimental predictive studies EXERCISE 4-3 Interpreting Confidence Intervals (LO3) Although the odds ratio for use of both hands and postsecondary education is very strong at 4.8, the 95% confidence interval is also large, ranging from 1.86 to 12.17. How might the confidence interval affect your interpretation of the findings? 2. Which is a better predictor of earnings for individuals with schizophrenia: cognition, symptoms, or educational background? 3. Is mirror therapy or acoustic feedback more effective in reducing neglect in individuals with stroke? CRITICAL THINKING QUESTIONS EXERCISE 4-4 Matching the Statistics with the Research Question (LO4) 1. Why is it more difficult to find a significant difference between groups when the standard deviations are large? QUESTIONS What statistic would likely be used in studies designed to answer the following research questions? 1. What is the difference in memory impairment for individuals with early Alzheimer’s disease and same-aged adult peers? 4366_Ch04_059-080.indd 78 2. Explain the difference between an independent sample and a dependent sample t-test. 27/10/16 2:02 pm CHAPTER 4 3. What types of research results would be analyzed using the different types of ANOVAs? ● Understanding Statistics 79 provides a general perspective as to the central tendencies of the distribution. EXERCISE 4-2 There may be some differences in the interpretations of the graphs, but in general the conclusions should be similar. 4. What are some examples from rehabilitation research in which you would expect to find negative correlations? 5. What information can a regression equation provide (both linear and logistic) that goes beyond the information you get from a simple two-variable correlation? 6. Why are effect sizes and confidence intervals sometimes preferred over statistical significance when it comes to making clinical interpretations of a study? 1. At admission, individuals with C8 SCI have higher FIM scores than individuals with C6 SCI. 2. Individuals with C6 SCI more than double their FIM scores from admission to discharge. 3. The lower the SCI, the greater the FIM score at admission. There is a particularly marked difference between C6 and C8 and between C8 and Thoracic, with admission scores for C4 and C6 being more similar. 4. The greatest improvement in FIM scores for C6 SCI occurs from admission to discharge, but there is continuous improvement through 2 years after discharge. 5. Individuals with C4, C6, and C8 SCI improve over time, but there are some differences in the patterns of improvement. For example, those with C8 SCI make the greatest amount of improvement, and most of their improvement occurs between admission and discharge and then levels off. Patients with C4 SCI have a slight decrease in FIM scores from 1 year to 2 years after discharge. EXERCISE 4-3 ANSWERS EXERCISE 4-1 1. The mean will show the difference in height for the two groups (the basketball players being a taller group). However, the standard deviation will be important to illustrate the variability of the samples, with the basketball players as a group having less variability because they will tend to all be tall, while the mixed group of basketball players, jockeys, and baseball players will include shorter individuals, thus increasing variability. 2. The median will be a more useful statistic to describe length of stay if there are a few outliers that would skew the mean. 3. A frequency distribution provides a visual display that illustrates the variability of the sample and also 4366_Ch04_059-080.indd 79 The large confidence interval suggests that the results could be unstable or imprecise and that, if a second study were conducted, you might get very different results. Although the odds ratio may be even larger, the low end at 1.86 suggests that this predictor may not actually be so important. These results speak to the importance of replication and why another study would be useful to determine if the results from the first study hold up. EXERCISE 4-4 1. Independent sample t-test, because you are comparing two groups of individuals at one point in time. 2. Multiple linear regression, because you are looking at several predictors for a continuous outcome measure. If the outcome was a dichotomous variable of employment, you could analyze the results using logistic regression. 3. Mixed model ANOVA, because you would want to know the interaction effect of the between-group comparison of groups across time from baseline to posttesting. 27/10/16 2:02 pm 80 CHAPTER 4 ● Understanding Statistics FROM THE EVIDENCE 4-1 FROM THE EVIDENCE 4-7 At pretest, one child had 60% ability; at posttest, 5 children had 60% ability. If the child has a high school education or GED, he or she is more than three times as likely to receive secondary education. This is understandable, as one typically must finish high school before going on to postsecondary education. However, sometimes we want the research to verify what we assume is apparent. FROM THE EVIDENCE 4-2 The standard deviations for the Parkinson’s disease and multiple sclerosis groups extend to almost the same point as the control group. As a group, although the means are different between the groups, this indicates that many individuals in the PD and MS groups will perform similarly to the control group. FROM THE EVIDENCE 4-3 1. It must be a dependent sample t-test because the pretest and posttest scores are compared. This study utilized a one-group pretest-posttest design. 2. The N stands for the number of participants in the study. The N varies because there were different numbers of individuals for whom data were available for the different tests. FROM THE EVIDENCE 4-4 Although the high-competence group had better scores than the low-competence group at both time points, there was a similar pattern of scores, in that both groups improved at a comparable rate from 9 years of age to age 12. FROM THE EVIDENCE 4-5 Logically, you would expect that a person’s body mass index (BMI) and the amount of fat on that individual to be closely related; the other variables, such as BMI and balance, may be related, but you would expect there to be many other variables that remain unaccounted for, such as level of physical activity, medications taken, or comorbid conditions associated with balance problems. FROM THE EVIDENCE 4-6 You could interpret this finding to mean that language composite, speech production, and speech perception are all important for phonological awareness, but that these abilities are highly intercorrelated and are not distinct in terms of their ability to explain phonological awareness. 4366_Ch04_059-080.indd 80 REFERENCES Ambrose, S. E., Fey, M. E., & Eisenberg, L. S. (2012). Phonological awareness and print knowledge of preschool children with cochlear implants. Journal of Speech Language and Hearing Research, 55, 811–823. Autism Speaks. (n.d.). Frequently asked questions. Retrieved from http:// www.autismspeaks.org/what-autism/faq Bjornson, K., Kobayashi, A., Zhou, C., & Walker, W. (2011). Relationship of therapy to postsecondary education and employment in young adults with physical disabilities. Pediatric Physical Therapy, 23, 179–186. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. DiZazzo-Miller, R., Samuel, P. S., Barnas, J. M., & Welker, K. M. (2014). Addressing everyday challenges: Feasibility of a family caregiver training program for people with dementia. American Journal of Occupational Therapy, 68(2), 212–220. doi:10.5014/ajot.2014.009829 Haga, M. (2009). Physical fitness in children with high motor competence is different from that in children with low motor competence. Physical Therapy, 89, 1089–1097. Hall, K. M., Cohen, M. E., Wright, J., Call, M., & Werner, P. (1999). Characteristics of the Functional Independence Measure in traumatic spinal cord injury. Archives of Physical Medicine and Rehabilitation, 80, 1471–1476. Jeon, B. J. (2013). The effect of obesity on fall efficacy in elderly people. Journal of Physical Therapy Science, 24, 1485–1489. Kim, K., Kim, Y. M., & Kim, E. K. (2014). Correlation between the activities of daily living of stroke patients in a community setting and their quality of life. Journal of Physical Therapy Science, 26, 417–419. Pegorari, M. S., Ruas, G., & Patrizzi, L. J. (2013). Relationship between frailty and respiratory function in the community dwelling elderly. Brazilian Journal of Physical Therapy, 17, 9–16. Shakeri, H., Keshavarz, R., Arab, A. M., & Ebrahimi, I. (2013). Clinical effectiveness of kinesiological taping on pain and pain-free shoulder range of motion in patients with shoulder impingement syndrome: A randomized, double-blinded, placebo-controlled trial. International Journal of Sports and Physical Therapy, 3, 800–810. Tjaden, K., & Wilding, G. (2011). Speech and pause characteristics associated with voluntary rate reduction in Parkinson’s disease and multiple sclerosis. Journal of Communication Disorders, 44, 655–665. Watson, A. H., Ito, M., Smith, R. O., & Andersen, L. T. (2010). Effect of assistive technology in a public school setting. American Journal of Occupational Therapy, 64, 18–29. 27/10/16 2:02 pm 5 “The world will not stop and think—it never does, it is not its way; its way is to generalize from a single sample.” —Mark Twain Validity What Makes a Study Strong? CHAPTER OUTLINE LEARNING OUTCOMES Regression to the Mean Threats KEY TERMS Testing Threats INTRODUCTION Instrumentation Threats VALIDITY Experimenter and Participant Bias Threats STATISTICAL CONCLUSION VALIDITY Attrition/Mortality Threats Threats to Statistical Conclusion Validity Fishing Low Power INTERNAL VALIDITY Threats to Internal Validity EXTERNAL VALIDITY Threats to External Validity Sampling Error Ecological Validity Threats INTERNAL VERSUS EXTERNAL VALIDITY Assignment and Selection Threats CRITICAL THINKING QUESTIONS Maturation Threats ANSWERS History Threats REFERENCES LEARNING OUTCOMES 1. Detect potential threats to statistical conclusion validity in published research. 2. In a given study, determine if the researcher adequately managed potential threats to statistical conclusion validity. 3. Detect potential threats to internal validity in published research. 4. In a given study, determine if the researcher adequately managed potential threats to internal validity. 5. Detect potential threats to external validity in published research. 6. In a given study, determine if the researcher adequately managed potential threats to external validity. 81 4366_Ch05_081-102.indd 81 27/10/16 5:14 pm 82 CHAPTER 5 ● Validity KEY TERMS alternative treatment threat assignment threat attrition Bonferroni correction compensatory demoralization compensatory equalization of treatments matching maturation threat mortality order effect participant bias power practice effect Pygmalion effect convenience sampling random assignment covary random sampling ecological validity regression to the mean effectiveness study replication efficacy study response rate experimenter bias Rosenthal effect external validity sampling error fishing selection threat Hawthorne effect statistical conclusion validity history threat instrumentation threat internal validity testing effect validity INTRODUCTION T he purpose of this chapter is to make sure that you don’t do what Mark Twain suggests in the opening quote and generalize from single research samples. You will learn how to stop and think about data both from a single sample and multiple samples, with the goal of using the information in evidence-based practice. When evaluating the strength of evidence, there are certain axioms that practitioners tend to rely on, such as “Randomized controlled trials are the strongest design” and “Large sample sizes provide more reliable results.” Although true in many cases, there can be exceptions. To be a critical consumer of research, it is essential to understand the whys behind these assertions. Why is a large sample size desirable? Why are protections inherent in randomized controlled trials? Even with a large-sample, randomized, controlled trial, other factors may compromise the validity of the study. 4366_Ch05_081-102.indd 82 This chapter explains the concept of validity, describes different threats to validity, and identifies possible solutions to these threats. This information will increase your ability to critically appraise research. If you have a good grasp of the possible threats to validity and the ways in which these threats can be managed, you will be able to evaluate the strength of evidence and become an evidence-based professional. VALIDITY When thinking about the validity of a study, consider the terms truthfulness, soundness, and accuracy. Validity is an ideal in research that guides the design, implementation, and interpretation of a study. The validity of a study is enhanced when sound methods allow the consumer to feel confident in the findings. The validity of a study is supported when the conclusions drawn are based on accurate interpretations of the statistics and not confounded with alternative explanations. The inferences that are drawn from a study will have greater validity if they are believable and reflect the truth. This chapter describes three types of research validity: (1) statistical conclusion validity, (2) internal validity, and (3) external validity. Chapter 6 addresses a different type of validity concerned with assessments used by researchers. STATISTICAL CONCLUSION VALIDITY Statistical conclusion validity refers to the accuracy of the conclusions drawn from the statistical analysis of a study. Recall that with most inferential statistics, a p value is calculated; conventionally, if the p value is < 0.05, the conclusion is one of statistical significance (i.e., there is a statistically significant difference or there is a statistically significant relationship). As an evidence-based practitioner, there may be reasons why you should question the researchers’ conclusions that are presented in a research article. Threats to Statistical Conclusion Validity In Chapter 3, mistaken statistical conclusions were described in terms of Type I and Type II errors. As an evidence-based practitioner, you can identify potential errors by increasing your awareness of research practices that lead to error. Specific threats to statistical conclusion validity, their relationship to error type, and methods researchers use to protect research from those threats are described in this chapter. Table 5-1 outlines the threats to statistical conclusion validity, confounding factors that interfere with statistical conclusion, and methods for protecting against these threats. 27/10/16 5:14 pm CHAPTER 5 ● Validity 83 TABLE 5-1 Threats to Statistical Conclusion Validity and Their Protections Confounding Factor That Interferes With Statistical Conclusion Threat Type of Error Fishing Type I • Researcher searches data for interesting findings that go beyond the initial hypotheses. • Conclusions may be due to chance. • Use statistical methods that adjust for multiple analyses. • Conduct a second study to test the new hypothesis with different participants. Low power Type II • A difference or relationship exists, but there is not enough statistical power to detect it. • Increase alpha level. • Ensure that intervention is adequately administered to obtain optimal effect size. • Increase sample size. Fishing Fishing is a euphemism that refers to looking for findings that the researcher did not originally plan to explore. Ideally, when a researcher conducts a study, a hypothesis is developed before collecting data. Once the data are collected, a statistical analysis is applied to test the hypothesis. However, not infrequently, researchers will explore existing data in what is sometimes called a “fishing expedition” or “mining for data.” In other words, the researcher is letting the data lead the way toward interesting findings. Although there are legitimate reasons for delving into the data, the risk in fishing is that the researcher will see interesting differences or relationships that may not be true and instead are due only to chance. In other words, the researcher has committed a Type I error by finding a difference that does not exist. Typically many analyses are conducted in a researcher’s search for findings. Previously, you learned that when alpha is set at 0.05, the researcher is willing to take a 5% risk that the difference or relationship is not true but is due to chance. However, this applies only to a single analysis. Each time another analysis is performed, there is a greater risk that the finding is due to chance. Researchers often explore their data for unexpected findings, which can lead to important discoveries. However, protections should be in place so that chance findings are not misleading. Protection Against Fishing Threats As an evidence-based practitioner, you may suspect that a fishing expedition has occurred when the results of the study are not presented in terms of answers to a research hypothesis. A straightforward researcher may acknowledge 4366_Ch05_081-102.indd 83 Protection the exploration and, if it is a robust study, will describe how threats to Type I error were addressed. One way that researchers can protect against fishing threats is to use statistical procedures that take into account multiple analyses. There are many such procedures, but the simplest one conceptually is the Bonferroni correction. With the Bonferroni correction, the alpha level of 0.05 is adjusted by dividing it by the number of comparisons. For example, if six comparisons were made, 0.05/6 = 0.0083, meaning that the acceptable alpha rate is much lower and much more conservative than the initial 0.05. Another method that protects against fishing threats involves conducting another study to test the new hypothesis discovered when the data were explored. For example, consider a researcher who tested a new intervention and found that the initial analyses did not show the intervention to be more effective than the control. However, upon deeper analysis, the researcher discovered that men experienced a significant benefit, whereas women stayed the same. A new study could be conducted to test this hypothesis. If the second study resulted in the same findings, there would be stronger evidence to conclude that only men benefit from the intervention. Low Power Power is the ability of a study to detect a difference or relationship. Power is based on three things: sample size, effect size, and alpha level. The larger the sample is, the more powerful the study is. It is easier to detect a difference when you have many participants. Likewise, if you have a large effect, you will have greater power. If an intervention makes a major difference in the outcome, 27/10/16 5:14 pm 84 CHAPTER 5 ● Validity it will be easier to detect that difference than if an intervention makes only a minor difference. Recall from Chapter 3 that a Type II error occurs when no difference is found, but in actuality a difference is present. This occurs because of low power and is most often the result of small sample size. When you review a study with a small sample size that does not find a difference or a relationship, low power is a potential threat to statistical conclusion validity. However, it is also possible that, even with a large sample, the researcher does not find a difference or relationship. Protection Against Low Power Threats Power can be increased by changes in the alpha level, effect size, or sample size. In exploratory analyses, the researcher may utilize a higher alpha level, such as 0.10 instead of 0.05; however, in doing so, the researcher takes a greater chance of making a Type I error. It is more difficult to change the effect size, but the researcher needs to ensure that everything is in place to test whether the intervention is effective (e.g., trained individuals administer the intervention, strategies that foster adherence are used, etc.). The simplest way to increase the power of a test is to increase sample size. However, it can be costly in terms of both time and resources to conduct a study with a large sample. Researchers often conduct a power analysis to determine the smallest sample possible to detect an effect given a set alpha level and estimated effect size. The potential for Type II errors provides a strong rationale for using large samples in studies. With a large sample, a researcher is unlikely to make a Type II error. However, there are additional benefits to having a large sample size. With a large sample, outliers are less likely to skew the results of a study. For example, consider the average of the following six scores on an assessment: 5 ⫹ 4 ⫹ 5 ⫹ 3 ⫹ 26 ⫹ 5 ⫽ 48/6 ⫽ 8 The score of 26 is an outlier, when considering the other scores. The mean score for this sample of 6 is 8. When you look at each individual participant, 8 is a much higher score than the majority of the participants received. A single outlier misrepresents the group as a whole. Now consider a sample of 40 participants: sample represent the population, but it shows that more individuals are likely to respond when invited to complete the survey. The response rate is the number of individuals who respond to a request to participate in a survey or other research endeavor. The larger the response, the more accurate the results. In the case of survey research, the response rate is determined by dividing the number of surveys that were completed by the number of surveys that were administered. For example, if 200 surveys were sent out, and 150 people completed and returned them, the response rate would be 150/200 = 75%. Individuals who choose not to participate in a study may opt out of the study for a particular reason and, in doing so, bias the results. For example, if you are conducting a satisfaction survey for your therapy program and only 25% of your clients respond, it is possible that the individuals who responded are either highly dissatisfied or highly satisfied and therefore more motivated to voice their opinions. EXERCISE 5-1 Identifying Threats to Statistical Conclusion Validity (LO1 and LO2) Read the following scenario and identify which practices present potential threats to statistical conclusion validity. Suggest methods for controlling these threats. A new researcher who is a therapist wants to collect data to examine the efficacy of three different orthoses for a specific hand condition. The researcher plans to recruit clients from her clinic and expects that approximately 10 individuals will have the hand condition of interest. The following outcomes will be measured: pain, range of motion, fine motor control, and function. The researcher has no expectation as to which orthosis will provide the better outcome. QUESTIONS 1. Why are there fishing threats? 5 ⫹ 4 ⫹ 5 ⫹ 3 ⫹ 26 ⫹ 4 ⫹ 3 ⫹ 4 ⫹ 4 ⫹ 5 ⫹ 4 ⫹ 5⫹5⫹4⫹5⫹3⫹3⫹4⫹3⫹4⫹4⫹5⫹ 4⫹5⫹5⫹4⫹5⫹3⫹4⫹3⫹3⫹4⫹5⫹ 4 ⫹ 3 ⫹ 5 ⫹ 4 ⫹ 3 ⫹ 3 ⫹ 4 ⫽ 183/40 ⫽ 4.58 The outlier has a weaker effect on the group as a whole, and the mean for this sample is more in line with the typical scores. Another benefit of a large sample is that, the larger the sample, the more likely it is that the sample will represent the population. This fact is particularly relevant for survey research. Not only will a large 4366_Ch05_081-102.indd 84 2. How could the researcher address the fishing threats? 27/10/16 5:14 pm CHAPTER 5 ● Validity 3. Why are there threats due to low power? 4. How could the researcher address the low power threats? 85 However, there is always the possibility that there is an alternative explanation for the study results. Perhaps the difference was due to chance. Or it could be that the attention the children received is what made the difference, and not the intervention itself. Perhaps the individuals who administered the outcome assessments were biased and tended to give higher scores to the individuals in the intervention group. Although you can never be certain that the results of a study are entirely accurate, certain features of the study can greatly increase your confidence in its accuracy and validity. Threats to Internal Validity INTERNAL VALIDITY When evaluating a study for evidence, it is necessary to consider internal validity and how it may affect the study outcomes. A study has internal validity when the conclusions drawn from the results are accurate and true. Validity is not an either/or situation, but rather a matter of degree. For example, a study that examines the effectiveness of social stories in children with autism concludes that children in the intervention group had greater improvement in their social skills than children in the control group. If the study were internally valid, this would mean that it was truly the social stories intervention that improved the social skills. When examining internal validity, ask yourself, “Is there an alternative explanation for these study results?” Alternative explanations are often referred to as “threats” to internal validity. This section of the chapter characterizes common threats to internal validity, describes protections or solutions to avoid or minimize those threats, and identifies types of research situations in which these threats are most likely to occur. Table 5-2 summarizes the threats to internal validity and their protections. Assignment and Selection Threats Threats to internal validity can occur when a bias is present during the process of assigning or selecting TABLE 5-2 Threats to Internal Validity and Their Protections Threat Confounding Factor Affecting Outcome/Alternative Explanation Protection Maturation • Changes occur over time in participants as a result of development or healing. • Use control groups. • Ensure baseline equivalence through random assignment or participant matching. Assignment/ Selection • Groups are not equal on some important characteristics. • Random assignment • Participant matching • Statistical procedures such as covariance History • Events occur between the pretest and posttest. • Use control groups. • Ensure short time between pretest and posttest. • Ensure protection against exposure to alternative therapies. Continued 4366_Ch05_081-102.indd 85 27/10/16 5:14 pm 86 CHAPTER 5 ● Validity TABLE 5-2 Threats to Internal Validity and Their Protections (continued) Threat Confounding Factor Affecting Outcome/Alternative Explanation Protection Regression to the mean • Extreme scores change and move toward the mean with repeated testing. • Use control groups. • Exclude outliers. • Take the average of multiple measurements. Testing/practice/ order effects • Performance on measures changes due to exposure or some other feature of the testing experience. • Use control groups. • Use measures with good test/retest reliability • Use alternate forms of measures. • Counterbalance the order of measures. • Take breaks if fatigue is anticipated. Instrumentation • Invalid or unreliable measures, tester error, or poor condition of the instrument result in inaccurate outcomes. • Use measures with good reliability and validity. • Use measures that are sensitive to change. • Train the testers. • Maintain the instruments. • Blind the tester. Participant and Experimenter Bias Threats Rosenthal/ Pygmalion effect • Intervention leaders expect participants to improve, or participants expect to improve. • When a control group is involved, the expectation is that the experimental group will perform better than the control group. • Blind the intervention leaders and/or participants. • Limit contact between intervention and control leaders and participants. • Establish and follow protocols for interventions. Compensatory equalization of treatment • Intervention leader’s behavior encourages participants in the control group to improve to equal the intervention group, or control group participants are motivated to compete with the intervention group. • Blind intervention leaders and/or participants. • Limit contact between intervention and control leaders and participants. Compensatory demoralization • Intervention leader’s behavior discourages the control group, or participants are discouraged because of being in the control group. • Blind intervention leaders and/or participants. • Limit contact between intervention and control leaders and participants. Continued 4366_Ch05_081-102.indd 86 27/10/16 5:14 pm CHAPTER 5 ● Validity 87 TABLE 5-2 Threats to Internal Validity and Their Protections (continued) Threat Hawthorne effect Attrition/Mortality Confounding Factor Affecting Outcome/Alternative Explanation • Participants improve because of attention received from being in a study. • Blind intervention leaders and/or participants. • Ensure equal attention for intervention and control groups. • Participants who drop out affect the equalization of the group or other characteristics of participants. • Employ strategies to encourage attendance or participation. • Use statistical estimation for missing data. • Perform intent to treat analysis. participants for groups. If there are differences between the groups on a baseline characteristic, this difference might affect the outcomes of the study. Demographic characteristics, illness/condition issues, medication, and baseline scores on the outcome measures are examples of variables that should be considered when examining assignment and selection threats, because these characteristics can account for differences in outcome. An assignment threat indicates that the bias occurred when groups were assigned, whereas a selection threat indicates the bias occurred during selection— either the selection of participants or the selection of sites. For example, Killen, Fortmann, Newman, and Varady (1990) found that men responded better than women to a particular smoking cessation program. If at baseline there were more men in the intervention group and more women in the control group, the study would be biased toward finding more positive results than would exist with equal distributions of gender across the two groups. In another example, a splinting study for cerebral palsy may find that one group is receiving more antispasticity medication than another. The medication could then account for some or all of the differences in the outcomes. Comparisons of the intervention and control groups at baseline should be provided in the initial section of the results section of a study so the reader can determine if there are differences between the groups. A table is typically provided that outlines the comparison of the groups on important demographic characteristics as well as the baseline scores of the outcome variables. An example of this type of comparison is shown in From the Evidence 5-1. The table comes from an intervention study examining the efficacy of an education and exercise program to reduce the chronicity of low back pain. The 4366_Ch05_081-102.indd 87 Protection table compares the intervention and control groups at baseline (del Pozo-Cruz et al, 2012). Protection Against Assignment Threats Random assignment is the primary protection method used by researchers against assignment threats. In random assignment to groups, each research participant has an equal chance of being assigned to the available groups for an intervention or control. There are times when researchers are reluctant to use random assignment to a no-intervention control group, because it may be considered unethical to withhold treatment from a group. Sometimes this concern is managed by using a wait-list control group. The control group eventually receives the intervention, but not until the intervention group has completed the treatment. Random assignment does not ensure equal distribution, which is particularly true with small samples in which extremes in a few individuals can greatly influence the group results. However, it works particularly well with larger samples because you are more likely to form equivalent groups. Therefore, when evaluating evidence, examine the group comparisons presented in the results section of the study. Sometimes researchers use strategies such as matching study participants to ensure equal distribution on a particularly important characteristic. For example, if the researcher knows that the outcomes are likely to be influenced by a characteristic such as level of education, symptom severity, or medication, potential participants are identified and matched on the variable of interest, with one randomly assigned to the intervention group and one randomly assigned to the control group. Another procedure that can be used to minimize assignment threats is statistical equalization of groups. If 27/10/16 5:14 pm 88 CHAPTER 5 ● Validity FROM THE EVIDENCE 5-1 Table Comparing Study Groups del Pozo-Cruz, B., Parraca, J. A., del Pozo-Cruz, J., Adsuar, J. C., Hill, J., & Gusi, N. (2012). An occupational, Internet-based intervention to prevent chronicity in subacute lower back pain: A randomized controlled trial. Journal of Rehabilitation Medicine, 44(7), 581–587. doi:10.2340/16501977-0988. Table I. Baseline Characteristics of Participants in the Study (n = 90) Group Age (years) Sex (%) Male Female Smokers, yes/no, % Roland Morris Questionnaire score, points TTO, points SBST total score, points SBST psychological score, points Control group (n = 44) Intervention group (n = 46) p Mean (SD) 45.50 (7.02) Mean (SD) 46.83 (9.13) 0.44 11.4 88.6 50/50 11.65 (2.14) 15.2 84.8 56.5/43.5 12.28 (2.63) 0.59 0.53 0.22 0.78 (0.08) 4.38 (1.67) 2.36 (1.03) 0.75 (0.11) 4.36 (1.28) 2.28 (0.98) 0.23 0.95 0.70 p-values from t-test for independent measures or 2 test. TTO: Time Trade Off; SBT: STarT Back Tool; SD: standard deviation. Note A: The SBST is an outcome measure for the study. Note B: The p value is above 0.05 for all comparisons, indicating that the two groups are comparable (i.e., there are no statistically significant differences) at baseline. This is particularly true of the SBST total score. FTE 5-1 Question 1 Are the two groups equivalent on all key characteristics—both demographic variables and outcome variables? How do you know? one group is older than another group, age can be covaried in the statistical analysis so that it does not influence the outcomes. If, when reading the initial section of the results section, you find that the groups are not equal on one or more important characteristics (which sometimes occurs, even with random assignment), check to see if the researcher handled this by covarying that variable, or at least acknowledging the difference in the limitations section of the discussion. Maturation Threats Maturation is a potential threat in intervention research involving health-care practitioners. Maturation refers to changes that occur over time in research participants. 4366_Ch05_081-102.indd 88 Two major types of maturation threats are particularly common in health-care research: (1) changes that occur as part of the natural growth process, which is particularly relevant for research with children; and (2) changes that occur as a result of the natural healing process, which is particularly relevant for research related to diseases and conditions in which recovery is expected. In other words, is it possible that if left alone the research participants would have changed on their own? Maturation is of greatest concern when the time period between the pretest and posttest is prolonged, such as during longitudinal studies or studies with long-term follow-up. To illustrate the maturation threat, consider a study that examines an intervention for children with language delays. The study finds an improvement in language from 27/10/16 5:14 pm CHAPTER 5 ● Validity the pretest to the posttest; however, without adequate protection from other influences, it is difficult to determine whether the intervention caused the improvement or the change occurred as a result of developmental changes in language. Maturation would be an even greater concern if the study extended over a significant period of time, such as throughout a school year. Similarly, an intervention study examining changes in mobility for individuals after hip replacement would need to take into account the maturation threat, because individuals can experience improved mobility without therapy. Maturation is in play whether the natural changes are positive or negative. When conditions result in a natural decline, the goal of therapy is often to reduce the speed with which that decline occurs. For example, if a therapist is using a cognitive intervention for individuals with Alzheimer’s disease, it would be challenging to determine if a decline were less severe than would have occurred naturally over the course of the illness. However, studies can be designed with the proper protections to determine whether a particular intervention reduces the natural course of a decline in functioning. Protections Against Maturation Threats The primary protection against maturation threats is use of a control group. If the intervention group improves more than the control group, the difference between the two groups is more likely to be due to the intervention, even if both groups improve over time. The degree of improvement that the intervention group makes above and beyond the control group is likely due to the intervention. Another protection against maturation threats is outcome scores that are similar at baseline for the control and intervention groups. This allows you to be more confident that the groups start out at a similar place, and makes interpretations of changes from pretest to posttest more straightforward. Random assignment and matching of participants are additional strategies that increase the likelihood that the 89 groups will be equal at baseline. (Random assignment and matching are described in detail in the section on assignment and selection threats.) In the results section of a research study, typically the first report of results is the comparison of the intervention and control groups; this includes pretest scores on the outcomes of interest and demographic variables that could affect the findings. Finally, you can be more certain that maturation is not a factor when the time between the pretest and posttest is short and when it is unlikely that changes would occur without an intervention. History Threats A history threat involves changes in the outcome or dependent variable due to events that occur between the pretest and posttest, such as a participant receiving an unanticipated treatment or exposure to an activity that affects the study outcome. In this case, the threat may also be referred to as an alternative treatment threat. For example, participants in a fall prevention program may start attending a new senior center that provides exercise classes with an emphasis on strength and balance. In fact, any external event that can affect the dependent variable is a potential threat. A new teacher in a classroom who uses an innovative approach, participation in a health survey that draws attention to particular health practices, or a new fitness center opening in the participants’ neighborhood could pose a threat to internal validity. History can also have a negative effect on outcomes. A snowstorm might affect attendance, or scheduling a weight-loss program around the Thanksgiving and Christmas holidays could interfere with desired outcomes and act as a threat to internal validity. Protections Against History Threats History threats are avoided by many of the same strategies that are used to protect against maturation effects. The use of a control group provides protection, as long as both groups have the same potential exposure to the historical FROM THE EVIDENCE 5-1 (CONT.) FTE 5-1 Question 2 Using this example, why is it important that participants in the intervention and control groups have similar scores at baseline on the STarT Back Tool? How does equivalence at baseline protect against maturation threats? 4366_Ch05_081-102.indd 89 27/10/16 5:15 pm 90 CHAPTER 5 ● Validity event. Likewise, history is reduced as a threat when there is a shorter time between pretest and posttest. Researchers can also put protections in place to reduce exposure to alternative treatments, such as requiring participants to avoid alternative exercise programs or drug therapies while involved in the study. The researcher can include questionnaires or observations to help determine if events occurred that might affect the outcome. Regression to the Mean Threats Regression to the mean refers to a phenomenon in which extreme scores are likely to move toward the average when a second measurement is taken; extremely high scores will become lower, and extremely low scores will become higher. When taking a test for a second time, it is always possible—even likely—that you will not receive the exact same score. This phenomenon is especially predictable in individuals who initially score at the extremes of the distribution. At the ends of the distribution, it is less likely that a second test score will become even more extreme; instead, extreme scores tend to regress toward the mean. The “Sports Illustrated curse” serves as a case in point. It is often observed that after someone is featured in Sports Illustrated, that individual has a decline in performance. Regression to the mean would explain this observation, because the individual athlete is likely featured when he or she is at a peak of performance and superior to most if not all other athletes in that sport. Consequently, subsequent performance is likely to move toward the average, rather than improve. In health-care research, study participants often start with extreme scores because of their condition. Therefore, when extreme scores are involved, regression to the mean should be considered a potential threat. Figure 5-1 depicts the normal curve and illustrates the propensity for extreme scores to regress toward the mean; the extreme scores toward both ends of the continuum move toward the middle. Protection Against Regression to the Mean Threats Similar to history and maturation threats, regression to the mean is protected against by use of a control group. Once again, if the treatment group outperforms the control group, the difference between the groups is most likely due to the intervention. The importance of a control group should be more and more apparent; control groups are valuable because they address multiple threats to validity. One other option is for researchers to exclude outliers from a study, although this tactic is not feasible when large numbers of participants could be classified as outliers. When small samples are necessary, the threat posed by an extreme score at baseline may be reduced by taking multiple pretest measures and using the average. For example, waist circumference can be challenging to measure accurately, so during testing, three measures may be taken and then averaged. Testing Threats Testing as an internal validity threat occurs when changes in test performance are a result of the testing experience itself. A testing effect is present when an earlier experience somehow affects a later testing experience. There are many different ways in which this can occur. The testing experience often sensitizes participants to a desirable outcome. For example, the pretest may ask questions about following a home program, so the individual becomes sensitized to this behavior and begins following the program (as a result of the test, not the intervention). In another example, pedometers and other devices are often used as a measure of physical activity. The simple act of wearing the pedometer can influence how far an individual walks because the presence of the pedometer motivates the person to walk more, especially when the participant can see the readings. In this case, it is the wearing of the pedometer and not the intervention that causes the change. The tester can also influence the outcomes of the testing with behaviors such as providing cues to enhance performance, such as, “Try harder, you can do a few more.” Practice effects are a type of testing threat that occurs when exposure to the pretest allows the individual to perform better on the posttest. Prior exposure can mean the test is more familiar, the participant is Normal Curve Standard Deviation 2.5% 2.5% 34% 13.5% –3 4366_Ch05_081-102.indd 90 –2 –1 34% 0 13.5% 1 2 3 FIGURE 5-1 Normal curve, showing standard deviations and propensity for extreme scores to regress toward the mean. The shape of the curve suggests that individuals who score toward the ends are more likely to move toward the middle on a second testing. 27/10/16 5:15 pm CHAPTER 5 ● Validity less anxious, and the participant can adopt a strategy for improved performance at posttest. For example, students who receive a handwriting test before and after a handwriting intervention may do better on the second test, simply because of exposure and practice from the first test. In the case of order effects, there is a change in performance based on the order in which the tests are presented. For example, in a long testing session there may be a decline in performance due to fatigue. Protection Against Testing Threats A measure with strong test-retest reliability is a good start for protecting against testing threats. Standardization, scripts, and training for the tester also can reduce biases that the tester may introduce. Instead of logs or diaries that can act as prompts to engage in a behavior, time sampling can be used to protect against testing threats. With time sampling, individuals receive a prompt, but the prompt is unexpected and random, and the individual or an instrument records what the person is doing at that point in time. Control groups are also beneficial with testing threats. If both groups receive some benefit from exposure to the testing situation, the difference between the control and intervention groups still represents the intervention effect. Some measures are more vulnerable to practice effects than others. For example, a list learning test that assesses memory is less reliable the second time it is administered because the participant may remember words from the first time. In contrast, range-of-motion testing is less amenable to practice effects. Alternate forms are often used when a measure can be learned, such as a list of words. In this case alternate lists are made available, so that the individual is tested with a different set of words. 91 many therapy studies use self-report measures to assess an individual’s functional performance. Although these measures are easy to administer, they may not provide accurate results. Sabbag et al (2012) found that performance measures were more accurate in assessing daily living skills in individuals with schizophrenia than was self-report. Another aspect of instrumentation threat is the “ceiling effects” or “floor effects” of a measure. If ceiling effects are in play, the participants may have such a high score at the beginning that there is no room for improvement. Alternatively, the test may be so difficult (floor effects) that it is unlikely the researcher will be able to detect a significant change. Protection Against Instrumentation Threats Proper selection of measures is an essential protection against instrumentation threats prior to subject selection. It is essential to know the reliability and validity of measures used in a study and their sensitivity to change. Godi et al (2013) compared two balance measures—the Mini-BESTest and the Berg Balance Scale—in terms of their sensitivity in detecting change. They found that the Berg Balance Scale had greater ceiling effects, suggesting that the Mini-BESTest may be the better instrument to use in a study examining the efficacy of an intervention to improve balance. Training of testers also provides protection against instrumentation threats. If multiple testers are used, inter-rater reliability among the testers should be established. Electronic and mechanical measures should receive the necessary maintenance and calibration. For example, audiologists are particularly cognizant of the importance of calibration, as testing of hearing impairment would be significantly compromised with a poorly calibrated instrument. Instrumentation Threats Instrumentation threats occur when a measure itself or the individual administering the measure is unreliable or invalid. Instrumentation threats are a common problem in research. A well-designed study is rendered useless if instrumentation poses a threat to its validity. When using mechanical or electronic measures, the quality, condition, and calibration of the instruments can affect outcomes. When an instrument is in poor condition, the measurements may be inaccurate. For example, it is recommended that the Jamar dynamometer be professionally calibrated on a yearly basis (Sammons Preston, n.d.). Human error can also play a role in instrumentation threats. For example, a tester may provide incorrect instructions to a participant or make poor judgments when scoring an observational measure. The test itself may be a poor choice for the study, such as when it does not accurately measure the intended outcome; this represents an issue with validity of the test, which is a very important feature for an intervention study. For example, 4366_Ch05_081-102.indd 91 Experimenter and Participant Bias Threats Experimenter bias is introduced when the research process itself affects the outcomes of the study, whether intentionally or unintentionally. Experimenter bias can be introduced by the person(s) providing the intervention. A classic experimenter bias, known as the Rosenthal effect or the Pygmalion effect, occurs when the researcher sets up different expectations for the intervention and control groups. The term Rosenthal effect comes from an experiment by Rosenthal and Jacobson (1968) that involved teachers. Rosenthal communicated to some teachers that they should expect a strong growth spurt in intellectual ability from the students, whereas other teachers were not given this information. Students performed better when the teachers expected them to perform better. This study was set up to study this effect, but the same phenomenon can occur unintentionally when an intervention leader communicates an expectation of better outcomes from the intervention group. The higher expectations become a 27/10/16 5:15 pm 92 CHAPTER 5 ● Validity self-fulfilling prophecy. Perhaps the leader provides more attention or enthusiasm, or works harder at providing the intervention, or the participants pick up on the leader’s expectations and respond in kind. Just being assigned to a particular group can introduce an experimenter bias. For example, without the leader’s prompting, the control participants may want to compensate for not being picked for the intervention. In many rehabilitation studies, the control group receives standard treatment or “treatment as usual.” If the control group is aware that the intervention group is receiving something new, they may try to compensate for this difference. Another bias that can be introduced by the experimenter is compensatory equalization of treatments. In this case, the intervention leaders for the control group may feel compelled to work harder to compensate for the fact that the control group is not receiving the intervention. This type of bias is similar to the Rosenthal effect, but directed toward the control group. The control group may also respond in the other direction and feel discouraged because they are not receiving the intervention. In response, they may not try as hard or give up. This threat to validity is called compensatory demoralization. The threats of compensatory equalization and demoralization are more likely to occur when the leaders and/or participants of the control and treatment groups interact with one another. Participant bias threats come into play when the participant’s involvement in the study affects the outcomes. The Hawthorne effect occurs when participants respond to the fact that they are participating in a study and not the actual intervention (Mayo, 1949). The term comes from research conducted at the Hawthorne electric plant. Many variables were studied to determine what factors might affect productivity, such as lighting or changes in workstations. No matter what was studied and how insignificant the change, there was a change in productivity. It was concluded that the change and not the actual condition was resulting in greater productivity. The Hawthorne effect may occur because participants behave as expected or want to please the researcher. In an interesting study examining the efficacy of ginkgo biloba in Alzheimer’s disease, McCarney and colleagues (2007) examined follow-up as a confounding variable influenced by the Hawthorne effect. Some participants had minimal follow-up, whereas others had intensive follow-up. In other words, the intensive follow-up group received more attention from the researchers. The results indicated that participants receiving intensive follow-up had greater cognitive improvement than participants receiving minimal follow-up. This result is a particularly remarkable example of the Hawthorne effect, given that cognition was measured using an objective, standardized assessment (ADAS-cog) that included 11 cognitive tasks, such as word recall, orientation, and the ability to follow commands. Protection Against Experimenter and Participant Bias Threats Unlike many of the previous threats to validity, random assignment to an intervention and control group does not protect against experiment and participant bias. However, blinding of the intervention leaders and participants provides a strong protection against these threats. If the leaders and participants do not know whether they are providing or receiving the intervention, it is more difficult to introduce a bias. This is a common approach in drug trials in which a placebo is used in place of the actual medication. In rehabilitation research, it is often difficult to blind intervention leaders and participants; therapists will know EVIDENCE IN THE REAL WORLD How Lack of Blinding Participants Can Lead to Compensatory Equalization In rehabilitation and therapy practices, it is difficult and in many cases impossible to blind the intervention leaders and participants to which group is receiving the experimental intervention. If you are the intervention leader, you have to know what you are leading (as opposed to offering a placebo pill); if you are a participant, you will likely know what intervention you are participating in. In a real-life example, I was administering a weight-loss program for individuals with psychiatric disabilities. Participants were randomly assigned to either an intervention or a no-treatment control group. After the informed consent process, participants were told which groups they were assigned to. Some control participants voiced a desire to show the researchers that they could lose weight on their own, apparently compensating for not being assigned to the intervention. In some cases it worked, and indeed several control participants were successful in losing weight during the time they participated in the study. To control for this confound, it would have been helpful for both groups to receive some form of intervention, so that neither group felt compelled to prove something based on their group assignment. As a reader of evidence, it is often hard to discern when individuals are responding to experimenter or participant bias; however, it is useful to know that the protections discussed in this section are in place to protect against potential problems. 4366_Ch05_081-102.indd 92 27/10/16 5:15 pm CHAPTER 5 ● Validity they are providing an intervention and will typically know when they are providing the experimental intervention. A form of a placebo is provided in some rehabilitation studies when the control group receives an intervention that equalizes attention. When a new intervention is compared with standard therapy and when both groups receive the same amount of intervention time, experimenter and participant bias is less of a concern. Therefore, equal attention to groups is generally preferable to a no-treatment control. However, participants may know through the informed consent process that they are receiving the experimental intervention. Typically when individuals volunteer to be in a study, they want to receive the new intervention, so this can lead to disappointment if they are not assigned to the experimental condition. Other methods can be used to minimize experimenter and participant bias. Clear protocols for the administration of the interventions can reduce bias. In some cases the therapists may be expected to follow the protocol and ideally do not know which intervention is expected to yield superior results. It is also helpful to limit interactions between intervention leaders to further prevent the development of bias. Similarly, keeping the participants in the intervention and control groups separate diminishes bias on the part of those receiving the treatment. In addition, it is often possible to blind the individuals who administer the outcome assessments in order to reduce or eliminate bias in scoring the assessments. Attrition/Mortality Threats Threats due to attrition, also called mortality, involve the withdrawal or loss of participants during the course of the study. The process of informed consent acknowledges that participants can withdraw from a study at any point in time. Individuals withdraw from studies for many reasons, which may or may not relate to the study itself. Individuals may move or experience other personal issues that require withdrawal. Others may withdraw because they are no longer interested in the study, find the time commitment too great, or feel disappointed in the intervention. When people withdraw from a study, it can affect the equalization of groups that was achieved at the outset. When substantial numbers of participants withdraw from a study, group differences can emerge that confound the results of the study. When attrition occurs, it is important to identify any characteristics of the individuals who dropped out of the study that might make them different from the individuals who remained in the study. Perhaps the individuals who dropped out were experiencing a more severe condition, in which case you would not know if the intervention was effective for that group of individuals. Attrition may also result in an uneven number of participants in the groups. Protections Against Attrition/Mortality Threats Depending on the length of the study and access to participants, a researcher may be able to recruit additional 4366_Ch05_081-102.indd 93 93 participants to replace individuals who drop out and thus maintain the overall power of the study. In addition, strategies such as reminder phone calls and e-mails can be used to promote good attendance for an intervention or follow-through with therapy. Characteristics of the individuals who withdraw should be compared with those of the individuals who remain. If differences exist, this factor should be identified as a limitation of the study. Statistical procedures can be used to account for attrition/mortality threats, such as using estimates for missing data, but this approach is less desirable than having actual participant scores. An “intent to treat” analysis can be used in which the data of individuals who did not receive the intervention are still included in the analysis. This analysis is similar to real-life practice, in which some individuals do not complete or follow through with all aspects of their treatment. It also maintains the integrity of the randomization process and baseline equality of groups. EXERCISE 5-2 Detecting Potential Threats to Internal Validity in a Research Study (LO3 and LO4) Analyze the following two study abstracts and determine which threats to internal validity (among the options provided) are likely to be present. Before looking at the answers, write down a rationale for why you do or do not think a particular threat may confound the interpretation of the results. In other words, would the threat suggest that something other than the intervention resulted in the improvement? You can find the answers at the end of the chapter. STUDY #1 Hohler, A. D., Tsao, J. M., Katz, D. I., Dipiero, T. J., Hehl, C. L., Leonard, A., . . . Ellis, T. (2012). Effectiveness of an inpatient movement disorders program for patients with atypical parkinsonism. Parkinson’s Disease (2012), 871-974. doi:10.1155/2012/871974 (Epub 2011 Nov 10). Abstract This paper investigated the effectiveness of an inpatient movement disorders program for patients with atypical parkinsonism, who typically respond poorly to pharmacologic intervention and are challenging to rehabilitate as outpatients. Ninety-one patients with atypical parkinsonism participated in an inpatient movement disorders program. Patients received physical, occupational, and speech therapy for 3 hours/day, 5 to 7 days/week, and pharmacologic adjustments based on daily observation and data. Differences between admission and discharge scores were analyzed for the functional independence measure (FIM), timed up and go test (TUG), two-minute 27/10/16 5:15 pm 94 CHAPTER 5 ● Validity walk test (TMW), Berg balance scale (BBS) and finger tapping test (FT), and all showed significant improvement on discharge (P > .001). Clinically significant improvements in total FIM score were evident in 74% of the patients. Results were similar for ten patients whose medications were not adjusted. Patients with atypical parkinsonism benefit from an inpatient interdisciplinary movement disorders program to improve functional status. Consider this: teasing. At posttesting, the CFT group was superior to the DTC group on parent measures of social skill and play date behavior, and child measures of popularity and loneliness. At 3-month follow-up, parent measures showed significant improvement from baseline. Post-hoc analysis indicated more than 87% of children receiving CFT showed reliable change on at least one measure at posttest and 66.7% after 3 months follow-up. Consider this: • Not included in this abstract is the length of treatment. Participants’ length of stay varied from 1 to 6 weeks, with an average of 2.5 weeks. Also, the intervention leaders administered the assessments. • A working knowledge of atypical parkinsonism symptoms, course, and treatment will be useful in identifying threats to validity. You can obtain more information at http:// www.ncbi.nlm.nih.gov/pubmedhealth/PMH0001762/ • If you would like more information about the study, you may want to use your library resources to obtain the full text of the article. • Ten participants did not complete the intervention and therefore were not included in the follow-up data. • The following table was included in the study, comparing the two groups at baseline. QUESTIONS Sample Characteristics for Children’s Friendship Training (CFT) and Delayed Treatment Control (DTC) Conditions Variable Explain your answer. STUDY #2 Frankel, F., Myatt, R., Sugar, C., Whitham, C., Gorospe, C. M., & Laugeson, E. (2010, July). A randomized controlled study of parent-assisted Children’s Friendship Training with children having autism spectrum disorders. Journal of Autism and Developmental Disorders, 40(7), 827-842. doi:10.1007/ s10803-009-0932-z Abstract This study evaluated Children’s Friendship Training (CFT), a manualized parent-assisted intervention to improve social skills among second to fifth grade children with autism spectrum disorders. Comparison was made with a delayed treatment control group (DTC). Targeted skills included conversational skills, peer entry skills, developing friendship networks, good sportsmanship, good host behavior during play dates, and handling p CFT M (SD) n = 35 DTC M (SD) n = 33 Age (months) 103.2 (15.2) 101.5 (15.0) ns Grade 3.2 (1.0) 3.4 (1.2) ns SESa 44.6 (10.6) 50.6 (11.8) ns Percent male 85.7 84.8 ns Percent Caucasian 77.1 54.5 ns WISC-III Verbal IQ 106.9 (19.1) 100.5 (15.7) ns ASSQ 22.4 (7.3) 22.0 (9.3) ns Communication 84.3 (20.5) 79.8 (15.3) ns Daily living 62.4 (15.7) ns 1. Based on your reading of the abstract and any additional resources, which of the following would you consider noteworthy threats to internal validity? A. Maturation B. History C. Testing Group VABSb 67.0 (18.2) Continued 4366_Ch05_081-102.indd 94 27/10/16 5:15 pm CHAPTER 5 ● Validity Sample Characteristics for Children’s Friendship Training (CFT) and Delayed Treatment Control (DTC) Conditions (continued) Variable Socialization a b Group CFT M (SD) n = 35 DTC M (SD) n = 33 66.3 (10.8) 66.1 (10.8) Composite 68.1 (16.4) 64.4 (11.0) # sessions attended 11.3 (0.8) 10.7 (1.9) C. Instrumentation D. Attrition Explain your answer: p ns ns ns DTC n = 32 CFT n = 34 EXTERNAL VALIDITY External validity is the extent to which the results of a study can be applied to other people and other situations. External validity speaks to the generalizability of a study. A study has more external validity when it reflects real-world practice. As an evidence-based practitioner, those studies that include conditions that are more similar to your practice will have more external validity, and the results can be applied with greater confidence. Threats to External Validity If you would like more information about the study, you may want to use your library resources to obtain the full-text article. QUESTIONS 2. Based on your reading of the abstract and any additional resources, which of the following would you consider noteworthy threats to internal validity? A. Maturation B. Selection 95 Threats to external validity occur when the situation or study participants are different from the real world or the clinical setting. As with internal validity, external validity is a continuum; a study may have good or bad external validity, but it will never be perfectly valid. As an evidence-based practitioner, it is important that you evaluate the characteristics of the people and situations in a study to determine how similar those characteristics are to your own practice. Table 5-3 summarizes threats to external validity and protections against those threats. TABLE 5-3 Threats to External Validity and Their Protections Threat Confounding Factor Affecting Generalizability Protection Sampling error • Sample does not represent the population. • Use random assignment. • Use large samples. • Select participants from multiple sites. • Replicate the study with new samples. Poor ecological validity • Conditions of the study are very different from real-world practice (when the research is administered in a manner that closely mirrors real-life practice, the results will be more generalizable). • Ensure researcher is sensitive to issues of real-world practice. • Replicate with effectiveness studies. 4366_Ch05_081-102.indd 95 27/10/16 5:15 pm 96 CHAPTER 5 ● Validity Sampling Error A primary principle of quantitative research involves generalizing the results from a sample to the population. Sampling error is any difference that exists between the population and the sample. The exact nature of sampling error is not always known, because many characteristics of the population are unknown. However, among known characteristics it is possible to compare the sample with the population to identify similarities and differences. For example, boys are approximately five times more likely than girls to be diagnosed with autism (CDC, 2012), although even this is an estimate from a sample. In a study that intends to represent children with autism, a more representative sample would be one with a similar gender distribution. Protections Against Sampling Error Sampling methods influence external validity. Although the best method for obtaining a representative sample is to randomly select a large sample from the target population, this is not always possible. In random sampling every individual in the population has an equal chance of being selected. With a large random sample, you are likely to select a group of participants that is representative of the population. Unfortunately, true random sampling rarely happens in health-care research, because it is usually impractical to sample from an entire population. For example, in an intervention study of people with multiple sclerosis, the population would be all individuals with multiple sclerosis. A worldwide sampling and then administration of the intervention would be next to impossible. True random sampling does occur in some research when the sample is smaller and accessible. For example, a study of members of the American Occupational Therapy Association could be obtained through random sampling of the membership list. The most common method of sampling in health-care research is convenience sampling. In this method, participants are selected because they are easily available to the researcher. When conducting a study of individuals with a particular condition or disability, it is likely that the researcher will go to a treatment facility that provides services to those individuals. Then the researcher might ask for volunteers, or a clinician might approach each person who meets the study criteria when that person is admitted. The lack of randomness in the process presents a high potential for introducing bias or sampling error. When samples are selected from one school, one neighborhood, or one clinic, for example, they are more likely to have characteristics that are different from the population as a whole; depending on the setting, they may be poorer, older, or more symptomatic. 4366_Ch05_081-102.indd 96 One method for reducing sampling error is by selecting a large sample from multiple settings. A larger sample is more likely to approximate the population. In addition, multiple settings can be selected to represent the heterogeneity of a population. For example, in considering the generalizability of a study of children with attention deficit hyperactivity disorder (ADHD), it is more likely that a sample would represent the population if children from both urban and suburban locations in different areas of the country were recruited, to better represent the racial and socioeconomic characteristics of the population. In the results section of a study, it is important for the researcher to provide a detailed description of the study participants. Many journals require that gender, age, and race at a minimum be included. As an evidence-based practitioner, you can review this information to determine if the sample is representative of the population. However, more important to you is whether the sample in the study is similar to the clients you work with. When a study sample is similar to your clientele, you are more justified in generalizing the findings. Ecological Validity Threats Ecological validity refers to the environment in which a study takes place and how closely it represents the real world. The treatment or method by which a study is administered, the time during which a study takes place, and where a study takes place are all important considerations affecting the external validity or generalizability of a study. Sometimes the administrators of the intervention in a study are highly trained, more so than the typical practitioner. The study time period may last longer than the length of stay covered by most insurance companies, or the intervention may be more intense than standard practice. The study may take place in an inpatient setting, although most individuals with the condition are actually treated on an outpatient basis. Any differences from the study conditions and real-world practice represent threats to external validity. The generalizability of a particular study will be good in situations that are similar to those in the study and poor in those that are different. Protections Against Ecological Validity Threats Practitioners are more likely to apply research that is relevant and practical to real-world practice, and studies have greater ecological validity when they are sensitive to typical practice situations. For example, a researcher may ensure that the intervention takes place in the typical time frame during which clients receive therapy, or that the therapists providing the intervention are those who already work in a particular type of hospital. As a practitioner, it is important to apply the results of a study cautiously and consider the similarity to your 27/10/16 5:15 pm CHAPTER 5 ● Validity own situation. A study is more generalizable to your practice and clients when the characteristics of the study are similar. For example, a study by Sutherland et al (2012) that examined exposure therapy for posttraumatic stress disorder (PTSD) in veterans will be more applicable to practice situations that involve treating veterans with PTSD. The results of the study will be less applicable and have less external validity for treating PTSD in women who have experienced sexual abuse. In another example, a well-designed study that involved 24 weeks of Tai Chi showed that it was effective in improving balance for individuals with Parkinson’s disease (Tsang, 2013). However, if you are unable to see clients for a 24-week time period, this study is less relevant for your practice setting. Nevertheless, you may be able to use the results of this study to justify to your administrators and/or insurance companies why a longer length of stay is warranted. Replication to Promote Generalizability Replication, or reproducibility, is essential to the generalization of research and a primary principle of the scientific method. It is important in the generalization of both samples and situations. Study findings must be capable of being repeated to ensure generalizability and applicability of the results. If several studies yield similar findings about the efficacy of an intervention or a predictor of an outcome, clinicians can be more confident in those results. Another consideration in replication is the researchers themselves. Even if there are several studies that support a particular approach, if all of those studies were conducted by the same researcher, there should be some concern that 97 the results will not generalize to other situations. There may be reasons why one researcher is able to garner more positive findings than another. Perhaps that researcher and his or her team are exceptional therapists and it is their general competence as opposed to the actual intervention that makes the difference. In addition, when findings are particularly surprising or remarkable, replication is important. These kinds of findings are interesting, and thus will have a higher likelihood of being published. However, replication will reveal whether the findings were a result of chance (i.e., a Type I error). Replication is often a matter of degree. Studies rarely follow the exact procedures of a previous study to determine whether the same results are obtained. Typically variables are manipulated to extend the findings of previous research. A replication study may shorten an intervention period, utilize a different outcome measure, apply the approach to a different sample, or administer the intervention in a new setting. The ability of research to build upon previous work is part of the power of the scientific method. INTERNAL VERSUS EXTERNAL VALIDITY When designing a study, the researcher must find a balance between internal and external validity. Studies that are tightly controlled to maximize internal validity will have less external validity. For example, inclusion criteria that produce a homogeneous sample, a strict protocol for administering the intervention, expert intervention EVIDENCE IN THE REAL WORLD How Replication Changed the Perception of Facilitated Communication In the early 1990s, there was great interest in facilitated communication for individuals with autism. Much of this interest came from the work of Biklen and colleagues (1992), who got surprising and amazing results with this technique. In facilitated communication, the facilitator provides physical assistance to help a person with autism type out a message on a keyboard. The assumption is that facilitated communication overcomes neuromotor difficulties that interfere with the abilities of a person with autism to communicate. In an uncontrolled study of 43 individuals with autism, Biklen and colleagues reported startling outcomes. Previously nonverbal individuals were writing grammatically correct sentences and paragraphs, and even poetry. Skepticism about these findings led other researchers to conduct controlled studies. A review by Green (1994) found that when the facilitator’s influence was controlled, the technique was no longer useful. The review suggested that the facilitator’s belief in the potential of facilitated communication and the client’s untapped capabilities led him or her to unconsciously or unintentionally guide the communication process. Now organizations such as the American Speech and Hearing Association and the American Psychological Association assert that there is no evidence to support facilitated communication for individuals with autism. 4366_Ch05_081-102.indd 97 27/10/16 5:15 pm 98 CHAPTER 5 ● Validity leaders, and limited exposure to alternative treatments will yield results that can be interpreted in the context of a cause-and-effect relationship, yet do not reflect everyday practice. In contrast, studies that are conducted under real-world conditions are “messier”; that is, there are not as many controls in place, and real-world conditions introduce more alternative explanations of the outcome—greater external validity at the expense of internal validity. When accumulating research evidence regarding a particular intervention, you may use a process that begins with studies with high internal validity and moves to studies with greater external validity. First, it is important to know if an intervention is effective under ideal conditions; that is, a highly controlled study with strong internal validity. Once the efficacy of the intervention is established, future studies can examine the same intervention in more typical practice conditions. The difference between these studies can be referred to as efficacy versus effectiveness. An efficacy study is one that emphasizes internal validity and examines whether an intervention is effective under ideal conditions. With efficacy studies, you can be more confident that the intervention is what made the difference; however, the conditions of the study are likely to differ from real-world conditions. In an effectiveness study, the study conditions are more reflective of real-world practice; however, the untidy nature of practice means that there could be more threats to internal validity in play. Studies about therapy practices will always have threats to validity. Researchers face significant challenges in designing a study and must find a balance that involves minimizing threats, being pragmatic, and operating ethically. From the Evidence 5-2 provides an example of an effectiveness study that carries out a strength training intervention in existing community fitness facilities. EXERCISE 5-3 Managing Threats to Validity in a Particular Research Study (LO3, LO4, LO5, and LO6) intervention designed to increase the amount and intensity of physical activity for children in primary grades 1 through 3. Two schools have agreed to participate in the study. One school is located in an urban setting with children from mostly low socioeconomic and racially diverse backgrounds. Another school is located in a suburban setting with children from mostly high socioeconomic and Caucasian backgrounds. QUESTIONS Consider the following issues and describe how the researcher might reduce threats to validity. The following situations address both internal and external validity issues. 1. The schools in which the researcher plans to implement the study will not allow the researcher to randomly assign children to groups. What is the threat to validity, and how can the researcher manage this threat? 2. The researcher plans to increase interest in physical activity by including a climbing wall and other new but expensive equipment as part of the activity program. What is the threat to validity, and how can the researcher manage this threat? 3. To determine if the activity at school carries over to home, parents are asked to keep a log for one week of their child’s participation in activity, which includes type of activity, time engaged, and level of intensity. What is the threat to validity, and how can the researcher manage this threat? Childhood obesity is a major public health risk, and many efforts have been made to address the problem. A researcher is interested in studying a new 4366_Ch05_081-102.indd 98 27/10/16 5:15 pm FROM THE EVIDENCE 5-2 An Example of an Effectiveness Study Minges, K. E., Cormick, G., Unglik, E., & Dunstan, D. W. (2011). Evaluation of a resistance training program for adults with or at risk of developing diabetes: An effectiveness study in a community setting. International Journal of Behavioral Nutrition and Physical Activity, 8, 50. doi:10.1186/1479-5868-8-50. Note A: The researchers were interested in taking an intervention with efficacy in a controlled condition and assessing its effectiveness in existing fitness facilities. Note B: The large number of dropouts (there were 86 participants at 2 months, but only 32 at 6 months) is not unexpected in a fitness center. People often discontinue their fitness program. BACKGROUND: To examine the effects of a community-based resistance training program (Lift for Life®) on waist circumference and functional measures in adults with or at risk of developing type 2 diabetes. METHODS: Lift for Life is a research-to-practice initiative designed to disseminate an evidence-based resistance training program for adults with or at risk of developing type 2 diabetes to existing health and fitness facilities in the Australian community. A retrospective assessment was undertaken on 86 participants who had accessed the program within 4 active providers in Melbourne, Australia. The primary goal of this longitudinal study was to assess the effectiveness of a community-based resistance training program, thereby precluding a randomized, controlled study design. Waist circumference, lower body (chair sit-to-stand) and upper body (arm curl test) strength, and agility (timed up-and-go) measures were collected at baseline and repeated at 2 months (n = 86) and again at 6 months (n = 32). RESULTS: Relative to baseline, there was a significant decrease in mean waist circumference (-1.9 cm, 95% CI: -2.8 to -1.0) and the timed agility test (-0.8 sec, 95% CI: -1.0 to -0.6); and significant increases in lower body (number of repetitions: 2.2, 95% CI: 1.4-3.0) and upper body (number of repetitions: 3.8, 95% CI: 3.0-4.6) strength at the completion of 8 weeks. Significant differences remained at the 16-week assessment. Pooled time series regression analyses adjusted for age and sex in the 32 participants who had complete measures at baseline and 24-week follow-up revealed significant time effects for waist circumference and functional measures, with the greatest change from baseline observed at the 24-week assessment. CONCLUSIONS: These findings indicate that an evidence-based resistance training program administered in the community setting for those with or at risk of developing type 2 diabetes can lead to favorable health benefits, including reductions in central obesity and improved physical function. Note C: Without a control group, you could be less certain that Lift for Life made the difference. Perhaps individuals attending the fitness center took advantage of other programs or were more likely to exercise outside the program. Because the assessors are not blind to group assignment, they may consciously or unconsciously show a bias in scoring individuals whom they hoped were improving. This example demonstrates how improving external validity can sometimes compromise internal validity. FTE 5-2 Question 4366_Ch05_081-102.indd 99 Using the Lift for Life study example, how did improving external validity compromise internal validity? 27/10/16 5:15 pm 100 CHAPTER 5 ● Validity CRITICAL THINKING QUESTIONS 1. Why is a large sample generally more desirable than a small sample in research (give at least three reasons)? 6. Explain the differences between random selection and random assignment. What aspects of validity are addressed by these research practices, and how? 7. Why is it difficult to design a study that is strong in both internal and external validity? How can you balance the two types of validity? 2. Why is a randomized controlled trial considered the strongest single study design? ANSWERS 3. Why might random assignment to groups result in ethical concerns? 4. Although pretests are generally desirable, how can they potentially pose a threat to validity? 5. For each of the following three situations, how can the researcher manage threats to validity to determine whether the new intervention is effective? • Comparing a new intervention with a no-treatment control group • Comparing a new intervention with a treatmentas-usual control group • Comparing a new intervention with another evidence-based intervention 4366_Ch05_081-102.indd 100 EXERCISE 5-1 1. There are two reasons why fishing threats exist: The researcher does not have a research hypothesis, and four outcomes are being studied. 2. The study would be stronger if the researcher had a prior hypothesis about which orthoses would be best for which outcomes. This could be based on existing research or the researcher’s clinical experience. To address the fact that multiple outcomes are studied, the researcher should adjust the alpha level of the statistical analysis or use a statistic to control for multiple comparisons. 3. Ten people divided into three groups will result in a study with very low power. 4. The researcher will want to recruit additional participants and may need to use another clinic or conduct the study over a longer period of time. Power can also be increased by reducing the number of groups, so the researcher could compare two orthoses (although it would still be best to have more than 5 participants per group) or use a crossover design in which all of the participants try all of the orthotics. EXERCISE 5-2 1. Study #1 A. Maturation—No. Although there is no control group and the study goes on for several weeks, consider the normal course of the disorder in determining whether or not maturation is a threat to validity. The normal course of Parkinson’s disease is progressively deteriorating, so you would not expect improvement without treatment. 27/10/16 5:15 pm CHAPTER 5 ● Validity B. History—Yes. The conclusion of the study is that an interdisciplinary movement disorders program was effective in improving movement problems for people with Parkinson’s disease. The medication adjustments could be a history threat: 81 of the 91 participants received a medication adjustment during the intervention. Although participants without medication adjustments had similar improvements, which provides some support for the therapy, the design of this study makes it difficult to determine whether it was the therapy or the medications (or both) that made the difference. C. Testing—Yes. There are a few issues with testing. The Timed Up and Go Test uses time as an outcome, and the two-minute walk test is measured in terms of distance covered. With these objective outcomes, you would not be as concerned about biased assessments. However, the FIM and Berg Balance Scale do involve judgment on the part of the therapist, and a therapist who has been involved in the intervention and wants to see improvement may tend to rate the participants, albeit unintentionally, higher than an unbiased rater would. In addition, medication effectiveness in Parkinson’s disease varies across the day, so the time of day at which assessments were administered could affect the outcome. 2. Study #2 A. Maturation—No, in this case there is a no-treatment control group with random assignment. If there was an improvement in the control group, the intervention’s improvement was greater than the control’s improvement and would suggest that the intervention group improved over and above any typical development. B. Selection—No, random assignment helps to promote equal group assignment. The table provides additional support that the two groups were comparable at the outset. C. Instrumentation—Yes. The self, parent, and teacher reports are problematic. The children, parents, and teachers knew about the intervention and may have been biased toward providing a more positive report. The use of observational methods (i.e., observing the child in social situations) would enhance this aspect of the study. D. Attrition—Yes. Ten children in the intervention group were unavailable at the follow-up testing period: eight of these children dropped out, and two were removed for behavioral reasons. It is possible that these 10 children were not responding as well to the intervention and that could be why they dropped out. The two who were removed were not benefiting. If these children were included in the findings, it is possible, even likely, that the results would be less positive. This should be taken into 4366_Ch05_081-102.indd 101 101 account when evaluating how effective the intervention is and for how many. EXERCISE 5-3 1. The major concern with lack of randomization is that there will be selection threats to validity. To address this concern, it is important to use strategies that will reduce any differences that might occur between the groups. In school-based research, it is common for one classroom to receive an intervention while the other classroom does not. You would not want to make one school an intervention setting and one school a control setting, because the distinct differences in the schools might account for differences you find in the intervention. Instead, you could randomly assign classrooms at each school to receive or not receive the intervention. You might address ethical concerns for the control group not receiving the intervention by using a waitlist control design (i.e., you will eventually provide the intervention to the control group). A drawback to this approach is that there is the potential for greater experimenter and participant bias. Blinding of the testers and reducing exposure of the students and teachers, particularly during physical activity, would help address these concerns. It would also be useful to provide the control group with equal attention to something new without introducing additional physical activity. For example, you might have the control group participate in board games. 2. The inclusion of expensive equipment makes it less likely that other schools will be able to implement this intervention, thereby making it less generalizable. The researcher should consider redesigning the intervention to use equipment that is typically available in most school settings; however, in doing so the researcher may lose the novelty or excitement that would be created by the new equipment. 3. Asking parents to keep a log introduces instrumentation threats. Maintaining a log for a week is asking a great deal of the parents, and it is unlikely that you will receive complete data. The researcher could use more objective means, such as an accelerometer that the children wear to record the time spent engaged in activity. Another method that is less burdensome to the parent is time sampling. In time sampling, usually at random intervals, a timer indicates that a log entry should be made. Using this method the parent only has to respond to the timer and not keep records at all times. Both of these methods still present concerns. For example, the parents may forget to put the accelerometer on, the child may lose the accelerometer, or the parent still may not respond to a time-sampling approach. All of these examples speak to the challenges of designing a study. It is virtually impossible to design a perfect 27/10/16 5:15 pm 102 CHAPTER 5 ● Validity study with no threats to validity. Researchers typically weigh their options and make choices given the particular research question, the ethical concerns presented, and pragmatic issues. FROM THE EVIDENCE 5-1 1. Yes, the p value for all of the comparisons is > 0.05, indicating there is no statistically significant difference between the groups at baseline. 2. If the groups start out at different levels of back pain, this could affect/confound the results of the study. For example, if the control group had less pain and the intervention group had more pain at baseline, even without the intervention, maturation may result in the intervention group having a greater recovery, because there may be more room for improvement in the intervention group. The control group may not be able to improve much because they already are not experiencing a great deal of pain. In this example from the evidence, it is a good thing that the groups are equivalent on the outcome measure of back pain as well as other demographic variables. FROM THE EVIDENCE 5-2 Without a control group, you could be less certain that Lift for Life made the difference. Perhaps individuals attending the fitness center took advantage of other programs or were more likely to exercise outside of the program. Also, you would expect less precision among the assessors, who would not be blind and may vary from site to site. REFERENCES Biklen, D., Morton, M. W., Gold, D., Berrigan, C., & Swaminathins, S. (1992). Facilitated communication: Implications for individuals with autism. Topics in Language Disorders, 12(4), 1–28. Centers for Disease Control and Prevention (CDC). (2012). Prevalence of autism spectrum disorders: Autism and developmental disability monitoring network, 14 sites, US, 2008. Retrieved from 4366_Ch05_081-102.indd 102 http://www.cdc.gov/mmwr/preview/mmwrhtml/ss6103a1.htm?s_ cid=ss6103a1_w del Pozo-Cruz, B., Parraca, J. A., del Pozo-Cruz, J., Adsuar, J. C., Hill, J., & Gusi, N. (2012). An occupational, internet-based intervention to prevent chronicity in subacute lower back pain: A randomized controlled trial. Journal of Rehabilitation Medicine, 44, 581–587. Frankel, F., Myatt, R., Sugar, C., Whitham, C., Gorospe, C. M., & Laugeson, E. (2010, July). A randomized controlled study of parentassisted Children’s Friendship Training with children having autism spectrum disorders. Journal of Autism and Developmental Disorders, 40(7), 827–842. doi:10.1007/s10803-009-0932-z Godi, M., Franchignoni, F., Caligari, M., Giordano, A., Turcato, A. M., & Nardone, A. (2013). Comparison of reliability, validity, and responsiveness of the Mini-BESTest and Berg Balance Scale in patients with balance disorders. Physical Therapy, 93, 158–167. Green, G. (1994). The facilitator’s influence: The quality of the evidence. In H. C. Shane (Ed.), Facilitated communication: The clinical and social phenomenon (pp. 157–226). San Diego, CA: Singular. Killen, J. D., Fortmann, S. P., Newman, B., &Varady, A. (1990). Evaluation of a treatment approach combining nicotine gum with self-guided behavioral treatments for smoking relapse prevention. Journal of Consulting and Clinical Psychology, 58, 85–92. Mayo, E. (1949). Hawthorne and the Western Electric Company: The social problems of an industrial civilisation. London, UK: Routledge. McCarney, R., Warner, J., Iliffe, S., van Haselen, R., Griffin, M., & Fisher, P. (2007, July 3). The Hawthorne effect: A randomised, controlled trial. BMC Medical Research and Methodology,7, 30. Minges, K. E., Cormick, G., Unglik, E., & Dunstan, D. W. (2011, May 25). Evaluation of a resistance training program for adults with or at risk of developing diabetes: An effectiveness study in a community setting. International Journal of Behavioral Nutrition and Physical Activity, 8, 50. doi:10.1186/1479-5868-8-50 Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the classroom. New York, NY: Holt, Reinhart & Winston. Sabbag, S., Twamley, E. W., Vella, L., Heaton, R. K., Patterson, T. L., & Harvey, P. D. (2012). Predictors of the accuracy of self assessment of everyday functioning in people with schizophrenia. Schizophrenia Research, 137, 190–195. Sammons Preston. (n.d.). Jamar hand dynamometer owner’s manual. Retrieved from https://content.pattersonmedical.com/PDF/spr/ Product/288115.pdf Sutherland, R. J., Mott, J. M., Lanier, S. H., Williams, W., Ready, D. J., & Teng, E. J. (2012). A pilot study of a 12-week model of groupbased exposure therapy for veterans with PTSD. Journal of Trauma and Stress, 25(2), 150–156. Tsang, W. W. (2013). Tai Chi training is effective in reducing balance impairments and falls in patients with Parkinson’s disease. Journal of Physiotherapy, 59, 55. 27/10/16 5:15 pm “Research is formalized curiosity. It is poking and prying with a purpose.” —Zora Neale Hurston, writer 6 Choosing Interventions for Practice Designs to Answer Eff icacy Questions CHAPTER OUTLINE LEARNING OUTCOMES Factorial Designs KEY TERMS Single-Subject Designs INTRODUCTION Retrospective Intervention Studies RESEARCH DESIGN NOTATION SAMPLE SIZE AND INTERVENTION RESEARCH BETWEEN- AND WITHIN-GROUP COMPARISONS USING A SCALE TO EVALUATE THE STRENGTH OF A STUDY RESEARCH DESIGNS FOR ANSWERING EFFICACY QUESTIONS COST EFFECTIVENESS AS AN OUTCOME Designs Without a Control Group CRITICAL THINKING QUESTIONS Randomized Controlled Trials ANSWERS Crossover Designs REFERENCES Nonrandomized Controlled Trials LEARNING OUTCOMES 1. Use scientific notation to explicate the design of a given study. 2. Differentiate between group comparisons, within-group comparisons, and interaction effects. 3. Identify the design of a given intervention study. 4. Use levels of evidence, threats to validity, and the PEDro Scale to evaluate the strength of the evidence for a given study. 103 4366_Ch06_103-126.indd 103 28/10/16 5:19 pm 104 CHAPTER 6 ● Choosing Interventions for Practice KEY TERMS between-group comparison quality-adjusted life year (QALY) cluster randomized controlled trial quasi-experimental study control group randomized controlled trial (RCT) crossover study design factorial design interaction effect nonequivalent control group design nonrandomized controlled trial pre-experimental design research design notation retrospective cohort study retrospective intervention study single-subject design within-group comparison prospective INTRODUCTION I t is important for evidence-based practitioners to evaluate the strength of the evidence. This point has been well established. That said, if a randomized controlled trial (RCT) provides the strongest evidence for intervention studies, why don’t all researchers use this design? One reason other designs are employed relates to the use of resources: A randomized controlled trial can be expensive, in terms of actual cost, time, and people power. When evaluating a new intervention, it may be unwise to invest a lot of time, energy, and money in a study when the outcomes are as yet unknown. Researchers frequently decide to first test an intervention using a simpler design (e.g., a pretest-posttest without a control group) as a pilot study. This approach informs the researcher if it is prudent to proceed with a more complex study design. In addition, funding agencies such as the National Institutes of Health often expect preliminary results that show promise for an intervention before they will fund a larger-scale RCT. If a number of smaller studies demonstrate that a specific intervention appears to have merit, an RCT is often the next logical step. Additional reasons for using other research designs include ethical and pragmatic considerations. In the case of intervention research, the researcher is motivated to design a study that best answers questions about the efficacy of the intervention. In doing so, individual needs are not considered when participants are randomly assigned to groups. This is particularly true when one group receives an intervention and the other group does not. Knowledge 4366_Ch06_103-126.indd 104 may be acquired from the study that helps future clients, but the individuals participating in the study may not receive the best treatment. Therefore, an ethical decision may require use of a study design in which all participants receive the preferred intervention. In other situations, there may not be enough individuals with a particular condition to conduct a two-group comparison, or the intervention provided is so individualized that it is unreasonable to combine participant results. This chapter introduces various research designs that can be used to answer questions about the efficacy of interventions. The randomized controlled trial is described, as are several other design options. Familiarity with these designs will provide information to use when selecting interventions for practice. RESEARCH DESIGN NOTATION Research design notation is a system that uses characters to diagram the design of intervention studies. Although research articles rarely display the notation, familiarity with research design notation is useful because it can help a reader break down and analyze a particular study. The primary characters used in the notation system include: R = randomization N = nonrandomization X = treatment O = dependent variable or outcome Consider a simple RCT of two groups in which one group receives the experimental treatment and the other group acts as a control group, receiving no treatment. There is a pretest and posttest. The research design notation for this study would look like this, with the first O representing the pretest and the last O representing the posttest: R R O O X O O In studies that compare treatments, the X can be further specified with a subscript, such as X1 and X2, or with letters, as in Xa and Xb. In fact, designating the treatments with letters that represent the intervention can be even more effective. For example, Keser et al (2013) compared Bobath trunk exercises with routine neurorehabilitation for individuals with multiple sclerosis; they used the Trunk Impairment Scale (TIS), Berg Balance Scale (BBS), International Cooperative Ataxia Rating Scale (ICARS), and Multiple Sclerosis Functional Composite (MSFC) as outcome measures before and after the intervention. In this case, the treatments could be notated as Xb (Bobath exercises) and Xr (routine neurorehabilitation). In addition, this study utilized multiple outcome measures, which can be identified in 28/10/16 5:19 pm CHAPTER 6 ● Choosing Interventions for Practice the notation. A comprehensive notation of this study might look like this: R Xb R Xr OTIS, BBS, ICARS, MSFC OTIS, BBS, ICARS, MSFC OTIS, BBS, ICARS, MSFC OTIS, BBS, ICARS, MSFC The specificity with which a study is notated depends on your need to describe the study design. If you wish to better understand when different measures are administered during a study, you may choose to abbreviate and specify the particular measures, as done in the preceding example. However, if you simply want to know at what points the measures were administered, you may simply identify the testing time with an O, but use no further abbreviation. This chapter describes several different designs that are commonly used in intervention research, each of which includes the research design notation with the description. In addition, you have the opportunity to practice creating notations with examples from the research. BETWEEN- AND WITHIN-GROUP COMPARISONS In intervention research, most designs include comparisons both “between” and “within” groups. A between-group comparison is one in which a comparison is made to identify the differences between two or more groups. In most efficacy studies, the between-group comparison examines the differences between the intervention group and control group. The control group may receive no treatment, standard treatment, or some form of alternate treatment. As the name implies, a within-group comparison makes a comparison inside the same group, most often a comparison of scores before and after an intervention. An interaction effect combines the between- and within-group comparisons, so an interaction is said to occur when there is a difference in how a group performs over time. These distinctions are important when reading and analyzing the results of research studies. The previous example from the multiple sclerosis study (Keser et al, 2013) can be used to demonstrate between- and within-group comparisons. In this study, the two groups included the Bobath treatment group and the routine treatment group. Consider one outcome measure, the Trunk Impairment Scale (TIS). A between-group comparison might compare the two groups before the intervention takes place, as shown here: 4366_Ch06_103-126.indd 105 R OTIS Xb OTIS R OTIS Xr OTIS 105 The study could also compare the two groups after the intervention takes place: R OTIS Xb OTIS R OTIS Xr OTIS In contrast, a within-group comparison examines differences in the pretest and posttest scores for each group separately: R OTIS Xb OTIS R OTIS Xr OTIS and However, the most important comparison in an intervention study is the combination of the between and within comparisons, or the interaction effect. It is important to determine whether there was a difference in the way that one group performed from pretest to posttest, compared with the other group’s performance from pretest to posttest. When there is a difference in how two groups respond, an interaction has occurred; if that difference is large enough, the interaction effect is statistically significant. When the intervention group improves more than the control group, the intervention was more effective than the control group treatment in causing a positive change in the outcome. Often, a study concludes that the intervention group improved (a within-group comparison), but so did the control or comparison group. If the degree of improvement is comparable for both groups, no interaction effect is present. Researchers use different types of analyses (many still based on the t-test and ANOVA statistics) to determine if two groups differ over time. One common approach involves treating the pretest as a covariate and analyzing the difference in the posttest scores of the two groups using an analysis of covariance (ANCOVA). In other studies, the researcher might first compute the difference in pretest and posttest scores and then compare the two groups in terms of their difference scores. If there are only two groups, the comparison can be made using an independent sample t-test; if there are three or more groups, the comparison can be made with a one-way ANOVA. It is helpful to graph the pretest and posttest scores of two groups to illustrate the interaction. Figures 6-1 through 6-4 show several different examples of potential outcomes and identify whether or not an interaction effect occurred. Each graph provides a visual to help determine whether an interaction occurred and the degree to which the two groups differed. The between and within comparisons are also described as main effects; the combination of main effects becomes the interaction. 28/10/16 5:19 pm 106 CHAPTER 6 ● Choosing Interventions for Practice Understanding Statistics 6-1 Difference statistics are used to analyze the within, between, and interaction effects described in this section. In Chapter 4, t-tests and ANOVAs were identified as common statistics for analyzing differences. Several approaches can be used, but a basic and common method of analyzing a two-group comparison with a pretest and posttest utilizes t-tests and ANOVAs. The between-group comparison at pretest or posttest can be made using an independent sample t-test; a separate t-test would be conducted for the pretest and for the posttest. If the t-test results in a p value of less than 0.05, the difference between the two groups is statistically significant. In the following table, the arrow indicates where the between-group comparisons occur. Pretest Posttest Intervention Control Pretest Intervention Control A mixed-model ANOVA is used to combine the within and between comparisons to determine if an interaction effect has occurred. If p < 0.05, an interaction effect has occurred. In the literature, a mixed-model ANOVA is sometimes described as a repeated measures ANOVA, even when it includes both between and within comparisons. It is important for evidence-based practitioners to know that the interaction effect is the way to determine whether the intervention group improved more than the control group. The arrows in the following table indicate that the analysis is comparing both the intervention and control groups at pretest and posttest. The within-group comparison can be made using a dependent sample t-test. Again, two separate t-tests would be done, one for the intervention group and one for the control group. A p value of less than 0.05 would mean that there is a difference between the pretest and posttest scores. In the following table, the arrow indicates where the within-group comparisons occur. Figure 6-1 shows no effects. In this case, there are neither main effects nor interaction effects. There is no difference between or within groups, which makes it impossible for an interaction effect to occur. In this example, neither group improved after the intervention. 5 Intervention Control 4 Pretest Posttest Intervention Control Figure 6-2 shows main effects within groups, but no interaction effect. There is a main effect within a group (pre- to post-improvement), but there is only a small difference between groups (no main effect between groups), and the parallel lines indicate that there is no interaction 5 Intervention Control 4 3 3 2 2 1 1 0 Posttest 0 Pretest Posttest FIGURE 6-1 Example of a graph showing no effects. Pretest Posttest FIGURE 6-2 Example of a graph showing main effects within groups, but no interaction effect. 4366_Ch06_103-126.indd 106 28/10/16 5:19 pm CHAPTER 6 ● Choosing Interventions for Practice effect. In this case, both groups improved in a similar fashion. Figure 6-3 is a graph showing main and interaction effects. In this case, there is a clear interaction effect (i.e., the lines cross). The control group actually gets worse, whereas the intervention group improves. There are also main effects both between and within groups. There is a difference between the groups at pretest and at posttest, and there is a difference within the groups from pretest to posttest, although the difference is in the opposite direction for both groups. Figure 6-4 is a graph showing some main effects and an interaction effect. In this example, the groups start out at the same place, so there is no between-group main effect at pretest. However, the groups separate at posttest, indicating a between-group main effect. Both groups improved to some degree, resulting in a withingroup main effect. In addition, there is an interaction effect because the pattern of improvement is different: The intervention group improves more than the control group. A well-written research article will include a graph of the interaction effect when relevant; however, when it is not included, you can graph the interaction yourself. 5 Intervention Control 4 3 107 EXERCISE 6-1 Graphing and Describing Between-Group Comparisons, Within-Group Comparisons, and Interaction Effects From a Study (LO2) This exercise returns to the multiple sclerosis study (Keser et al, 2013). The following are actual data from the study measuring the pretest and posttest scores for both groups on the Trunk Impairment Scale: Group Pretest Score Posttest Score Bobath 13.90 19.20 Routine 12.60 18.00 QUESTIONS 1. Graph these results using a line graph similar to those shown in Figures 6-1 through 6-4. Next, identify whether there are main effects for the between-group and within-group comparisons and whether there is an interaction effect. Finally, write a sentence summarizing the results of this study for the Trunk Impairment Scale. 2 1 0 Pretest Posttest FIGURE 6-3 Example of a graph showing main and interaction effects. 5 Intervention Control 4 3 2 RESEARCH DESIGNS FOR ANSWERING EFFICACY QUESTIONS 1 0 Pretest Posttest FIGURE 6-4 Example of a graph showing some main effects and an interaction effect. 4366_Ch06_103-126.indd 107 This section provides more detailed information about the specific designs typically used to answer efficacy questions about an intervention. Each section includes the 28/10/16 5:19 pm 108 CHAPTER 6 ● Choosing Interventions for Practice scientific notation for that design and an example of an actual study from the research literature. notation for a “true” randomized controlled trial is as follows: R R Designs Without a Control Group Technically speaking, a study without a control group is not considered an experimental design, because one of the requirements of experimental design is that there be a comparison between two groups. However, it is important to understand the pretest-posttest design without a control group, because that design is frequently encountered in the literature. Researchers may use this design initially to determine whether an intervention has the potential to make a difference, before investing the time and money in a more extensive RCT. This design is sometimes referred to as a pre-experimental design because the intent is to examine a cause-and-effect relationship. However, the lack of a control group significantly limits the researcher’s ability to conclude that such a relationship exists. Recall that, according to the levels-of-evidence hierarchy, this design type yields a lower level of evidence: Level IV, meaning it is weaker than either an RCT or a nonrandomized controlled trial. The notation for a pre-experimental design looks like this: O X Randomized Controlled Trials Randomized controlled trials (RCTs) are extremely valuable in intervention research. The RCT is the highest level of evidence for a single study, rated a Level II in the level-of-evidence hierarchy used in this text. A well-designed RCT with a large sample size that finds an advantage for the target intervention provides strong support for the conclusion that the intervention caused the positive outcome. The use of the term control merits some additional discussion. Strictly speaking, a control group is one in which the participants do not receive an intervention. A drug study in which one group receives a drug and the other group receives a placebo is a true control condition. The 4366_Ch06_103-126.indd 108 X O O However, no-treatment control groups are often avoided for ethical reasons, because denying treatment can have adverse consequences. In addition, in rehabilitation research it is naturally more difficult to provide a true placebo. An exception is a study by Chang and colleagues (2010) that examined the efficacy of Kinesio taping to increase grip strength in college athletes. In this study, three conditions were compared: (1) Kinesio taping, (2) no treatment, and (3) placebo Kinesio taping. The results indicated no difference in grip strength across the three conditions. Because a no-treatment control condition is often avoided in therapy studies, an alternative design that is frequently used compares a new intervention with treatment as usual, or standard therapy. The term randomized controlled trial typically is still used to describe these studies. In other studies, two or more specific interventions may be compared to identify the preferred approach. The notation for the basic design of these studies looks like this: R R O The absence of a control group is a significant limitation of this study type. As described in Chapter 5, many threats to validity, such as maturation and regression to the mean, are managed with the use of a control group. In an example, Seo and Park (2016) examined the efficacy of gyrokinesis exercises to improve gait for women with low back pain and found improvements on a number of gait outcomes; however, this study provides only preliminary evidence, and future studies are needed to compare the intervention with a control group to eliminate threats to validity such as maturation. From the Evidence 6-1 presents a table from the study showing the differences in pretest and posttest scores. O O O O Xa Xb O O One limitation of a study design without a true control is the existence of a maturation threat to validity. Without a no-intervention group, you cannot know what would have happened as a result of natural healing or development. An example of standard versus new treatment is provided by a study that compared a dynamic exercise program with conventional joint rehabilitation for individuals with arthritis (Baillet et al, 2009). The results found short-term benefits for the dynamic exercise program; however, these benefits were not sustained in the long term. A protocol for another study describes a headto-head comparison of Rapid Syllable Transition Treatment and the Nuffield Dyspraxia Programme to address childhood apraxia (Murray, McCabe, & Ballard, 2012). These studies have the potential to assist therapists in making clinical decisions about the preferred approach. Yet another research approach is to compare a standard intervention with standard intervention plus a new intervention. The notation includes the combined treatment: R R O O Xs Xs + Xa O O For example, in a study of individuals with eating disorders, both groups received standard outpatient treatment, but the intervention group also received Basic Body Awareness (Catalan-Matamoros et al, 2011). The Basic Body Awareness intervention involved exercises with focused attention on the experience of the movement. Participants 28/10/16 5:19 pm CHAPTER 6 ● Choosing Interventions for Practice 109 FROM THE EVIDENCE 6-1 Pretest-Posttest Without a Control Group Whitson, H.E., Whitaker, D., Potter, G., McConnell, E., Tripp, F., Sanders, L.L., Muir, K.W., Cohen, H.J., & Cousins,S.W. (2013). A low-vision rehabilitation program for patients with mild cognitive deficits. JAMA Ophthamology, 131, 912–919. Table 4. Comparison of Functional and Cognitive Measures Before and After Participation in MORE-LVR Mean (SD) Before MORE-LVR Measure Vision-Related Function: Self-reported by Patient VFQ-25 Composite score 47.2 VFQ-25 Near-activities score 21.5 VFQ-25 Social functioning score 56.3 VFQ-25 Distance-activities score 27.8 VFQ-25 Dependency score 45.1 VFQ-25 Role difficulties score 39.5 Individual goal attainment score 0.4 (range, 0 to 9) –1.4 Patient-reported satisfaction with IADL ability (range, -24 to 24) After MORE-LVR p Valuea (16.3) (14.0) (37.6) (17.8) (22.6) (22.6) (0.5) 54.8 (13.8) 41.0 (23.1) 80.3 (21.7) 31.8 (15.9) 53.5 (28.8) 33.4 (24.7) 5.7 (2.8) .01b .02b .06 .75 .53 .57 .001b (12.0) 6.1 (9.9) .05b Timed Performance Measures (No. of Seconds to Complete Each Task) Filling in a crossword puzzle answer 205 (103) 123 (92) Making a 4-item grocery list 155 (116) 99 (62) Looking up a telephone number in a 221 (99) 228 (79) telephone book Answering questions about a recipe 230 (99) 240 (80) in a cookbook Neurocognitive Scores Logical memory Immediate recall Delayed recall 19.7 (9.7) 13.0 (8.5) 22.9 (9.9) 18.7 (12.4) .003b .03b .30 .99 .07 .02b Abbreviations: IADL. instrumental activities of daily living; MORE-LVR, Memory or Reasoning Enhanced Low Vision Rehabilitation; VFQ-25. Vision Function Questionnaire42 a Comparison based on Wilcoxon signed rank test. b Values that are significant at an error level of P .05. Note A: This table illustrates a within-group comparison of the pretest and posttest scores on several measures. There is no betweengroup comparison, as this study is made up of only one group. FTE 6-1 Question 4366_Ch06_103-126.indd 109 On which outcome was the improvement the greatest? 28/10/16 5:19 pm 110 CHAPTER 6 ● Choosing Interventions for Practice receiving Basic Body Awareness had decreased symptoms of eating disorders, such as a drive to thinness and body dissatisfaction, than the group that only received standard treatment. This study design provides the benefit of showing whether the additional treatment offered an additional advantage. However, the Hawthorne effect can threaten the validity of such a design, because those individuals receiving the additional treatment are also receiving additional attention. (Recall from Chapter 5 that the Hawthorne effect occurs when participants respond to the fact that they are participating in a study rather than to the actual intervention.) From the Evidence 6-2 is a flowchart of the procedures used in the study. When the efficacy of a particular intervention is established, subsequent studies may use RCTs to further specify the optimal conditions of administration of the intervention, such as intensity, training of the providers, and setting of administration. A follow-up study may also compare variations of the intervention. A notation is not provided here, because the designs can vary considerably. However, to meet the criteria for an RCT, the design must include randomization to at least two conditions. A study of constraint-induced therapy provides an example. In this study, both groups received the same type and amount of practice, but participants were randomly assigned to wear a sling during the practice or voluntarily constrain (not use) their nonaffected arm during practice (Krawczyk et al, 2012). The results indicated that both groups improved, but there was no difference between the two conditions. In some randomized controlled trials, a pretest is not used. This can present a threat to validity in that, without a pretest, one does not know if the groups were equivalent at the start of the outcome of interest. In addition, this design does not provide information regarding the extent of change that occurred. With a large sample, the randomization to group will likely lead to equivalence, but that is not a certainty. Posttest-only studies are typically those in which a pretest would influence the outcome of the posttest (a testing threat to validity) or those in which it is expected that all participants will start out at a similar point. For example, if the outcome of interest is rate of rehospitalization, a pretest is not possible. As another example, a fall prevention program may enroll participants who have not experienced a fall, but as an outcome assess the number of falls after the intervention period. The notation for a posttest-only RCT is: R R X O O Crossover Designs Crossover study designs, in which participants are randomly assigned to groups, are considered to be at the same level of evidence as randomized controlled trials 4366_Ch06_103-126.indd 110 (Level II). In a crossover study design, all participants receive the same treatment, but in a different order. In some crossover studies, a no-treatment condition is compared with a treatment condition. In this case, one group starts with the treatment and then receives the control condition; the other group begins with the control and moves to the intervention. The notation provides a useful illustration of this design: R R O O X O O X O O In other crossover studies, two different interventions are compared in different orders: R R O O Xa Xb O O Xb Xa O O Crossover designs are most useful for interventions in which a permanent change is not expected. Otherwise, the second condition will be affected, which would present a history threat to validity. Crossover designs are often used in studies of assistive devices/technologies. For example, in one study of individuals with brain injury, two reminder systems were compared: typical reminder systems such as calendars, lists, and reminders from service providers, versus Television Assisted Prompting (TAP), in which reminders were programmed and then provided audiovisually through the client’s at-home television (Lemoncello, Sohlberg, Fickas, & Prideaux, 2011). One group received TAP prompting first, followed by typical prompting; the other group received the interventions in the reverse order. From the Evidence 6-3 shows the order in which the interventions were presented for each group. The study found that prompting using TAP was effective in increasing memory. Nonrandomized Controlled Trials As the name implies, the only difference between a randomized controlled trial and a nonrandomized controlled trial relates to the allocation of subjects to groups. In a nonrandomized controlled trial, participants do not have an equal chance of being assigned to a condition. The lack of randomization can lead to bias or differences between the two groups. For this reason a nonrandomized controlled trial yields a lower level of evidence than a randomized controlled trial. The nonrandomized controlled trial is a Level III in this text’s evidence hierarchy. Instead of random assignment, allocation to groups can occur by asking for volunteers for the intervention first and then matching individuals to a control group. Individuals who volunteer for an intervention may be more amenable to treatment and thereby differ from individuals who participate in the control group that does not receive the intervention. The nonrandomized approach often is used for pragmatic or ethical reasons, with one setting receiving the intervention and the other serving 28/10/16 5:19 pm CHAPTER 6 ● Choosing Interventions for Practice 111 FROM THE EVIDENCE 6-2 Randomized Controlled Trial Catalan-Matamoros, D., Helvik-Skjaerven, L., Labajos-Manzanares, M. T., Martínez-de-Salazar-Arbolea, A, & Sánchez-Guerrero, E. (2011). A pilot study on the effect of Basic Body Awareness Therapy in patients with eating disorders: A randomized controlled trial. Clinical Rehabilitation, 25(7), 617–626. N = 102 met the inclusion criteria N = 74 refused to participate Note A: This flowchart illustrates the selection process and random assignment to groups. N = 28 were pretested and randomly allocated into two groups N = 14 experimental group N = 14 control group N = 14 Received BBAT intervention Losses: N = 6 n=2; lack of time n=2; lack of transportation n=2; other reasons Losses: N=0 N=8 followed the posttest after 10 weeks of the pretest N = 14 followed the posttest after 10 weeks of the pretest Note B: The control group lost six participants, whereas all participants in the intervention group were available at follow-up. Mortality is a threat to validity; the fact that all of the dropouts were in the control group suggests that control participants may have experienced compensatory demoralization as a threat to internal validity. FTE 6-2 Question 4366_Ch06_103-126.indd 111 Would the threat to mortality in this study be even greater because of the small sample size? 28/10/16 5:19 pm 112 CHAPTER 6 ● Choosing Interventions for Practice Understanding Statistics 6-2 Even with randomization to groups, it is still possible for groups to differ on important demographic characteristics or outcome variables. This is particularly true when sample sizes are small. If the researcher identifies differences between groups, the variable in which the difference lies can be covaried so that the groups are made equivalent statistically. In this case an ANCOVA is typically used. For example, in a study of children, the subjects in the intervention group are older than the children in the control group; hence, differences in development and education that may be related to age could affect the outcomes of the study. In this case, the researcher may choose to covary age; the statistic removes the variability associated with age from the analysis. as the control or standard treatment group. In this case, differences in the settings can bias the results and present a selection threat to validity. When the settings are randomly assigned to group, the study design may be called a cluster randomized controlled trial; however, this is not a true RCT, as individuals do not have an equal chance of being assigned to the study conditions. Although there are advantages associated with randomly assigning the settings to a condition, the potential exists for setting bias, making these designs nonrandomized. A nonrandomized controlled trial is strengthened by efforts to ensure that there are as few differences as possible between the two groups. Testing for any systematic differences in the groups that could threaten the validity of findings should be done prior to the start of the study, to avoid conducting a study that is seriously flawed from the outset. All of the designs described in the randomized controlled trial section could apply to the nonrandomized controlled trial, with the difference occurring at the initial FROM THE EVIDENCE 6-3 Crossover Design Lemoncello, R., Sohlberg, M. M., Fickas, S., & Prideaux, J. (2011). A randomised controlled crossover trial evaluating Television Assisted Prompting (TAP) for adults with acquired brain injury. Neuropsychological Rehabilitation, 21, 825–826. Assessment and goal-setting sessions TAP x2 weeks TYP x2 weeks TAP x2 weeks TYP x2 weeks Random assignment Group A Final data collection and interview Group B TYP x2 weeks TAP x2 weeks TYP x2 weeks TAP x2 weeks Note A: In a crossover design, both groups receive both treatments, but in a different order. TAP = television-assisted prompting TYP = typical reminders FTE 6-3 Question 4366_Ch06_103-126.indd 112 What is the benefit of providing TAP and typical reminders twice to each group? 28/10/16 5:19 pm CHAPTER 6 ● Choosing Interventions for Practice group assignment, such that nonrandomized designs can include a true control: N N O O X O O Xa Xb O O or a comparison treatment: N N O O Nonrandomized controlled trials may also be referred to as quasi-experimental studies or nonequivalent control group designs, indicating that the two groups 113 may be different due to the lack of randomization. For example, Ferguson and colleagues (Ferguson, Jelsma, Jelsma, & Smits-Engelsman, 2013) compared neuromotor training with a Wii fitness program to improve motor skills in children with developmental coordination disorder. The children’s allocation to the treatment group depended on the school they attended, so that children at two schools received neuromotor training, and children at the other school received the Wii fitness program. Therefore, children did not have an equal chance of being assigned to either intervention. From the Evidence 6-4 provides a more detailed description of the rationale for FROM THE EVIDENCE 6-4 Nonrandomized Controlled Trial Ferguson, G. D., Jelsma, D., Jelsma, J., & Smits-Engelsman, B. C. (2013). The efficacy of two task-orientated interventions for children with developmental coordination disorder: Neuromotor task training and Nintendo Wii Fit training. Research in Developmental Disabilities, 34(9), 2449–2461. ISSN 0891-4222, http://dx.doi.org/10.1016/j.ridd.2013.05.007. 2.1. Research design and setting A pragmatic, single blinded, quasi-experimental design was used to compare the effect of two intervention programmes. Cluster sampling was used to select three mainstream primary schools (i.e., A, B and C) located within a low-income community in Cape Town, South Africa. Allocation to treatment group was determined by school of attendance. Children attending schools A and B received NTT while children attending school C received Nintendo Wii training. A non-randomized approach was used as it was not possible to provide Nintendo Wii training at either school A or B over the study period due to damage to the power supply at both schools. Apart from the functioning power supply, there were no significant differences between schools in terms of playground facilities, socioeconomic backgrounds of the learners, school fees, staff ratios or curriculum. NTT = neuromotor task training FTE 6-4 Question The study is described as single-blinded. Based on the description, who would be blind, and how does this blinding strengthen the validity of the study? 4366_Ch06_103-126.indd 113 28/10/16 5:19 pm 114 CHAPTER 6 ● Choosing Interventions for Practice the design, with an excerpt from the methods section of the journal article. The study found that both interventions were effective, but neuromotor training resulted in greater improvements in more areas of motor performance. The authors explain why randomization was not possible and provide some evidence to support the contention that the classrooms are equivalent. However, not all differences can be accounted for, so there is still the potential for bias or differences in the classroom receiving the Wii training. Factorial Designs Factorial designs use either a randomized or nonrandomized approach to group assignment. However, factorial designs are distinguished from other designs by including more than one independent variable. In a factorial design for an intervention study, one independent variable is the intervention condition. The additional independent variable is typically included to determine if the intervention had differential effects on that additional variable. For example, a study may include gender as a second independent variable and then determine if the intervention was more effective in males or females. In the following notation, two interventions are designated as a and b, and gender is designated as m and f. The notation looks like this: R R R R O O O O Xam Xaf Xbm Xbf O O O O The notation illustrates the additional complexity of a factorial design. In this instance, there are four conditions: intervention A with males, intervention A with females, intervention B with males, and intervention B with females. For the researcher, this complexity also translates into additional challenges for recruitment, because a larger sample size is necessary to allow for additional comparisons. Each condition requires enough participants to detect a difference and represent the population. Factorial designs are also described in terms of the number of levels within each independent variable. The preceding example is a 2 ⫻ 2 design because there are two levels of the intervention, A and B, and two levels of gender, male and female. A factorial design that compares two different interventions (first independent variable) and three different settings (second independent variable) would be described as a 2 ⫻ 3 design. Factorial designs can include a third independent variable, although doing so greatly increases the complexity of the analysis and interpretation. The two preceding examples could be combined, such that two interventions are compared, along with gender and setting, creating a 2 ⫻ 2 ⫻ 3 design. Factorial designs can also be used to compare intervention conditions (the first independent variable) and health conditions (the second independent variable). For 4366_Ch06_103-126.indd 114 example, a memory intervention can be compared with a control condition for healthy older adults, older adults with Alzheimer’s disease, and older adults with Parkinson’s disease (2 ⫻ 3 factorial design). In still another variation of the factorial design, two interventions can be examined together. As an example of this design, a study compared whole body vibration therapy (WBV) with a control condition (the first independent variable) and two doses of vitamin D, a conventional dosage and a higher dosage, for women over age 70 (Verschueren et al, 2011). This 2 ⫻ 2 factorial design found improvements in all groups; however, the group with the conventional dosage of vitamin D and without WBV had musculoskeletal outcomes comparable to those of the other three groups, suggesting that there was no benefit to higher doses of vitamin D or WBV therapy. From the Evidence 6-5 is a table of the findings comparing the four groups. EXERCISE 6-2 Identifying the Research Design Using Scientific Notation of a Study (LO1) Use your knowledge of scientific notation to diagram the study described here, recognizing that many real-world studies will include variations and/or combinations of the designs described in this chapter. Kreisman, B. M., Mazevski, A. G., Schum, D. J., & Sockalingam, R. (2010, March). Improvements in speech understanding with wireless binaural broadband digital hearing instruments in adults with sensorineural hearing loss. Trends in Amplification, 14(1), 3-11. [Epub 2010 May 10]. doi:10.1177/1084713810364396 “This investigation examined whether speech intelligibility in noise can be improved using a new, binaural broadband hearing instrument system. Participants were 36 adults with symmetrical, sensorineural hearing loss (18 experienced hearing instrument users and 18 without prior experience). Participants were fit binaurally in a planned comparison, randomized crossover design study with binaural broadband hearing instruments and advanced digital hearing instruments. Following an adjustment period with each device, participants underwent two speech-in-noise tests: the QuickSIN and the Hearing in Noise Test (HINT). Results suggested significantly better performance on the QuickSIN and the HINT measures with the binaural broadband hearing instruments, when compared with the advanced digital hearing instruments and unaided, across and within all noise conditions.” 28/10/16 5:19 pm CHAPTER 6 ● Choosing Interventions for Practice 115 Understanding Statistics 6-3 Number of Names Remembered The statistical analysis of a factorial design can get very complicated. In a factorial design, there are also interaction effects and main effects. In the immediately preceding example, the interaction effect would examine the pattern of differences for a memory intervention and control condition for healthy older adults, older adults with Alzheimer’s disease, and older adults with Parkinson’s disease. This 2 ⫻ 3 factorial design would have six groups. An ANOVA can reveal if there is an interaction between the independent variables of intervention condition and health condition. In other words, does one health group benefit more than another health group AND is there a difference between the control and intervention conditions? Figure 6-5 is a hypothetical graph that illustrates the number of names remembered as the outcome of the intervention. The interaction effect would show if there is a difference in the health conditions and intervention conditions, and the graph does in fact appear to indicate a difference (i.e., the healthy condition benefits more than Alzheimer’s and Parkinson’s conditions). However, the interaction effect is only part of the story. You would still want to know if there is a difference between the intervention and control groups (a main effect), and if there are differences between the three health conditions (another main effect). These follow-up tests are typically referred to as post hoc analyses. There are many different ways in which these analyses can be done. In this case it would be possible to compare the intervention and control group using a between-group t-test. In this main effect, all of the health conditions are combined into one group. Main Effect of Treatment Condition (Independent Sample t-test) Intervention Control You could compare the three groups (healthy, Alzheimer’s, and Parkinson’s) with their intervention and control groups combined, using a oneway ANOVA to examine the main effect of health condition. Main Effect of Health Condition (One-Way ANOVA) Healthy Alzheimer’s Parkinson’s 10 9 8 7 6 5 4 3 2 1 0 Healthy Alzheimer’s Parkinson’s Health Condition Intervention Control FIGURE 6-5 Interaction effect of health condition and treatment condition (ANOVA analysis). 4366_Ch06_103-126.indd 115 This analysis will reveal if there is a difference among the three groups. There are still other post hoc tests that could be done. For example, you may want to know if there is a difference between the Alzheimer’s and Parkinson’s groups for individuals who received the intervention, which could be determined with a between-group t-test. There are many ways that post hoc analyses can be carried out and in many cases researchers use statistics that help control for Type I error due to multiple analyses. The particular statistics are beyond the scope of this chapter, but evidence-based practitioners who understand all of the different comparisons that can be made should be better able to interpret the results sections of research articles. 28/10/16 5:19 pm 116 CHAPTER 6 ● Choosing Interventions for Practice FROM THE EVIDENCE 6-5 Factorial Design Verschueren, S. M., Bogaerts, A., Delecluse, C., Claessens, A. L., Haentjens, P., Vanderschueren, D., & Boonen, S. (2011). The effects of whole-body vibration training and vitamin D supplementation on muscle strength, muscle mass, and bone density in institutionalized elderly women: A 6-month randomized, controlled trial. Journal of Bone and Mineral Research, 26, 42–49. doi:10.1002/jbmr.181. Note A: The 2 X 2 design of this study uses two different levels of two different interventions (WBV and vitamin D). Table 5. Separate Group Results and Interaction Effects Whole body vibration (WBV) training programme No whole body vibration (WBV) training programme High dose 1600 IU vitamin D daily (n = 26) High dose 1600 IU vitamin D daily (n = 29) Conventional dose 880 IU vitamin D daily (n = 28) Conventional dose 880 IU vitamin D daily (n = 28) p value for an interaction effect between WBV training programme and vitamin D medication 1. Percentage change from baseline (SE) in each group and p values for an interaction effect between the two interventions. Isometric muscle strength (Nm) +6.07% (2.14) +3.01% (2.67) +1.10% (2.44) +0.11% (3.18) .330 Dynamic muscle strength (Nm) +11.41% (4.42) +4.71% (2.13) +4.94% (2.66) +8.07% (3.17) .600 Muscle mass (cm3) −0.36% (0.72) −0.16% (0.57) +0.02% (0.72) −0.25% (0.38) .350 Hip bone mineral density (g/cm2) +0.78% (0.39) +0.71% (0.42) +0.78% (0.39) +0.99% (0.51) .179 Serum vitamin D level (nmol/L) +200.01% +146.80% (46.89) (35.78) +172.25% (37.91) +183.02% (38.34) .668 Note B: Because none of the p values is < .05, there is no difference between the four groups on any of the outcome measures. FTE 6-5 Question Although the table does not indicate any differences among the groups, which of the following questions can be answered by the study? 1. Is high-dose vitamin D more effective than conventional-dose vitamin D? 2. Is whole body vibration training plus vitamin D treatment more effective than vitamin D treatment alone? 3. Is whole body vibration training more effective than no treatment? 4366_Ch06_103-126.indd 116 28/10/16 5:19 pm CHAPTER 6 ● Choosing Interventions for Practice Single-Subject Designs Unlike the study designs described earlier, single-subject designs do not aggregate the scores of participants. Instead, the results of each individual are examined separately to determine if the intervention was effective. Because average scores are not calculated and groups are not compared, a different methodology is employed to infer cause-and-effect relationships. The basis of the single-subject design is to compare an individual’s response under different conditions. Cause-and-effect relationships can be inferred from a strong singlesubject design study when there is a clear difference between behavior that occurs when an intervention is present and that which occurs when the intervention is absent. The methodology of the single-subject study differs from group designs, but still answers questions of causality. Single-subject research is a within-participant design; that is, each participant is his or her own control. Repeated measures must be taken over time, while certain conditions are held constant. Typically the conditions involve a baseline in which no treatment is applied, which is compared with a treatment phase. A different notation system is used for a singlesubject design. A simple example is an ABA design. The first A represents the initial phase of the study and comprises a baseline observation and collection of data. Then an intervention is provided, which is represented by the B phase. The same observation and collection of data continue. Finally, the intervention is removed, and observation and data collection continues, indicated by the second A. If a cause-and-effect relationship exists, a change in behavior occurs during the intervention. Just as importantly, when the intervention is removed, the improvement should either disappear or decrease to some degree. Without the last phase and a change from the intervention phase, it is more difficult to attribute the change to the intervention. The expectation that behaviors will return to the previous level once the intervention is discontinued, or that the improvement will wane, suggests that this design is most useful for interventions in which a permanent change in behavior is not expected. The single-subject design is useful for conditions in which there are few individuals to study or it is undesirable to aggregate the results because there is great variability in the outcomes measured or the intervention provided. For example, Collins and Dworkin (2011) studied the efficacy of a weighted vest for increasing time on task for typically developing second graders. From the Evidence 6-6 shows a graph from the study depicting the response of each child to the weighted vest. The participant response was variable, with some children increasing their time on task during the wearing of the vest and other children decreasing their time on task. 4366_Ch06_103-126.indd 117 117 Therefore, the results did not support the use of the weighted vest in this situation. A major limitation of single-subject research is the problem of generalizability from a small sample. Using the previous example, it is possible that if only two students were included and those students happened to be the ones with dramatic positive responses, the conclusions of the study would have been different. Replication is an important concept in single-subject research. Replication occurs in the form of participants; each participant replicates the design. With multiple participants, if the results are similar for all participants, you can have greater confidence that the intervention caused the outcome. Furthermore, the use of multiple baseline designs can strengthen the cause-and-effect conclusion. It is not unusual to see an ABAB design or even an ABABAB design in single-subject studies. If improvement during the intervention and a decline at baseline are consistently shown, evidence-based practitioners can be more assured that the intervention was effective and caused the change in behavior. Frequently a second intervention period is added to single-subject designs, in which case the study would be described as ABAB. Multiple baselines help support the intervention as the change agent. Any number of baseline and intervention periods may be included. For example, some single-subject designs use more than one type of intervention, resulting in a design notated as ABACA. Retrospective Intervention Studies The study designs presented up to this point have been prospective in nature; that is, the researcher designs the study and then administers the intervention and collects the data. In a retrospective intervention study, the researcher looks back at something that has already occurred and uses existing records to collect the data. Retrospective studies are not experimental because the independent variable is not manipulated. Instead, these studies are described as observational, because existing conditions are observed. Sometimes these studies are called retrospective cohort studies because they utilize and compare existing groups (cohorts) of individuals. The primary disadvantage of a retrospective study is that conditions cannot be controlled, and numerous threats to internal validity are likely present in the existing conditions. However, because a retrospective intervention study typically examines practices that have taken place in real-world clinical situations, the study can have greater external validity. From the Evidence 6-7 is the abstract of a retrospective study that compared individuals with stroke receiving less than 3 hours and more than 3 hours of rehabilitation therapies (Wang et al, 2013). The study, which was conducted after individuals were discharged from the rehabilitation center, found benefits for individuals who received more therapy. 28/10/16 5:19 pm 118 CHAPTER 6 ● Choosing Interventions for Practice FROM THE EVIDENCE 6-6 Single-Subject Design Collins, A., & Dworkin, R. J. (2011). Pilot study of the effectiveness of weighted vests. American Journal of Occupational Therapy, 65(6), 688–694. doi:10.5014/ajot.2011.000596. 100 Participant 1 Participant 3 Participant 4 Participant 7 Participant 8 Participant 9 Participant 11 % of Time on Task 90 80 70 60 50 40 30 20 10 0 Baseline Intervention Phase Withdrawal Note A: The graph illustrates the performance of each participant in the intervention group. This study differs from many single-subject design studies in that it includes a control group. As depicted in the graph, the childrenʼs data were analyzed individually rather than using a group average. FTE 6-6 Question What conclusion would you draw as to the efficacy of the weighted vest for improving time on task? EXERCISE 6-3 Identify the Study Design (LO3) Locate the abstracts on PubMed for the following studies examining the efficacy of fall prevention programs. After reading each abstract, determine the study design and identify the independent variable(s) in the study. Is the intervention compared with a control group, usual care, or another intervention? Is it a factorial study with more than one independent variable? 4366_Ch06_103-126.indd 118 1. Li, F., Harmer, P., Stock, R., Fitzgerald, K., Stevens, J., Gladieux, M., . . . Voit, J. (2013). Implementing an evidence-based fall prevention program in an outpatient clinical setting. Journal of the American Geriatrics Society, 61(12), 2142-2149. 2. Bhatt, T., & Pai, Y. C. (2009). Prevention of sliprelated backward balance loss: The effect of session 28/10/16 5:19 pm CHAPTER 6 ● Choosing Interventions for Practice 119 FROM THE EVIDENCE 6-7 Retrospective Cohort Study Wang, H., Camiciam, N. M., Terdiman, J., Mannava, M. K., Sidney, S., & Sandel, M. E. (2013). Daily treatment time and functional gains of stroke patients during inpatient rehabilitation. Physical Medicine and Rehabilitation, 5, 122–128. Note A: The study began after the patients had received treatment and were discharged. OBJECTIVE: To study the effects of daily treatment time on functional gain of patients who have had a stroke. DESIGN: A retrospective cohort study. SETTING: An inpatient rehabilitation hospital (IRH) in northern California. PARTICIPANTS: Three hundred sixty patients who had a stroke and were discharged from the IRH in 2007. INTERVENTIONS: Average minutes of rehabilitation therapy per day, including physical therapy, occupational therapy, speech and language therapy, and total treatment. MAIN OUTCOME MEASURES: Functional gain measured by the Functional Independence Measure, including activities of daily living, mobility, cognition, and the total of the Functional Independence Measure (FIM) scores. RESULTS: The study sample had a mean age of 64.8 years; 57.4% were men and 61.4% were white. The mean total daily therapy time was 190.3 minutes, and the mean total functional gain was 26.0. A longer daily therapeutic duration was significantly associated with total functional gain (r = .23, P = .0094). Patients who received a total therapy time of <3.0 hours per day had significantly lower total functional gain than did those treated ≥3.0 hours. No significant difference in total functional gain was found between patients treated ≥3.0 but <3.5 hours and ≥3.5 hours per day. The daily treatment time of physical therapy, occupational therapy, and speech and language therapy also was significantly associated with corresponding subscale functional gains. In addition, hemorrhagic stroke, left brain injury, earlier IRH admission, and a longer IRH stay were associated with total functional improvement. CONCLUSIONS: The study demonstrated a significant relationship between daily therapeutic duration and functional gain during IRH stay and showed treatment time thresholds for optimal functional outcomes for patients in inpatient rehabilitation who had a stroke. Note B: Retrospectively, the researchers divided the patients into groups of those who received < 3 hours of therapy per day and those who received > 3 hours of therapy per day. Note C: Although the researchers imply that more treatment resulted in greater functional gain, the retrospective design means it is likely that individuals in the two groups are different on other factors. For example, those receiving less therapy may have greater impairments and therefore be less responsive to intervention. FTE 6-7 Question This abstract indicates one possible reason why the groups may not be equivalent (i.e., individuals with more severe conditions may not receive as much therapy). What are other possible differences between the groups that could have influenced the outcomes? 4366_Ch06_103-126.indd 119 28/10/16 5:19 pm 120 CHAPTER 6 ● Choosing Interventions for Practice intensity and frequency on long-term retention. Archives of Physical Medicine and Rehabilitation, 90, 34-42. 3. Donat, H., & Ozcan, A. (2007). Comparison of the effectiveness of two programmes on older adults at risk of falling: Unsupervised home exercise and supervised group exercise. Clinical Rehabilitation, 21, 273-283. 4. Kerse, N., Butler, M., Robinson, E., & Todd, M. (2004). Fall prevention in residential care: A cluster, randomized controlled trial. Journal of the American Geriatric Society, 52, 524-531. 5. Beling, J., & Roller, M. (2009). Multifactorial intervention with balance training as a core component among fall-prone older adults. Journal of Geriatric Physical Therapy, 32, 125-133. SAMPLE SIZE AND INTERVENTION RESEARCH As explained in previous chapters, sample size is an important consideration in evaluating the strength of a study. A larger sample reduces the likelihood of making a Type II error (i.e., when the researcher finds no difference between groups, but actually a difference exists) and in general reduces sampling error so that the results of the study are more likely to reflect the true population. However, it is expensive to use large samples, and researchers must always balance sample size and pragmatics. Chapter 4 provides more information about determining sample size based on statistical formulas termed “power estimates.” Nevertheless, when evaluating the strength of the evidence, with all other things being equal, a study with a larger number of participants provides more reliable data than a study with a smaller number of participants. When considering studies with similar designs (e.g., comparing two well-designed RCTs), a study with 100 participants provides stronger evidence than a study with 20 participants. Furthermore, in group comparison studies it is important that each group be adequately represented. Therefore, every time a researcher uses an additional group or, in the case of a factorial study, a new independent variable, the researcher must increase the number of participants. USING A SCALE TO EVALUATE THE STRENGTH OF A STUDY The randomized controlled trial is the highest level of evidence of a single study, yet the quality of RCTs can vary. As explained in Chapter 5, several threats to validity can EVIDENCE IN THE REAL WORLD When Is a Practice Really Evidence-Based? Medical reports in the media often highlight findings based on a newly published research study, and can imply that the intervention under study is the next best thing. Hopefully, by now you are well aware that a single study, regardless of how well it is designed, does not provide sufficient evidence to warrant certainty that an intervention is effective. Because evidence-based practice is highly valued in health care, it is common to hear the term evidence-based used as an adjective to describe a particular intervention. Critical consumers of research know that for an intervention to truly be evidence-based, there must be an accumulation of strong evidence across multiple studies. The best evidence for the efficacy of an intervention requires that several conditions be in place, including: (1) multiple, well-designed randomized controlled trials (2) using large numbers of participants (3) finding the intervention to be more effective than the control and other comparison conditions (4) on outcomes that matter to the client, family, and society and (5) the intervention is generalizable to real-world practice. When sufficient evidence is available, it is now commonplace for professional organizations and other relevant groups to develop sets of practice guidelines that summarize and make recommendations for practice based on the evidence. The use of practice guidelines is covered in greater detail in Chapter 10. However, particularly with new approaches, the evidence is often sparse. The practitioner’s ability to evaluate the quality and strength of the individual studies that do exist is extremely useful when making clinical decisions about an intervention of interest. 4366_Ch06_103-126.indd 120 28/10/16 5:19 pm CHAPTER 6 ● Choosing Interventions for Practice exist, even with a randomized controlled trial. Even more threats can present when assignment to groups is not randomized. For example, when assessors are not blind to group assignment, they may exhibit bias when scoring participants who are known to be in the intervention group. The PEDro Scale (Maher et al, 2003) was developed so that a numerical rating could be applied to a study to objectively assess the methodological quality of an individual study. The term PEDro was used because it was initially developed to rate the quality of studies on the Physiotherapy Evidence Database. Box 6-1 lists the 11 items of the PEDro Scale. Each item is rated as present (1) or not present (0). The 11 item scale includes one item that assesses external validity (Item #1), eight items that assess internal validity (Items #2–9), and two items that assess the reporting of outcomes (Items #10 & 11). Points are only awarded when a criterion is clearly satisfied. For criteria 4 and 7-11, key outcomes are those outcomes that provide the primary measure of the effectiveness (or lack of effectiveness) of the therapy. In most studies, more than one variable is used as an outcome measure. The PEDro Scale is often used in rehabilitation to assess the quality of a study design. The PEDro database for physical therapy and the OTseeker database for occupational therapy include abstracts of relevant studies and their corresponding PEDro ratings. Some systematic reviews also use the PEDro Scale to assess the quality of the existing evidence. BOX 61 PEDro Scale Items 1. Eligibility criteria were specified. 2. Subjects were randomly allocated to groups (in a crossover study, subjects were randomly allocated an order in which treatments were received). 3. Allocation was concealed. 4. The groups were similar at baseline regarding the most important prognostic indicators. 5. There was blinding of all subjects. 6. There was blinding of all therapists who administered the therapy. 7. There was blinding of all assessors who measured at least one key outcome. 8. Measurements of at least one key outcome were obtained from more than 85% of the subjects initially allocated to groups. 9. All subjects for whom outcome measurements were available received the treatment or control condition as allocated or, where this was not the case, data for at least one key outcome was analyzed by “intent to treat.” 10. The results of between-group statistical comparisons are reported for at least one key outcome. 11. The study provided both point measurements and measurements of variability for at least one key outcome. 4366_Ch06_103-126.indd 121 121 EXERCISE 6-4 Considering Design, PEDro Ratings, and Threats to Validity in Evaluating the Strength of a Study (LO4) The abstract and PEDro rating for a randomized controlled trial are presented here. Based on this information, identify which of the threats to validity in the accompanying table were likely controlled and which were a potential problem. Provide a brief rationale for your responses. Walking the line: a randomised trial on the effects of a short term walking programme on cognition in dementia Eggermont LH, Swaab DF, Hol EM, Scherder EJ Journal of Neurology, Neurosurgery, and Psychiatry 2009 Jul;80(7):802-804 clinical trial 7/10 [Eligibility criteria: Yes; Random allocation: Yes; Concealed allocation: No; Baseline comparability: Yes; Blind subjects: No; Blind therapists: No; Blind assessors: Yes; Adequate follow-up: Yes; Intention-to-treat analysis: Yes; Between-group comparisons: Yes; Point estimates and variability: Yes. Note: Eligibility criteria item does not contribute to total score] *This score has been confirmed* (These are the PEDro ratings). Background Walking has proven to be beneficial for cognition in healthy sedentary older people. The aim of this study was to examine the effects of a walking intervention on cognition in older people with dementia. Methods Ninety seven older nursing home residents with moderate dementia (mean age 85.4 years; 79 female participants; mean Mini-Mental State Examination 17.7) were randomly allocated to the experimental or control condition. Participants assigned to the experimental condition walked for 30 min, 5 days a week, for 6 weeks. To control for personal communication, another group received social visits in the same frequency. Neuropsychological tests were assessed at baseline, directly after the 6 week intervention and again 6 weeks later. Apolipoprotein E (ApoE) genotype was determined. Results Differences in cognition between both groups at the three assessments were calculated using a linear mixed model. Outcome measures included performance on tests that formed three domains: a memory domain, an executive function domain and a total cognition domain. Results indicate that there were no significant time 28/10/16 5:19 pm 122 CHAPTER 6 ● Choosing Interventions for Practice x group interaction effects or any time x group x ApoE4 interaction effects. Conclusion Possible explanations for the lack of a beneficial effect of the walking programme on cognition could be the level of physical activation of the intervention or the high frequency of comorbid cardiovascular disease in the present population of older people with dementia. Threat to Validity Yes No Rationale Maturation Assignment possible, which suggests a quality of life worse than death. The QALY is calculated by multiplying the number of years of extra life by the quality-of-life indicator. For example, if an intervention extended life by 4 years with a 0.6 quality of life, the QALY value would be 4 ⫻ 0.6 = 2.4. A systematic review examining numerous approaches to fall prevention used QALY to determine the approach with the highest economic benefit (Frick et al, 2010). Although vitamin D was the least expensive approach, home modifications resulted in the greatest increase in QALY points per dollar spent. CRITICAL THINKING QUESTIONS 1. What factors should you take into account when determining if a study’s findings are strong enough to warrant your use of a particular intervention? Rosenthal effect Hawthorne effect COST EFFECTIVENESS AS AN OUTCOME An important consideration in the evaluation of an intervention is cost effectiveness. Policy makers want to apply resources sensibly by spending money on things that influence health the most; likewise, clients want to use their health-care dollars wisely. In cost-effectiveness studies, different interventions can be compared as to their cost and efficacy, or the cost of a single intervention can be calculated so that interested consumers can assess its value. For example, a systematic review of treatments for aphasia found a cost of $9.54 for each percentage point of improvement on the targeted outcome (Ellis, Lindrooth, & Horner, 2013). In addition, this study found that initial sessions yielded greater benefits than later sessions. The first three sessions had a cost of $7 for a percentage point of improvement, whereas later sessions cost more than $20 to achieve the same improvement. In a different type of analysis, Jutkowitz et al (2012) studied the cost effectiveness of a particular intervention called Advancing Better Living for Elders. This intervention, aimed at reducing functional difficulties and mortality using occupational and physical therapy approaches in the home, had a cost of $13,179 for each year of extended life. Cost-effectiveness studies often use the qualityadjusted life year (QALY) to assess the impact of an intervention. A QALY combines an assessment of quality of life and the number of years of life added by an intervention. Quality of life is measured on a scale of 1 to 0, with 1 being the best possible health and 0 representing death. A negative score is 4366_Ch06_103-126.indd 122 2. What is the difference between a randomized controlled trial and a cluster randomized controlled trial? What threats to internal validity are more likely to be present in a cluster randomized controlled trial? 3. Under what circumstances might a single-subject design be preferable to a group comparison? 4. Why is the interaction effect most important in determining if an intervention is more effective than a control or comparison condition? 28/10/16 5:19 pm CHAPTER 6 ● Choosing Interventions for Practice 5. Describe a hypothetical study that uses a 3 × 2 design. 123 EXERCISE 6-2 e = experienced users; i = inexperienced users; b = broadband; a = advanced; q = QuickSIN; h = HINT You may have chosen different abbreviations, but the scientific notation should combine crossover and factorial designs. The two factors include the two types of hearing devices (broadband and advanced) and the level of experience (experienced and inexperienced users). 6. Why are some intervention approaches not amenable to study with a crossover design? Oqh Oqh Oqh Oqh Xeb Xea Xib Xia Oqh Oqh Oqh Oqh Xea Xeb Xia Xib Oqh Oqh Oqh Oqh EXERCISE 6-3 7. What does the PEDro Scale contribute beyond the levels-of-evidence hierarchy? Design Type Li et al (2013) Pretestposttest without a control No groups are compared; this is a within-group comparison Bhatt & Pai (2009) Factorial design with a randomized controlled trial Intensity of the intervention and number of sessions ANSWERS EXERCISE 6-1 1. The graph from the data should look similar to the graph in Figure 6-2. There are no main effects between groups. Although there are slight differences between groups both before and after treatment, the differences are small (less than 1.5 points) on a scale of scores that ranges up to 19.2. Therefore, it is unlikely that these differences are statistically significant. In contrast, the differences within groups from pretest to posttest are quite large: 5.3 points for the intervention group and 5.4 points for the control group. Thus, there is a main effect within each group. No interaction effect is present. Both groups improved, and they improved almost the same amount (5.3 points vs. 5.4 points). Therefore, you might conclude that both interventions were effective in improving trunk balance and coordination, but neither intervention was more effective than the other. 4366_Ch06_103-126.indd 123 Independent Variables Study Donat & Randomized Setting of interOzcan (2007) controlled trial vention (unsupervised home vs. supervised group) Kerse et al (2004) NonrandomTreatment ized controlled (intervention trial (the vs. true control) homes are randomly assigned but the individuals are not) Beling & Randomized Treatment Roller (2009) controlled trial (intervention vs. true control) 28/10/16 5:19 pm 124 CHAPTER 6 ● Choosing Interventions for Practice EXERCISE 6-4 Threat to Validity Maturation Yes No Rationale X Randomization to groups should protect against this threat because a difference between groups would indicate that something has occurred other than a passage of time. Sometimes a study finds no significant difference between the groups. The strength of the study design is still important when no difference is found, as a stronger study suggests that the finding is accurate. In the case of insignificant results, sample size is an important consideration. When the sample is small, there is a possibility of a Type II error. In this study the sample size was relatively large, so Type II error is unlikely. The researchers suggest that the insignificant results may have been due to inadequate intervention intensity or the participants’ disease condition. FROM THE EVIDENCE 6-1 Although not typically considered an effect size measure, the percentage of improvement provides an estimate of the magnitude of improvement. Step width had the greatest percentage of improvement in this study. FROM THE EVIDENCE 6-2 Assignment Rosenthal effect Hawthorne effect 4366_Ch06_103-126.indd 124 X X X Random assignment and the fact that the groups were equal at baseline suggest that there are no important differences between the groups at the outset of the study. The therapists were not blind to group assignment, and their expectations from the participants could influence the outcomes. This is not a likely threat because the researchers controlled for this by making sure both groups received equal amounts of attention. Yes, the threat to mortality in this study would be even greater because the proportion of participants lost is greater. In this study, almost 50% of the participants dropped out of the control condition. If 6 individuals dropped out of a control group of 100, you would be less concerned about mortality as a threat. FROM THE EVIDENCE 6-3 This design is particularly strong in showing cause-andeffect relationships. If there is a marked difference with TAP compared with TYP (typical reminders) within each group, and this occurs both times and occurs similarly for both groups, you can be relatively confident that the difference is due to the intervention. FROM THE EVIDENCE 6-4 It is unreasonable to assume that the participants are blind, because they would know if they are receiving the Wii treatment, and the intervention leaders would know what treatment they are providing. However, the testers could be blind to group assignment. Therefore, this eliminates bias that the testers might have when administering the outcome measures. FROM THE EVIDENCE 6-5 Question 1 could be answered by examining the results of the comparison of the two groups that received vitamin D treatment without whole body vibration. Question 2 could be answered by comparing each of the whole body vibration plus vitamin D groups with the vitamin D alone groups. 28/10/16 5:20 pm CHAPTER 6 ● Choosing Interventions for Practice Question 3 cannot be answered because there is no control/ comparison group that does not receive treatment; however, you can examine the within-group results to determine if any group improves from pretest to posttest; if not, you know that the treatments were not effective. However, if the within-group improvement is significant, you do not know if this would be better than no treatment. FROM THE EVIDENCE 6-6 Only 2 of the 7 participants improved their time on task during the intervention, and many actually had poorer performance during the intervention, so overall the intervention does not appear to be effective. FROM THE EVIDENCE 6-7 There are many possible answers, but some alternative explanations include: Individuals who were more motivated to participate in therapy may have received more therapy minutes. It could be their motivation instead of the therapy minutes that influenced the outcomes. The therapists who evaluated the clients on the FIM may have provided higher ratings to those individuals who spent more time in therapy. The therapists could be biased toward wanting therapy to result in better outcomes. REFERENCES Baillet, A., Payraud, E., Niderprim, V. A., Nissen, M. J., Allenet, B., Francois, P., . . . Gaudin, P. (2009). A dynamic exercise programme to improve patients’ disability in rheumatoid arthritis: A prospective randomized controlled trial. Rheumatology, 48, 410–415. Beling, J., & Roller, M. (2009). Multifactorial intervention with balance training as a core component among fall-prone older adults. Journal of Geriatric Physical Therapy, 32, 125–133. Bhatt, T., & Pai, Y. C. (2009). Prevention of slip-related backward balance loss: The effect of session intensity and frequency on long-term retention. Archives of Physical Medicine and Rehabilitation, 90, 34–42. Catalan-Matamoros, D., Helvik-Skjaerven, L., Labajos-Manzanares, M. T., Martínez-de-Salazar-Arbolea, A., & Sánchez-Guerrero, E. (2011). A pilot study on the effect of Basic Body Awareness Therapy in patients with eating disorders: A randomized controlled trial. Clinical Rehabilitation, 25(7), 617-626. doi:10.1177/0269215510394223 (Epub 2011 Mar 14). Chang, H. Y., Chou, K. Y., Lin, J. J., Lin, C. F., & Wang, C. H. (2010). Immediate effect of forearm Kinesio taping on maximal grip strength and force sense in healthy collegiate athletes. Physical Therapy in Sport, 11, 122–127. Collins, A., & Dworkin, R. J. (2011). Pilot study of the effectiveness of weighted vests. American Journal of Occupational Therapy, 65, 688–694. Donat, H., & Ozcan, A. (2007). Comparison of the effectiveness of two programmes on older adults at risk of falling: Unsupervised home exercise and supervised group exercise. Clinical Rehabilitation, 21, 273–283. 4366_Ch06_103-126.indd 125 125 Ellis, C., Lindrooth, R. C., & Horner, J. (2013). Retrospective costeffectiveness analysis of treatments for aphasia: An approach using experimental data. American Journal of Speech and Language Pathology, 23(2), 186–195. doi:10.1044/2013_AJSLP-13-0037 Ferguson, G. D., Jelsma, D., Jelsma, J., & Smits-Engelsman, B. C. (2013). The efficacy of two task-orientated interventions for children with developmental coordination disorder: Neuromotor task training and Nintendo Wii Fit training. Research in Developmental Disabilities, 34, 2449–2461. Frick, K. D., Kung, J. Y., Parrish, J. M., & Narrett, M. J. (2010). Evaluating the cost-effectiveness of fall prevention programs that reduce fall-related hip fractures in older adults. Journal of the American Geriatric Society, 58, 136–141. Jutkowitz, E., Gitlin, L. N., Pizzi, L. T., Lee, E., & Dennis, M. P. (2012). Cost effectiveness of a home-based intervention that helps functionally vulnerable older adults age in place at home. Journal of Aging Research, 2012, 680265. doi:10.1155/2012/680265 Kerse, N., Butler, M., Robinson, E., & Todd, M. (2004). Fall prevention in residential care: A cluster, randomized controlled trial. Journal of the American Geriatric Society, 52, 524–531. Keser, I., Kirdi, N., Meric, A., Kurne, A. T., & Karabudak, R. (2013). Comparing routine neurorehabilitation program with trunk exercises based on Bobath concept in multiple sclerosis: Pilot study. Journal of Rehabilitation and Research Developments, 50, 133–140. Krawczyk, M., Sidaway, M., Radwanska, A., Zaborska, J., Ujma, R., & Czlonkowska, A. (2012). Effects of sling and voluntary constraint during constraint-induced movement therapy for the arm after stroke: A randomized, prospective, single-centre, blinded observer rated study. Clinical Rehabilitation, 26, 990–998. Kreisman, B. M., Mazevski, A. G., Schum, D. J., & Sockalingam, R. (2010, March). Improvements in speech understanding with wireless binaural broadband digital hearing instruments in adults with sensorineural hearing loss. Trends in Amplification, 14(1), 3–11. [Epub 2010 May 10]. doi:10.1177/1084713810364396 Lemoncello, R., Sohlberg, M. M., Fickas, S., & Prideaux, J. (2011). A randomised controlled crossover trial evaluating Television Assisted Prompting (TAP) for adults with acquired brain injury. Neuropsychological Rehabilitation, 21, 825–826. Li, F., Harmer, P., Stock, R., Fitzgerald, K., Stevens, J., Gladieux, M., . . . Voit, J. (2013). Implementing an evidence-based fall prevention program in an outpatient clinical setting. Journal of the American Geriatrics Society, 61(12), 2142–2149. Maher, C. G., Sherrington, C., Herbert, R. D., Moseley, A. M., & Elkins, M. (2003). Reliability of the PEDro Scale for rating quality of randomized controlled trials. Physical Therapy, 83, 713–721. Murray, E., McCabe, P., & Ballard, K. J. (2012). A comparison of two treatments for childhood apraxia of speech: Methods and treatment protocol for a parallel group randomised control trial. BMC Pediatrics, 12, 112. Seo, K.E., & Park, T.J. (2016). Effects of gyrokinesis exercise on the gait pattern of female patients with chronic low back pain. Journal of Physical Therapy Science, 28, 511–514. Verschueren, S. M., Bogaerts, A., Delecluse, C., Claessens, A. L., Haentjens, P., Vanderschueren, D., & Boonen, S. (2011). The effects of whole-body vibration training and vitamin D supplementation on muscle strength, muscle mass, and bone density in institutionalized elderly women: A 6-month randomized, controlled trial. Journal of Bone and Mineral Research, 26, 42–49. Wang, H., Camiciam, N. M., Terdiman, J., Mannava, M. K., Sidney, S., & Sandel, M. E. (2013). Daily treatment time and functional gains of stroke patients during inpatient rehabilitation. Physical Medicine and Rehabilitation, 5, 122–128. 28/10/16 5:20 pm 4366_Ch06_103-126.indd 126 28/10/16 5:20 pm “In school, you’re taught a lesson and then given a test. In life, you’re given a test that teaches you a lesson.” —Tom Bodett, author and radio host 7 Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests CHAPTER OUTLINE LEARNING OUTCOMES KEY TERMS INTRODUCTION TYPES OF SCORING AND MEASURES Continuous Versus Discrete Data Norm-Referenced Versus Criterion-Referenced Measures Inter-Rater Reliability Internal Consistency TEST VALIDITY Construct Validity Sensitivity and Specificity Relationship Between Reliability and Validity RESPONSIVENESS Norm-Referenced Measures CRITICAL THINKING QUESTIONS Criterion-Referenced Measures ANSWERS TEST RELIABILITY REFERENCES Standardized Tests Test-Retest Reliability LEARNING OUTCOMES 1. Distinguish between continuous and discrete data. 2. Distinguish between norm-referenced and criterion-referenced measures. 3. Evaluate sensitivity and specificity for a given measure. 4. Identify the types of reliability, validity, and/or responsiveness examined in a particular study. 5. Match the psychometric properties of a measure with the necessary qualities of a measure, given a specific clinical situation. 127 4366_Ch07_127-144.indd 127 02/11/16 11:38 AM 128 CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests KEY TERMS ceiling effect Likert scale clinically significant difference measurement error concurrent validity construct validity continuous data convergent validity criterion-referenced Cronbach’s alpha dichotomous data discrete data discriminant validity divergent validity external responsiveness floor effect internal consistency internal responsiveness inter-rater reliability intra-class correlation coefficient (ICC) method error minimally clinically important difference (MCID) norm-referenced predictive validity provocative test psychometric properties reliability responsive measure sensitivity specificity standardized test statistically significant difference test-retest reliability trait error validity INTRODUCTION O ccupational, physical, and speech therapists administer assessments to determine the needs and problems of clients and determine the outcomes of interventions. However, what if the score a therapist obtains when administering a test is inaccurate? Or perhaps the therapist administers a measure of motor performance, but the client’s cognition influences the results more than his or her motor skills? In these cases, the therapist could fail to identify an impairment and/or incorrectly assess a client’s abilities. In addition, the therapist may not learn the precise outcomes of an intervention. The ability to develop an effective intervention plan and evaluate an intervention depends on the accuracy of the results obtained from assessment measures. In selecting assessments, therapists need to consider many psychometric properties, such as reliability and validity of the assessment, sensitivity/ specificity, and the ability of an assessment to detect change. 4366_Ch07_127-144.indd 128 Psychometric properties are the quantifiable characteristics of a test that reflect its consistency and accuracy. Studies examining the psychometric properties of an assessment provide therapists with the information they need to select the best available test for a specific client. This chapter describes how to evaluate the evidence associated with assessments that therapists use every day in practice and then use this information to select the right tool. TYPES OF SCORING AND MEASURES During the assessment process, therapists typically obtain a score and then interpret that score to give it meaning. The process of interpreting scores varies with the type of scoring (continuous or discrete) and the type of measure (norm-referenced or criterion-referenced). Continuous Versus Discrete Data Continuous data result from a test in which the score can be any value within a particular continuum. For example, range of motion is continuous data expressed in terms of the number of degrees of movement within a 360-degree range (although each joint has its own restriction of range). Discrete data, also referred to as categorical data, are obtained when classifying individuals or their performance into groups, such as gender or diagnosis. Discrete data can also be given numerical values, although the numbers assigned reflect a category more than a quantity; that is, the numbers typically indicate a construct. For example, in manual muscle testing, a client’s strength is categorized as fair and assigned a grade of 3 when that client can complete the range of motion against gravity. In this case, the numbers do indicate a range of less to greater muscle strength, but they do not reflect a quantified value (i.e., a manual muscle test score of 4 is not twice a manual muscle score of 2). Dichotomous data are a type of discrete data with only two categories: Typically something exists or does not exist. This type of data may describe a condition (e.g., a child does or does not have autism, or an athlete does or does not have a ligament tear) or the status of an individual (e.g., a client is or is not hospitalized, or a client does or does not recover). With continuous data, the numbers reflect a real value, such as the quantity of correct answers, the time taken to complete a task, or the decibels of hearing loss. With discrete data, categories are assigned and possibly given a numerical value. In part, it is important to know the difference between continuous and discrete data because different statistical methods are used to analyze the different types of data. These methods are described in more detail later in this chapter. 02/11/16 11:38 AM CHAPTER 7 ● Some scoring methods, such as Likert scales, can be difficult to classify into discrete or continuous categories. For example, in a Likert scale, individuals respond to a range of responses, most typically on a continuum, such as “strongly disagree” to “strongly agree,” or “never” to “always” (Fig. 7-1). These responses are then assigned numerical ratings. Multiple items using this type of response rating are added together to obtain a score for the scale. Although the numbers indicate increasingly greater amounts of some construct, such as agreement, the difference between a 1 and 2 may not be equal to the difference between a 2 and 3. For example, a healthy lifestyle measure may include an item about eating breakfast (to which the respondent indicates “rarely”) and eating a serving of fruit (to which the respondent indicates “sometimes”)—yet the respondent could have eaten breakfast one day of the week and fruit four days of the week. Although “rarely” and “sometimes” seem reasonable responses, in this example, “rarely” occurred 1 day more than “never,” and “sometimes” occurred 3 days more than “rarely.” Nevertheless, in practice Likert scales are commonly considered to provide continuous data. Although the practice of treating Likert scales as continuous data is somewhat controversial, there is logical and statistical support indicating that Likert scales perform similarly to scales with equal intervals (Carifio & Perla, 2007). This is particularly true when the scale comprises more items. When creating an assessment measure with a Likert scale, test developers often include items that are written to suggest both the presence and the absence of the construct being measured. For example, on a scale addressing social play, the following two items may be included: • “My child approaches unfamiliar children in play situations.” • “My child avoids unfamiliar children in play situations.” These items would then be reverse-scored, so that on a five-point scale, if both items were marked as “always,” the first item would receive a score of five, and the second item a score of one. Including opposing items can prevent the respondent from ignoring the content of the question and marking all items with the same response. When all of the items are marked at the same end of the scale, it 1 FIGURE 7-1 Example of a Likert scale. 4366_Ch07_127-144.indd 129 129 Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests suggests that the respondent was not paying attention to the items. Norm-Referenced Versus Criterion-Referenced Measures It is important for evidence-based practitioners to understand the differences between norm- and criterionreferenced measures, so that they can properly evaluate the evidence when determining how to use and interpret a particular measure. In general, the differences hinge on whether you want to know how an individual performs in comparison to others (norm-referenced) or how well an individual has mastered a particular skill (criterionreferenced). Norm-Referenced Measures With a norm-referenced measure, a client’s scores are compared with those of other individuals. The purpose of a norm-referenced test is to discriminate between individuals so as to determine if an individual’s abilities fall within or outside of a typical range of abilities. Each client’s scores are compared with those of a larger group. Generally a raw score is obtained; it is then converted to a standard score or percentile rank, which can better be interpreted. For example, an individual at the 90th percentile earns scores higher than 90 percent of the population. Norms have been established for many physical measures, such as grip strength and development milestones. Norm-referenced tests are often used to identify a particular impairment and in some cases to determine if a client is eligible for services. For example, the Patterned Elicitation Syntax Test is used to identify speech impairments. Norm-referenced tests are less likely to provide information on whether an individual has achieved a particular skill, and they may be limited in terms of sensitivity to change. For example, IQ tests are norm-referenced. IQ is a relatively stable construct and would not be a good measure to use to assess improvement or change in cognitive functioning. The most accurate norms are those derived from a large sample. Norms are always sample-dependent; for example, norms derived from a sample of adults should not be applied to children, and norms from males should not be applied to females. When selecting a measure, it is important to consider the similarity between the normative sample and 2 3 4 5 • Strongly disagree • Disagree • Neutral • Agree • Strongly agree • Never • Rarely • Sometimes • Often • Always 02/11/16 11:38 AM 130 CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests the client on important factors that could affect the outcomes of the test, such as age, gender, education, and culture. Other aspects of norms include the population from which the norms arose and how closely a client matches the sample. For example, if the norms are based on data gathered in forensic settings and your client has no history of legal trouble, it would be inaccurate to apply these norms to your client. Criterion-Referenced Measures When therapists are interested in an individual’s specific ability or response to an intervention (versus how that person compares with others), a criterion-referenced measure is appropriate. A criterion-referenced test is based on a standard or fixed point, which is established by experts. The individual is then tested to determine how his or her performance compares with the established benchmark. The widely used Functional Independence Measure (FIM) is an example of a criterion-referenced test. The individual is rated on items such as eating, locomotion, and social interaction on a scale of 0 (activity does not occur) to 8 (complete independence), and the criterion is complete independence. The measure is typically administered in rehabilitation settings at admission and discharge so that improvement can be measured. Another example of a criterion-referenced test is the School Function Assessment, which examines the performance of children with disabilities in nonacademic activities, such as eating in the lunchroom and being transported to and from school. 69 without balance problems are expected to score between 54 and 56. EXERCISE 7-1 Distinguishing Between Discrete and Continuous Data and Norm-Referenced and Criterion-Referenced Measures (LO1 and LO2) QUESTIONS For each of the following descriptions, identify whether the measure uses discrete or continuous data and whether it is a norm- or criterion-referenced measure. 2. Glasgow Coma Scale: Individuals are rated on a sixpoint scale in the areas of eye opening and verbal and motor responses. The total score is the cumulative score of the three responses and ranges from 3 (deep coma) to 15 (fully conscious). 1. Berg Balance Scale: Individuals are assessed using 15 functional tasks. For each task they can receive a score of 0 to 4, with a total score possible ranging from 4 to 56. Community-dwelling females aged 60 to Modified Glasgow Coma Scale 1 2 3 4 5 6 Eye Does not open eyes Opens eyes in response to painful stimuli Opens eyes in response to voice Opens eyes spontaneously N/A N/A Verbal Makes no sounds Incomprehensible sounds Utters inappro- Confused, priate words disoriented Oriented, converses normally N/A Motor Makes no movements Extension to painful stimuli (decerebrate response) Abnormal flex- Flexion/Withdrawal ion to painful to painful stimuli stimuli (decorticate response) Localizes painful stimuli Obeys commands Data from: Teasdale, G. M., & Jennett, B. (1976). Assessment and prognosis of coma after head injury. Acta Neurochirurgica, 34, 45–55. 4366_Ch07_127-144.indd 130 02/11/16 11:38 AM CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests 3. The Phonological Awareness Test 2 assesses a student’s awareness of oral language segments that comprise words (i.e., syllables and phonemes). Students receive a score of 1 for a correct response and 0 for an incorrect response. There are eight subscales (e.g., rhyming discrimination and production), which are scored by adding the number of correct responses for each item on a subscale. The raw score for each subscale can be compared with age equivalents and percentile ranks. 4. Phalen’s Test: A provocative test for diagnosing carpal tunnel syndrome. The individual is placed in 90 degrees of wrist flexion under the influence of gravity and holds the position for 1 minute. If the symptoms of carpal tunnel syndrome are elicited, this test is positive. 131 perfectly reliable; however, reliability can be enhanced by reducing systematic error. One method for increasing reliability is to increase the number of test items or the number of skills or behaviors to be evaluated. With more items, it is less likely that a single item will unduly influence the outcome of the test. As a student with test-taking experience, you understand that missing one item on a five-point test has a much greater impact on your overall score than missing one item on a 50-point test. A test with five well-constructed items is still better than a test with 50 poorly constructed items, but with all other characteristics being equal, a test with more items is more reliable than a test with fewer items. In practice, of course, there are often time restraints, and a long assessment may be impractical, but this aspect of reliability should always be considered. In cases in which a brief measure is used, it is even more important that the components of the test have good reliability. Standardized Tests A standardized test is one in which the administration of the test is the same for all clients. In a standardized test there are specific procedures to be followed, such as specifications TEST RELIABILITY Reliability describes the stability of a test score. A reliable assessment measure is one for which the scores are expected to be trustworthy and consistent. From a measurement theory perspective, a reliable test is one in which measurement error (i.e., the difference between a true score and an individual’s actual score) is reduced. Theoretically, a true score exists in which some construct is perfectly measured for an individual. However, in the real world, measurement is flawed; some degree of error will occur, so that an individual’s actual score is a reflection of his or her true score plus error (which can either lower or raise the true score). Some error is to be expected in the context of administering an assessment in the real world. One type of measurement error is method error. Method error occurs when there is a discrepancy between an individual’s true potential test score and his or her actual test score due to an aspect of the testing situation or the test itself. For example, distractions in the room or a biased or inexperienced test administrator would result in method errors. Trait error occurs when aspects of the test taker, such as fatigue or poor test-taking skills, interfere with his or her true score. As with validity, reliability is measured on a continuum. No assessment is perfect, and therefore no assessment is 4366_Ch07_127-144.indd 131 Understanding Statistics 7-1 Reliability is measured with a reliability coefficient. There are different types of reliability coefficients (see Understanding Statistics 7-2), but these coefficients can be interpreted similarly. Reliability estimates range from 0 to 1.0. A reliability coefficient of 1.0. means that all variability is due to true differences, and a reliability coefficient of 0 indicates that all variability is due to error. The reliability coefficient represents the amount of variability that is due to the true score. For example, a reliability coefficient of 0.85 means that 85 percent of variability is due to the true score, and 15 percent is due to error. Unlike the relationship statistics explained in Chapter 4, the correlation coefficient for reliability is not squared to determine variance. Adequate reliability is always a judgment call (Lance, Butts, & Michels, 2006). Although a standard of .80 is sometimes used (Nunnally, 1978) as a minimal requirement, in some instances a much higher standard may be necessary (e.g., using a test to determine if someone should be given a risky treatment). In another situation, a lower standard may be logical if meeting the reliability standard would require selection of an invalid test. 02/11/16 11:38 AM 132 CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests regarding the environment in which the test is administered, the tools that are used, the instructions that are provided to the client, and the manner in which the test is scored. For example, the Executive Function Performance Test (EFPT) includes four tasks (cooking, telephone use, medication management, and bill paying) that are administered using specific instruction with graded cues provided by the therapist (Baum, Morrison, Hahn, & Edwards, 2003). A scoring rubric is applied to rate executive function components, capacity for independent functioning, and type of assistance needed. A standardized test can be either norm-referenced or criterion-referenced. The process of administering a standardized test enhances reliability because it reduces variability in the testing situation of clients. When a test is standardized, it is important to follow the specific instructions for administration. If a therapist modifies the instructions, the scoring (normative or criterion) is no longer valid. Standardization is a desirable characteristic of a test; however, in some cases the standardized environment can limit the therapist’s ability to assess real-world performance. For example, assessing driving skills in a standardized driving simulator provides a more consistent method of testing, but the driving simulator creates an unnatural environment that may not represent some aspects of on-the-road driving. In many clinical situations, it is desirable to also include nonstandardized methods of observation in the assessment process. For example, observing a child at play or an adult performing an activity of daily living at home will capture information that may not be obtained in a more structured testing situation. Test-Retest Reliability If there are no changes in the client, the re-administration of a test should result in the same score as the original administration. If so, this suggests that the test provides consistent results. Test-retest reliability provides an estimate of the stability of a test over time. In a study examining test-retest reliability, the same test is given to a group of individuals at two points in time. The analysis examines the similarity in scores for each individual at the two time points. Although there are no definitive rules for the length of time between test administrations, the amount of time between tests is a critical issue in test-retest reliability. It is important for enough time to elapse so that the benefits from memory or practice do not influence the results, but not enough time so that history or maturation affects the scores. The type of measure and the specific individual being tested will affect this decision. For example, the time lapse for retesting of a developmental measure such as the Denver Developmental Screening should be much 4366_Ch07_127-144.indd 132 briefer for infants than for older children, as developmental changes occur more quickly in infants. Inter-Rater Reliability The instructions and methods for administering and scoring a measure should be clear enough that multiple testers can perform the procedures consistently and obtain similar results. When a test has strong inter-rater reliability, you can assume that different raters will arrive at comparable conclusions. In studies that examine inter-rater reliability, it is useful for the raters to conduct the assessment with the same client at the same point in time, so that other factors that could influence measurement error are reduced. This is sometimes accomplished by several raters watching an assessment or even the videotape of an assessment. A measure with poor inter-rater reliability is less likely to result in dependable scores. For example, if two raters are evaluating driving performance, and one rater gives much higher scores than another rater, the more accurate score is unknown. When using a categorical measure, Cohen’s kappa is typically used to assess inter-rater reliability. Kappa depicts the amount of agreement between raters and, like the other reliability coefficients, ranges from 0 to 1.0. For example, Phalen’s test is a measure that determines categorically if someone does or does not have carpal tunnel syndrome. One review of inter-rater reliability studies of the Phalen’s test found kappa ratings of 0.52 to 0.79 (Walker-Bone, Palmer, Reading, & Cooper, 2003). Understanding Statistics 7-2 Pearson correlations may be used for test-retest or inter-rater reliability, but more frequently researchers use the intra-class correlation coefficient (ICC). When calculating reliability coefficients, there must be multiple administrations of the test. With test-retest reliability, the test is given at least two times to the same person. With inter-rater reliability, at least two testers administer the same test. Similar to other correlation coefficients, the value of the ICC ranges from 0 to 1.0, with a higher number indicating greater stability. An ICC of 1.0 does not mean perfect agreement; rather, the difference between raters or testing times is perfectly predictable (e.g., every time rater 1’s scores increase by 2 points, rater 2’s scores increase by 4 points). Remember, when interpreting the amount of variability accounted for by the true score, the coefficient is not squared. 02/11/16 11:38 AM CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests From the Evidence 7-1 provides an example of a study that examines both test-retest reliability and inter-rater reliability. Internal Consistency Internal consistency refers to the unity or similarity of items on a multi-item measure. When examining internal consistency, each item is correlated with the total score or with the other items on the measure. An internally consistent measure will find consistency in how the items perform, suggesting that each item measures the same thing. 133 This type of reliability is relevant when it is expected that all of the items of a measure assess the same construct. For example, a measure of tactile defensiveness should only include items measuring that specific construct; it should exclude items that are more indicative of anxiety and irritability. Internal consistency is also pertinent with measures comprised of multiple subscales. In this case, internal consistency is examined within each subscale. With subscales, it is desirable for items within a subscale to be more highly correlated to the total score of that subscale and less correlated to other subscales. FROM THE EVIDENCE 7-1 Test-Retest and Inter-Rater Reliability Gailey, R. S., Gaunaurd, I. A., Raya, M. A., Roach, K. E., Linberg, A. A., Campbell, S. M., Jayne, D. M., & Scoville, C. (2013). Development and reliability testing of the Comprehensive High-Level Activity Mobility Predictor (CHAMP) in male servicemembers with traumatic lower-limb loss. Journal of Rehabilitation Research & Development, 50(7), 905–918. doi:10.1682/JRRD.2012.05.0099. The opportunity for wounded servicemembers (SMs) to return to high-level activity and return to duty has improved with advances in surgery, rehabilitation, and prosthetic technology. As a result, there is now a need for a high-level mobility outcome measure to assess progress toward high-level mobility during and after rehabilitation. The purpose of this study was to develop and determine the reliability of a new outcome measure called the Comprehensive High-Level Activity Mobility Predictor (CHAMP). The CHAMP consists of the Single Limb Stance, Edgren Side Step Test, T-Test, and Illinois Agility Test. CHAMP reliability was determined for SMs with lower-limb loss (LLL) (inter-rater: n = 118; test-retest: n = 111) and without LLL (n = 97). A linear system was developed to combine the CHAMP items and produce a composite score that ranges from 0 to 40, with higher scores indicating better performance. Inter-rater and test-retest intraclass correlation coefficient values for the CHAMP were 1.0 and 0.97, respectively. A CHAMP score equal to or greater than 33 points is within the range for SMs without LLL. The CHAMP was found to be a safe and reliable measure of high-level mobility in SMs with traumatic LLL. Note A: ICC values were calculated for both inter-rater reliability and test-retest reliability. FTE 7-1 Question 4366_Ch07_127-144.indd 133 Note B: Reliability was strong, with perfect agreement for inter-rater reliability. Provide an interpretation of the ICC for both test-retest reliability and inter-rater reliability. 02/11/16 11:38 AM 134 CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests Understanding Statistics 7-3 Internal consistency is often measured using Cronbach’s alpha; like the ICC, Cronbach’s alpha is measured on a scale of 0 to 1.0, with higher numbers indicating greater internal consistency. Unlike the ICC, a Cronbach’s alpha of 1.0 may be considered too high. When items are performing too similarly, this can indicate that items are redundant and may be unnecessary. With internal consistency, the measure need only be given one time; within this single administration, the items are correlated to one another. When using Cronbach’s alpha, it is helpful to have a large sample size in the study, which will result in more stable findings. In an example, Boyle (2013) developed the Self-Stigma of Stuttering Scale. The measure is made up of three subscales: stigma awareness, stereotype agreement, and selfstigma concurrence. The three subscales and the overall measure were assessed for internal consistency. The values obtained can be found in From the Evidence 7-2. TEST VALIDITY Validity refers to the ability of a test to measure what the test is intended to measure. A test of strength should measure strength and not coordination; a test of capacity to perform activities of daily living (ADLs) should measure the actual ability to carry out ADLs rather than someone’s perception of that ability. Validity, as discussed in this chapter, refers to measurement validity FROM THE EVIDENCE 7-2 Internal Consistency Boyle, M. P. (2013). Assessment of stigma associated with stuttering: Development and evaluation of the Self-Stigma of Stuttering Scale (4S). Journal of Speech, Language, and Hearing Research, 56, 1517–1529. doi:10.1044/1092-4388(2013/12-0280). Table 2. Reliability statistics for the 4S and subscales. Variable Overall 4S Stigma awareness Stereotype agreement Stigma self-concurrence Cronbach’s ␣ Test-retest correlation .87 .84 .70 .89 .80 .62 .55 .82 Note: The time between test and retest was approximately 2 weeks for 41 individuals who stutter. Note A: The internal consistency values, as measured by Cronbach’s alpha, are relatively high for the three subscales and the overall measure. FTE 7-2 Question Which of the Self-Stigma of Stuttering Scale subscales has the weakest internal consistency? For which subscale is internal consistency the highest? 4366_Ch07_127-144.indd 134 02/11/16 11:38 AM CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests 135 EVIDENCE IN THE REAL WORLD Improving the Internal Consistency of the Adolescent/Adult Sensory Profile I examined internal consistency when developing items for the Adolescent/Adult Sensory Profile (Brown & Dunn, 2002), which includes four subscales: low registration, sensation seeking, sensory sensitivity, and sensation avoiding. An early item included on the sensation-avoiding subscale was, “I like to wear sunglasses,” with the rationale that individuals who tended to avoid sensation would wear sunglasses more frequently than individuals who did not avoid sensation. The wearing of sunglasses would serve the purpose of reducing the intensity of visual input. However, the internal consistency analysis revealed that the item, “I like to wear sunglasses,” was more highly correlated with the sensation-seeking total score than it was with the sensation-avoiding total score. The reason for this mismatch is not known, but people likely wear sunglasses for reasons other than blocking the sun or other visual sensations. Sensation seekers may wear sunglasses more frequently for fashion or because it is more common for sensation-seeking activities. Whatever the reason, the internal consistency analysis indicated that the item did not perform as expected, so it was removed from the scale. and thus is different from research validity, as described in Chapter 5. Historically, test validity has been divided into different types, including concurrent, convergent, divergent, discriminant, construct, and predictive. However, more modern views of test theory suggest that all validity is construct validity. There are different methods of providing evidence to support the validity of a measure (Messick, 1995); therefore, the types of validity identified previously are now considered different approaches that come together to support the construct validity of a test. In short, concurrent, convergent, divergent, discriminant, and predictive validity together provide evidence for the construct validity of a test. The greater the cumulative evidence, the more confident one can be that an assessment measures the intended construct. Construct Validity As with reliability, determining the validity of a test is a process. A single study does not prove the validity of a test; rather, evidence is accumulated to support its validity. A classic method for examining construct validity is to correlate the scores of the index measure with an established gold standard. For example, the validity of the dynamometer for assessing grip strength has been compared to the gold standard of isokinetic testing (Stark et al, 2011). In isokinetic testing, sophisticated equipment is used to apply constant resistance over range of motion and speed. Recall that validity is a continuum, and some constructs are more difficult to measure than others. A true “gold standard” may not exist in some areas of testing (e.g., diagnosing autism or quality-of-life assessment). When a gold standard does not exist, the new measure is judged against the best measure available or one that 4366_Ch07_127-144.indd 135 is widely accepted in practice. For example, the validity of the Stroke Rehabilitation Assessment of Movement (STREAM) was evaluated by correlating the results of the STREAM with two established measures in stroke rehabilitation: the Functional Independence Measure and the Stroke Impact Scale (Ward, Pivko, Brooks, & Parkin, 2011). The terms concurrent validity and convergent validity are sometimes used to describe the process of correlating measures that are expected to yield similar results. Concurrent validity and convergent validity use the same process of finding relationships between the index measure and other measures of the same construct to support construct validity. The difference between the two types of validity lies in their purpose. Concurrent validity is used to predict scores on another measure, whereas convergent validity is used to find evidence that the new measure is similar to a measure of the same construct. In the research literature, it can be difficult to distinguish between the two, and in the end doing so is not crucial. For example, in a study described as testing concurrent validity, a new version of the Pediatric Evaluation of Disability Inventory (PEDI-CAT), which was administered by computer, was compared with the original PEDI. There was strong support for concurrent validity, with r = 0.82 (Dumas & Fragala-Pinkham, 2012). It would not be a mistake to also describe this study as a convergent validity study. A measure should correlate with similar measures; likewise, a measure should not correlate with measures of constructs that are purported to be unrelated. In other words, a test with strong validity should deviate from dissimilar constructs. For example, a test of motor function should not require visual skills to earn a good score. This process of examining validity is sometimes described as divergent validity. Giesinger, Kuster, Behrend, and 02/11/16 11:38 AM 136 CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests Giesinger (2013) found problems with the divergent validity of the Western Ontario and McMaster University Osteoarthritis Index (WOMAC). This widely used measure of pain, stiffness, and function for individuals with hip and knee osteoarthritis was highly related to measures of psychological functioning. Thus, this relationship indicates that the WOMAC measures the physical functions it professes to measure, but in addition captures some aspects of emotion experienced by individuals with osteoarthritis and thereby suggests some limits to the construct validity of the measure. Another way to assess the construct validity of a measure involves comparing scores for two groups of individuals (those who would be expected to possess the construct that is measured and those who would not). A valid measure should be able to discriminate between the two groups; hence, this process is often described as discriminant validity. For example, a discriminant validity study of the Sensory Profile found that children with autism had lower sensory processing scores compared with typically developing children on all subscales (Brown, Leo, & Austin, 2008). Predictive validity is a very important type of validity for clinical practice. An instrument that has predictive validity as a component of construct validity examines the accuracy of a measure in determining future performance. Pankratz (2007) examined the usefulness of the Renfrew Bus Story, which assesses language skills in children through the retelling of a story. This study found that the measure was useful for predicting language impairments three years later. The accumulation of all types of validity evidence provides support for or against the collective construct validity of a measure. Figure 7-2 illustrates how the different types of validity come together to make up construct validity. Concurrent—the measure can predict performance on another measure with the same construct Predictive—the measure predicts future performance in a direction that is consistent with the construct Understanding Statistics 7-4 When examining the relationship between two measures, a correlation coefficient is calculated (typically using a Pearson product moment correlation or a Spearman correlation). Therefore, an r value indicates the strength of the relationship between the two measures, with values ranging from 0 to 1.0. Sensitivity and Specificity Sensitivity and specificity are considered to be aspects of validity because they are related to the predictive validity of an assessment. Sensitivity is the ability of a test to detect a condition when it is present, which is also known as a true positive. A sensitive test will accurately identify individuals who have a specific condition or diagnosis. However, in doing so, overdiagnosis may occur, such that people are diagnosed with a condition they do not have. This is known as a false positive. Specificity is the ability of a test to avoid detecting a condition when it does not exist, otherwise known as a true negative. Likewise, mistakes can occur with specificity, and some individuals may be missed who do have the condition, resulting in a false negative. Figure 7-3 illustrates the concepts of sensitivity and specificity. There is often a trade-off between sensitivity and specificity. When a test is made more sensitive, the changes will likely be detrimental to specificity, and vice versa. The choice of greater sensitivity versus greater specificity often depends on the situation. For example, in a study addressing fall prevention, researchers may strive for greater sensitivity because they are willing to Convergent—the measure performs similarly to another measure with the same construct Construct Validity Divergent—the measure performs differently than another measure of a different construct Discriminant—the measure discriminates between different groups of people, one that is expected to possess the construct and another that is not FIGURE 7-2 Different types of validity contribute to the overall construct validity of a measure. 4366_Ch07_127-144.indd 136 02/11/16 11:38 AM CHAPTER 7 High sensitivity Few false negatives (blue) ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests Low specificity Many false positives (black) TABLE 7-1 Sensitivity and Specificity Result Test Positive Result Failed test Passed test Low sensitivity Many false negatives (blue) High specificity Few false positives (black) 137 True Status Condition present Condition not present True positive A False positive B Negative False negative C True negative D provokes the condition. For example, O’Brien’s Test, which involves comparing the pain experienced with resistance to shoulder flexion with the thumb up and down, is used to diagnosis a labral tear of the shoulder. If pain is present when the thumb is down, but not when the thumb is up, the results are considered positive for a tear. Sensitivity and specificity were examined in a study of the O’Brien’s Test (McFarland, Kim, & Savino, 2012). Failed test Passed test EXERCISE 7-2 FIGURE 7-3 Sensitivity and specificity. identify more people at risk than is actually the case, given that the intervention is relatively noninvasive and they do not want to miss people who might fall. In contrast, in a study related to autism diagnosis, the researcher might desire greater specificity, because there is the potential for greater negative consequences by misdiagnosing individuals. Sensitivity and specificity data are useful when examining the validity of provocative tests used to diagnose physiological abnormalities. With a provocative test, an abnormality is induced through a manipulation that Determining Sensitivity and Specificity (LO3) McFarland, Kim, and Savino (2012) calculated sensitivity and specificity by comparing results of O’Brien’s Test to labral tears identified with diagnostic arthroscopy. There were 371 controls (individuals without a tear) and 38 individuals with a tear. The 2 ⫻ 2 table from the study would look like this: Sensitivity and Specificity for O’Brien’s Test Diagnostic Arthroscopy Sensitivity and specificity are relevant only for tests that result in dichotomous decisions (i.e., the condition does or does not exist). There are formulas for sensitivity and specificity based on a 2 ⫻ 2 table (Table 7-1). Sensitivity ⫽ a/(a ⫹ c) Specificity ⫽ d/(b ⫹ d) 4366_Ch07_127-144.indd 137 O’Brien’s Test Understanding Statistics 7-5 Result Tear No Tear Positive 18 168 Negative 20 203 Data from: McFarland, E. G., Kim, T. K., & Savino, R. M. (2012). Clinical assessment of three common tests for superior labral anterior-posterior lesions. Archives of Clinical Neuropsychology, 27, 781–789. 02/11/16 11:38 AM 138 CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests QUESTIONS 1. Use the formula to calculate: Sensitivity = Specificity = 2. Based on the results of the McFarland et al (2012) study, what conclusions would you draw about the sensitivity and specificity of O’Brien’s Test? Tests with continuous data can still provide dichotomous results by establishing a cut-off score at which the condition is classified as present. For example, Wang and colleagues (2012) developed the Route Map Recall Test (RMRT) to predict “getting lost” behavior in individuals with dementia. The test involves using a pen to designate the route on a paper map. The total possible score is 104, and the developers established a cut-off score of 93.5, meaning that individuals who scored below the “cut” score were at risk of getting lost. Scores on the RMRT were then compared to actual getting-lost behavior, as reported by caregivers of individuals with mild dementia. The results indicated a sensitivity of 100% and specificity of 67%; all of the individuals who did get lost were accurately identified, but many individuals who did not get lost were misclassified. From the Evidence 7-3 provides another example of sensitivity/specificity analysis with a swallow test (Crary et al, 2013). Relationship Between Reliability and Validity It is possible for a test to have excellent reliability but poor validity. The test may provide consistent results (i.e., have good inter-rater and test-retest reliability), but still not test what it is intended to test. However, a test with poor reliability can never have good validity, because reliability affects validity. If a test lacks consistency and stability, it cannot assess what it is intended to assess. RESPONSIVENESS Therapists frequently use measures that determine whether or not an individual has changed or improved during an intervention period or as a part of the natural healing or development process. In this case, it is important to use a measure that is responsive: A responsive measure is one that can detect change. Some characteristics of measures, known as floor and ceiling effects, can interfere with responsiveness. A floor effect means that a test is so difficult or the construct is 4366_Ch07_127-144.indd 138 so rare that almost all individuals receive the very lowest score. In this case, even when change occurs, the test may not be capable of identifying that a difference exists. For example, if an assessment measures clients’ driving ability by the number of car collisions they had during the previous year, a researcher may identify many individuals with no accidents; however, this does not necessarily mean that these individuals have no driving problems. With a ceiling effect, the test is too easy; everyone gets a very high score to begin with, and there is no room for improvement. For example, if a measure of independent living only includes the simpler tasks, such as dressing and feeding, and excludes more complex tasks, such as cooking and money management, depending on the sample, many individuals might get a perfect score, even though they have impairments in performing more complex activities. When children with disabilities are compared to typically developing children on a norm-referenced test, it may be difficult to detect change, because the child’s improvement will likely not keep pace with typical development. Even with consistent progress, children with a disability may still fall behind their typically developing peers. The inability to detect change in the children with disabilities on certain norm-referenced tests is due to the floor effect of the instrument. In this situation and others like it, criterion-referenced tests are more useful for detecting improvement over time. Another reason why criterion-referenced tests are more sensitive to change is that therapy is often focused on the criterion as opposed to a more general skill area. For example, a speech-language therapist may work on specific vocabulary words (the criterion) and then test on the words that were taught. In another example, an occupational therapist may work on cooking skills (independence in cooking is the criterion) and then specifically test improvement in cooking ability. There is no general consensus as to what constitutes a responsive measure. Husted, Cook, Farewell, and Gladman (2000) studied the difference between internal responsiveness and external responsiveness. Internal responsiveness means that a measure is able to detect change when a known effective treatment is assessed. For example, a systematic review of the Fugl Meyer Assessment of Motor Recovery After Stroke consistently found differences in clients receiving constraint-induced motor therapy (Shi, Tian, Yang, & Zhao, 2011). In this study, there was a statistically significant difference in the Fugl Meyer scores before and after the intervention. A statistically significant difference reflects internal responsiveness and means that, when scores were compared before and after treatment, there was a statistical difference. However, the magnitude of the difference may not be great enough for the clinician or client to find the difference meaningful. A clinically significant difference is a change that would be regarded by clinicians and the client as meaningful and important. 02/11/16 11:38 AM CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests 139 FROM THE EVIDENCE 7-3 Example of Sensitivity/Specificity Analysis Crary, M. A., Carnaby, G. D., Sia, I., Khanna, A., & Waters, M. F. (2013). Spontaneous swallowing frequency has potential to identify dysphagia in acute stroke. Stroke, 44(12), 3452–3457. doi:10.1161/STROKEAHA.113.003048. Note A: The established cut-off value of 0.40 is based on an average number of swallows/minute during a 30-minute time period. BACKGROUND AND PURPOSE: Spontaneous swallowing frequency has been described as an index of dysphagia in various health conditions. This study evaluated the potential of spontaneous swallow frequency analysis as a screening protocol for dysphagia in acute stroke. METHODS: In a cohort of 63 acute stroke cases, swallow frequency rates (swallows per minute [SPM]) were compared with stroke and swallow severity indices, age, time from stroke to assessment, and consciousness level. Mean differences in SPM were compared between patients with versus without clinically significant dysphagia. Receiver operating characteristic curve analysis was used to identify the optimal threshold in SPM, which was compared with a validated clinical dysphagia examination for identification of dysphagia cases. Time series analysis was used to identify the minimally adequate time period to complete spontaneous swallow frequency analysis. RESULTS: SPM correlated significantly with stroke and swallow severity indices but not with age, time from stroke onset, or consciousness level. Patients with dysphagia demonstrated significantly lower SPM rates. SPM differed by dysphagia severity. Receiver operating characteristic curve analysis yielded a threshold of SPM 0.40 that identified dysphagia (per the criterion referent) with 0.96 sensitivity, 0.68 specificity, and 0.96 negative predictive value. Time series analysis indicated that a 5- to 10-minute sampling window was sufficient to calculate spontaneous swallow frequency to identify dysphagia cases in acute stroke. CONCLUSIONS: Spontaneous swallowing frequency presents high potential to screen for dysphagia in acute stroke without the need for trained, available personnel. Note B: Sensitivity is higher than specificity; thus, there are more likely to be false positives than false negatives. FTE 7-3 Question In this study, sensitivity is better than specificity, with a cut-off score of less than 0.40. Would you raise or lower the cut-off score to improve specificity? Why? 4366_Ch07_127-144.indd 139 02/11/16 11:38 AM 140 CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests External responsiveness describes a measure that is able to detect a clinically significant difference. One term used to describe external responsiveness is minimally clinically important difference (MCID), or the amount of change on a particular measure that is deemed clinically important to the client. There are different methods for computing MCID, but typically the MCID index includes ratings from the client as to what constitutes meaningful change. For example, MCID was computed for the Six-Minute Walk test for individuals with chronic obstructive pulmonary disease (COPD) by utilizing the test before and after a rehabilitation program, and asking participants to rate the amount of change in their own walking after participating in the program (Holland et al, 2010). Individuals were then identified as having expressed that they made no change, some change, or substantial change. Using several methods, the researchers estimated a MICD of 25 meters; that is, someone would need to improve at least 25 meters on the Six-Minute Walk test for the change to be deemed clinically important. EXERCISE 7-3 Identifying Reliability, Validity, and Responsiveness Data From Studies of Measures (LO4) 3. Sears and Chung (2010) found that, when using the Michigan Hand Outcomes questionnaire (MHQ) as the gold standard, the Jebson Taylor Hand Function Test (JTT) tended to miss problems with hand function. Many patients with high scores on the JTT did not have high scores on the MHQ. EXERCISE 7-4 Matching the Psychometric Properties of a Measure With the Necessary Qualities of the Measure (LO5) QUESTIONS For the following situations, identify which psychometric property is most important when selecting the measure, and explain why. 1. Different therapists will be using the same measure to make judgments about client progress. QUESTIONS Based on your knowledge of how studies of assessments are designed and the statistical analyses that are done, identify the type of psychometric properties (test-retest reliability, inter-rater reliability, internal consistency, validity, sensitivity and specificity, responsiveness) evaluated in the following studies: 1. In a study of the Bruininks-Oseretsky Test of Motor Proficiency-Second Edition, Wuang and Su (2009) tested 100 children with intellectual disability at three points in time. They found an ICC of 0.99 and an alpha of 0.92. 2. A study of the Tinnitus Questionnaire (Adamchic et al, 2012) found that an improvement of -5 was enough improvement to be clinically meaningful. 4366_Ch07_127-144.indd 140 2. It is most important that the practitioner not incorrectly diagnose someone as having a cognitive impairment. 3. Identify small changes that occur from before to after an intervention program. 4. Accurately assess cognition without involvement of motor abilities. 02/11/16 11:38 AM CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests CRITICAL THINKING QUESTIONS 141 6. Can a test be equally sensitive and specific? Why, or why not? 1. In what situations would a therapist select a normreferenced test, and when is a criterion-referenced test more appropriate? 7. Why is sensitivity/specificity a type of validity? 2. How does standardization increase the reliability of a test? 8. Is it important for all measures to have good responsiveness? Why, or why not? 3. What types of measures are most relevant for studies of internal consistency? ANSWERS 4. What are the features of the three primary psychometric properties of reliability, validity, and responsiveness? 5. Explain the statement: “All validity is construct validity.” 4366_Ch07_127-144.indd 141 EXERCISE 7-1 1. Continuous data: Individuals are scored in terms of how well they perform on a number of tasks. Norm-referenced measure: The scores are compared against a standard, such as community-dwelling adults presented in the example. 2. Discrete data: Although different numbers are used to label each rating, the numbers represent a particular type of response and not a continuous score. The rating is more of a rank order than a total score. Criterionreferenced measure: The level of coma is described as criteria met, and the individual is not compared with a normative sample. 3. Continuous data: Although each item is rated dichotomously, a total subscale score is calculated based on the number correct. Norm-referenced measure: Whenever age equivalents are used, the raw score is compared to a normative sample. 02/11/16 11:38 AM 142 CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests 4. Discrete data: This is a dichotomous ranking of whether the diagnosis is present or not present. Criterion-referenced measure: A testee is not compared to a normative sample; rather, the individual meets the criteria for the diagnosis or does not meet the criteria for the diagnosis. EXERCISE 7-2 1. Sensitivity: 18/(18 ⫹ 20) ⫽ 47%; Specificity: 203/(203 ⫹ 168) ⫽ 55% 2. These numbers indicate that the O’Brien’s test was neither very sensitive nor very specific. Many people were misclassified as false positives and false negatives. EXERCISE 7-3 1. Test-retest reliability and internal consistency 2. Responsivity 3. Validity EXERCISE 7-4 1. Inter-rater reliability: Getting consistent scores from different therapists requires a measure with good inter-rater reliability. 2. Specificity: The practitioner wants to avoid false positives, and a measure with good specificity will limit the false positives. 3. Responsivity: To detect changes or improvement, one needs a measure that is responsive to change. 4. Validity (specifically convergent and divergent ability): Practitioners want the measure to be related to other cognitive measures, but not measures of motor ability. FROM THE EVIDENCE 7-1 There was perfect consistency between the raters for every individual who was assessed; that is, the raters scored the servicemembers in the same way. This does not necessarily mean that they provided the exact same score, but if rater 1 gave a high score, then rater 2 gave the same individual a high score in relation to the other individuals that tester rated. For test-retest reliability, 97 percent of the variability is accounted for by the true score and only 3 percent is due to error variance. FROM THE EVIDENCE 7-2 The subscale for stereotype agreement has the weakest internal consistency, at 0.70; the subscale for stigma self-concurrence has the strongest, at 0.89. All levels are considered adequate if using the criteria of Nunnally (1978), where the minimum standard for Cronbach’s alpha is 0.70. 4366_Ch07_127-144.indd 142 FROM THE EVIDENCE 7-3 You would raise the swallow rate because fewer swallows suggest more dysphagia. If you raise the number of swallows, you will reduce sensitivity (i.e., identify fewer people with dysphagia) but increase the specificity (i.e., avoid identifying people with dysphagia who do not really have dysphagia). REFERENCES Adamchic, I., Tass, P. A., Langquth, B., Hauptmann, C., Koller, M., Schecklmann, M., Zeman, F., & Landgrebe, M. (2012). Linking the Tinnitus Questionnaire and the subjective Clinical Global Impression: Which differences are clinically important. Health and Quality of Life Outcomes, 10, 79. Baum, C. M., Morrison, T., Hahn, M., & Edwards, D. (2003). Executive Function Performance Test: Test protocol booklet. St. Louis, MO: Washington University School of Medicine, Program in Occupational Therapy. Boyle, M. P. (2013). Assessment of stigma associated with stuttering: Development and evaluation of the Self-Stigma of Stuttering Scale (4S). Journal of Speech and Language and Hearing Research, 56, 1517–1529. Brown, C., & Dunn, W. (2002). Adolescent/Adult Sensory Profile. San Antonio, TX: Psychological Corp. Brown, T., Leo, M., & Austin, D. W. (2008). Discriminant validity of the Sensory Profile in Australian children with autism spectrum disorder. Physical and Occupational Therapy in Pediatrics, 28, 253–266. Carifio, J., & Perla, R. J. (2007). Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. Journal of Social Sciences, 3, 106–116. Crary, M. A., Carnaby, G. D., Sia, I., Khanna, A., & Waters, M. F. (2013, December). Spontaneous swallowing frequency has potential to identify dysphagia in acute stroke. Stroke, 44(12), 3452–3457 [Epub 2013 October 22]. doi:10.1161/STROKEAHA.113.003048 Dumas, H. M., & Fragala-Pinkham, M. A. (2012). Concurrent validity and reliability of the Pediatric Evaluation of Disability InventoryComputer Adaptive Test mobility domain. Pediatric Physical Therapy, 24, 171–176. Gailey, R. S., Gaunaurd, I. A., Raya, M. A., Roach, K. E., Linberg, A. A., Campbell, S. M., Jayne, D. M., & Scoville, C. (2013). Development and reliability testing of the Comprehensive High-Level Activity Mobility Predictor (CHAMP) in male servicemembers with traumatic lower-limb loss. Journal of Rehabilitation Research & Development, 50(7), 905–18. doi:10.1682/JRRD.2012.05.0099 Giesinger, J. M., Kuster, M. S., Behrend, H., & Giesinger, K. (2013). Association of psychological status and patient-reported physical outcome measures in joint arthroplasty: A lack of divergent validity. Health and Quality of Life Outcomes, 11, 64. Holland, A. E., Hill, C. J., Rasekaba, T., Lee, A., Naughton, M. T., & McDonald, C. F. (2010). Updating the minimal importance difference for Six-Minute Walk distance in patients with cardiopulmonary disorder. Archives of Physical Medicine and Rehabilitation, 91, 221–225. Husted, J. A., Cook, R. J., Farewell, V. T., & Gladman, D. D. (2000). Methods for assessing responsiveness: A critical review and recommendation. Journal of Clinical Epidemiology, 53, 459–468. Lance, C.E., Butts, M.M., & Michels, L.C. (2006). The sources of four commonly reported cutoff criteria: What did they really say? Organizational Research Methods, 9, 202–220. McFarland, E. G., Kim, T. K., & Savino, R. M. (2012). Clinical assessment of three common tests for superior labral anterior-posterior lesions. Archives of Clinical Neuropsychology, 27, 781–789. 02/11/16 11:38 AM CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues and Practice, 14(4), 5–8. Nunnally, C. (1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill. Pankratz, M. E. (2007). The diagnostic and predictive validity of the Renfrew bus story. Language, Speech & Hearing Services in Schools, 38(4), 390. Sears, E. D., & Chung, K. C. (2010). Validity and responsiveness of the Jebsen-Taylor Hand Function Test. Journal of Hand Surgery, 35, 30–37. Shi, Y. X., Tian, J. H., Yang, K. H., & Zhao, Y. (2011). Modified constraint-induced movement therapy versus traditional rehabilitation in patients with upper-extremity dysfunction after stroke: A systematic review and meta-analysis. Archives of Physical Medicine and Rehabilitation, 92, 972–982. Stark, T., Walker, B., Phillips, J. K., Feier, R., & Beck, R. (2011). Handheld dynamometry correlation with the gold standard isokinetic dynamometry: A systematic review. Physical Medicine and Rehabilitation, 3, 472–479. 4366_Ch07_127-144.indd 143 143 Teasdale, G. M., & Jennett, B. (1976). Assessment and prognosis of coma after head injury. Acta Neurochirurgica, 34, 45–55. Walker-Bone, K. E., Palmer, K. T., Reading, I., & Cooper, C. (2003). Criteria for assessing pain and nonarticular soft-tissue rheumatic disorders of the neck and upper limb. Seminars in Arthritis and Rheumatism, 33, 168–184. Wang, T. Y., Kuo, Y. C., Ma, H. I., Lee, C. C., & Pai, M. C. (2012). Validation of the Route Map Recall Test for getting lost behavior in Alzheimer’s disease patients. Archives of Clinical Neuropsychology, 27, 781–789. Ward, J., Pivko, S., Brooks, G., & Parkin, K. (2011). Validity of the Stroke Rehabilitation Assessment of Movement Scale in acute rehabilitation: A comparison with the Functional Independence Measure and Stroke Impact Scale-16. Physical Medicine and Rehabilitation, 3, 1013–1021. Wuang, Y. P., & Su, C. Y. (2009). Reliability and responsiveness of the Bruininks-Oseretsky Test of Motor Proficiency-Second Edition in children with intellectual disability. Research in Developmental Disabilities, 30, 847–855. 02/11/16 11:38 AM 4366_Ch07_127-144.indd 144 02/11/16 11:38 AM “The fact that an opinion has been widely held is no evidence whatever that it is not utterly absurd; indeed, in view of the silliness of the majority of mankind, a widespread belief is more likely to be foolish than sensible.” —Bertrand Russell (1872-1970) a Nobel prize winner for literature (Russell was a philosopher, mathematician and social activist) 8 Descriptive and Predictive Research Designs Understanding Conditions and Making Clinical Predictions CHAPTER OUTLINE LEARNING OUTCOMES KEY TERMS Predictive Studies Using Group Comparison Methods INTRODUCTION Case-Control Studies DESCRIPTIVE RESEARCH FOR UNDERSTANDING CONDITIONS AND POPULATIONS Cohort Studies Incidence and Prevalence Studies Group Comparison Studies Survey Research STUDY DESIGNS TO PREDICT AN OUTCOME Predictive Studies Using Correlational Methods Simple Prediction Between Two Variables EVALUATING DESCRIPTIVE AND PREDICTIVE STUDIES LEVELS OF EVIDENCE FOR PROGNOSTIC STUDIES CRITICAL THINKING QUESTIONS ANSWERS REFERENCES Multiple Predictors for a Single Outcome LEARNING OUTCOMES 1. Identify research designs that answer clinical descriptive and predictive questions. 2. Identify the appropriate statistical analysis for specific clinical descriptive and predictive research questions. 3. Evaluate the strength of the evidence of a given published study. 145 4366_Ch08_145-162.indd 145 28/10/16 12:11 pm 146 CHAPTER 8 ● Descriptive and Predictive Research Designs KEY TERMS case-control design odds ratio correlation predictive studies correlational studies prevalence descriptive studies epidemiology ex post facto comparison hazard ratio incidence multicollinearity prospective cohort study response rate Incidence and Prevalence Studies retrospective cohort study Epidemiology, which is the study of health conditions in populations, includes descriptive research methods aimed at identifying the incidence and prevalence of specific conditions. As a body of research intended to describe a population, these studies often use large numbers of research participants. To obtain large samples of data for epidemiological studies, researchers may access major health-care databases such as the National Health and Nutrition Examination Surveys (NHANES) (Centers for Disease Control and Prevention [CDC], n.d.) and the Centers for Medicare & Medicaid Services (CMS, n.d.). NHANES includes a large national sample of individuals who have participated in health interviews and physical examinations, and CMS maintains data on the health services and outcomes of Medicare and Medicaid recipients. Rehabilitation professionals are particularly interested in epidemiological studies that consider the functional aspects of health conditions, such as the incidence and prevalence of language impairments, ambulation problems, and activity limitations. Incidence is the frequency of new occurrences of a condition during a specific time period. Incidence is calculated as the number of new cases during a time period, divided by the total population at risk. The focus on “new” cases is a distinguishing characteristic of incidence studies. For example, Huang et al (2014) studied the incidence of dysphagia after surgery in children with traumatic brain injury. They collected data for 8 years, from 2000 to 2008, and found an incidence of 12.3 percent. This means that, of all of the children in the study sample who experienced surgery for traumatic brain injury, 12.3 percent developed dysphagia. This information is useful to therapists for anticipating the number of children with traumatic brain injury who may need swallowing assessment and treatment. Prevalence refers to the number of individuals in a population who have a specific condition at a given point in time, regardless of onset. Prevalence is a measure of how widespread a condition is, whereas incidence provides an estimation of the risk of developing the condition. Prevalence is calculated as the number of cases at a given time point, divided by the total population at risk. For example, Lingam et al (2009) used data from a larger risk ratio (RR) survey research systematic review multiple logistic regression INTRODUCTION uppose you have been working for several years in an orthopedic practice, primarily with clients who have experienced fractures, traumatic and repetitive use injuries, and joint replacements. You have decided to take a new job in an inpatient neurological rehabilitation unit and wish to prepare yourself by learning about this new population. Although it will be important to examine the evidence regarding interventions, first it would be helpful to learn more about the population itself. Descriptive studies explain health conditions and provide information about the incidence and prevalence of certain conditions within a diagnostic group, such as the prevalence of left-sided neglect in stroke and the incidence of skin breakdown in hospitalized patients with spinal cord injury. Descriptive studies can also provide practitioners with information about comorbidities that are common with a particular condition. Equally important are predictive studies, which provide information about factors that are related to a particular outcome. For example, a predictive study might identify what activities of daily living are most important for a successful discharge to home and which motor abilities are most likely to result in independence in mobility. This chapter describes specific nonexperimental designs and statistical analyses used in descriptive and predictive studies. In addition, it identifies characteristics that indicate a strong design and provides a levels-of-evidence hierarchy for predictive studies. 4366_Ch08_145-162.indd 146 Studies that are undertaken to understand conditions and populations are descriptive in nature. They are intended to observe the naturally occurring characteristics of individuals. Because no variables are manipulated in these studies, they are considered nonexperimental. response bias multiple linear regression S DESCRIPTIVE RESEARCH FOR UNDERSTANDING CONDITIONS AND POPULATIONS 28/10/16 12:11 pm CHAPTER 8 ● Descriptive and Predictive Research Designs Understanding Statistics 8-1 Incidence is measured using frequency statistics, typically expressed in terms of a percentage or proportion. It is calculated using the following formula: Number of new cases during a time period Incidence = Total population at risk For example, from the study of dysphagia (Huang et al, 2014), the denominator in the equation was based on the 6,290 children who constituted the total study sample. The study found that 775 had severe dysphagia, which became the numerator in the equation. The incidence of 12.3% was determined with this simple calculation: Incidence of dysphagia = 775/6,290 Understanding Statistics 8-2 Like incidence, prevalence uses frequency statistics such as percentages and proportions. The formula is expressed as: Number of cases at a given time point Prevalence = Total population at risk Using the Lingam et al (2009) study as an example, the researchers had a total sample of 6,990 children and found that 119 met the criteria for developmental coordination disorder. Prevalence = 119/6,990, or 1.7% (17 out of 1,000 children). project called the Avon Longitudinal Study of Parents and Children to identify children with developmental coordination disorder. The study collected data on manual dexterity, ball skills, and balance, and found that 1.7 percent, or 17 out of 1,000 children, met the criteria for developmental coordination disorder. Prevalence is often expressed as a whole number of individuals with the condition, such as 17 out of 1,000 children. Incidence and prevalence studies help practitioners know how widespread a particular condition is and how likely someone is to develop it. This information can then be used in intervention planning and implementation. For example, knowing that the incidence of developing pressure sores is high in individuals with spinal cord injuries, 4366_Ch08_145-162.indd 147 147 practitioners can provide education and seating to help prevent the occurrence of this problem. Subgroups within a population, such as gender and age, are often analyzed in prevalence and incidence studies. For example, the CDC (2012) reported that in 2010 the prevalence for stroke was 2.6 percent; however, the rate varied by race and was highest among American Indians at 5.9 percent and lowest among Asians at 1.5 percent. Prevalence is influenced by both incidence and the duration of the disease; prevalence is much higher than incidence with chronic conditions because of the extended duration of the disease. Consider the difference between incidence and prevalence rates when comparing hip fractures with spinal cord injury. If you have a hip fracture in 2015, it will heal, and you will no longer have a hip fracture in 2017. However, if you have a spinal cord injury in 2015, in most cases you will still have a spinal cord injury in 2017. Incidence and prevalence will be more similar for hip fractures, because of the acute nature of the condition, and less similar for spinal cord injury. With incidence and prevalence studies it is important to recognize that, as with all nonexperimental studies, it is not possible to make conclusions regarding causation. For example, even though one study found that the highest prevalence for obesity in terms of occupational categories included health care (49.2 percent) and transportation workers (46.6 percent), it would not be logical or correct to assume that these jobs cause obesity (Gu et al, 2014). The large samples needed in epidemiological studies often result in the use of efficient data collection methods such as surveys. The methods by which data are collected should be carefully considered in incidence and prevalence studies; for example, physical and/or objective measures are more likely to provide reliable data than self-reported data that come from a survey. The table in From the Evidence 8-1 comes from a study that compared different reporting methods to collect data on the prevalence of lymphedema following breast cancer (Bulley et al, 2013). Perometry, an objective measure of swelling, was compared with two self-reported measures. The study found different prevalence rates depending on the data collection method used. Group Comparison Studies In health-care research, descriptive studies often compare a disability group to a group of individuals without a disability. These cross-sectional designs that compare two or more groups at one point in time, sometimes referred to as ex post facto comparisons, can answer important questions about the differences between groups. These designs are not experimental because the independent variable of group assignment is not manipulated; instead, the difference in groups is based on pre-existing conditions. Therapists are particularly interested in descriptive studies that characterize the functional aspects of a 28/10/16 12:11 pm 148 CHAPTER 8 ● Descriptive and Predictive Research Designs FROM THE EVIDENCE 8-1 Prevalence Rates by Reporting Method Bulley, C., Gaal, S., Coutts, F., Blyth, C., Jack, W., Chetty, U., Barber, M., & Tan, C. W. (2013). Comparison of breast cancer-related lymphedema (upper limb swelling) prevalence estimated using objective and subjective criteria and relationship with quality of life. Biomedical Research International, 807569. doi:10.1155/2013/807569. Table 2 Prevalence of Lymphedema According to Each Measurement Tool. Method of Measurement Prevalence: all available data for each tool % frequency Perometry 26.2 LBCQa 23.9 MSTb 20.5 bMST: aLBCQ: Note A: Difference in prevalence rates based on measurement method. Morbidity screening tool. Lymphedema and breast cancer questionnaire. FTE 8-1 Question How might the different findings from the Bulley et al study influence your clinical decisionmaking about assessment in lymphedema? disability. Studies that compare individuals with and without a disability on measures of, for example, mobility, instrumental activities of daily living, and language provide information on what to expect with a given diagnosis. The table in From the Evidence 8-2 compares three age groups of healthy controls and individuals with stroke using a one-way ANOVA. Developmental research with a longitudinal design may also use a group comparison approach. As an example, Understanding Statistics 8-3 When means are compared between groups, a t-test for independent samples is used for two groups, whereas a one-way ANOVA is used to compare three or more groups. In the results section of a study, the t-test may be described in terms of the t-value, degrees of freedom, and p value (statistical significance). The ANOVA provides similar results, but instead of a t-value, an F-value is used. A strong study results section will also provide the mean scores and standard deviations for each group. A Chi-Square analysis is used to compare frequencies between groups; in this case, the frequency (percentage) 4366_Ch08_145-162.indd 148 with which something happens is compared between groups. The results are described using a Chi-Square value (F), degrees of freedom (df), and p value. For example, McKinnon (1991) looked at assistance needed for activities of daily living in elderly Canadians. Individuals were categorized as needing or not needing assistance. In an age and gender comparison, elderly women (61.5%) were much more likely than elderly men (41.8%) to require assistance with yardwork (F = 550.7, df = 5, p < 0.00001), whereas elderly men (81.7%) were more likely than elderly women (40.4%) to require assistance with housework (F = 739.3, df = 3, p < 0.00001). 28/10/16 12:11 pm CHAPTER 8 ● Descriptive and Predictive Research Designs 149 FROM THE EVIDENCE 8-2 Comparison of Groups Using One-Way ANOVA Rand, D., Katz, N., & Weiss, P. L. (2007). Evaluation of virtual shopping in the VMall: Comparison of post-stroke participants to healthy control groups. Disability Rehabilitation, 29, 1719–1720. Table III. Time to complete the four-item shopping task comparison between groups. Group Children Young adults Older adults Stroke patients Time to shop for four items (min) Mean ± SD (range) 4.6 ± 1.9 (2.7–10.6) 5.1 ± 2.4 (2.3–15.3) 5.9 ± 2.7 (3.24–15.3) 11.2 ± 3.3 (5.6–16.20) One-way ANOVA – F(3,97) = 23.28, P < 0.000. Note B: The numbers in parentheses are the degrees of freedom, which is based on the number of groups – 1 and the number of participants. FTE 8-2 Question What does the table tell you about individuals with stroke and their ability to shop? therapists working with children need to know how diseases and disabilities can affect patterns of growth and development. Studies that compare typical children with a disability group over time can answer these descriptive questions. In a 2013 study, Martin et al compared different areas of language production in boys with fragile X, boys with Down syndrome, and typically developing boys. Data collected over three years suggested that the disability groups had different types of language issues. Boys with Down syndrome had more problems related to syntax, whereas boys with fragile X experienced more challenges related to pragmatic language. In ex post facto group comparisons, the lack of random assignment and manipulation of the independent variable present potential threats to validity. Differences between the groups may exist outside of the dependent variable of interest and confound the comparison. For example, when 4366_Ch08_145-162.indd 149 Note A: Individuals with stroke took much longer to shop than healthy controls. comparing individuals with and without schizophrenia on a fine motor measure, factors other than the health condition, such as medication side effects, could explain the differences. In group comparison studies with preassigned groups, it is important to match the groups on as many potentially confounding variables as possible. However, in some cases that may be difficult to do, such as in the preceding example regarding medication issues. Nevertheless, group comparison studies of existing groups help therapists better understand the characteristics of disability groups. Survey Research Survey research is a common approach to gathering descriptive information about health conditions. In survey research a questionnaire is administered via mail, electronic media, telephone, or face-to-face contact. 28/10/16 12:11 pm 150 CHAPTER 8 ● Descriptive and Predictive Research Designs A major advantage of survey research is the ease with which large amounts of data can be collected, particularly when surveys are administered electronically. Surveys may be used to gather incidence and prevalence data, but they can be extended beyond these areas of research to describe virtually any phenomenon that can be investigated using a questionnaire. Another advantage of survey research is the opportunity to use random sampling methods, as it is possible to reach individuals in different geographic locations. A major consideration in survey research is response rate, or the percentage of individuals who return a survey based on the total numbers of surveys administered. If large numbers of individuals choose not to respond to a survey, it is likely that a response bias exists. Response bias is a general term for a measurement error that creates inaccuracy in the survey results. In the case of few responders to a survey, the responders may be different from nonresponders. Perhaps the nonresponders do not want to share sensitive information, or maybe they are less motivated to participate. For example, a survey to identify binge drinking in college students may have a poor response rate among the binge drinkers, which would make the findings unreliable. Surveys related to alcohol use have notoriously high nonresponse rates (Meiklejohn, Connor, & Kypri, 2012). Survey researchers typically make multiple contacts to try to reach nonresponders, but even these efforts can be unsuccessful. Although there is no universally accepted response rate, some journals now require a certain level of response before they will publish a study. For example, the American Journal of Pharmacy Education requires a response rate of 60 percent for most research, and 80 percent representation of all colleges for educational research (Finchman, 2008). However, when response rates are low, researchers may be able to provide data to demonstrate that the sample is similar to the population by comparing important demographic characteristics. Another factor affecting the reliability of survey research involves self-reporting issues. People generally have a desire to present themselves in a favorable light, and thus may underreport certain conditions. For example, in a study of self-reported body mass index, researchers found that many individuals underreported or refused to report their body mass index (Chau et al, 2013). Some methods to reduce nonresponse and underreporting are face-to-face interviews, training of the interviewers in establishing rapport, well-designed questionnaires, and the use of objective measures to verify self-reports. Most surveys used in research are cross-sectional in nature, meaning they gather data at a single point in time. Other surveys are longitudinal and collect data over periods of years or decades. The NHANES (CDC, n.d.) is a large health-care database that uses multiple strategies to increase the reliability and validity of the data. Thousands of individuals are included in the study, and sampling is 4366_Ch08_145-162.indd 150 carefully done to represent the U.S. population. Face-toface interviews are conducted in participants’ homes by trained interviewers, and the survey data are supported by physical examinations conducted in mobile stations. More recently, data collection has included physical activity monitoring with an accelerometer. A portion of the participants are followed for years. The NHANES data have contributed to many important research findings. One example is the first significant measurement of physical activity data in the United States, which found that actual physical activity was much less than self-reported physical activity, and less than 5 percent of Americans get the recommended 30 minutes of physical activity per day. Of course, the NHANES project requires large resources in time, money, and people to gather the data. STUDY DESIGNS TO PREDICT AN OUTCOME Several different research designs can be used to answer questions about predicting an outcome. The two major categories of such studies are (1) studies that use correlational methods, and (2) studies using group comparison methods. In all cases, the purpose is to identify factors that are most predictive of an outcome. For example, if therapists work with young adults with autism who are trying to find a job, they would be interested in factors that predict employment. They may wonder if social skills or cognitive skills are more important in obtaining and maintaining employment. Both correlational studies and group comparison studies can answer such questions. Predictive Studies Using Correlational Methods One way of predicting outcomes is through the use of correlational studies that examine the relationships between variables. These designs include predictor and outcome variables within a target sample and offer one approach for making a prognosis or predicting an outcome. Group comparison studies are another approach, as discussed in the following text. Simple Prediction Between Two Variables In a simple correlation, the association between two variables is determined. Correlational studies are crosssectional, with data collected at a single point in time. At least two measures are administered and related; however, multiple measures can be administered, with the results presented in a correlation matrix. For example, a study examining factors associated with quality of life in clients with multiple sclerosis could correlate quality of life with several other variables, such as fatigue, depression, and strength. Each relationship between two variables results in a correlation; 28/10/16 12:11 pm CHAPTER 8 ● Descriptive and Predictive Research Designs that is, correlations for quality of life and fatigue, quality of life and depression, and quality of life and strength. Often researchers examine the relationships among predictors as well. Using the same example, the study could result in correlations for fatigue and depression, fatigue and strength, and depression and strength. Figure 8-1 illustrates the relationship between fatigue and quality of life. As mentioned previously, correlational studies may administer multiple measures and explore the relationships among all of these measures. In that case, the results should be presented in a correlation matrix. A hypothetical correlation matrix based on the example of predictors of quality of life in multiple sclerosis is illustrated in Table 8-1. Multiple Predictors for a Single Outcome Complex correlational designs look at multiple predictors for a single outcome. The same design is used to collect the data as with a simple correlation, but the analysis is conducted differently. When sufficient cases and a robust design are in place, a regression analysis is used. Larger sample sizes (often in the hundreds) are needed with multiple predictors so that the results are stable. With small samples, the predictors found in one study are not likely to be the same in another study. Using the previous example, multiple measures (i.e., quality of life, fatigue, depression, and strength) are administered to the target population. Rather than looking at two variables at a time, multiple variables are entered into the regression equation. These studies have the advantage of examining the total amount of variance accounted for by multiple predictors and the relative importance of each individual variable as a predictor. Take quality of life as the outcome, for example. One would expect that many factors contribute to quality of life, and that many factors taken together (e.g., fatigue, depression, and strength) would be a better predictor than 151 Understanding Statistics 8-4 As discussed in Chapter 4, correlation statistics describe the relationship between two variables. The Pearson Product Moment Correlation is the most common correlation statistic used to examine the strength of the relationship between two continuous variables (e.g., speed and distance, or age and grip strength). If one or both of the variables is measured on an ordinal or nominal scale, and is rank-ordered, such as manual muscle testing grades, a Spearman Correlation should be used. In the results section of a research article, the correlation is presented as an r value. R values can be considered effect sizes. Cohen (1992) provided a rule of thumb for interpreting the strength of r values within the applied sciences: 0.10 = weak effect/ relationship, 0.30 = moderate effect/relationship, and 0.50 = large effect/relationship. Correlation statistics provide three types of information: strength of the relationship, direction of the relationship, and statistical significance. From the preceding example, consider the relationship of fatigue and quality of life. If the correlation of these two variables is r = -0.30, the correlation is moderately strong, the direction is negative (i.e., higher fatigue means lower quality of life), and p < 0.05, indicating that the relationship is statistically significant. Another way to evaluate the strength of the relationship is to consider the amount of variance accounted for by the relationship. Variance is calculated by squaring the correlation coefficient. Using the preceding example, the square of the correlation results in a variance of r2 = 0.09; 9% of the variance is accounted for, and 91% is left unaccounted for; in other words, 91 percent must be attributed to a factor other than fatigue. The overlap represents the amount of variance accounted for by the relationship between fatigue and quality of life TABLE 81 Correlation Matrix Fatigue Depression Strength Fatigue Quality of life Quality of life -0.40* -0.60* 0.12 Fatigue 0.30* -0.25* Depression 0.02 Strength FIGURE 8-1 Variance accounted for in the relationship between fatigue and quality of life in multiple sclerosis. 4366_Ch08_145-162.indd 151 * Statistically significant relationships, p < 0.05. 28/10/16 12:11 pm 152 CHAPTER 8 ● Descriptive and Predictive Research Designs just one factor (e.g., strength). Perhaps the greatest challenge in regression involves the selection of predictors. It is critical to include important predictors. Another important consideration in selecting predictors is the phenomenon of multicollinearity, a term that refers to the circumstance in which variables (or, in the case of regression, predictors) are correlated with one another. In regression, it is preferable that multicollinearity be kept to a minimum and that each predictor contribute as much unique variance as possible. An overlap in predictors suggests that they are measuring the same or similar thing. In the example, only a moderate amount of multicollinearity appears, as fatigue is likely associated with strength, because individuals who lack strength will fatigue more quickly. Similarly, depression and fatigue often co-occur. To determine the unique variance of a predictor, it is entered last into the regression equation. Returning to the example of predicting quality of life, if fatigue were entered last into the equation and depression and strength were entered previously, one could determine the unique variance of fatigue. In other words, the unique variance of fatigue is the amount of variance by which fatigue predicts quality of life when all of the other predictors are taken into account. Due to multicollinearity, the unique variance will always be less than the variance accounted for in a simple correlation. In the example, the variance between fatigue and quality of life was 0.16. In the multiple regression, the unique variance of fatigue as a predictor would be r2 < 0.16, because fatigue overlaps with depression and strength when predicting quality of life. Figure 8-2 illustrates this more complex relationship. There are two primary types of regression analyses that examine multiple predictors: multiple linear regression and multiple logistic regression. Unique variance of depression and quality of life Unique variance of fatigue and quality of life Quality of life Strength Depression Fatigue Shared variance of fatigue, depression, and quality of life FIGURE 8-2 Multiple predictors of quality of life. 4366_Ch08_145-162.indd 152 Understanding Statistics 8-5 Multiple linear regression is an extension of bivariate correlation (i.e., the relationship between two variables). When the results are presented for multiple linear regression, often the bivariate correlations are presented as well. In the case of multiple linear regression, consider a set of predictors and an outcome or criterion. The analysis reveals the degree to which the set of predictors accounts for the outcome, as well as the relative importance of the individual predictors. The results are provided as a multiple correlation, or R, and a squared multiple correlation, R2 or variance. The R2 change is the difference in variance accounted for by a second set of predictors or a single predictor that takes into account the previous predictors. R2 change is the unique variance of the last predictor entered into the equation. Multiple Linear Regression In multiple linear regression, the outcome is a continuous variable. Quality of life as measured by a questionnaire is an example of a continuous variable, which would be analyzed with linear regression. The multiple refers to multiple predictors. The multiple linear regression reveals the total amount of variance accounted for by all of the predictors, as well as which predictors are most important. From the Evidence 8-3 provides an example of a multiple linear regression. This study examined predictors of speech recognition in older adults and included both a correlation matrix and results from a linear regression analysis (Krull, Humes, & Kidd, 2013). Multiple Logistic Regression When the outcome and predictors are categorical, a multiple logistic regression analysis is used, and results are reported in terms of an odds ratio. The odds ratio is a probability statistic that determines the likelihood that, if one condition occurs, a specific outcome will also occur. An odds ratio of 1.0 means there is no difference between the groups. When the odds ratio is greater than 1.0, there is a greater chance of experiencing the outcome; when the odds ratio is less than 1.0, there is a lower chance of experiencing the outcome. Table 8-2 shows an odds ratio based on a 2 ⫻ 2 table in which OR = AD/BC. This example is drawn from a study of elderly individuals that considered falls as the outcome and alcohol consumption as the predictor. To calculate the odds ratio: OR = 50 ⫻ 40 / 10 ⫻ 20 = 2,000/200 = 10.00 28/10/16 12:11 pm CHAPTER 8 ● Descriptive and Predictive Research Designs 153 FROM THE EVIDENCE 8-3 Predictors of Speech Recognition Krull, V., Humes, L. E., & Kidd, G. R. (2013). Reconstructing wholes from parts: Effects of modality, age and hearing loss on word recognition. Ear and Hearing, 34(2), e14– e23. doi:10.1097/AUD.0b013e31826d0c27. Relationships (Pearson correlation coefficients) between age, high-frequency pure tone average (HFPTA), auditory (a_SSN: Speech-shaped noise; a_INT: Interrupted speech; a_FILT: Filtered speech), visual (v_SSN: Text in Gaussian noise; v_INT: Text masked by bar pattern), and cognitive measures (MU: Memory updating; SS: Sentence span; SSTM: Spatial short-term memory) in elderly subjects (pooled data). Statistically significant relationships are indicated by asterisks. HFPTA a_SSN Age 0.20 HFPTA a_INT a_FILT v_SSN v_INT MU SS SSTM –0.39* –0.38* –0.32* 0.11 –0.11 –0.21 0.04 –0.15 –0.08 0.11 0.20 0.16 –0.24 –0.11 0.11 0.05 0.68** 0.82** –0.02 0.08 0.13 0.24 0.15 0.81** 0.10 0.11 0.22 0.27 0.19 0.21 0.17 0.31 0.12 0.43** –0.01 0.12 0.02 0.14 0.25 0.18 0.56** 0.40* a_SSN a_INT 0.19 a_FILT v_SSN v_INT MU Note A: Age is negatively correlated with the speech recognition measures, and the speech measures are intercorrelated; however, the cognitive measures are not related to the speech measures. 0.33* SS **P < 0.01 (2-tailed); *P < 0.05 (2-tailed). A standard forward stepwise linear regression analysis (SPSS) was used to analyze each of the three dependent auditory measures (a_SSN: Speech-shaped noise; a_INT: Interrupted speech; a_FILT: Filtered speech) in elderly adults (pooled data). Age, high-frequency pure tone average (HFPTA), visual (v_INT: Text masked by bar pattern; v_SSN: Text in Gaussian noise) and cognitive measures (SS: Sentence span; SSTM: Spatial short-term memory; MU: Memory updating) were included as independent variables in each analysis. The independent variables entered the model according to their statistical contribution (F-to-enter criterion) in explaining the variance in the dependent variable (speech recognition); only significant independent variables are included in the table. Dependent Variable Independent Variable(s)  R Square P a_SSN Age –0.411 0.169 0.009** a_INT Age –0.388 0.151 0.015* a_FILT Age –0.333 0.111 0.038* **P < 0.01 (2-tailed); *P < 0.05 (2-tailed). Note B: The dependent variables would be considered the outcome or criterion, and the independent variables would be considered the predictors. Note C: Only age is a significant predictor of speech recognition. FTE 8-3 Question How would you interpret the findings from the correlation matrix and linear regression? In other words, what predicts speech recognition? 4366_Ch08_145-162.indd 153 28/10/16 12:11 pm 154 CHAPTER 8 ● Descriptive and Predictive Research Designs This example indicates that elderly individuals who consume more than one drink per day have a tenfold chance of falling. Ninety-five percent confidence intervals are typically presented along with the odds ratio. The confidence interval is the range in which you would expect the odds ratio to fall if the study were conducted again. If the 95 percent confidence interval includes the value of 1.0, the odds ratio will not be statistically significant. From the Evidence 8-4 provides an example of predicting outcomes for back pain. TABLE 82 Sample Odds Ratio Number of Drinks Per Day No Falls Falls < 1 drink A 50 B 10 > 1 drink C 20 D 40 FROM THE EVIDENCE 8-4 Predictors of Improvement in Back Pain Hicks, G. E., Benvenuti, F., Fiaschi, V., Lombardi, B., Segenni, L., Stuart, M., . . . Macchi, C. (2012). Adherence to a community-based exercise program is a strong predictor of improved back pain status in older adults: An observational study. Clinical Journal of Pain, 28(3), 195–203. http://doi.org/10.1097/AJP.0b013e318226c411. Multivariate logistic regression analysis of factors associated with improved back pain status Odds Ratio 95% Cl p-value Lives alone 0.60 0.31–1.18 .140 Currently working 2.14 0.81–5.63 .125 High Depressive Symptoms (GDS>5) 0.47 0.25–0.89 .019 SPPB score >8 1.71 0.88–3.34 .114 Poor Self-Related Health 0.20 0.08–0.51 .001 Numeric Pain Rating Scale 0.94 0.85–1.05 .273 Roland Morris Scale 1.01 0.95–1.07 .786 13.88 8.17–23.59 < .001 Positive rating of trainer 1.13 0.63–2.02 .675 Satisified with hours of operation 1.36 0.67–2.75 .391 Demographic/Social factors Health Status Factors Back Pain Factors Note A: Only depression, self-rated health, and adherence to a back pain program were predictive of improvements in back pain. Accessibility/Attendance Factors* Adherent to APA program APA Satisfaction Factors* GDS = Geriatric Depression Scale; SPPB = Short Physical Performance Battery *These factors are from the follow-up telephone interview, but all other factors are from the baseline assessment. FTE 8-4 Question 4366_Ch08_145-162.indd 154 Which is the strongest predictor in determining outcomes for back pain? 28/10/16 12:11 pm CHAPTER 8 ● Descriptive and Predictive Research Designs Predictive Studies Using Group Comparison Methods Studies using correlational methods typically involve a single group of participants. However, predictive studies may also use group comparison designs. In health-care research, case-control designs and cohort studies are commonly used for answering questions about prognosis and predicting outcomes. Case-Control Studies A case-control design is an observational, retrospective, cross-sectional study that can be used to answer prognostic research questions concerning which risk factors predict a condition. This design is commonly used in epidemiological research that is conducted after a condition has developed. In this type of research design, individuals who already have a condition constitute one group; they are matched and compared with individuals without the condition. There is no random assignment. The landmark example of a case-control design is research that compared individuals with lung cancer with individuals without lung cancer and identified smoking as a predictor (Wynder & Graham, 1950; Doll & Hill, 1950). Because randomization to group was not possible, cigarette companies used the argument that the research design could not confirm causation. However, the use of very large samples, replication of the findings, and cohort studies (described later) eventually led to confidence in the link. In a rehabilitation example, Brennan and colleagues (2014) examined rheumatoid arthritis (RA) as a risk factor for fracture in women. In a large study of several thousand women aged 35 and older, 1.9 percent of individuals with RA had suffered a fracture, compared with 1.2 percent of women without RA. This slight increase in risk could warrant interventions to increase bone strength and/or reduce falls in women with RA. The case-control research design uses the same group comparison described previously for descriptive research. The difference lies in the research question. Case-control designs answer prognostic and predictive questions, and may use analyses that are more suitable for answering predictive questions (which differs from simply describing a condition). In the preceding RA example, the groups were compared in terms of differences in frequencies. However, case-control designs are often reported in terms of hazard ratios, or the likelihood of a particular event occurring in one group compared with the likelihood of that event occurring in another group. Cohort Studies A cohort study is also observational, but differs from a case-control design in that participants are followed over time, making this design longitudinal. In cohort studies, a hypothesized risk factor is identified, and then the study 4366_Ch08_145-162.indd 155 155 Understanding Statistics 8-6 Hazard ratios (HRs) are estimates of risk over time and are conceptually similar to odds ratios. A hazards ratio greater than 1.0 indicates a greater risk, whereas a hazards ratio less than 1.0 indicates a lesser risk. Hazard ratios (as opposed to risk ratios, which are described later) are used in case-control designs in which the risk factor is analyzed retrospectively. For example, a large study that retrospectively examined the records of 66,583 individuals regarding the risk of ischemic stroke after total hip replacement (Lalmohamed et al, 2012) reported the following: HR = 4.69; 95% CI = 3.12-7.06, indicating there is a 4.7-fold increased risk for experiencing an ischemic stroke in individuals who undergo total hip replacement. follows individuals with and without the risk factor. At a certain time point in the future, the risk factor is analyzed to determine its impact on the outcome. For example, Xiao et al (2013) examined lack of sleep as a risk factor for developing obesity. For 7.5 years, the researchers followed a large sample of individuals who were not obese at the onset of the study. This example is considered a prospective cohort study because the research question was identified before the study began, and individuals were followed over time to determine who did and who did not develop the condition. The individuals who consistently got less than 5 hours of sleep were 40 percent more likely to become obese than individuals who got more sleep. These results suggest that sleep interventions may be useful in preventing obesity. As a prospective cohort study in which problems with sleep occur before obesity, there is some evidence to support a cause-and-effect relationship. However, caution should be exercised when drawing conclusions from prospective data, because cohort studies do not use experimental designs and are therefore weaker in terms of supporting causative conclusions. Another type of cohort design is the retrospective cohort study in which existing records or the client’s report on past behavior is used to determine if changes occurred over time. One cohort study retrospectively examined an existing database of individuals with moderate to severe traumatic brain injury and found that educational attainment was predictive of “disability-free recovery” (Schneider et al, 2014). More specifically, more of the individuals with disability-free recovery had greater than 12 years of education prior to the injury. In cohort studies (both retrospective and prospective), risk ratios (RRs), also referred to as relative risks, are used to express the degree of risk for an outcome over time. 28/10/16 12:11 pm 156 CHAPTER 8 ● Descriptive and Predictive Research Designs EVIDENCE IN THE REAL WORLD Predicting Outcomes Is Not an Exact Science Practitioners must be cautious when interpreting predictive study results for their clients. Quantitative studies consolidate the findings of many individuals and, in the process, can lose the distinctiveness of the individual’s situation. When predicting an outcome, the evidence provides important information about the relationship between variables, but a study can never include all of the factors that contribute to a particular individual’s outcome. Factors such as financial resources, the client’s level of engagement in therapy, degree of social support, other psychiatric or medical conditions, and the client’s physical condition may not be factored into a particular study. Therefore, practitioners should avoid making absolute predictions for clients. A case in point involves an individual who was involved in a serious cycling accident that resulted in a fractured scapula, clavicle, and two ribs. She is an avid road cyclist who enjoys participating in hill-climbing races. She asked her physician if she would be able to participate in an upcoming hill-climb event scheduled for 9 weeks after her accident. Based on clinical experience and research evidence, the physician said it was unrealistic for her to expect to even be back on the bike in 9 weeks. Not only did she end up participating in the event, but she improved her time by 6 minutes over the previous year! It is likely that you know similar stories of individuals who have defied the odds. In other circumstances, individuals with relatively minor conditions may have outcomes that are more severe than would be predicted based on the research. Predictive studies can provide direction, but remember that each client is an individual with a unique life situation. Understanding Statistics 8-7 The calculation of risk ratios differs from that of odds ratios (ORs) and hazard ratios (HRs) and is expressed as: RR = [A/A + B]/[C/C + D] Risk ratios are generally lower than odds ratios, particularly when the events that are measured occur frequently. When the occurrence of an event is rare or low, the RR and OR will be similar. For example, returning to the previous research example involving falls and alcohol consumption, the odds ratio was very large, at 10.0. However, if a risk ratio were calculated from the same data (assuming the study used a longitudinal design), the RR would be much smaller, as shown in Table 8-3. To explain the differences in interpretation of odds ratios, hazard ratios, and risk ratios, it is important to remember the designs in which they are used. In cross-sectional research such as case-control designs, the odds ratio or hazard ratio describes the odds of having one condition if the person has another. It does not take time into account and is similar to a correlation coefficient. In cohort studies that use a longitudinal design, risk ratios interpret the risk of developing one condition if exposed to another, and it takes time into account. 4366_Ch08_145-162.indd 156 RR = [50/50 + 10]/[20/20 + 40] = 50/60 + 20/20 = 0.83/0.33 = 2.5 TABLE 83 Sample Risk Ratio Number of Drinks Per Day No Falls Falls < 1 drink A 50 B 10 > 1 drink C 20 D 40 Table 8-4 summarizes the differences among odds ratios, hazard ratios, and risk ratios. Because one event occurs before the other, risk ratio analysis provides stronger support for causation; that is, for conclusions that the earlier event caused the later event. For example, Strand, Lechuga, Zachariah, and Beaulieu (2013) studied the risk of concussion for young female soccer players and found RR = 2.09. As a risk ratio, this statistic suggests that playing soccer is not just associated with a concussion, but plays a role in causing the concussion. 28/10/16 12:11 pm CHAPTER 8 ● Descriptive and Predictive Research Designs EXERCISE 8-1 EXERCISE 8-2 Identifying Study Designs That Answer Clinical Descriptive and Predictive Questions (LO1) Matching the Research Design With the Typical Statistical Analysis (LO2) 157 QUESTIONS QUESTIONS Consider the following list of research questions from a hypothetical clinician working in a Veterans Administration hospital with individuals who have traumatic brain injury. For each question, identify the type of study in which the therapist should look for the answer. 1. What cognitive impairments are most closely associated with unemployment after traumatic brain injury? 2. How many individuals with traumatic brain injury have sleep disturbances? 3. Is depression a risk factor for homelessness in individuals with traumatic brain injury? Match the four research designs with the corresponding typical statistical analysis. 1. Prevalence study A. odds ratio 2. Predictive study with a continuous outcome B. risk ratio 3. Case-control study C. multiple linear regression D. frequencies/percentages 4. Prospective cohort study EVALUATING DESCRIPTIVE AND PREDICTIVE STUDIES The quality of descriptive and predictive studies cannot be analyzed using the levels-of-evidence hierarchy applied to efficacy studies, because the lack of random assignment to groups and absence of manipulation of the independent variable create the opportunity for alternative explanations. For example, in the cohort design study described earlier TABLE 84 Differences Among Odds Ratios, Hazard Ratios, and Risk Ratios Statistical Test Research Design Interpretation Example Odds ratio Case-control study Degree to which the presence of one condition is associated with another condition The degree to which having diabetes is associated with carpal tunnel syndrome Hazard ratio Retrospective cohort study Risk of developing a condition in one group over another group (at one point in time) Risk of readmission for individuals with stroke who received high-intensity therapy versus those who received low-intensity therapy Risk ratio Prospective cohort study Risk of developing a condition in one group over another group (cumulatively over time) Risk of having a concussion for individuals playing soccer and versus those who do not play soccer over a season 4366_Ch08_145-162.indd 157 28/10/16 12:11 pm 158 CHAPTER 8 ● Descriptive and Predictive Research Designs concerning sleep and obesity, alternative explanations for the results are possible. Perhaps it is not lack of sleep per se that causes increased risk for obesity, but the fact that people who sleep less may also exercise less or have more time to eat. An important consideration in evaluating predictive studies is controlling for alternative explanations. One way to put controls in place is to gather data on other predictors and include those alternatives in a regression analysis or odds ratio. These additional variables can be compared as predictors or used as control variables and removed from the variance. Matching is another strategy to improve the strength of a nonrandomized design. When two or more groups are compared, as in a descriptive group comparison study, case-control, or cohort design, matching the two groups on important variables at the outset will reduce the possibility that differences other than the grouping variable are affecting the outcome. For example, when comparing runners with and without stress fractures, matching individuals in the two groups on characteristics such as running distance, intensity, frequency, and terrain would be useful. The challenge in statistically controlling (e.g., through linear regression, odds ratio) or matching is identifying the relevant controlling variables ahead of time. Researchers will never be able to account for every factor and can miss important variables; however, knowledge of the research literature and experience with the condition is helpful in selecting the variables to include. Generally speaking, prospective studies are stronger than retrospective analyses. When the researcher identifies the question ahead of time, there is less opportunity to capitalize on chance findings, and the researcher can put more controls in place. For example, with a prospective study the researcher can match groups when selecting participants; however, with archival data from existing records, the researcher must rely on the participants for whom existing records are available. Sample size is another important consideration in evaluating descriptive and predictive studies. It is particularly important to include large samples in epidemiological research, because the intention of description and prediction is to represent the naturally occurring phenomenon of a population. Epidemiological research that is intended to describe incidence and prevalence or to identify risk factors of a population is more trustworthy if it uses large samples of hundreds or thousands of participants. Findings from small samples are more difficult to replicate and less likely to be representative of the population. As with all research, sampling bias should be considered. The description of the sample should match the known characteristics of the population. For example, a sample of married older adults may present a bias and misrepresent the general population of older adults. The measurement methods used in descriptive and predictive studies should also be evaluated. As mentioned previously, response rate is a critical consideration 4366_Ch08_145-162.indd 158 in survey research. Studies with low response rates are more likely to be biased and less representative of the response that could be expected if a higher response rate occurred. In addition, survey methods and self-reported measures are frequently used in descriptive and predictive studies. When participants must rely on their memory or describe undesirable behaviors, problems with the reliability of the data should be expected. Research that includes more objective measurement methods produces more reliable results. For example, accelerometers present a more objective measure of physical activity than does self-report of physical activity. Nevertheless, many constructs of interest to rehabilitation practitioners, such as fatigue and anxiety, rely heavily on self-report. In these instances, it is most useful when established measures with good validity and reliability estimates are used. LEVELS OF EVIDENCE FOR PROGNOSTIC STUDIES The levels-of-evidence hierarchy for efficacy studies should not be applied to prognostic studies, which observe existing phenomenon. Variables are not manipulated; therefore, a randomized controlled trial is not appropriate for answering prognostic questions. For example, a researcher interested in predicting concussions among soccer players could not practically or ethically assign individuals to receive or not receive a concussion. Instead, in a predictive study, the relationship between certain potential predictors (e.g., history of prior concussion, position on the team, gender, number of games played) and concussions can be studied (correlational design); people with and without concussions can be compared (case-control design) to identify potential predictors; or individuals can be followed over time to identify differences (predictors) between those who do and do not experience a concussion (retrospective and prospective cohort design). The criteria discussed in this chapter—replication, prospective designs, and longitudinal approaches—are applied to the levels-of-evidence hierarchy for prognostic studies. Table 8-5 provides a hierarchy adapted from the Oxford Centre for Evidence-Based Medicine (2009). A systematic review that includes two or more prospective cohort studies provides the highest level of evidence for studies that focus on prediction. A systematic review combines the results from multiple studies, and replication of results strengthens the findings. The prospective cohort study provides Level II evidence. This design follows individuals over time and, with a prospective approach, is able to put desirable controls into place. The retrospective cohort also uses a longitudinal approach, which is preferable when attempting to identify risks or make a prognosis. However, looking back at existing data makes it more difficult to control potentially confounding variables. The case-control design, at Level IV, represents 28/10/16 12:12 pm CHAPTER 8 ● Descriptive and Predictive Research Designs TABLE 85 Levels of Evidence for Prognostic Studies 159 total daily physical activity predicts incident AD and cognitive decline. Methods Level of Evidence Research Design I Systematic reviews of prospective cohort studies II Individual prospective cohort study III Retrospective cohort study IV Case-control design V Expert opinion, case study a lower level of evidence because the cross-sectional nature of the study does not allow for the identification of temporal relationships; that is, the researcher does not know if the condition or risk factor occurred before the outcome. Finally, expert opinion and case studies are considered to be the lowest level of evidence. EXERCISE 8-3 Evaluating the Strength of the Evidence (LO3) QUESTION Total daily exercise and nonexercise physical activity was measured continuously for up to 10 days with actigraphy (Actical®; Philips Healthcare, Bend, OR) from 716 older individuals without dementia participating in the Rush Memory and Aging Project, a prospective, observational cohort study. All participants underwent structured annual clinical examination including a battery of 19 cognitive tests. Results During an average follow-up of about 4 years, 71 subjects developed clinical AD. In a Cox proportional hazards model adjusting for age, sex, and education, total daily physical activity was associated with incident AD (hazard ratio = 0.477; 95% confidence interval 0.273-0.832). The association remained after adjusting for self-report physical, social, and cognitive activities, as well as current level of motor function, depressive symptoms, chronic health conditions, and APOE allele status. In a linear mixed-effect model, the level of total daily physical activity was associated with the rate of global cognitive decline (estimate 0.033, SE 0.012, p = 0.007). Conclusions A higher level of total daily physical activity is associated with a reduced risk of AD. CRITICAL THINKING QUESTIONS Read the following research abstract and evaluate the study based on the criteria described in this chapter. What are your general conclusions about the strength of the evidence? 1. What are differences and similarities between descriptive and predictive studies? Buchman, A. S., Boyle, P. A., Yu, L., Shah, R. C., Wilson, R. S., & Bennett, D. A. (2012). Total daily physical activity and the risk of AD and cognitive decline in older adults. Neurology, 78(17), 1323-1329. [Epub 2012, April 18]. doi:10.1212/WNL.0b013e3182535d35 2. What is the primary distinction between incidence and prevalence? Objective Studies examining the link between objective measures of total daily physical activity and incident Alzheimer disease (AD) are lacking. We tested the hypothesis that an objective measure of 4366_Ch08_145-162.indd 159 28/10/16 12:12 pm 160 CHAPTER 8 ● Descriptive and Predictive Research Designs 3. Would the following factors increase or decrease prevalence rates for a particular condition? (a) an increase in the survival rates; (b) an increase in the cure rates; (c) an increase in mortality rates. 4. A comparison between two or more groups with and without a condition can answer both descriptive and predictive questions. What is the difference in how the two types of group comparison studies are analyzed? 8. What are the differences between correlational designs and group comparisons designs for answering questions about prognosis? 9. Cohort and case-control studies may also be used to study the efficacy of an intervention. What types of groups would be compared in these nonexperimental designs? 10. What are the levels of evidence for predictive studies? 5. What strategies may be useful for increasing the response rate of a survey? ANSWERS 6. How does a low response rate affect both the internal and external validity of a study? 7. What are the primary characteristics of strong descriptive and predictive studies? 4366_Ch08_145-162.indd 160 EXERCISE 8-1 1. Correlational study that uses logistic regression, because the outcome is categorical in nature: employed versus unemployed. Odds ratios can then reveal what cognitive impairments are most predictive of unemployment. 2. Prevalence, because the study concerns how many individuals have a specific condition, rather than the number of new cases, which would be incidence. 3. Case-control design, because the question is written in a way that suggests a comparison of individuals with TBI, some with depression and some without, and then examination to determine if there is a difference in homelessness rates among those two groups. If you were to follow people over time (e.g., those who are depressed and those who are not) and identify who becomes homeless, it would be a cohort study. 28/10/16 12:12 pm CHAPTER 8 ● Descriptive and Predictive Research Designs EXERCISE 8-2 1. 2. 3. 4. D. frequencies/percentages C. multiple linear regression A. odds ratio B. risk ratio EXERCISE 8-3 This cohort study illustrates many of the desirable qualities of a nonexperimental study: Many potential confounds or predictors were taken into account when assessing the impact of physical activity, such as involvement in activities, motor functioning, and depression. The study was prospective, had a relatively large sample of more than 700 participants, and utilized many objective measures. FROM THE EVIDENCE 8-1 Although the reported prevalence is not dramatically different, use of the objective measure of perometry seems preferable, if possible, because some individuals who could benefit from treatment may be missed with selfreport. If self-report is the only option, the data from this study suggest that the more complete Lymphedema and Breast Cancer Questionnaire provides estimates closer to perometry. FROM THE EVIDENCE 8-2 This study indicates that individuals with stroke take considerably longer to shop than individuals without stroke; however, little is known about why or what takes them longer to shop. That is, since stroke is a complex condition with different impairments, the cause of the difference is unknown. Cognitive concerns, motor problems, or another factor could be interfering with performance. Still, knowing that shopping is challenging is useful information for therapists. The therapist would have to use other sources to determine the specific limitations that make it challenging for an individual with stroke to do shopping. FROM THE EVIDENCE 8-3 Cognition was not related to speech recognition in this study of elderly adults, but age was negatively correlated with the speech outcomes. Not surprisingly, these results indicate that the older you get, the more problems you have with speech recognition. However, it is important to note that the R2 or variance accounted for by age remains relatively small, at 0.11-0.17, suggesting that factors/ predictors not included in this study account for most of the variance. 4366_Ch08_145-162.indd 161 161 FROM THE EVIDENCE 8-4 By far the strongest predictor in determining outcomes for back pain is adherence to the adaptive physical activity program, with an odds ratio of 13.88. Depression and self-reported health are also identified as significant predictors, with depression and poor health associated with less recovery. However, the magnitude of these predictors, at 0.47 and 0.20 respectively, is much less than adherence. None of the other predictors are statistically significant, as all have confidence intervals that include 1.0. REFERENCES Brennan, S. L., Toomey, L., Kotowicz, M. A., Henry, M. J., Griffiths, H., & Pasco, J. A. (2014). Rheumatoid arthritis and incident fracture in women: A case-control study. BMC Musculoskeletal Disorders, 15, 13. Buchman, A. S., Boyle, P. A., Yu, L., Shah, R. C., Wilson, R. S., & Bennett, D. A. (2012). Total daily physical activity and the risk of AD and cognitive decline in older adults. Neurology, 78(17), 1323–1329. [Epub 2012, April 18]. doi:10.1212/WNL.0b013e3182535d35 Bulley, C., Gaal, S., Coutts, F., Blyth, C., Jack, W., Chetty, U., Barber, M., & Tan, C. W. (2013). Comparison of breast cancerrelated lymphedema (upper limb swelling) prevalence estimated using objective and subjective criteria and relationship with quality of life. Biomedical Research International, 2013, 807569. [Epub 2013 June 18]. doi:10.1155/2013/807569 Centers for Disease Control and Prevention (CDC). (n.d.). National Health and Nutrition Examination surveys. Retrieved from http:// www.cdc.gov/nchs/nhanes.htm Centers for Disease Control and Prevention (CDC). (2012). Prevalence of stroke in the US 2006-2010. Morbidity and Mortality Weekly Report, 61(20), 379–382. Centers for Medicare & Medicaid Services (CMS). (n.d.). Research, statistics, data & systems [home page]. Retrieved from http://www .cms.gov/Research-Statistics-Data-and-Systems/Research-StatisticsData-and-Systems.html Chau, N., Chau, K., Mayet, A., Baumann, M., Legleve, S., & Falissard, B. (2013). Self-reporting and measurement of body mass index in adolescents: Refusals and validity, and the possible role of socioeconomic and health-related factors. BMC Public Health, 13, 815. Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. Doll, R., & Hill, A. B. (1950). Smoking and carcinoma of the lung. British Medical Journal, 2, 740–748. Finchman, J. E. (2008). Response rates and responsiveness for surveys, standards and the Journal. American Journal of Pharmacy Education, 72, 42–46. Gu, J. K., Charles, L. E., Bang, K. M., Ma, C. C., Andrew, M. E., Biolanti, J. M., & Burchfiel, C. M. (2014). Prevalence of obesity by occupation among US workers: The National Health Interview Survey 2004-2011. Journal of Occupational and Environmental Medicine, 56, 516–528. Hicks, G. E., Benvenuti, F., Fiaschi, V., Lombardi, B., Segenni, L., Stuart, M., . . . Macchi, C. (2012). Adherence to a community-based exercise program is a strong predictor of improved back pain status in older adults: An observational study. Clinical Journal of Pain, 28, 195–203. Huang, C. T., Lin, W. C., Ho, C. H., Tung, L. C., Chu, C. C., Chou, W., & Wang, C. H. (2014). Incidence of severe dysphagia after brain surgery in pediatric traumatic brain injury: A nationwide populationbased retrospective study. Journal of Head Trauma Rehabilitation, 28, 1–6. 28/10/16 12:12 pm 162 CHAPTER 8 ● Descriptive and Predictive Research Designs Krull, V., Humes, L. E., & Kidd, G. R. (2013). Reconstructing wholes from parts: Effects of modality, age and hearing loss on word recognition. Ear and Hearing, 32, e14–e23. Lalmohamed, A., Vestergaard, P., Cooper, C., deBoer, A., Leufkens, H. G., van Staa, T. P., & de Vries, F. (2012). Timing of stroke in patients undergoing total hip replacement and matched controls: A nationwide cohort study. Stroke, 43, 3225–3229. Lingam, R., Hunt, L., Golding, J., Jognmans, M., & Emond, A. (2009). Prevalence of developmental coordination disorder using the DSM-IV at 7 years of age: A UK population-based study. Pediatrics, 123, 693–700. McKinnon, J. (1991). Occupational performance of activities of daily living among elderly Canadians in the community. Canadian Journal of Occupational Therapy, 58, 60–66. Martin, G. E., Losh, M., Estigarribia, B., Sideris, J., & Roberts, J. (2013). Longitudinal profiles of expressive vocabulary, syntax and pragmatic language in boys with fragile X syndrome or Down syndrome. International Journal of Language and Communication Disorders, 48, 432–443. Meiklejohn, J., Connor, J., & Kypri, K. (2012). The effect of low survey response rates on estimates of alcohol consumption in a general population survey. PLoS One, 7, e35527. 4366_Ch08_145-162.indd 162 Oxford Centre for Evidence-Based Medicine. (2009). Levels of evidence. Retrieved from http://www.cebm.net/index.aspx?o=1025 Rand, D., Katz, N., & Weiss, P. L. (2007). Evaluation of virtual shopping in the VMall: Comparison of post-stroke participants to healthy control groups. Disability Rehabilitation, 29, 1719–1720. Schneider, E. B., Sur, S., Raymont, V., Duckworth, J., Kowalski, R. G., Efron, D. T., . . . Stevens, R. D. (2014). Functional recovery after moderate/severe traumatic brain injury: A role for cognitive reserve. Neurology, 82(18), 1636–1642. [Epub 2014 April 23]. doi:10.1212/ WNL.0000000000000379 Strand, S., Lechuga, D., Zachariah, T., & Beaulieu, K. (2013). Relative risk for concussion in young female soccer players. Applied Neuropsychology—Child, 4(1), 58–64. [Epub 2013 December 2]. doi: 10.1080/21622965.2013.802650 Wynder, E. L., & Graham, E. A. (1950). Tobacco smoking as a possible etiological factor in bronchogenic carcinoma: A study of six hundred and eight-four proved cases. JAMA, 143, 329–336. Xiao, Q., Arem, H., Moore, S. C., Hollenbeck, A. R., & Matthews, C. E. (2013). A large prospective investigation of sleep duration, weight change, and obesity in the NIH-AARP Diet and Health Study cohort. American Journal of Epidemiology, 178, 1600–1610. 28/10/16 12:12 pm “No question is so difficult to answer as that to which the answer is obvious.” —George Bernard Shaw (1856-1950)–An Irish playwright who explored social problems through his art, especially issues of social class. 9 Qualitative Designs and Methods Exploring the Lived Experience CHAPTER OUTLINE LEARNING OUTCOMES Grounded Theory KEY TERMS Ethnography INTRODUCTION Narrative THE PHILOSOPHY AND PROCESS OF QUALITATIVE RESEARCH Mixed-Methods Research PROPERTIES OF STRONG QUALITATIVE STUDIES Philosophy Credibility Research Questions Transferability Selection of Participants and Settings Dependability Methods of Data Collection Confirmability Data Analysis QUALITATIVE RESEARCH DESIGNS Phenomenology CRITICAL THINKING QUESTIONS ANSWERS REFERENCES LEARNING OUTCOMES 1. Describe the philosophy and primary characteristics of qualitative research. 2. Distinguish the types of qualitative designs and their purposes. 3. Determine the characteristics of trustworthiness in a given qualitative study. 163 4366_Ch09_163-182.indd 163 27/10/16 2:09 pm 164 CHAPTER 9 ● Qualitative Designs and Methods KEY TERMS artifacts mixed-methods research audit trail narrative research axial coding naturalistic inquiry bracketing naturalistic observation code-recode procedure open coding confirmability open-ended interview connecting data participant observation constant comparative method phenomenology constructivism credibility dependability embedding data ethnography field notes focus group grounded theory inductive reasoning informant life history member checking prolonged engagement purposive sampling qualitative research reflexive journal reflexivity saturation selective coding snowball sampling themes thick description transferability triangulation trustworthiness merging data INTRODUCTION A lthough well-designed, randomized controlled trials are highly valued in evidence-based practice, the research framework and scientific methods employed in these studies are not the only way of thinking about or conducting research. Quantitative research comes from a positivist worldview that is based on deductive reasoning. From this perspective, by reducing phenomena to their smallest parts and objectively measuring those parts, we can better understand the world. Nevertheless, there are limitations to this type of research design. In a randomized controlled trial, the average or mean of a group is used for the purposes of comparison. However, practitioners do not provide services to an average; rather, they work with individuals whose stories and circumstances may be similar to or very different from those of the individuals who participated in the study. Consider the “average” American family: In the 4366_Ch09_163-182.indd 164 2010 census, there were 2.58 members per household (U.S. Census Bureau, 2012), a characterization that does not accurately describe any actual household and could be very different from your own. In contrast, qualitative research acknowledges that there are multiple realities among individuals, and that the human experience is complex and diverse. Qualitative research uses a naturalistic tradition with an emphasis on understanding phenomena in the real world. This chapter provides information that evidence-based practitioners can use to better understand qualitative research and better understand their clients. The chapter describes the philosophy and processes of qualitative research and outlines common qualitative research designs. In qualitative research, the criteria for determining the strength of the evidence are based on the concept of trustworthiness. After completing this chapter, you will understand the concept of trustworthiness in this context and be able to evaluate qualitative studies. THE PHILOSOPHY AND PROCESS OF QUALITATIVE RESEARCH Like quantitative research, qualitative research studies are framed around a research question(s), and the resulting reports, articles, or papers include the same major sections of introduction, methods, results, and discussion. Narrative research articles can be an exception to this rule, as they usually follow a format that resembles a story. In qualitative research, the steps of writing and answering the research question, designing the study, and collecting and analyzing the data differ substantially from those steps in quantitative research. Learning the basic nature of and information about the process of qualitative research is also useful for reading, interpreting, and evaluating qualitative research. Philosophy The philosophical underpinnings of qualitative research come from naturalistic inquiry, which suggests that a phenomenon is only understood in context and that multiple perspectives can and do exist and differ among individuals. Imagine your own experience as a student. If someone interviewed one of your classmates about his or her motivations regarding a career choice and reflections on being a student, you would not expect the answers to mirror your own. Similarly, one student might think a particular test was difficult, whereas another student found it easy. Although the conclusions about the test are very different, they can both still be true. Qualitative research uses inductive reasoning, in which data are collected and, based on that data, an understanding is reached. Health-care professionals 27/10/16 2:09 pm CHAPTER 9 ● Qualitative Designs and Methods frequently use inductive reasoning. Therapists interview clients and administer assessments and, based on this information, begin to understand their clients and develop intervention plans. In another example, therapists use inductive reasoning when they evaluate the progress a client makes to determine if an intervention is working. Qualitative researchers use data to drive the research process. In contrast, quantitative research begins with a hypothesis to be tested (a form of deductive reasoning). Another important belief in qualitative research is related to constructivism, which is the philosophy that our understanding of the world is “constructed.” From this perspective, an objective reality either does not exist or cannot be known. Instead, reality is filtered by our experiences and our past knowledge, so it is always subjective and interpreted. The constructions that exist to explain our world are inseparable from the people who give those constructions meaning. Therefore, from a qualitative perspective, both study participants and researchers are influenced by their worldview and cannot be totally unbiased. Thus, the philosophy that forms the foundation of qualitative research also influences its processes and practices. Research Questions Qualitative research is often used to explore aspects of practice that are not well understood and for which theory has not been developed. For this reason, qualitative research questions are very different from quantitative questions. Qualitative research is aimed at discovering new information, so hypotheses are avoided; it is essential that there be no preconceived notions about the phenomenon to be researched. Instead of specific, quantifiable questions, qualitative research questions are broad and general. In addition, qualitative questions are open to revision and often change during the course of a study, exemplifying the discovery perspective that is central to qualitative research. Qualitative questions steer clear of terminology such as cause or relate, because these words suggest an expected outcome. Instead, they are more likely to use words such as discover, inquire, describe, and explore. Also, consistent with an exploratory perspective, qualitative questions often begin with “what” or “how,” rather than “why.” For example, in a study of driving perceptions of veterans with traumatic brain injury (TBI) and posttraumatic stress disorder (PTSD), Hannold et al (2013) asked the following qualitative questions: 1. How do Veterans describe their current driving habits, behaviors, and experiences? 2. What do Veterans identify as influences on their driving habits and behaviors? 3. How insightful are Veterans regarding their driving behavior? 4366_Ch09_163-182.indd 165 165 4. What, if any, driving strategies do Veterans report that are related to Battlemind driving, mild TBI, or PTSD issues (p. 1316)? Selection of Participants and Settings Each individual’s lived experience is unique and highly influenced by the real-world environments in which he or she lives. For this reason, qualitative research takes a naturalistic approach to the selection of participants and settings for study. The description of sample and setting selection is included in the methods section of a qualitative research article. Instead of random sampling or convenience sampling, qualitative research uses purposive sampling, in which the participants included in the study and the settings in which qualitative research takes place are selected for a purpose or a specific reason. The selection of participants will vary depending on the specific purpose of the research. In some cases, the research question may necessitate including participants who represent what is typical. In other instances, the researcher may decide to include individuals with extreme experiences, or may select participants who represent a variety of experiences. It depends on the purpose of the study. Generally speaking, qualitative studies are not designed for generalizability and involve an in-depth and intense data collection process; consequently, small numbers of participants are the norm. The veterans’ driving study described earlier included only five individuals, who were selected based on specific criteria related to combat experience and TBI/PTSD diagnosis (Hannold et al, 2013). Sometimes qualitative researchers use a method known as snowball sampling, in which the initial participants are asked to recruit additional participants from their own social networks. This method is useful because carefully selected initial participants may be able to reach and recruit individuals whom the researcher does not have a way to identify and/or contact. If naturalistic inquiry is the philosophy underlying qualitative research, naturalistic observation is one of its methods. Instead of manipulating variables or environments, as is done in quantitative research, qualitative research takes place in a naturalist setting, or real-world environment, and events are observed as they naturally occur. For example, if a study is examining work, at least some of the research would be expected to take place in the workplace. Spending extended time in the natural environment allows the researcher to gain access to the daily life of the participant and experience reality as that individual experiences it. The description in From the Evidence 9-1 comes from the methods section of a study (Arntzen & Elstad, 2013) and describes a naturalist setting. In this study, which examined the bodily experience of apraxia, participants were observed for extended periods of time during typical therapy sessions and at home. Videotape was used to capture the data from these sessions. 27/10/16 2:09 pm 166 CHAPTER 9 ● Qualitative Designs and Methods FROM THE EVIDENCE 9-1 Example of a Naturalistic Setting Arntzen, C., & Elstad, I. (2013). The bodily experience of apraxia in everyday activities: A phenomenological study. Disability Rehabilitation, 35, 63–72. Observation and video recording: Apractic difficulties, as experiences of bodily movement, are silent, fluctuating, and frequently puzzling. The use of video observations of rehabilitation sessions provided access to the phenomena and context for the dialogue with the participants about their apraxia experiences. In phenomenological studies of body movements, video observations do not aim to establish an objective basis for validating subjective material, but a close and detailed description of ongoing processes, subjected to a triangulation that integrates interview and observational data [25,26]. The observational data were particularly valuable for capturing the person’s immediate response to the disturbed actions. The ADL-activities observed in the rehabilitation units were ordinary morning routines, breakfast preparation, making coffee or meals. We did not influence the choice of activity, or when and where it should be carried out; it was as close to ordinary practice as possible. The activities were performed in the patients’ rooms and bathrooms, in the therapy kitchen, or other therapy rooms. Four participants were followed up in their own homes and one in a nursing home. If possible, the same activities were video filmed at the hospital and at home. Some participants offered to demonstrate other activities, such as needlework and dinner preparations. These were also videotaped. The video camera was handheld for greater flexibility of position, adjustment of angle, and zooming to capture details in the interaction. It was considered important to capture the therapist and the tools in use as well as the patient. The video sequences varied from fifteen minutes to one hour, in total twelve hours of edited video tape. Note A: Observations took place during typical practice and in some cases at home. FTE 9-1 Question 1 From the standpoint of a qualitative research philosophy, is it a problem that different participants were observed in different settings? Explain your answer. Methods of Data Collection Qualitative data collection uses methods that capture the unique experience of the participant. These methods are described in the methods section of qualitative research articles. Rather than depending on questionnaires and surveys, a fundamental method of data 4366_Ch09_163-182.indd 166 collection in qualitative research is the open-ended interview, in which there are no set questions, so the process directs the questioning. Interviews are typically done face-to-face, so the interviewer is free to probe, ask follow-up questions, and take the lead of the interviewee to pursue new and possibly unexpected areas of inquiry. 27/10/16 2:09 pm CHAPTER 9 ● Qualitative Designs and Methods Another method, focus groups, allows multiple individuals to be interviewed at once. In focus groups, an interview is conducted with a group of individuals to target a particular topic. Although it may be more difficult to get in-depth information from each individual, the dynamic nature of a focus group provides participants with the opportunity to bounce ideas and thoughts off one another. The group members may confirm an individual’s experience or provide a different point of view. The collection of artifacts is another methodology used in qualitative research. Artifacts are objects that provide information about the subject of interest. For example, in a study exploring factors that facilitate return to work for individuals with spinal cord injury, Wilbanks and Ivankova (2014/2015) asked to see assistive technology that was used in the workplace. Actual photographs of these artifacts are included in the study report. As noted earlier, naturalistic observation is a common component of qualitative data collection. This type of observation can take different forms. Qualitative studies should specifically describe the observational process so that the reader clearly understands how data collection occurred. In some studies the researcher is removed from the experience and observes from afar, typically recording observations in the form of field notes, the most unobtrusive way of recording and observation. In this instance, the focus is on watching and listening. Field notes often describe both what is seen and the researcher’s impressions. For example, a researcher who is interested in the ways in which peers provide support to one another in a peer-operated drop-in center for individuals with serious mental illness might spend time at the drop-in center recording observations; however, the researcher does not personally engage in the activities at the drop-in center. Photographs, videotape, and audiotape may also be used to capture the experience, as these methods have the potential to provide greater detail and depth of data, and they may capture information that the researcher missed. When a study uses multiple methods of data collection, it is more likely to capture the complexity of an experience. FTE 9-1 Question 2 4366_Ch09_163-182.indd 167 167 A more immersive form of data collection is participant observation. With participant observation, the researcher engages with the participants in their naturally occurring activities in order to gain a more in-depth appreciation of the situation. For example, Mynard, Howie, and Collister (2009) described the process of participant observation in a study examining the benefits of participating in an Australian Rules Football team for individuals with disadvantages such as mental illness, addiction, unemployment, and homelessness: The first author joined the team for an entire season, attending 11 training sessions, 10 of 11 games and the Presentation Dinner. As a woman, she did not train or play, but assisted as required with babysitting, preparing the barbeque and first aid. During games her role was to fill water bottles and run drinks to players on the field. This enabled her to experience games from both on-field and off-field perspectives (pp. 268–269). Qualitative researchers may also collect data by involving participants in creative means of expression, such as photography, poetry, and music. For example, Tomar and Stoffel (2014) used photovoice to explore the lived experience of returning veterans who had gone back to college. Photovoice is a specific methodology in which participants take photographs and write accompanying narratives to answer particular questions. Figure 9-1 displays an example of a photovoice created by an individual with mental illness. Data Analysis Large amounts of data are generated from all of the methods described in this chapter. A lengthy analysis process involves identifying patterns within the data that can be categorized for easy retrieval. These key passages in text, photographs, or artifacts are coded; in other words, excerpts of the data are given a name. For example, in a study of physical activity, the category of “barriers” may be used to identify data that related to obstacles to exercise. The patterns within the codes are then further analyzed to identify the underlying meaning among the categories. These patterns are labeled or described in terms of themes. The themes are what the reader of qualitative research sees. The results section of a qualitative study reports the themes, describes what the themes mean, and illustrates Why is video a particularly useful method of data collection in qualitative research? 27/10/16 2:09 pm 168 CHAPTER 9 ● Qualitative Designs and Methods “When I became homeless, I went to C.A.S.S. I told them I have issues with behavioral and medical health. They suggested I go to a different shelter. I was sad, mad, and confused. So I called four different shelters, and they all turned me away. I believe that this is the norm here in Phoenix. There needs to be more outreach to our community so that this doesn’t happen to “one more person.” the themes through actual quotations from the study participants. For example, the exercise barriers may be further analyzed with a theme such as “exercise is difficult when life feels out of control.” The results section would go on to explain the theme and include quotes that support or illustrate the theme. From the Evidence 9-2 provides an example of one theme identified in the results section of a qualitative study. The process of analyzing qualitative data is often accomplished through specialized software. Different computer programs are available that code and identify patterns among the data captured in transcripts of interviews and field notes. The software also sets up the data in a format that is easy to use, by numbering each line and providing a system for annotating the data. EXERCISE 9-1 Distinguishing Characteristics of Qualitative and Quantitative Data (LO1) QUESTIONS Using the following letter choices, identify the following characteristics as most likely: A. Qualitative B. Quantitative C. Both qualitative and quantitative 1. A positivistic philosophy 2. In-depth open-ended interviews 4366_Ch09_163-182.indd 168 FIGURE 9-1 Example of a photovoice. This photovoice describes the lived experience of an individual in recovery from mental illness. 3. Articles organized around introduction, methods, results, and discussion 4. Hypothesis-driven 5. May use computers for data analysis 6. Sampling by identifying individuals who will serve a particular purpose 7. Results reported as themes 8. Often includes manipulation of the environment 9. Utilizes inductive reasoning QUALITATIVE RESEARCH DESIGNS Qualitative research is a broad term that encompasses many different research designs. This section describes the qualitative designs that are most commonly employed in health-care research, including phenomenology, grounded theory, ethnography, and narrative. Table 9-1 summarizes the characteristics of the different qualitative designs. In addition, the mixed-methods approach, which combines qualitative and quantitative methods, is described. Phenomenology The purpose of phenomenology is to understand and describe the lived experience from the point of view of the research participant. As an approach, phenomenology is particularly useful for situations that are poorly defined or 27/10/16 2:09 pm CHAPTER 9 ● Qualitative Designs and Methods 169 FROM THE EVIDENCE 9-2 Example of Themes, Description, and Quotations Leroy, K., Boyd, K., De Asis, K., Lee, R. W., Martin, R., Teachman, G., & Gibson, B. E. (2014). Balancing hope and realism in familycentered care: Physical therapists’ dilemmas in negotiating walking goals with parents of children with cerebral palsy. Physical and Occupational Therapy in Pediatrics, epub. In this study, which examined the negotiation of goals between physical therapists and parents of children with cerebral palsy, the following theme and supporting information were provided: Note A: The theme. Balancing Hope and Realism A consistent theme evident across participants’ accounts was the notion that physical therapists actively worked to balance their beliefs about the value of walking and their expert knowledge about children's walking prognoses with families’ goals and hopes. Finding a balance meant considering families’ goals and weighing these against their own beliefs and professional knowledge in order to negotiate a treatment plan that participants felt was in the best interests of the children. Participants underscored their beliefs regarding how walking facilitated accessibility in the environment and enabled social participation in daily activities. Generally, all participants were consistent in believing that walking held, as Olivia stated, a “very high value” in society. They agreed that walking has a number of benefits including making it easier to function in society and various physiological benefits. “ … saying that everybody has to be able to walk and we have to find some way for everyone to be able to walk, I don't agree with that. I don't think that everyone has to be able to. We know that there are benefits though, to walking in terms of weight bearing and … another way to explore your environment.” –Emily Participants defined walking primarily by its functionality and they were less concerned about the appearance of gait or whether gait aids or assistance were required. They expressed that the benefits to walking could be achieved through different techniques including walking with a gait aid or push toy. When setting goals with the family, these beliefs served as a backdrop to their clinical knowledge and experience, as they considered whether walking was realistically achievable for each child. Physical therapists used available predictors of prognosis, such as the Gross Motor Function Classification System (GMFCS) (Palisano et al., 1997), and their clinical experience to predict outcomes. Participants were confident that the GMFCS was a good predictor of walking ability, but not perfect as they had witnessed some children surpassing their initial prognosis. Hence, they were hesitant to definitively predict a child's walking outcomes or convey these to parents. “The GMFCS classification system is great, you have a pretty good idea of whether the child is going to walk … but I'd never say never, cause there are kids that you really thought would never walk and they do walk.” –Riley Note C: Quotes to illustrate themes. Note B: Descriptions of the theme. FTE 9-2 Question The quotes used in qualitative research could be equated with what type of data that is often reported in quantitative research? 4366_Ch09_163-182.indd 169 27/10/16 2:09 pm 170 CHAPTER 9 ● Qualitative Designs and Methods TABLE 91 Qualitative Designs* Design Purpose Methods Results Phenomenology Describe the lived experience Interviews, focus groups, observation, Description of the bracketing phenomenon Grounded theory Develop a theory that is derived from the data Interviews, focus groups, observation, Theory to explain constant comparative methods the data Ethnography Describe a group of people or a culture Immersion in the field, participation in the culture, examination of artifacts Description of a culture and/or theory about that culture Narrative Tell a story In-depth interview, collection of artifacts Construction of a coherent story * Note that these methods are not mutually exclusive; there is often overlap of the designs. potentially misunderstood, such as the process of coming to terms with a particular diagnosis or the client’s perspective of the therapy experience. The emphasis is on description rather than explanation. In phenomenology, the insider’s perspective is of primary importance. Therefore, researchers must not impose their own beliefs on the situation. Because it is impossible to remain totally unbiased, in phenomenological research, the researcher’s assumptions are identified and “bracketed.” When bracketing, the researcher uses methods such as keeping a diary or concept mapping to identify preconceived ideas about a phenomenon. Then the researcher uses strategies to keep these biases in abeyance while collecting and interpreting qualitative data. For example, a researcher who is also a therapist may have personal experiences with providing the specific therapy experience that is being studied. The therapist would identify pre-existing assumptions about the therapy, such as “This therapy is well tolerated,” or “Most clients respond well to this therapy.” The therapist would acknowledge the positive bias and remain open to alternative views from the research participants. In phenomenology, typically one person or a small, selected group of participants are included in the study. The primary methods of data collection include in-depth interviews, discussion, and observation. The transcripts and observations are then analyzed for themes. The reporting of the results includes the identified themes supported by individual statements or observations. For example, Gramstad, Storli, and Hamran (2014) used a phenomenological design to examine the experience of older adults during the process of receiving an assistive technology device. From the Evidence 9-3 provides an 4366_Ch09_163-182.indd 170 illustration of one theme identified in the study: “Taking charge or putting up.” Grounded Theory The purpose of the grounded theory qualitative research design is to develop new theory from the data collected. This type of research was developed partially as a reaction to the positivist perspective of developing theory first and then collecting data to support or refute that theory. Grounded theory takes the opposite approach, beginning without a hypothesis or assumption. Instead, data are collected concerning a general question or topic. The theory comes out of the data or, in other words, is “grounded” in the data. Grounded theory research uses the constant comparative method. Instead of waiting until all of the data have been collected to do an analysis, some data are collected, and an analysis is performed to determine how more data should be collected. For example, should additional people be interviewed? What questions should be asked next? Multiple iterations of data collection and analysis are conducted until saturation occurs. Saturation means that no new ideas or information are emerging from the data. Data analysis in grounded theory research is a multistep process that starts with open coding, or identifying simple categories within the data. Next, the categories are brought together in a process called axial coding, which identifies relationships between categories. Finally, selective coding involves the articulation of a theory based on the categories and their relationships. In a study of the sensory experiences of children with and without autism and their effect on family occupations, 27/10/16 2:09 pm CHAPTER 9 ● Qualitative Designs and Methods 171 FROM THE EVIDENCE 9-3 Example of a Theme Gramstad, A., Storli, S. L., & Hamran, T. (2014). Older individuals’ experience during assistive device service delivery process. Scandinavian Journal of Occupational Therapy, 21, 305–312. Knowing who to call or where to go did not guarantee that the participant contacted someone to correct difficulties. Putting up with a difficult situation including hesitating to contact the occupational therapist when the ATD did not work the way the participant expected it would. For some participants, contacting the occupational therapist because they needed additional help was considered to mean that they would be perceived as rude, ungrateful, and subject to negative consequences. These concerns could make the participants delay or entirely omit contacting the professional to correct the situation. One of the participants explained her reluctance to contact the occupational therapist by saying, “I am not one of those to complain. I never was. Complaining is just not my style.” [H]. To explain why she delayed contacting the occupational therapist with her ATD that did not work to her satisfaction another woman said, “I am not like that…I am not pushy and aggressive. I can’t…nag to get what I want. That is the worst thing I know.” [E] Note A: Quotes are used to illustrate the theme of taking charge or putting up. FTE 9-3 Question What important information from the perspective of individuals receiving assistive devices does this study provide to a therapist? Bagby, Dickie, & Baranek (2012) interviewed parents of 12 children, 6 with autism and 6 without autism. In the study, researchers identified the following simple categories through open coding: (1) shared meaning within the family, (2) preparation for what the family was going to do, (3) the child’s sensory experiences, and (4) family occupations. Axial coding indicated that family occupations were affected by the shared meaning, preparation for the experience, and the child’s family occupations. This led to a theory that explained how family occupations are impacted differently by children with and without autism. In another study using grounded theory, the experience of a rotator cuff injury was studied (Minns Lowe, Moser, & Barker, 2014). The figure in From the Evidence 9-4 illustrates the theory derived from the 4366_Ch09_163-182.indd 171 research and demonstrates that the pain associated with this injury has a wide-ranging impact on all aspects of daily life. Ethnography Perhaps the oldest of the qualitative research designs, ethnography has its roots in anthropology. The purpose of ethnography is to describe a group of people, their behaviors, and/or their culture. In ethnography, the insider’s viewpoint is sought in the context of understanding the larger culture or social structure, with the “insider” being the individuals who are being studied. Ethnographic researchers become immersed within the culture being studied. The ethnographer observes and, once invited, participates in the routines, rituals, and activities of the group. This process is referred to as 27/10/16 2:09 pm 172 CHAPTER 9 ● Qualitative Designs and Methods FROM THE EVIDENCE 9-4 Example of Grounded Theory Minns Lowe, C. J., Moser, J., & Barker, K. (2014). Living with a symptomatic rotator cuff tear “bad days, bad nights”: A qualitative study. BMC Musculoskeletal Disorders, 15, 228. http://doi.org/10.1186/1471-2474-15-228. Coping strategies, Getting on with it, Acceptance, Other shoulder, Analgesia, Aids and adaptations Impact on ADL, Leisure, Occupation, Emotional impact, Finance, Social support Limited movement Reduced strength Audible sounds Broken sleep, Waking, Night pain, Daytime tiredness and irritability Intense shocking surprising pain Figure 1. Diagrammatic summary of living with a rotator cuff tear. This diagram shows how, like ripples spreading out from a stone thrown into a pool, pain from a symptomatic rotator cuff tear can impact on, and change, all areas of a participant’s life. FTE 9-4 Question Using the examples from From the Evidence 9-3 and From the Evidence 9-4, explain how grounded theory differs from phenomenology in its purpose. participant observation. Data are collected with field notes and recorded via audiotape and/or videotape. Participants are generally referred to as informants because they provide the researcher with an insider perspective. Ethnography typically involves spending extended time in the field until the researcher has collected enough data to adequately understand the culture or situation. Through participation and an extended presence, the researcher can learn the language, behaviors, activities, and beliefs of the people being studied. Unlike phenomenology, which describes, ethnography explains. After collecting data, the ethnographer spends time outside of the field to reflect on the experience and 4366_Ch09_163-182.indd 172 analyze the data. Explanations and in some cases theory arise from the data. Although theory developed from ethnography may be specific to the culture studied, some ethnographic theory is developed to explain the more general human experience. The researcher may return to the field to verify the interpretations or theory by asking informants to review and provide feedback on the analyses. Using ethnographic methods, Hackman (2011) explored the meaning of rehabilitation for people with life-threatening central nervous system tumors, specifically examining their perspective on the purpose of rehabilitation. As a physical therapist, Hackman was already embedded within the culture, but asked participants to write narratives about their 27/10/16 2:09 pm CHAPTER 9 ● Qualitative Designs and Methods experience with rehabilitation and obtained additional data through interviews and observations. From the Evidence 9-5 presents a diagram from the study, which used the analogy of an umbrella to represent the themes. The primary theme involved equating rehabilitation to a return to normalcy, with the physical therapist providing supports to different dimensions of that return. 173 on the recounting of an event or series of events, often in chronological order. Many narratives feature epiphanies, or turning points, in an individual’s life. Narrative research often appears more literary than other forms of qualitative research, and it is not unusual for a narrative research article to contain a story line or plot. Another feature of narrative research is the complexity and depth of the data collection. Narratives typically provide many details about the individual and present the information like a short story. Although narrative research can be biographical or autobiographical, with a focus on the story of a single individual, some narrative research collects stories from several individuals who have had a common experience. Narrative Narrative research can be characterized as storytelling. People are natural storytellers, and this method takes advantage of that propensity. Storytelling involves remembrances, retrospectives, and constructions that may focus FROM THE EVIDENCE 9-5 Themes Represented With an Analogy Hackman, D. (2011). “What’s the point?” Exploring rehabilitation for people with 10 CNS tumours using ethnography: Patient’s perspectives. Physiotherapy Research International, 16, 201–217. Note A: The overarching theme. Quality of Life l talk Confi denc e H self-efficacy touch role trust in the professional chat equipment Physical therapy structure holism Note B: Subthemes involved in “getting back to normal” and the central role of the physical therapist. e op e nc de a sion fes Pro Ind ep en Rehabilitation = “getting back to normal” emotion and function emotional support environment FTE 9-5 Question Why might analogies be a useful technique for presenting themes in qualitative research? 4366_Ch09_163-182.indd 173 27/10/16 2:09 pm 174 CHAPTER 9 ● Qualitative Designs and Methods Similar to other forms of qualitative research, unstructured or semi-structured interviews are a major source of data in narrative research. The collection of artifacts, in the form of journals and letters, is also common. Some extensive narratives take the form of a life history, in which an individual’s life over an extended period of time is examined. With this method, the researcher works to uncover significant events and explore their meaning to the individual. From the Evidence 9-6 provides an example of a narrative that uses a life history approach to tell the story of an individual with a disability who becomes a Paralympian (Kavanagh, 2012). FROM THE EVIDENCE 9-6 Example of a Narrative Study Kavanagh, E. (2012). Affirmation through disability: One athlete’s personal journey to the London Paralympic Games. Perspective in Public Health, 132, 68–74. Note A: Identification of life history approach. AIMS: This article explores the personal narrative of a British Paralympic wheelchair tennis player who experienced a spinal cord injury (SCI) following a motorcycle accident in 2001 that left her paralyzed from the waist down. The study responds to the call by Swain and French, among others, for alternative accounts of disability that demonstrate how life following impairment need not be empty and meaningless, but can actually reflect a positive, if different, social identity. METHODS: This study draws on life history data to investigate the journey of one athlete who has managed to achieve international sporting success following a life-changing accident. A pseudonym has not been used for this study as the athlete wanted to be named in the research account and for her story to be shared. RESULTS: A chronological approach was adopted to map the pre- and post-accident recovery process. The account examines life before the trauma, the impact of the accident, the process of rehabilitation, and the journey to athletic accomplishment. CONCLUSIONS: Negative views of disability can be challenged if disability is viewed in the context of positive life narratives. The story of one Paralympian demonstrates how an “ordinary” person has made the most of an extraordinary situation and become a world-class athlete. This paper demonstrates that in contrast to typical discourse in disability studies, becoming disabled or living with a disability need not be a tragedy but may on the contrary enhance life and lead to positive affirmation. Note B: Described chronologically, highlighting major life events. FTE 9-6 Question Describe the potential epiphany presented in the research abstract. 4366_Ch09_163-182.indd 174 27/10/16 2:09 pm CHAPTER 9 ● Qualitative Designs and Methods 175 EVIDENCE IN THE REAL WORLD Storytelling Can Be Useful for Clients In his important book, The Wounded Storyteller: Body, Illness and Ethics, Frank (2013) discusses the importance of storytelling as a way for the individual to understand his or her own suffering. So, not only is storytelling important to the researcher or clinician, but it is also useful for the individual who tells the story. Frank describes three narrative genres: In the restitution narrative, the individual is ill but will be cured and return to a pre-illness state. In the chaos narrative, there is no linear story, but a sense that illness leads to lack of control and a life of suffering. In the quest narrative, the individual learns from the experience and develops a sense of purpose. When individuals experience chronic illness or disability, the condition will not be resolved, so a restitution narrative is not possible. The quest narrative allows for a story that gives meaning as opposed to despair. Frank describes three subtypes of quest narratives. In the quest memoir, the story is characterized by acceptance, and the illness or condition is incorporated into the individual’s daily life. In the quest manifesto, the individual uses the insights gained from the experience to make things better for others through social reform. The quest automythology is a story of rebirth and reinvention of the self. As therapists, we can facilitate storytelling by asking open-ended questions and being good listeners. In addition, we can help individuals find meaning from their experiences and identify their own personal quests. Mixed-Methods Research In mixed-methods research, both quantitative and qualitative methods are used within a single study or a series of studies to increase the breadth and depth of understanding of a research problem. Given that the two methods arise from seemingly opposing philosophies, you may question how positivism and constructivism can be combined. Some mixed-methods researchers adopt a pragmatic philosophy that takes a “whatever works” stance and values both subjective and objective information (Morgan, 2007). With mixed-methods research, there is not simply a collection of qualitative and quantitative data, but a mixing of the data. Creswell and Plano Clark (2011) suggest that there are three ways in which data can be combined: 1. The first approach, merging data, involves reporting quantitative and qualitative data together. For example, the themes and quotes are supported by quantitative statistics. 2. With connecting data, one set of data is used to inform a second set of data, often chronologically. For example, qualitative data may be used to develop items for a quantitative measure, which is then examined for reliability and validity. 3. The third approach, embedding data, involves one dataset as the primary source of information and a second dataset that serves as a secondary supplement. For example, a quantitative efficacy study may be supplemented with qualitative data on the experience of the participants. EXERCISE 9-2 Matching Purpose Statements With Qualitative Designs (LO2) QUESTIONS The following purpose statements derive from actual qualitative studies. Identify the design suggested by each of the following statements. 1. This study used the telling of life stories to examine how engagement in creative occupations informed six older retired people’s occupational identities (Howie, Coulter, & Feldman, 2004). 2. This study aimed to describe and explain the responses of young people to their first episode of psychosis (Henderson & Cock, 2014). 3. This study answered the question, “What kinds of rituals, contextual circumstances, and personal health beliefs are operating in the use of music as self-care?” (Ruud, 2013). From the Evidence 9-7 provides an example of a mixed-methods study that examines a telerehabilitation dysphagia assessment. 4366_Ch09_163-182.indd 175 27/10/16 2:09 pm 176 CHAPTER 9 ● Qualitative Designs and Methods FROM THE EVIDENCE 9-7 Example of a Mixed- Methods Study Ward, E. C., Burns, C. L., Theodoros, D. G., & Russell, T. G. (2013). Evaluation of a clinical service model for dysphagia assessment via telerehabilitation. International Journal of Telemedicine and Applications, 918526. http://doi.org/10.1155/2013/918526. Emerging research supports the feasibility and viability of conducting clinical swallow examinations (CSE) for patients with dysphagia via telerehabilitation. However, minimal data have been reported to date regarding the implementation of such services within the clinical setting or the user perceptions of this type of clinical service. A mixed methods study design was employed to examine the outcomes of a weekly dysphagia assessment clinic conducted via telerehabilitation and examine issues relating to service delivery and user perceptions. Data were collected across a total of 100 patient assessments. Information relating to primary patient outcomes, session statistics, patient perceptions, and clinician perceptions was examined. Results revealed that session durations averaged 45 minutes, there was minimal technical difficulty experienced, and clinical decisions made regarding primary patient outcomes were comparable between the online and face to face clinicians. Patient satisfaction was high and clinicians felt that they developed good rapport, found the system easy to use, and were satisfied with the service in over 90% of the assessments conducted. Key factors relating to screening patient suitability, having good general organization, and skilled staff were identified as facilitators for the service. This trial has highlighted important issues for consideration when planning or implementing a telerehabilitation service for dysphagia management. Note A: Quantitative data describe the results from assessments and surveys. Note B: Qualitative data from clinicians are used to identify factors that contributed to patient satisfaction and positive outcomes. FTE 9-7 Question Which approach to mixing the data is used in this study: merging, connecting, or embedding data? Explain why. 4. The aim of this study was to explore parental learning experiences to gain a better understanding of the process parents use in learning to feed their preterm infant (Stevens, Gazza, & Pickler, 2014). PROPERTIES OF STRONG QUALITATIVE STUDIES Because qualitative and quantitative research are based in different philosophies and paradigms, different criteria are used to evaluate their assets. Whereas quantitative research is judged by its internal and external validity (see 4366_Ch09_163-182.indd 176 Chapter 4), qualitative research is more concerned with trustworthiness, that is, the accurate representation of a phenomenon. Because qualitative research is a reflection of the unique experiences and meaning attributed to those experiences of the participants, it is critical for qualitative research to provide an accurate representation. This section describes the major attributes associated with trustworthiness. As identified in the classic book by Lincoln and Guba (1985), there are four characteristics of qualitative research that reflect its trustworthiness: credibility, transferability, dependability, and confirmability. Evidence-based practitioners should consider these four characteristics when reading a qualitative study and determining whether to trust in the findings. Table 9-2 summarizes information about these criteria 27/10/16 2:09 pm CHAPTER 9 ● Qualitative Designs and Methods 177 TABLE 92 Criteria Used to Establish Trustworthiness of Qualitative Research Quantitative Counterpart Criteria Description Credibility Accurate representation of the phenomenon from the perspective of the participants Internal validity Prolonged engagement, extensive open-ended interviews, triangulation, member checking Transferability Application of information from a study to other situations External validity Thick descriptions Dependability Consistency in the data across time, participants, and researchers Reliability Code-recode, independent coding by multiple researchers, collection of data over several time points Confirmability Corroboration of the data Objectivity Reflexivity for evaluating qualitative studies. Although each of the methods listed in the table is primarily associated with a single criterion, many methods support more than one aspect of trustworthiness. Credibility A qualitative study is considered to have credibility when it is authentic; that is, when it accurately reflects the reality of the research participants. One measure of credibility lies in the sample selection. In purposive sampling, the participants are selected for a purpose. Thus, one evaluation of credibility involves questioning whether the participants who were selected were consistent with the study’s purpose. In addition, credibility requires the researcher to use methods to ensure that participants respond honestly and openly. One such method, referred to as prolonged engagement, involves the researcher spending enough time getting to know individuals that a sense of trust and familiarity is established. Recall that qualitative research relies on open-ended interviews that allow participants to direct the questioning. Credibility is enhanced when the interviews are extensive and conducted over multiple time periods. Triangulation is another strategy that can enhance the credibility of a study. With triangulation, multiple resources and methods are employed to verify and corroborate data; that is, use of several methods leads to the same results in each case. The inclusion of multiple participants and/or multiple observers is one way to accomplish triangulation. Triangulation can also be achieved by collecting data using different methods, such as interviews, focus groups, and participant observation. 4366_Ch09_163-182.indd 177 Methods One of the most useful methods for producing credibility is member checking, in which participants are regularly and repeatedly queried to ensure that the researcher’s impressions are accurate. For example, during initial interviews, the researcher asks follow-up questions and/or repeats statements back to the participant. During data analysis, the researcher shares the themes with the participant to ensure that the interpretations of the data are accurate. Qualitative research is a recursive process that involves back-and-forth dialogue between the researcher’s interpretation of the data and the participant’s input to promote authenticity. Through member checking, research participants can expand upon the data and/ or correct errors. Transferability With qualitative research there is less emphasis on generalizability, particularly in the statistical manner of quantitative research. Its emphasis on the uniqueness of each participant and context has resulted in some criticism that the information in qualitative research is most important in its own right (Myers, 2000). Others take a less strict position and suggest that, when research situations are similar, information acquired from qualitative research may illuminate related situations. In the case of qualitative research, the term transferability is more often used. Transferability is the extent to which the information from a qualitative study may be extended, or applied, to other situations. Regarding transferability, the burden lies primarily on the practitioner who is interested in applying the results. The practitioner must examine the research to determine how similar the research conditions are to the practitioner’s situation. The researcher may facilitate this 27/10/16 2:09 pm 178 CHAPTER 9 ● Qualitative Designs and Methods process by providing a thick description, meaning that enough details are provided about the people, situations, and settings that readers can determine the transferability to their own situations. For example, Bertilsson, Von Koch, Tham, and Johansson (2015) studied the experience of the spouses of stroke survivors. There is a fair amount of detail about the individuals. The study took place in Sweden over a one-year time period, both during and after client-centered activities of daily living (ADLs) rehabilitation was provided. The researchers studied six women and one man; three lived in a city, and four lived in small villages. However, there is little detail about the time frame during which the individuals received rehabilitation services. Therapists interested in applying the results could expect the study findings to be more transferable if the spouses they were interested in supporting were also primarily female and if the intervention applied client-centered principles. It would be more difficult to know how the Swedish experience transfers to the country in which the therapists are providing services. Dependability Dependability is the extent to which qualitative data are consistent. In qualitative research, it is recognized that perspectives change over time and different individuals have different perspectives. Although these changes and differences are described and acknowledged, efforts can be made to show that findings are supported and steady across individuals and time. One way in which consistency can be examined is in terms of time. When a researcher collects data over multiple time points, patterns across time can be identified. In terms of dependability, multiple time points are preferable to data collected at only one time point. Another way to examine consistency is across multiple coders. When coding transcripts, two or more researchers can code independently and then compare their results. For example, when reading a transcript, one rater may code a set of comments from a research participant as “resignation,” whereas another rater may code the same comments as “acceptance.” Although related, these two terms have different connotations. The raters could then have a discussion and arrive at a conclusion as to which code is more accurate, or if a different code would be more effective in characterizing the statements. If two coders see the same things, there is more consistency in the identification of themes. A code-recode procedure may be used with a single researcher. With this process, transcripts are coded and then set aside for a time. When the researcher returns, he or she then recodes and evaluates the findings. Confirmability Confirmability is the extent to which the findings of a qualitative study can be corroborated by others. Because 4366_Ch09_163-182.indd 178 there is inherent bias in any human interpretation, efforts must be taken to ensure that the researcher’s perspectives do not distort the voice of the participants. Reflexivity, or the process of identifying a researcher’s personal biases and perspectives so that they can be set aside, is necessary to make the views of the researcher transparent. One strategy to make biases known involves keeping a reflexive journal, in which entries are made throughout the research process to describe decisions made by the researcher to expose the researcher’s own values and beliefs. These journals are a type of diary that typically remains private, but reminds the researcher of potential biases. The researcher may then report these perspectives in the manuscript so that readers are made aware of the researcher’s positions. By taking a reflexive approach, it is more likely that the results reflect the experiences and beliefs of the participants and not the desires and biases of the researcher. An audit trail—the collection of documents from a qualitative study that can be used to confirm the data analysis of the researcher—promotes confirmability by making the data available to outside sources. Although it is not feasible to include all data in a manuscript, the researcher keeps the transcripts, field notes, coding documents, diaries, and so on, and describes the process of data collection and analysis in the manuscript. These materials are then made available if someone is interested in auditing the study. Triangulation also supports confirmability, as multiple sources with similar findings enhance verification of the results. EXERCISE 9-3 Identifying Strategies to Promote Trustworthiness (LO3) QUESTIONS Read the following excerpt from the methods section of a study of 275 parents examining their hopes for their children receiving sensory integration therapy (Cohn, Kramer, Schub, & May-Benson, 2014). Data Analysis Content and comparative analyses were used to analyze parents’ responses to three questions on developmental-sensory histories intake forms. . . . First, parents’ responses were entered word for word into a database. On the basis of a preliminary review of the entire data set, an occupational therapy student identified initial codes for each question to preserve the parents’ exact words for their primary concerns. Two occupational 27/10/16 2:09 pm CHAPTER 9 ● Qualitative Designs and Methods therapists with pediatric and qualitative research experience and familiarity with social participation and sensory integration research literature, along with the occupational therapy student, reviewed the initial codes and compared the codes across the data set for common themes of meaning. The research team identified four condensed, encompassing categorical codes that described parents’ concerns and hopes and developed specific definitions for each code. Consolidating initial codes to develop categorical codes involved comparing categorical codes with exemplar quotes for each category and ensuring that all team members agreed on categorization of the data. Three parental explanatory models (EMs), based on combinations of the four categorical codes, were developed to describe how parents conceptualized and linked concerns and hopes about their children’s occupational performance. These models were then reviewed and modified for conceptual congruence by the entire research team. Further, to check for theoretical relevancy, the first author (Cohn) conducted a member check with a mother whose child had sensory processing and learning challenges, participated in this study, and was receiving occupational therapy at the member check time. The EMs were presented to this mother, who was particularly insightful, provided the authors with additional data, and confirmed that in her experience as a parent and through her interactions with other parents of children with sensory processing disorder, the EMs captured what parents care about. 179 2. What is the difference in how theory is used in quantitative versus qualitative research? 3. Which sampling methods are commonly used in qualitative research? 4. What characteristics most distinguish the different designs in qualitative research? 5. What are the four criteria for judging the strength of qualitative research, and how does each one make a study more trustworthy? 1. What approaches are used to address trustworthiness in this study? What are its strengths and limitations? CRITICAL THINKING QUESTIONS 6. Although the four criteria for qualitative research are associated with four criteria for quantitative research, they are not the same thing. How do they differ? 1. Why are quotes used most often to illustrate themes in qualitative research articles? 4366_Ch09_163-182.indd 179 27/10/16 2:09 pm 180 CHAPTER 9 ● Qualitative Designs and Methods ANSWERS EXERCISE 9-1 1. 2. 3. 4. 5. 6. 7. 8. 9. B A C B C A A B A EXERCISE 9-2 1. 2. 3. 4. Narrative (clue: life stories) Grounded theory (clue: explain) Ethnography (clue: beliefs, rituals) Phenomenology (clue: understand) EXERCISE 9-3 This study included strategies to address dependability and confirmability, but was less strong in terms of transferability and credibility. Unlike most qualitative research, this study involved large numbers of participants (275), which enhances its dependability and confirmability; the reports can be corroborated, and consistencies can be examined. The use of multiple coders also enhanced the dependability of the data, and member checking improved the study’s credibility. However the large numbers also means that the data were not “thick”; it may be more difficult to determine the transferability of the results. Although member checking improves the study’s credibility, the use of only three open-ended questions using a written format, with no follow-up, makes the data less dependable. FROM THE EVIDENCE 9-1 1. No, this is not a problem. In qualitative research, controlling variables is not a goal when recruiting individuals or settings. Instead, participants are selected who meet the characteristics of a study. Sampling is not random, but deliberate or intentional. 2. Video recordings provide a qualitative researcher with a permanent record of the person-environment interaction. The researcher can capture the person’s experience through his or her words and actions within a real-world context. FROM THE EVIDENCE 9-2 The quotes in qualitative research could be equated with the reporting of means and standard deviations in quantitative research. Both would be considered more basic 4366_Ch09_163-182.indd 180 units of data that make up a larger whole. Both also provide readers with a level of transparency so that they may draw some of their own conclusions. FROM THE EVIDENCE 9-3 The individual’s perspective provides you with knowledge that some clients are reluctant to contact you when they are having trouble with a device. This suggests that follow-up appointments or phone calls could be useful to identify potential problems. FROM THE EVIDENCE 9-4 Phenomenology is focused on simply describing the lived experience or phenomenon without interpretation, whereas grounded theory becomes much more involved in explaining the phenomenon or situation through identifying a theory. FROM THE EVIDENCE 9-5 Analogies are consistent with a constructivist (qualitative) point of view because the analogy provides a description that is based on the researcher’s perspective and allows readers to make their own meaning from the analogy. In the case of the umbrella diagram, the picture might be comparable to a chart in quantitative research, which pulls multiple findings together in an illustration. FROM THE EVIDENCE 9-6 The epiphany might be described as moving from seeing the injury as tragedy to seeing the experience as one that enhanced the individual’s life. FROM THE EVIDENCE 9-7 This is an example of embedding data. The qualitative data are used to supplement and help explain the quantitative data, which present the outcomes and level of satisfaction of clients. REFERENCES Arntzen, C., & Elstad, I. (2013). The bodily experience of apraxia in everyday activities: A phenomenological study. Disability Rehabilitation, 35, 63–72. Bagby, M. S., Dickie, V. A., & Baranek, G. T. (2012). How sensory experiences of children with and without autism affect family occupations. American Journal of Occupational Therapy, 66, 78–86. Bertilsson, A. S., Von Koch, L., Tham, K., & Johansson, U. (2015). Clientcentered ADL intervention after stroke: Significant others’ experience. Scandinavian Journal of Occupational Therapy, 22(5), 377–386. Cohn, E. S., Kramer, J., Schub, J. A., & May-Benson, T. (2014). Parents’ explanatory models and hopes for outcomes of occupational therapy using a sensory integration approach. American Journal of Occupational Therapy, 68, 454–462. Creswell, J. W., & Plano Clark, V. L. (2011). Designing and conducting mixed methods research (2nd ed.). Thousand Oaks, CA: Sage. 27/10/16 2:09 pm CHAPTER 9 ● Qualitative Designs and Methods Frank, A. W. (2013). The wounded storyteller: Body, illness and ethics (2nd ed.). Chicago, IL: University of Chicago Press. Gramstad, A., Storli, S. L., & Hamran, T. (2014). Older individuals’ experience during assistive device service delivery process. Scandinavian Journal of Occupational Therapy, 21, 305–312. Hackman, D. (2011). “What’s the point?” Exploring rehabilitation for people with 10 CNS tumours using ethnography: Patient’s perspectives. Physiotherapy Research International, 16, 201–217. Hannold, E. M., Classen, S., Winter, S., Lanford, D. N., & Levy, C. E. (2013). Exploratory pilot study of driving perceptions among OIF/ OEF veterans with mTBI and PTSD. Journal of Rehabilitation Research and Development, 50, 1315–1330. Henderson, A. R., & Cock, A. (2014). The responses of young people to their experiences of first-episode psychosis: Harnessing resilience. Community Mental Health Journal, 51(3), 322– 328. [Epub 2014 July 27 ahead of print]. doi:10.1007/s10597014-9769-9 Howie, L., Coulter, M., & Feldman, S. (2004). Crafting the self: Older persons’ narratives of occupational identity. American Journal of Occupational Therapy, 58, 446–454. Kavanagh, E. (2012). Affirmation through disability: One athlete’s personal journey to the London Paralympic Games. Perspective in Public Health, 132, 68–74. Leroy, K., Boyd, K., De Asis, K., Lee, R. W., Martin, R., Teachman, G., & Gibson, B. E. (2015). Balancing hope and realism in family-centered care: Physical therapists’ dilemmas in negotiating walking goals with parents of children with cerebral palsy. Physical and Occupational Therapy in Pediatrics, 35(3), 253–264. Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Newbury Park, CA: Sage. 4366_Ch09_163-182.indd 181 181 Minns Lowe, C. J., Moser, J., & Barker, K. (2014). Living with a symptomatic rotator cuff tear “bad days, bad nights”: A qualitative study. BMC Musculoskeletal Disorders, 15, 228. Morgan, D. L. (2007). Paradigms lost and paradigms regained: Methodological implications of combining qualitative and quantitative methods. Journal of Mixed Methods Research, 1(1), 48–76. Myers, M. (2000). Qualitative research and the generalizability question: Standing firm with Proteus. The Qualitative Report, 4(3/4). Mynard, L., Howie, L., & Collister, L. (2009). Belonging to a communitybased football team: An ethnographic study. Australian Occupational Therapy Journal, 56, 266–274. Ruud, E. (2013). Can music serve as a “cultural immunogen”? An explorative study. International Journal of Qualitative Studies of Health and Well-Being, 8. [Epub. 2013 Aug 7]. doi:10.3402/qhw.v8i0.20597 Stevens, E. E., Gazza, E., & Pickler, R. (2014). Parental experience learning to feed their preterm infants. Advances in Neonatal Care, 14(5), 354–361. doi:10.1097/ANC.0000000000000105 Tomar, N., & Stoffel, V. (2014). Examining the lived experience and factors influencing education of two student veterans using photovoice methodology. American Journal of Occupational Therapy, 68, 430–438. U.S. Census Bureau. (2012). Households and families 2010. Retrieved from http://www.census.gov/prod/cen2010/briefs/c2010br-14.pdf Ward, E. C., Burns, C. L., Theodoros, D. G., & Russell, T. G. (2013). Evaluation of a clinical service model for dysphagia assessment via telerehabilitation. International Journal of Telemedicine and Applications, Article ID 918526. http://dx.doi.org/10.1155/2013/918526 Wilbanks, S. R., & Ivankova, N. V. (2014/2015). Exploring factors facilitating adults with spinal cord injury rejoining the workforce: A pilot study. Disability Rehabilitation, 37(9), 739–749. [Epub 2014 Jul 8, ahead of print]. doi:10.3109/09638288.2014.938177. 27/10/16 2:10 pm 4366_Ch09_163-182.indd 182 27/10/16 2:10 pm 10 “Research is creating new knowledge.” —Neil Armstrong, an American astronaut and first person to walk on the moon Tools for Practitioners That Synthesize the Results of Multiple Studies Systematic Reviews and Practice Guidelines CHAPTER OUTLINE LEARNING OUTCOMES DATA ANALYSIS IN SYSTEMATIC REVIEWS KEY TERMS Meta-Analyses INTRODUCTION Qualitative Thematic Synthesis SYSTEMATIC REVIEWS PRACTICE GUIDELINES Finding Systematic Reviews Finding Practice Guidelines Reading Systematic Reviews Evaluating the Strength of Practice Guidelines Evaluating the Strength of Systematic Reviews THE COMPLEXITIES OF APPLYING AND USING SYSTEMATIC REVIEWS AND PRACTICE GUIDELINES Replication CRITICAL THINKING QUESTIONS Publication Bias ANSWERS Heterogeneity REFERENCES LEARNING OUTCOMES 1. Locate and interpret the findings of systematic reviews and practice guidelines. 2. Interpret effect-size statistics and forest plots. 3. Evaluate the strengths and weaknesses of systematic reviews and practice guidelines. 183 4366_Ch10_183-202.indd 183 27/10/16 2:11 pm 184 CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies KEY TERMS effect size publication bias forest plot replication grey literature secondary research meta-analysis study heterogeneity narrative review systematic review practice guidelines thematic synthesis primary research INTRODUCTION T he amount of information available on a given topic can make clinicians feel overwhelmed. If you are looking for the answer to a clinical question, how do you know how many studies exist, and whether you have found the best unbiased studies with which to answer the question? Evidence-based practice involves locating, reading, evaluating, and interpreting those studies. Fortunately, for many clinical questions some of that work has already been done through reports known as systematic reviews and practice guidelines. You will remember that systematic reviews provide the highest level of evidence when they include strong individual studies. Practice guidelines are specifically designed with the practitioner in mind and include clinical recommendations. However, as with individual studies, it is important to be able to evaluate the quality of systematic reviews and practice guidelines. This chapter introduces you to different types of documents that synthesize a body of research and help clinicians evaluate and apply the information to their practice. SYSTEMATIC REVIEWS Before evidence-based practice was widely adopted, it was common for experts in the field of research to publish narrative reviews that summarized the literature and in some instances included clinical recommendations. The limitation of the narrative review format is that the reader must have a great deal of confidence in the author, trusting that the author has done a thorough and unbiased report on the subject. Narrative reviews can still be found in professional literature, but today it is more common to use a systematic review as the mechanism for synthesizing a body of research. A systematic review uses a scientific approach to answer a research question by synthesizing existing research rather than collecting new data. For this reason, 4366_Ch10_183-202.indd 184 systematic reviews are sometimes referred to as secondary research; the primary research is comprised of the individual studies included in the review. Systematic reviews are typically conducted after a body of research has developed around a topic and by authors who are not the primary authors of the primary research studies. Systematic reviews are not limited to a single type of study, but can be used to assimilate the research for any type of research question. Reviews of assessment tools, intervention approaches, and descriptive and predictive studies are common. Systematic reviews are also conducted on qualitative research. The word systematic describes the ordered process that is followed to conduct a systematic review, which mirrors the steps of primary research: A research question is written, the methodology is defined, data are collected, results are analyzed, and the findings are reported. The reporting of systematic reviews has become more standardized with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA, 2009). The purpose of PRISMA is to provide guidelines for authors to increase the transparency and complete reporting of systematic reviews. Many journals now require that authors use the PRISMA standards when writing their systematic review articles. Finding Systematic Reviews Familiarity with certain databases and search strategies makes it easier to find systematic reviews. Conducting a search of PubMed or CINAHL using the limitation of “systematic review” will constrain your search to relevant articles. The physiotherapy evidence database known as PEDro (http://www.pedro.org.au/), OTseeker (OTseeker.com), and the Evidence Map from the American Speech-Language-Hearing Association (ASHA) (http:// www.asha.org/Evidence-Maps/) provide searchable databases specific to the physical therapy, occupational therapy, and speech-language pathology disciplines, respectively. In PEDro and OTseeker, once a topic is entered into the database (e.g., a diagnosis, treatment approach, or assessment), the relevant articles are listed in order of the evidence hierarchy, with systematic reviews identified first. The database provides the reference and abstract, but the user must use another source to obtain the full text of the article. For example, at the time of publication, when the term cognitive remediation was entered into the search for OTseeker, 14 systematic reviews were identified. In this example, there are several reasons why there were so many systematic reviews on the same topic: (1) they addressed different diagnoses (e.g., brain injury, schizophrenia, bipolar disorder); (2) some were specific to computer-assisted cognitive remediation; and (3) others were specific to certain cognitive functions (e.g., executive function). In addition, since all systematic reviews 27/10/16 2:11 pm CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies are included, searchers will find many that are updates of earlier reviews. The Cochrane Collaboration is a key source of systematic reviews in all areas of medical practice. As of 2015, there were close to 6,000 Cochrane systematic reviews available at the Cochrane library (www.cochrane.org). The organization conducts rigorous, high-quality reviews that are made available to the public free of charge. Reviews included in the Cochrane Collaboration must follow published guidelines for performing the review, such as those established by PRISMA. Cochrane reviews typically limit inclusion of studies to randomized controlled trials (RCTs). Cochrane reviews are regularly updated, and outdated reviews are removed when newer reviews become available. For rehabilitation practitioners, the limitations in the available research often result in Cochrane reviews indicating that few conclusions can be drawn and more research is needed. For example, a Cochrane review by Dal Bello-Haas and Florence (2013) examining exercise in amyotrophic lateral sclerosis (ALS) found only two randomized controlled trials. Although both studies found greater improvement in function for exercise when compared with stretching, the review indicates that, because the studies were small (a total of 43 participants), more research is needed to determine the extent to which exercise is beneficial for ALS patients. In this case, rehabilitation practitioners may decide to try using exercise for ALS but continue to look for new research while they monitor their clients’ progress. EXERCISE 10-1 Locating Systematic Reviews (LO1) Conduct a search for systematic reviews that examine the efficacy of vestibular rehabilitation for vestibular dysfunction at the Cochrane Library, CINAHL, OTseeker, and PEDro. QUESTIONS How many systematic reviews did you find from each source, and generally what were the conclusions? 1. Cochrane Library 2. CINAHL 4366_Ch10_183-202.indd 185 185 3. OTseeker 4. PEDro Reading Systematic Reviews Systematic reviews contain the same major sections as individual research studies: abstract, introduction, methods, results, and discussion. The abstract provides a summary of the review. The Cochrane Library and some other databases sometimes provide a more extensive summary than is typical of an abstract. This summary is useful for practitioners because it often includes clinical recommendations based on the results of the review. The introduction is similar to that for an individual study, presenting background information on the topic, the need for the review, and typically a purpose statement or research question. The methods section is significantly different from individual studies, primarily because the data collected in a systematic review are gleaned from other studies, not from individuals/clients. The sample described in the methods section is the number of studies and characteristics of those studies. Inclusion and exclusion criteria are stated to clarify what studies were included in the review. For example, many reviews of efficacy studies only include randomized controlled trials. The way that the data were collected is specified, including the databases searched and key words used in the search. Other methods of finding studies are included, such as contacting prominent researchers in a field and reviewing the reference lists of relevant studies. The methods section also describes who collected the data and what information was abstracted. Ideally, the methods section is written in such a way that the review is reproducible; if you were to utilize the author’s methods, you would identify the same studies. The results section begins by describing how many studies were identified that met the criteria. A table that summarizes each individual study is an important part of this section and provides a measure of transparency. By identifying the specific studies included in the review, the review author allows the reader to review the primary sources to verify the interpretation. The results section often describes the individual studies in a narrative form, but most importantly it synthesizes 27/10/16 2:11 pm 186 CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies the information to explain what the evidence as a whole shows with regard to a research question. For example, the majority of the studies may support an intervention, or they may not yield evidence for its efficacy. It is not uncommon to read a synthesis that includes both positive and negative findings. An effective synthesis gives the reader an overall impression of the research (e.g., the intervention is or is not supported by the evidence). This synthesis is based on several factors, including the number of studies available, the number of participants in the study, the quality of the individual studies, and the outcomes of each study. Many results sections of systematic reviews include data analysis in the form of a meta-analysis. This specific form of systematic review is explained later in this section. From the Evidence 10-1 comes from a review of descriptive studies examining speech impairments in Down syndrome (Kent & Vorperian, 2013). The table lists each of the studies in the review and summarizes the results. The discussion section of a systematic review summarizes the results and, most importantly for evidence-based practitioners, often provides clinical recommendations. The limitations of the review are also acknowledged. The abstract presented in From the Evidence 10-2 provides a summary of a systematic review comparing unilateral with bilateral cochlear implants. It concludes that, although the evidence has limitations, bilateral cochlear implants appear to have superior outcomes to unilateral implants. Evaluating the Strength of Systematic Reviews Although systematic reviews have many characteristics of a single study, they possess unique characteristics that are important to understand and consider. When evaluating systematic reviews, take into account their replication, publication bias, and heterogeneity. Replication A systematic review provides the highest level of evidence because it synthesizes the results from multiple studies. Recall that replication (the same or similar study conducted more than once) is a principal element of the scientific process. A single study provides one piece of evidence from one researcher or a group of researchers. Multiple studies enhance confidence in the findings because they provide evidence from several perspectives. For example, the developer of an intervention often serves as the lead researcher in an efficacy study. This is desirable because new interventions should be studied. However, the developer of the intervention will likely have a level of commitment to the intervention that would not be expected from other researchers. In addition, the developer will have extensive and intimate knowledge of the intervention and its implementation; although controls may be in place, the developer is bound to have a strong bias 4366_Ch10_183-202.indd 186 toward seeing the intervention succeed. Therefore, it is important that other researchers who are not associated with development of the intervention replicate efficacy studies. A good example of replication of an intervention study implemented by a researcher invested in the outcome of this process is dialectical behavioral therapy (DBT). Marsha Linehan (1993) developed DBT as an intervention for individuals with borderline personality disorder. She conducted the initial studies of the approach and continues to be involved in researching it. One of her earlier studies, a well-designed randomized controlled trial with 101 participants, found that DBT was more effective than community treatment for reducing suicide attempts and hospitalization and lowering medical risk (Linehan et al, 2006). A more recent study by Pasieczny and Connor (2011) examined DBT for borderline personality disorder in routine clinical settings in Australia. These researchers found similar outcomes when comparing DBT with treatment as usual, with DBT resulting in better outcomes, including decreased suicidal and self-injurious behaviors and fewer hospitalizations. The results from these two studies show a consistent pattern, but even more compelling is a Cochrane systematic review of psychological therapies for borderline personality disorder, which found the strongest evidence for DBT. The review included eight studies of DBT: three from the Linehan group, and five from other researchers (Stoffers et al, 2012). Again, the review supported positive outcomes, particularly related to a reduction in suicidal and self-injurious behavior. When multiple studies from a variety of settings replicate the findings, consumers of research can have greater confidence in the outcome. However, the inclusion of multiple studies is not the only factor to consider when determining the strength of evidence provided by a systematic review. As a source of evidence, a systematic review is only as strong as the studies it contains. If a systematic review of efficacy studies contains no randomized controlled trials, it does not offer Level I evidence. Likewise, predictive studies must also be of high quality if the systematic review is to provide strong evidence. Remember that the highest level of evidence for predictive studies is a prospective cohort study. Therefore, a systematic review that is designed to answer a question about prediction must contain multiple prospective cohort studies to be considered Level I evidence. Publication Bias Studies are more likely to be published when they yield positive results, and this is particularly true with efficacy studies. Publication bias suggests that researchers are more likely to submit research, and journals are more likely to publish research, when the findings are positive. A Cochrane systematic review of clinical trials found support for publication bias with an odds ratio = 3.90, 95% CI = 2.68-5.68 (Hopewell et al, 2009), meaning 27/10/16 2:11 pm CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies 187 FROM THE EVIDENCE 10-1 Example of Table of Studies Included in a Systematic Review Kent, R. D., & Vorperian, H. K. (2013). Speech impairment in Down Syndrome: A review. Journal of Speech, Language, and Hearing Research, 56(1), 178–210. http://doi.org/10.1044/1092-4388(2012/12-0148). Note A: The source information allows you to consult the original study. Table 4: Summary of studies of speech intelligibility in individuals with DS. Source Participants Method Summary of Results Barnes et al. (2009) See Table 2 Phonological assessment: Perceptual and acoustic measures of phonological accuracy and processes DS scored lower in accuracy and processes and used fewer intelligible words. van Bysterveldt (2009) See Table 2 Transcription: Determination of percentage of intelligible utterances in narratives and connected speech DS had mean intelligibility scores of 83.1% for narratives and 80% for connected speech. Yoder, Hooshyar, Klee, & Schaffer (1996) N = 8 DS (mean age 83 mos) N = 8 ATD* (mean age 44 mos) Matched to DS group on MLU. No DS, but language delay Perceptual assessment: Intelligibility and length determined with SALT transcription program (Miller & Chapman, 1990) DS had over 3 times as many multi-word partially intelligible utterances. However, overall there were no significant differences in intelligibility. Bunton, Leddy, & Miller (2007) N = 5 DS (5M) (26 → 39 yrs) Perceptual assessment: Intelligibility test and perceptual scoring by listeners and transcribers DS had wide range of intelligibility scores (41%–75%); Errors that were ranked more highly than others: cluster-singleton production word initial and word final, vowel errors and place of production for stops and fricatives. Note B: The table also provides basic information about the participants, measures, and results so you do not have to go to the original source for those details. FTE 10-1 Question 4366_Ch10_183-202.indd 187 What information do you learn about each individual study by looking at this table? 27/10/16 2:11 pm 188 CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies FROM THE EVIDENCE 10-2 Components of a Systematic Review Van Schoonhoven, J., Sparreboom, M., van Zanten, B. G., Scholten, R. J., Mylanus, E. A., Dreschler, W.A., Grolman, W., & Matt, B. (2013). The effectiveness of bilateral cochlear implants for severe-to-profound deafness in adults: A systematic review. Otology and Neurotology, 32, 190–198. Note A: Identifies databases searched. Includes only published literature. OBJECTIVE: Assessment of the clinical effectiveness of bilateral cochlear implantation compared with unilateral cochlear implantation or bimodal stimulation in adults with severe-to-profound hearing loss. In 2007, the National Institute for Health and Clinical Excellence (NICE) in the U.K. conducted a systematic review on cochlear implantation. This study forms an update of the adult part of the NICE review. DATA SOURCES: The electronic databases MEDLINE and Embase were searched for English language studies published between October 2006 and March 2011. STUDY SELECTION: Studies were included that compared bilateral cochlear implantation with unilateral cochlear implantation and/or with bimodal stimulation in adults with severe-to-profound sensorineural hearing loss. Speech perception in quiet and in noise, sound localization and lateralization, speech production, health-related quality of life, and functional outcomes were analyzed. DATA EXTRACTION: Data extraction forms were used to describe study characteristics and the level of evidence. Data Synthesis: The effect size was calculated to compare different outcome measures. CONCLUSION: Pooling of data was not possible because of the heterogeneity of the studies. As in the NICE review, the level of evidence of the included studies was low, although some of the additional studies showed less risk of bias. All studies showed a significant bilateral benefit in localization over unilateral cochlear implantation. Bilateral cochlear implants were beneficial for speech perception in noise under certain conditions and several self-reported measures. Most speech perception in quiet outcomes did not show a bilateral benefit. The current review provides additional evidence in favor of bilateral cochlear implantation, even in complex listening situations. Note B: The full article includes specific information about each study, including the level of evidence. Fourteen studies were included; most were nonexperimental, using existing individuals with unilateral or bilateral implants. One was identified as an RCT but did not include between-group analyses. FTE 10-2 Question 1 The headings for the abstract do not match the sections described earlier (i.e., introduction, methods, results, and discussion). Connect the headings in the abstract with each of these sections. 4366_Ch10_183-202.indd 188 27/10/16 2:11 pm CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies that studies with positive findings (i.e., the intervention was effective) were almost four times more likely to be published than studies with negative findings (i.e., the intervention was not effective). One of the challenges of determining if publication bias exists involves finding unpublished research. Five studies were included in the aforementioned review, each of which used a slightly different methodology to identify research that had and had not been published. For examples in one study, the researchers contacted an institutional review board; from all of the studies approved by that board, researchers identified which studies had been published and which had not. Using another approach, the researchers identified a group of studies funded through the National Institutes of Health and again determined which of these funded studies resulted in publication. In the review, some studies suggested that authors did not submit studies with negative findings because they thought their study was not interesting enough or the journal was not likely to publish the findings. The effect of publication bias on evidence-based practice is noteworthy. When a systematic review is conducted, the intention of the review is to include all research that meets the inclusion criteria. Although published research will be easily accessible to the reviewer, limiting a systematic review to published studies will likely mean that the findings will be skewed toward the positive. Consequently, evidence-based practitioners may expect an intervention to have more positive outcomes than would reasonably be expected if all studies had been included in the review. Unfortunately for the systematic reviewer and evidence-based practitioners, finding the results of studies that have not been published can be challenging. When reading a systematic review, consider whether the reviewers made efforts to collect unpublished studies. If so, the reviewer will report this information in the methods section of the article. Finding unpublished research could necessitate contacting researchers who are known to do work concerning the topic of interest or conducting searches of grey literature. Grey literature refers to print and electronic works that are not published commercially or are difficult to find (Grey Literature Network Service, 1999). Grey literature includes theses, dissertations, conference proceedings, and government reports. Inclusion of grey literature will decrease the likelihood or degree to which publication bias is a factor in a systematic review. Concerns for publication bias have led to efforts to reduce its effect on the outcomes of systematic reviews. One important initiative is the registration of clinical trials. On the website ClinicalTrials.gov, which is sponsored by the National Institutes of Health, researchers register their clinical trials. A clinical trial is defined as any prospective study in which people are assigned to one or more interventions and health outcomes are collected. In 2005, the International Committee of Medical Journal Editors made it a condition of publication that researchers 4366_Ch10_183-202.indd 189 189 register their trials when beginning a study (ClinicalTrials. gov, 2014). This requirement makes it more likely that researchers will register their studies. Although at the time of this writing most rehabilitation journals do not require registration, it is considered a best practice in the research community. In 2008, ClinicalTrials.gov began to include a results database of registered studies, so individuals conducting systematic reviews can use the registry to locate studies that may not have been published in a journal. Even if the results are not included in the registry, the reviewer will have access to contact information for the researcher and can follow up on the study findings. Heterogeneity A common criticism of systematic reviews concerns their tendency to synthesize the results of studies that had substantial differences. Study heterogeneity refers to differences that often exist among studies in terms of the samples, interventions, settings, outcome measures, and other important variables. One reason for study heterogeneity is that researchers in different settings rarely collaborate to make their study designs similar. In addition, researchers often have slightly different questions or interest in studying different conditions. If one researcher is building upon previous research, the newer study will often expand on the previous work, including making changes. For example, one study may examine an intervention in a school setting and another study in a private clinic—but the systematic review combines the results of both studies to make a general recommendation. When evidence-based practitioners read the findings of a systematic review, the challenge is interpreting and applying the results to a specific clinical site. For example, if the school setting and private clinic results are combined, the results do not replicate school practice; it becomes a comparison of apples with oranges. Systematic reviews may address study heterogeneity with strict inclusion criteria that only allow very similar studies; however, with this approach the number of potential studies to include can be severely limited. Another approach is to provide subanalyses of the results. Using the previous example, the results may provide a separate analysis for all studies in a school setting and another analysis for all studies in clinic settings. As a reader of a systematic review, the evidence-based practitioner can have more confidence in the conclusions if the studies in the review are very similar or subanalyses were conducted to describe findings within different conditions. Study heterogeneity can be particularly problematic when different outcome measures or statistical analyses are used or reported. In some cases, the same construct may be measured (e.g., pain), but different measures are used. Differences in the reliability and validity of measures can affect the results. In other instances, studies may examine very different outcomes (e.g., one study looks at pain and another at strength), making the combination of 27/10/16 2:11 pm 190 CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies outcomes across studies invalid. Again, the reviewers may include subanalyses based on the different outcomes. In still other instances, the reporting of the data in statistical terms can vary appreciably, making it difficult or impossible to combine or compare the results. Evidence-based practitioners take into account the study conditions within the systematic review, bearing in mind the fact that those conditions that most resemble their own practices will be the most applicable. Box 10-1 outlines the factors to consider when evaluating the strength of a particular systematic review. because the strength of the evidence is weak (e.g., studies lacked a control group), it is premature to conclude that the intervention caused the positive outcomes. Systematic reviews may use a quality assessment such as the PEDro scale to evaluate the quality of each individual study. Systematic reviews may also combine the findings of each study to provide a result that reflects the combination of the results. When quantitative results are combined, the systematic review becomes a meta-analysis. Qualitative results are also combined in systematic reviews through a process of thematic synthesis. Meta-analysis and thematic synthesis are discussed in more detail in the following sections. DATA ANALYSIS IN SYSTEMATIC REVIEWS Meta-Analyses A challenge in reporting the results of a systematic review involves integrating the results from multiple studies. Many times the results are summarized narratively and take into account both the findings and the strength of the evidence. For example, a review may report that the majority of the findings indicate that the intervention was effective, but A meta-analysis is a specific type of systematic review in which the results of similar quantitative studies (i.e., using the same theoretical constructs and measures) are pooled using statistical methods (Fig. 10-1). The first step in a meta-analysis involves calculating an effect size for each individual study. This statistic may already be reported FTE 10-2 Question 2 From the information provided in the abstract and additional comments, how would you evaluate the strength of the evidence presented in From the Evidence 10-2? BOX 101 Considerations in Evaluating Systematic Reviews Replication: • Is there an adequate number of studies available from which to draw a conclusion? • Do the available studies include researchers other than the original developers of the intervention/assessment? • Are the studies included in the review of the highest level of evidence for a single study? Publication bias: Do the reviewers make efforts to obtain unpublished research (e.g., theses, dissertations, research reports from clinical trial registries)? Heterogeneity of the included studies: Are the studies so different that it is difficult to draw conclusions, or does the reviewer include subanalyses to address differences in the studies? 4366_Ch10_183-202.indd 190 Systematic review Meta-analysis FIGURE 10-1 All meta-analyses are systematic reviews, but not all systematic reviews are meta-analyses. 27/10/16 2:11 pm CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies in the results section; if not, and if descriptive statistics are available, typically the reviewer can calculate the effect size. The second step is to pool the effect size from all of the studies to generate an overall effect. From the Evidence 10-3 provides an example of a meta-analysis of studies examining yoga for low back pain that includes the pooled effect sizes. 191 There are several advantages to meta-analysis, including increased statistical power, the ability to estimate an overall effect size, and the ability to systematically compare differences between studies. A meta-analysis increases statistical power simply by increasing the number of participants when compared with a single study. Therefore, a meta-analysis can FROM THE EVIDENCE 10-3 Effect Sizes From a Meta-Analysis Holtzman, S., & Beggs, R. T. (2013). Yoga for chronic low back pain: A meta-analysis of randomized controlled trials. Pain Research and Management: The Journal of the Canadian Pain Society, 18(5), 267–272. OBJECTIVES: To evaluate the efficacy of yoga as an intervention for chronic low back pain (CLBP) using a meta-analytical approach. Randomized controlled trials (RCTs) that examined pain and⁄or functional disability as treatment outcomes were included. Post-treatment and follow-up outcomes were assessed. METHODS: A comprehensive search of relevant electronic databases, from the time of their inception until November 2011, was conducted. Cohen's d effect sizes were calculated and entered in a random-effects model. RESULTS: Eight RCTs met the criteria for inclusion (eight assessing functional disability and five assessing pain) and involved a total of 743 patients. At post-treatment, yoga had a medium to large effect on functional disability (d = 0.645) and pain (d = 0.623). Despite a wide range of yoga styles and treatment durations, heterogeneity in post-treatment effect sizes was low. Follow-up effect sizes for functional disability and pain were smaller but remained significant (d = 0.397 and d = 0.486, respectively); however, there was a moderate to high level of variability in these effect sizes. DISCUSSION: The results of the present study indicate that yoga may be an efficacious adjunctive treatment for CLBP. The strongest and most consistent evidence emerged for the short-term benefits of yoga on functional disability. However, before any definitive conclusions can be drawn, there are a number of methodological concerns that need to be addressed. In particular, it is recommended that future RCTs include an active control group to determine whether yoga has specific treatment effects and whether yoga offers any advantages over traditional exercise programs and other alternative therapies for CLBP. Note A: These are pooled effect sizes for the combined studies. FTE 10-3 Question The effect sizes were similar for functional disability and pain in both the short and long term. Why might the authors have concluded: “The strongest and most consistent evidence emerged for the short-term benefits of yoga on functional disability”? 4366_Ch10_183-202.indd 191 27/10/16 2:11 pm 192 CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies potentially detect smaller differences among groups. Effect sizes may be more relevant than statistical significance, because they provide evidence-based practitioners with a measure of the impact of the intervention or condition on the client. A pooled effect size may provide a more accurate representation of that impact, because it takes into account multiple studies. Finally, as mentioned earlier regarding heterogeneity, a meta-analysis may be used to try to explain some of the differences between studies. For instance, a meta-analysis could compare the results of studies conducted in school settings with studies in clinic settings to determine whether there is a difference in the effect. If indeed there was a difference, and the effect size was greater in school settings, the practitioner could conclude that the intervention was more effective when provided in a school setting than in a clinical setting. Often the reviewers conducting a meta-analysis will weight the studies, so that some studies exert more influence in the statistical analysis. In most cases, weighing/ weighting is based on sample size; those studies with more participants have greater influence on the calculated effect size of the meta-analysis. This is done intentionally, because larger studies should provide more reliable findings. Nevertheless, the issue of heterogeneity of studies is still a potential limitation of meta-analyses. The evidence-based practitioner examines the differences of included studies to determine if combining the results was logical. Several commonly used effect-size statistics are described in Table 10-1. EXERCISE 10-2 Interpreting Effect Sizes for Your Clients (LO2) QUESTIONS Take the following effect-size conclusions from a metaanalysis and translate them into plain language to interpret/describe this information for a client. 1. A meta-analysis examining the efficacy of eccentric viewing training for age-related macular degeneration. This training involves developing peripheral vision to substitute for loss of central vision: “Overall effect size was equal to 0.660 (95% CI, 0.232-1.088, p < 0.05). All five studies had positive results after eccentric viewing training with individuals with AMD” (Hong, Park, Kwon, & Yoo, 2014, p. 593). 4366_Ch10_183-202.indd 192 2. A meta-analysis comparing very early mobilization (VEM; i.e., patients are encouraged to get out of bed and/or start moving within 36 hours of stroke) with standard care for individuals with stroke reached the following conclusion: “VEM patients had significantly greater odds of independence compared with standard care patients (adjusted odds ratio, 3.11; 95% confidence interval, 1.03–9.33)” (Craig, Bernhardt, Langhorne, & Wu, 2010, p. 2632). 3. A meta-analysis examining the relationship of aging on balance found that up to age 70, healthy older adults have scores on the Berg Balance Scale that are close to the maximum. However, after age 70: “The analysis (meta-regression) shows the deterioration of the Berg Balance Scale score with increasing age, (R2 = 0.81, p < 0.001)” (Downs, 2014, p. 88). The results of a meta-analysis are often displayed graphically in the form of a forest plot, which graphs the effect size of each individual study against a reference point. The graph allows you to step back from the pooled data and visualize the pattern of results from the meta-analysis. The vertical line in the forest plot indicates Understanding Statistics 10-1 The choice of effect size in a meta-analysis will differ depending on the type of study and statistics used in the primary research. There are many different effect-size statistics, but some of the most common in health-care research include: • Cohen’s d for difference studies with continuous variables • Odds ratios or hazards ratios for difference studies with dichotomous variables • r values and r2 values for studies that use correlations to examine the strength of the relationship between variables or predictive studies 27/10/16 2:11 pm CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies 193 TABLE 101 Common Effect-Size Statistics Used in Meta-Analysis Effect Size Description/Explanation Types of Studies d-Index • Represents the magnitude of the difference between and/or within groups based on means and standard deviations • Intervention studies comparing groups and/or before and after treatment differences • Descriptive studies comparing groups of people with existing conditions Odds ratio/ Hazards ratio • Based on 2 ⫻ 2 contin• Intervention studies gency tables in which comparing groups on a two groups are compared dichotomous outcome on two dichotomous (success/failure) or progoutcomes nostic studies predicting an • Explained in terms of the outcome for two different odds of success in one groups group compared with the other group r • Correlation coefficient that describes the strength of the relationship measured between 0.0 and 1.0 • Correlation coefficient squared indicates the amount of variance accounted for in the relationship • Number will always be smaller than the simple r value r2 • Correlational and predictive studies examining relationships Suggested Interpretations of Size of the Effect* Small = 0.20* Medium = 0.50 Large = 0.80 Interpreted as probabilities = no difference = twice as likely = three times as likely (etc.) Small = 0.10* Medium = 0.30 Large = 0.50 * Strength of the effect size is based on Cohen (1992). the point at which there is no effect. For intervention studies comparing an experimental with a control group, 0 indicates no effect. For odds ratios, hazard ratios, and risk ratios, 1.0 indicates no effect. In a forest plot, the effect of each study is plotted with a square. The sizes of the squares typically vary, with those studies having greater weight represented with a larger square. The horizontal lines emanating from the square indicate the confidence intervals. Remember that a large confidence interval suggests that the calculated effect size is a less precise estimate. The overall effect with all studies pooled is indicated at the bottom of the forest plot. From the Evidence 10-4 is an example of a forest plot from 4366_Ch10_183-202.indd 193 a meta-analysis of exercise studies focused on increasing balance confidence in older adults. Consider the number of horizontal lines, the placement of the squares in relation to the vertical line, and the sizes of the squares when answering the From the Evidence questions. Qualitative Thematic Synthesis The results of multiple qualitative studies may also be synthesized using a systematic review process, although this process can be controversial. Some researchers contend that a review of qualitative research is inappropriate, because qualitative research is specific to context, time, and 27/10/16 2:11 pm 194 CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies FROM THE EVIDENCE 10-4 Forest Plot Rand, D., Miller, W. C., Yiu, J., & Eng, J. J. (2011). Interventions for addressing low balance confidence in older adults: A systematic review and meta-analysis. Age and Ageing, 40(3), 297–306. http://doi.org/10.1093/ageing/afr037. Study or Subgroup Aral 2007 Brouwer 2003 Campbell 1997 Devereux 2005 Lui-Ambrose 2004 Schoenfelder 2004 Southard 2006 Weerdesteyn 2006 Williams 2002 Exercise Mean -0.1 7.8 -2.5 0.3 4.9 1.5 12.01 3.5 9.3 SD Control Total 2.2 71 18.7 71 10.6 116 1.2 25 14.5 34 20.3 30 21.76 18 18.6 75 19.1 13 Mean -0.1 4 -6.1 0.03 0.7 -3.5 11.95 -1.48 5.5 SD Total Weight 2.5 65 12.6 17 10 117 0.5 25 23.7 32 22.4 28 16.32 17 20.3 26 19.3 18 Std. Mean Difference IV, Random, 95% Cl 18.5% 7.4% 31.3% 6.7% 8.9% 7.8% 4.8% 10.5% 4.1% 0.00 [-0.34, 0.34] 0.21 [-0.32, 0.74] 0.35 [0.09, 0.61] 0.29 [-0.27, 0.85] 0.21 [-0.27, 0.70] 0.23 [-0.29, 0.75] 0.00 [-0.66, 0.67] 0.26 [-0.19, 0.71] 0.19 [-0.52, 0.91] Total (95% Cl) 453 345 100.0% Heterogeneity; Tau2 = 0.00; Chi2 = 3.09, df = 8 (P = 0.93); I2 = 0% Test for overall effect: Z = 2.93 (P = 0.003) 0.22 [0.07, 0.36] -1 Std. Mean Difference IV, Random, 95% Cl -0.5 favors control 0 0.5 1 favors exercise Note A: Descriptive statistics are provided along with the plot of the effect size using the standard mean difference between the groups. FTE 10-4 Questions 1. Using this visual analysis of the results, how many studies were included in the meta-analysis? Of those studies, how many suggest that the intervention condition was more effective than the control? 2. Which study in the meta-analysis was the most heavily weighted? individual. However, as the importance of qualitative research as a source of evidence is increasingly recognized, the synthesis of such research is also gaining acceptance (Tong et al, 2012). Different methods are emerging for the systematic review of qualitative research, including meta-ethnography, critical interpretive synthesis, and meta-synthesis; however, detailed descriptions of these approaches are beyond the scope of this chapter. Generally speaking, a thematic synthesis is at the core of the process of these approaches. A thematic synthesis uses a systematic process to identify themes from each individual study and then find 4366_Ch10_183-202.indd 194 similar themes in other studies. Each individual study may use different words to describe analogous ideas. The reviewer looks for corroborating concepts among the studies, which become the synthesized themes. It is important to note that a strong synthesis of qualitative research does more than merge the findings of several studies; rather, it finds novel interpretations in the synthesis of the data (Thomas & Harden, 2008). For example, Lee et al (2015) examined factors that presented barriers to independent active free play in children. This large systematic review included 46 studies from a variety of disciplines. Numerous barriers were identified, but the meta-synthesis 27/10/16 2:11 pm CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies found that parents’ safety concerns about strangers, bullies, and traffic presented the greatest impediment to free play. From the Evidence 10-5 is an example of a qualitative review that synthesizes studies examining the impact of the environment on youth with disabilities. PRACTICE GUIDELINES Practice guidelines, also referred to as clinical guidelines or clinical practice guidelines, provide recommendations for practitioners to address specific clinical situations. For example, practice guidelines may provide intervention recommendations for specific diagnoses or describe the optimal assessment process for a particular condition. 195 Unlike single studies and most systematic reviews, practice guidelines are, as a rule, developed by organizations, such as the American Occupational Therapy Association (AOTA), the American Physical Therapy Association (APTA), and ASHA. As a member of a professional organization, you may have free access to practice guidelines, or they may be available at a cost. For example, AOTA publishes numerous practice guidelines on topics such as adults with low vision, adults with serious mental illness, and children and adolescents with autism (AOTA, n.d.). The Orthopaedic Section of the American Physical Therapy Association (APTA, n.d.) publishes practice guidelines in the Journal of Orthopaedic and Sports Physical Therapy, including guidelines on neck pain and impairments FROM THE EVIDENCE 10-5 A Qualitative Systematic Review Kramer, J. M., Olsen, S., Mermelstein, M., Balcells, A., & Lijenquist, K. (2012). Youth with disabilities perspectives of the environment and participation: A qualitative meta-synthesis. Child Care and Health Development, 38, 763–777. Note A: The process of reviewing the studies was meta-synthesis. Meta-syntheses can enhance our knowledge regarding the impact of the environment on the participation of youth with disabilities and generate theoretical frameworks to inform policy and best practices. The purpose of this study was to describe school-aged youth with disabilities' perspectives regarding the impact of the environment and modifications on their participation. A meta-synthesis systematically integrates qualitative evidence from multiple studies. Six databases were searched and 1287 citations reviewed for inclusion by two independent raters; 15 qualitative articles were selected for inclusion. Two independent reviewers evaluated the quality of each study and coded the results section. Patterns between codes within and across articles were examined using a constant comparative approach. Environments may be more or less inclusive for youth with disabilities depending on others' understanding of individual abilities and needs, youth involvement in decisions about accommodations, and quality of services and policies. Youth implemented strategies to negotiate environmental barriers and appraised the quality of their participation based on the extent to which they engaged alongside peers. This meta-synthesis generated a framework illustrating the relationship between the environment, modifications and participation, and provided a conceptualization of participation grounded in the lived experiences of youth with disabilities. Findings reveal gaps in current knowledge and highlight the importance of involving youth with disabilities in decision making. Note B: A new framework was developed from the synthesis, indicating that this review generated something new as opposed to a summary of the available evidence. FTE 10-5 Question The abstract indicates that two independent reviewers evaluated the quality of the evidence and coded the results. Why might it be useful to have more than one reviewer of a systematic review? 4366_Ch10_183-202.indd 195 27/10/16 2:11 pm 196 CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies in knee movement coordination. The recommendations for AOTA’s practice guidelines on adults with low vision are contained in From the Evidence 10-6. Practice guidelines may also be developed by nonprofit organizations that focus on a specific condition, such as the Alzheimer’s Association, the National Stroke Foundation, and the Brain Trauma Foundation. Practice guidelines may or may not have a strong evidence base. The best practice guidelines include the highest level of evidence available on the topic of interest. The evidence is then combined with expert opinion, and in some cases the client’s perspective, to help translate research to practice. Practice guidelines are often more general than systematic reviews and may provide guidance for practitioners in a discipline to treat a particular condition (e.g., stroke, autism, shoulder pain), rather than examining a specific intervention. In addition, practice guidelines are focused on providing recommendations to practitioners (as opposed to simply reporting the evidence); thus, they often incorporate expert opinion and client perspectives into the process of developing the guidelines (Fig. 10-2). FROM THE EVIDENCE 10-6 Practice Guidelines Recommendations From the AOTA Regarding Occupational Therapy for Low Vision Kaldenberg, J., & Smallfield, S. (2013). Occupational therapy practice guidelines for older adults with low vision. Bethesda, MD: AOTA Press. Recommended • Use of problem-solving strategies to increase participation in activities of daily living (ADLs) and instrumental activities of daily living (IADLs) tasks and leisure and social participation (A) • Multicomponent patient education and training to improve occupational performance (A) • Increased illumination to improve reading performance (B) • Increased illumination to improve social participation (B) • Stand-based magnification systems to increase reading speed and duration (B) • Patient education programs to improve self-regulation in driving and community mobility (B) • Use of bioptics to improve simulated and on-road driving skills as well as outdoor mobility skill (B) • Contrast and typeface (sans serif), type size (14–16 points), and even spacing to improve legibility and readability of print (B) • Use of contrast, for example, yellow marking tape, colored filters, using a white plate on a black placemat, to improve participation in occupations (C) • Use of low-vision devices (e.g., high-add spectacles, nonilluminated and illuminated handheld magnifiers, nonilluminated and illuminated stand magnifiers, high plus lenses, telescopes, electronic magnifiers [such as closed-circuit TVs]) to improve reading speed and reduce level of disability when performing ADL tasks (C) • Eccentric viewing training to improve reading performance (C) • Eccentric viewing in combination with instruction in magnification to improve reading (C) • Eccentric viewing completed with specific software programs for near vision and ADLs (C) • Use of optical magnifiers versus head-mounted magnification systems to improve reading speed (C) • Use of sensory substitution strategies (e.g., talking books) to maintain engagement in desired occupations (C) • Use of contrast to improve reading performance: colored overlays (I) • Use of spectacle reading glasses to improve reading performance (I) • Use of organizational strategies to compensate for vision loss (I) 4366_Ch10_183-202.indd 196 No Recommendation Not Recommended • Colored overlays do not improve reading performance (B) • Preferential use of either binocular or monocular viewing for reading performance (I) • Use of a specific light source (I) Note A: This practice guideline is based on a systematic review. Each recommendation is followed by the level of evidence supporting it. 27/10/16 2:11 pm CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies 197 *Note: Criteria for levels of evidence are based on the standard language from the Agency for Healthcare Research and Quality (2009). Suggested recommendations are based on the available evidence and content experts' clinical expertise regarding the value of using the intervention in practice. Definitions: Strength of Recommendations A–There is strong evidence that occupational therapy practitioners should routinely provide the intervention to eligible clients. Good evidence was found that the intervention improves important outcomes and concludes that benefits substantially outweigh harm. B–There is moderate evidence that occupational therapy practitioners should routinely provide the intervention to eligible clients. At least fair evidence was found that the intervention improves important outcomes and concludes that benefits outweigh harm. C–There is weak evidence that the intervention can improve outcomes, and the balance of the benefits and harms may result either in a recommendation that occupational therapy practitioners routinely provide the intervention to eligible clients or in no recommendation as the balance of the benefits and harm is too close to justify a general recommendation. D–Recommend that occupational therapy practitioners do not provide the intervention to eligible clients. At least fair evidence was found that the intervention is ineffective or that harm outweighs benefits. I–Insufficient evidence to determine whether or not occupational therapy practitioners should be routinely providing the intervention. Evidence that the intervention is effective is lacking, of poor quality, or conflicting and the balance of benefits and harm cannot be determined. FTE 10-6 Question Which guidelines are based on the strongest research evidence? EVIDENCE IN THE REAL WORLD Clinical Uses for EBP Guidelines Evidence-based practice (EBP) guidelines can serve many clinical purposes. They can be used in quality improvement activities to ensure that best practices are implemented. Insurance companies often use practice guidelines to make decisions about reimbursement for therapy services. In some cases, denials from insurance companies can be addressed and overturned by use of an evidence-based practice guideline. In addition, practice guidelines can be used to educate clients and families about best practice. Familiarity with available practice guidelines allows the evidence-based practitioner to make better decisions about intervention options that are both effective and reimbursable. Finding Practice Guidelines Like systematic reviews, databases such as PubMed and CINAHL can be searched with limitations to find practice guideline documents. In PubMed, you can limit your search to “practice guidelines” under article type; in CINAHL the limit of “practice guidelines” can be found under publication type. A major source of practice guidelines is the National Guideline Clearinghouse (NGC) at www.guidelines.gov. 4366_Ch10_183-202.indd 197 This website, developed by the Agency for Health Care Research and Quality of the U.S. Department of Health and Human Services, is an easy-to-search database of practice guidelines for all areas of health care. The guidelines included on the NGC site must meet the following criteria: • Includes recommendations to optimize patient care • Developed by a relevant health-care organization and not an individual 27/10/16 2:11 pm 198 CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies Review of the research Expert opinion Perspectives of clients Practice guidelines FIGURE 10-2 Practice guidelines use research evidence, opinions from experts in the field, and the perspectives of clients to make recommendations for clinical practice. • Based on a systematic review of the evidence • Includes an assessment of benefits and harms of recommended care and alternative approaches • Developed in the previous 5 years (U.S. Department of Health and Human Services, 2014) Evaluating the Strength of Practice Guidelines The development of practice guidelines is a complex and lengthy process that requires extensive resources. In part for this reason, practice guidelines are characteristically developed by organizations instead of individuals. In addition, when developed by an organization, the practice guidelines then carry the endorsement of that organization. One criterion to use when evaluating the quality of a practice guideline is to consider the reputation and resources of the organization that created it. In addition, the author should be free of possible conflicts of interest. For example, a practice guideline is suspect if the organization that created it can benefit financially from adoption of an assessment or intervention recommended therein. The time frame within which the practice guideline was developed and published is another important consideration. Practice changes quickly, so practice guidelines can quickly become outdated. In addition, because it takes a long time to develop a good practice guideline, the evidence used to develop the guidelines may become outdated. Review the reference list or table of studies in the practice guideline to evaluate the timeliness of the research. You might also want to conduct your own search on the topic to determine whether any important studies have been completed since the most recent research cited in the guidelines. The specific population and setting for which a practice guideline is developed are also important. For example, practice guidelines for community-based practice would not be relevant to practice in hospital settings. In 4366_Ch10_183-202.indd 198 addition, practice guidelines for shoulder rehabilitation will differ for athletic injuries and stroke. Practice guidelines should be transparent about the process used to review and evaluate the evidence, and the best practice guidelines follow the process of a systematic review. The recommendations provided in a practice guideline will carry more weight than the report of a single RCT because the recommendations in a practice guideline are based on multiple high-level studies with similar results. With replication, the strength of the evidence is greatly increased. It is easier to arrive at a consensus for a practice recommendation when there is consistency across numerous, well-designed studies. Practice guidelines should undergo a rigorous review process that involves multiple components and many individuals. Experts in the field, often individuals who have published on the topic, typically constitute the initial review board that develops the methodology, gathers and evaluates the evidence, and drafts the initial recommendations. From there, a group of external reviewers who are independent of the initial process should review the draft results and provide critical feedback. The external reviewers should represent a diverse constituency of researchers, practitioners, and clients. The recommendations provided within practice guidelines should be stated in such a way that practitioners are given information that is helpful in determining whether a recommendation should be adopted in their practices. This includes not only the level of evidence associated with each recommendation, but also the potential impact of the recommendation (e.g., effect size), and the applicability and generalizability of the recommendation. The Appraisal of Guidelines for Research and Evaluation (AGREE) Instrument II was developed by an international team as a standardized tool for evaluating practice guidelines (Brouwers et al, 2010). This tool examines the scientific rigor of the guidelines, involvement of relevant stakeholders, and transparency and clarity in the report. Scores can range from 12 to 84. Guidelines are not expected to earn perfect scores, but practice guidelines that have been reviewed with the AGREE II instrument and receive a high score can be considered to be strong guidelines. Box 10-2 reviews the basic considerations in evaluating the strength of practice guidelines. EXERCISE 10-3 Evaluating a Practice Guideline (LO3) Access the following practice guideline on alternative approaches to lowering blood pressure. It is available as a free full-text document. Brook et al and the American Heart Association Professional Education Committee of the Council for High Blood Pressure Research, Council on Cardiovascular and Stroke Nursing, Council on Epidemiology and 27/10/16 2:11 pm CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies Prevention, and Council on Nutrition, Physical Activity. (2013). Beyond medications and diet: Alternative approaches to lowering blood pressure: A scientific statement from the American Heart Association. Hypertension, 61, 1360-1383. QUESTIONS Examining each of the considerations, how would you evaluate the practice guidelines on the following points? 1. Reputation and resources of the organization 2. Timeliness 3. Strength of the research evidence 4. Review process 5. Recommendations include supporting evidence BOX 102 Considerations in Evaluating Practice Guidelines • The reputation and resources of the developing organization • Timeliness in terms of publication of the practice guidelines and the evidence included within the guidelines • Strength of the research evidence • Rigorous review process that includes experts and clients, when appropriate • Each recommendation includes information on level of supporting evidence, potential impact, and generalizability 4366_Ch10_183-202.indd 199 199 THE COMPLEXITIES OF APPLYING AND USING SYSTEMATIC REVIEWS AND PRACTICE GUIDELINES Systematic reviews and practice guidelines are useful tools for health-care practitioners, because they condense large amounts of research evidence and provide direction for practitioners. However, the other components of evidence-based practice—practitioner experience and client preference—are still essential to good clinical decision-making. Even with strong research evidence, practitioners need to determine whether the recommendations of systematic reviews and practice guidelines are relevant and applicable to their particular situation. For example, a systematic review of health promotion programs for people with serious mental illness recommends that weight-loss interventions be provided for at least 3 months (Bartels & Desilets, 2012). If the practitioner works in a setting with an average length of stay of 4 weeks, the evidence-based interventions described in the review will need to be adapted to the situation. Although clients should be involved in the decisionmaking process and informed of systematic reviews and practice guidelines, they should also be offered alternative approaches, when available. Guidelines and reviews may simplify the decision-making process for the practitioner, but it is important to remember that providing the best care is both an art and a difficult process. The best practices involve decisions that are made with the practitioner and client working together (Montori, Brito, & Murad, 2013). For example, within the deaf community there is controversy over the use of cochlear implants. Although cochlear implants may provide profound improvements in hearing, some individuals elect not to receive the implants because they find sign language to be a more effective means of communication that is respectful of their deaf culture. When cochlear implants are used in children, decisions still need to be made regarding the use of sign language versus oral language in parent-child interactions (Bruin & Nevoy, 2014). The integration of research evidence, practitioner experience, and client preferences makes for the best clinical decisions. This process is explained in Chapter 11. CRITICAL THINKING QUESTIONS 1. When is a systematic review not Level I evidence? 27/10/16 2:11 pm 200 CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies 2. Why might different systematic reviews come to different conclusions? 8. What are the characteristics of a strong practice guideline? 3. Why does publication bias exist? 9. Why should practitioners augment information from practice guidelines with practitioner experience and client preferences? 4. Why is heterogeneity a potential problem in systematic reviews? ANSWERS EXERCISE 10-1 These answers are based on a search conducted in September 2015. You will likely find more reviews. 5. What effect-size statistic would you expect to find in a meta-analysis of an efficacy study? A predictive study? 6. How are data synthesized from qualitative studies? 1. Cochrane—Two reviews were found, but only one focused on vestibular rehabilitation for vestibular dysfunction. The review indicated growing evidence to support its use. 2. CINAHL—Two reviews were found, including the Cochrane review and one other review, which also found positive outcomes for vestibular rehabilitation. 3. OTseeker—There were no reviews specific to vestibular rehabilitation for vestibular dysfunction. 4. PEDro—There were seven reviews in English, one in German, and one in Portuguese, and the reviews included the Cochrane review. The German and Portuguese reviews included English abstracts. Overall, the results indicated positive outcomes. One review compared the Epley maneuver and vestibular rehabilitation and found the Epley maneuver more effective at 1 week but not at a 1 month follow-up. EXERCISE 10-2 7. How do practice guidelines differ from systematic reviews? Your language may differ, but the following examples provide an accurate interpretation of the corresponding results, stated in plain language for clients. 1. Five studies of eccentric viewing training all found that it was effective. When the results of the five studies were combined, the intervention was found to have made a moderate difference. 2. People who had very early interventions that involved moving within 36 hours of their stroke were three 4366_Ch10_183-202.indd 200 27/10/16 2:11 pm CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies times more likely to be independent at the end of their treatment than individuals who received standard care. 3. Healthy adults are likely to have normal balance until the age of 70, but after that you are more likely to have balance problems. The link between aging and balance problems is very strong. EXERCISE 10-3 1. Reputation and resources of the organization—The American Heart Association is a well-established and respected organization that is certified by the National Health Council for excellence as an advocacy organization. 2. Timeliness—The practice guidelines were published in 2013, and the primary research included in the review is through 2011. This speaks to one of the issues with practice guidelines: By the time practice guidelines are published, more recent evidence will probably have been published as well. 3. Strength of the research evidence—Only one intervention provided Level A evidence. As defined in Table 1, this means multiple RCTs with different populations. The intervention with Level A evidence was dynamic aerobic exercise. 4. Review process—Reviewers included clinicians and researchers. A systematic review of the evidence was conducted. There was transparency in terms of potential conflict of evidence reported, with no major concerns identified. No clients were included in the review process. 5. Recommendations include supporting evidence— Extensive supporting evidence was included with each recommendation. Effect sizes were included when available. Overall, this practice guideline meets most of the criteria for a strong practice guideline. FROM THE EVIDENCE 10-1 The authors and date of publication; for most studies, the number of participants and in some cases gender and age; the measures/assessments used in the study; and a very brief summary of the findings. FROM THE EVIDENCE 10-2 1. The objective would be found in the introduction of the full article, whereas data sources, selection, and extraction would fit into the category of methods. The conclusion of the abstract includes information that would be found in both the results and discussion sections of a full article. 2. There are many limitations to this review. Many studies were included in the review (14); however, most were at a low level of evidence, with the exception of one weak randomized controlled trial. Therefore, the review could not be considered Level 1 evidence. The 4366_Ch10_183-202.indd 201 201 review is also limited in terms of study heterogeneity and the inclusion of only published literature. FROM THE EVIDENCE 10-3 Eight of the studies in the review assessed functional disability, but only five examined pain. Given the importance of replication in systematic reviews/meta-analyses, this is an important consideration bearing on the strength of the evidence for a particular outcome. FROM THE EVIDENCE 10-4 1. Seven out of nine studies had a positive effect. Two of the nine found little to no difference between groups. 2. Campbell (1997) was weighted the strongest in the meta-analysis and assigned a weight of 31.3%. FROM THE EVIDENCE 10-5 The existence of two independent reviewers helps to protect the review from bias. For example, each reviewer would rate the quality of each individual study using a rating scale. Then the two reviewers could compare their results; if there are any discrepancies, they would discuss those differences and come to a more informed conclusion. FROM THE EVIDENCE 10-6 The recommendations with Level A evidence include the use of problem-solving strategies for activities of daily living (ADLs) and instrumental activities of daily living (IADLs), and multicomponent patient education and training to improve occupational performance. REFERENCES American Occupational Therapy Association (AOTA). (n.d.). AOTA Press catalogue. Retrieved from http://www.aota.org/en/Publications-News/ AOTAPress.aspx American Physical Therapy Association. (APTA). (n.d.). Clinical practice guidelines. Retrieved from http://www.apta.org/InfoforPayers/ Resources/OrthopaedicClinicalGuidelines/ Bartels, S., & Desilets, R. (2012). Health promotion programs for people with serious mental illness (Prepared by the Dartmouth Health Promotion Research Team). Washington, DC: SAMHSA-HRSA Center for Integrated Health Solutions. Brook et al; American Heart Association Professional Education Committee of the Council for High Blood Pressure Research, Council on Cardiovascular and Stroke Nursing, Council on Epidemiology and Prevention, and Council on Nutrition, Physical Activity. (2013). Beyond medications and diet: Alternative approaches to lowering blood pressure. A scientific statement from the American Heart Association. Hypertension, 61, 1360–1383. Brouwers, M., Kho, M. E., Browman, G. P., Burgers, J. S., Cluzeau, F., Feder, G., . . . Zitzelsberger, L. for the AGREE Next Steps Consortium. (2010). AGREE II: Advancing guideline development, reporting and evaluation in healthcare. Canadian Medical Association Journal, 182, E839–E842. 27/10/16 2:11 pm 202 CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies Bruin, M., & Nevoy, A. (2014). Exploring the discourse on communication modality after cochlear implantation: A Foucauldian analysis of parents’ narratives. Journal of Deaf Studies and Deaf Education, 19, 385–399. ClinicalTrials.gov. (2014). History, policy and laws. Retrieved from http://clinicaltrials.gov/ct2/about-site/history Craig, L. E., Bernhardt, J., Langhorne, P., & Wu, O. (2010). Early mobilization after stroke: An example of an individual patient data meta-analysis of a complex intervention. Stroke, 41, 2632–2636. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. Dal Bello-Haas, V., & Florence, J. M. (2013). Therapeutic exercise for people with amyotrophic lateral sclerosis or motor neuron disease. Cochrane Database of Systematic Reviews, 5, Art. No. CD005229. doi:10.1002/14651858.CD005229.pub3 Downs, S. (2014). Normative scores on the Berg Balance Scale decline after age 70 years in healthy community-dwelling people: A systematic review. Journal of Physiotherapy, 60(2), 85–89. Grey Literature Network Service. (1999). 4th International conference on grey literature: New frontiers in grey literature. Washington, DC: Author. Holtzman, S., & Beggs, R. T. (2013). Yoga for chronic low back pain: A meta-analysis of randomized controlled trials. Pain Research and Management, 18, 267–272. Hong, S. P., Park, H., Kwon, J. S., & Yoo, E. (2014). Effectiveness of eccentric viewing training for daily visual activities for individuals with age-related macular degeneration: A systematic review and meta-analysis. NeuroRehabilitation, 34, 587–595. Hopewell, S., Loudon, K., Clarke, M. J., Oxman, A. D., & Dickersin, K. (2009). Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database of Systematic Reviews, 21(1), MR000006. doi:10.1002/14651858.MR000006.pub3 Kaldenberg, J., & Smallfield, S. (2013). Occupational therapy practice guidelines for older adults with low vision. Bethesda, MD: AOTA Press. Kent, R. D., & Vorperian, H. K. (2013). Speech impairment in Down syndrome: A review. Journal of Speech, Language and Hearing Research, 56, 127–210. Kramer, J. M., Olsen, S., Mermelstein, M., Balcells, A., & Lijenquist, K. (2012). Youth with disabilities perspectives of the environment and participation: A qualitative meta-synthesis. Child Care and Health Development, 38, 763–777. Lee, H., Tamminen, K. A., Clark, A. M., Slater, L., Spence, J. C., & Holt, N. L. (2015). A meta-study of qualitative research examining 4366_Ch10_183-202.indd 202 determinants of children’s independent active free play. International Journal of Behavior, Nutrition and Physical Activity, 12, 5. Linehan, M. M. (1993). Cognitive behavioral treatment of borderline personality disorder. New York, NY: Guilford Press. Linehan, M. M., Comtois, K. A., Murray, A. M., Brown, M. Z., Gallop, R. J., Heard, H. L., . . . Lindenboim, N. (2006). Twoyear randomized controlled trial and follow-up of dialectical behavior therapy vs therapy by experts for suicidal behaviors and borderline personality disorder. Archives of General Psychiatry, 63, 757–766. Montori, V. M., Brito, J. P., & Murad, M. H. (2013). The optimal practice of evidence-based medicine: Incorporating patient preferences in practice guidelines. JAMA, 310, 2503–2504. Pasieczny, N., & Connor, J. (2011). The effectiveness of dialectical behaviour therapy in routine public mental health settings: An Australian controlled trial. Behaviour Research and Therapy, 49, 4–10. PRISMA. (2009). PRISMA statement. Retrieved from http://prismastatement.org/PRISMAStatement/Checklist.aspx Rand, D., Miller, W. C., Yiu, J., & Eng, J. J. (2011). Interventions for addressing low balance confidence in older adults: A systematic review and meta-analysis. Age and Ageing, 40, 297–306. Stoffers, J. M., Vollm, B. A., Rucker, G., Timmer, A., Huband, N., & Lieb, K. (2012). Psychological therapies for borderline personality disorder. Cochrane Database of Systematic Reviews, 8:CD005652. doi:10.1002/14651858.CD005652.pub2 Thomas, J., & Harden, A. (2008). Method for the thematic synthesis of qualitative research in systematic reviews. BMC Medical Research Methodology, 8, 45. Tong, A., Flemming, K., McInnes, E., Oliver, S., & Craig, J. (2012). Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ. BMC Medical Research Methodology,12, 181. U.S. Department of Health and Human Services. (2014). Inclusion criteria for National Guideline Clearinghouse. Retrieved from http:// www.guideline.gov/about/inclusion-criteria.aspx Van Schoonhoven, J., Sparreboom, M., van Zanten, B. G., Scholten, R. J., Mylanus, E. A., Dreschler, W. A., Grolman, W., & Matt, B. (2013). The effectiveness of bilateral cochlear implants for severe-to-profound deafness in adults: A systematic review. Otology and Neurotology, 32, 190–198. 27/10/16 2:11 pm “Social innovation thrives on collaboration; on doing things with others rather than just to them or for them.” —Geoff Mulgan, Chief Executive of the National Endowment for Science, Technology and the Arts 11 Integrating Evidence From Multiple Sources Involving Clients and Families in Decision-Making CHAPTER OUTLINE LEARNING OUTCOMES Consensus Building KEY TERMS Agreement INTRODUCTION CLIENT-CENTERED PRACTICE SHARED DECISION-MAKING EDUCATION AND COMMUNICATION Components of the Process People Involved Decision Aids Content Resources for Shared Decision-Making CRITICAL THINKING QUESTIONS ANSWERS REFERENCES Engaging the Client in the Process LEARNING OUTCOMES 1. Describe how therapists can engage clients in the process of making treatment decisions, including sharing intervention evidence. 2. Locate or create client-friendly resources to facilitate shared decision-making regarding a particular condition. 203 4366_Ch11_203-216.indd 203 27/10/16 2:12 pm 204 CHAPTER 11 ● Integrating Evidence From Multiple Sources KEY TERMS client-centered practice decision aids shared decision-making INTRODUCTION I n the first chapter of this text, evidence-based practice was described as the integration of information from the research evidence, practitioner experience, and client values and preferences. However, the bulk of the remaining chapters focused on how to consume research effectively. Much of your current educational experience is informed by practitioner experience, and you will cultivate your own experiences in clinical education and eventually your own practice. However, the inclusion of client values and preferences is perhaps the least understood component of evidence-based practice and potentially the most difficult to implement (Hoffmann, Montori, & Del Mar, 2014). Including the client’s values and preferences is consistent with a broader approach known as client-centered practice. This chapter looks closely at the components of client-centered practice that are specific to evidence-based practice, and does so by emphasizing a process known as shared decision-making. CLIENT-CENTERED PRACTICE Client-centered practice is an approach to practice that attaches importance to the individual’s values and preferences, and respects the expertise that the person brings to the situation in the form of lived experience. Hammell (2013) claims that the central construct of client-centered practice is respect, which includes respect for clients and their experiences, knowledge, and right to make choices about their own lives. Clients want to be listened to, cared for, and valued. Client-centered practice can enhance therapy outcomes because clients must engage in the process in order to benefit from therapy. Engagement includes attending appointments, participating in therapy, and following up with recommendations. When the practitioner engages clients at the outset of the decisionmaking process, they may be more likely to fully participate in assessment, intervention, and follow-up. Partly because it is a relatively new concept, and partly because it can be difficult to measure, only a limited number of studies are available that examine client-centered practice in therapy. Furthermore, the exact meaning of client-centered therapy can differ greatly in application. However, the two studies described here illustrate the 4366_Ch11_203-216.indd 204 importance of incorporating client preferences into one’s practice: In both of the studies, the client preference was inconsistent with the research evidence. Cox, Schwartz, Noe, and Alexander (2011) conducted a study to determine the best candidates for bilateral hearing aids. They found that audiometric hearing loss and auditory lifestyle were poor predictors, as many individuals who were good candidates (based on the research evidence) for bilateral hearing aids were found at follow-up to have chosen to wear only one. The study conclusions emphasized the importance of collaboration with the client for determining preferences and making decisions on unilateral versus bilateral hearing aids. Sillem, Backman, Miller, and Li (2011) compared two different splints (custom made and prefabricated) for thumb osteoarthritis, using a crossover study design in which all participants had the opportunity to wear both splints. There was no difference in hand-function outcomes between the two splints, but the custom-made splint had better outcomes for pain reduction. However, a larger number of individuals expressed a preference for the prefabricated splint. The researchers concluded that ultimately the client should make the decision about which splint to use. The therapy process involves many decisions: which intervention approach to pursue, which assistive device to select, what discharge plan to follow, and so on. When therapists and clients approach clinical decisions jointly, better outcomes are possible, because both parties have the opportunity to bring their respective expertise to the situation to make an informed choice. This process, known as shared decision-making, has been explicated such that there are models and methods for sharing evidence with clients. SHARED DECISION-MAKING The therapist-client relationship is one of unequal positions, because the therapist is potentially in a position to assume a paternalistic or authoritarian role and determine the course of the client’s therapy. Therapists can be unaware of their position of power and the behaviors they engage in that may exploit that power (Sumsion & Law, 2006). For example, developing an intervention plan without input from the client, or administering an intervention without explaining it to the client, are behaviors that ignore the client’s role in the therapy process. In contrast, the practice of shared decision-making promotes equality and mutuality between therapist and client. Because therapists are in a position of power, it is incumbent upon them to intentionally create an environment in which collaboration can occur. Shared decision-making creates a common ground for the confluence of evidence-based practice and client-centered practice (Fig. 11-1). At the core of shared decision-making is a process of information exchange. 27/10/16 2:12 pm CHAPTER 11 ● Integrating Evidence From Multiple Sources • Clinical expertise • Research evidence ⴙ • Values • Preferences • Lifestyle • Knowledge of unique situation ⴝ Shared decisionmaking FIGURE 11-1 Shared decision-making occurs when evidence-based practice and client-centered practice come together. (ThinkStock/iStock/ Katarzyna Bialasiewicz.) The professional brings information about the clinical condition and options for intervention, including the risks and benefits of each option. Clients bring their own information, which includes values, preferences, lifestyle, and knowledge of their situation. The therapist and client then work collaboratively to arrive at clinical decisions. Some clinical situations are more amenable to shared decision-making than others. For example, in some cases only one treatment is available or the research evidence strongly supports only one approach. However, even if only one choice exists, choosing no treatment is still an alternative. Many conditions may be addressed with more than one approach and, when options exist, it is important to share those choices with the client. In other situations, the 205 therapist may believe it is unethical to provide a particular intervention; perhaps there is no clear benefit, and accepting payment would be unjustified, or there may be potential adverse consequences. In this case, a principled approach involves discussing concerns with the client. If the issues cannot be resolved, the therapist may opt out of providing therapy for that particular client. Some clients do not wish to participate in the decisionmaking process, whereas others highly value the opportunity. Edwards and Elwyn (2006) described shared decision-making as involving the individual in decisionmaking to the extent that he or she so desires. In fact, though, it can be challenging for the therapist to determine the extent to which a client wants to be involved in the process. Figure 11-2 illustrates the continuum of the client’s participation in the decision-making process. At one end of the continuum is the client who cannot or does not want to be involved in the decision-making process. Infants, individuals in a coma, and individuals in the later stages of Alzheimer’s disease are examples of clients who are unable to make their own decisions; in these instances, caregivers may assume that position. Still, there will likely be situations in which a caregiver is not involved or available to participate in the decision-making process. Other clients may be capable of making decisions, but choose for the therapist to be in control of the process. In that case, the preference to not be involved is a choice that should be respected (Gafni & Charles, 2009). At the other end of the continuum is the client who is well informed and empowered to make decisions about his or her own health care. Most clients will fall somewhere in between these two extremes, with some level of interest in obtaining information about options and a desire to participate in the discussion. There is a place for shared decision-making at all points in the continuum. Even if a client does not want to be involved in the process, the therapist should consider that client’s preferences and unique life experiences and situation. Client/family has no involvement in decision-making Wants information but does not want to make decisions Wants information and wants to make some decisions; relies on therapist for other decisions Client fully involved and makes all decisions Therapist gathers information about client and family and incorporates this into decision-making Therapist gathers and shares information and utilizes knowledge about client and family when making decisions Therapist gathers and shares information and collaborates with the client in the decisionmaking process Therapist gathers and shares information and respects clientʼs decisions FIGURE 11-2 Continuum of client involvement in the decision-making process and role of the therapist. 4366_Ch11_203-216.indd 205 27/10/16 2:12 pm 206 CHAPTER 11 ● Integrating Evidence From Multiple Sources The collaboration between client and therapist differs depending on the client’s desired level of participation. One study suggested that what the client valued most in the shared decision-making process was not the opportunity to make a collaborative decision, but the fact that he or she was listened to and that the health-care professional provided information that was easy to understand (Longo et al, 2006). Consider the example of supervised physical therapy versus a standardized home exercise program for individuals with total knee arthroplasty. There is evidence suggesting that comparable outcomes for range of motion and functional performance are achieved with both interventions (Büker et al, 2014). For clients who would prefer for the therapist to make the decision, the practitioner might make different recommendations for an individual with a history of regular involvement in physical activity than for one who finds exercise boring and unpleasant. The therapist would also take into consideration other unique considerations applicable to each particular client, such as availability of transportation to therapy, insurance reimbursement, and family support. For clients who want to make the decision independently, the therapist could present the options of outpatient therapy versus a home program, explaining the research evidence and the pragmatic issues of cost and inconvenience. The client in the middle would likely desire all of this information in addition to input from the therapist as to how best to proceed. EDUCATION AND COMMUNICATION Most clients are unfamiliar with the concepts surrounding shared decision-making and will simply expect the therapist to prescribe a course of treatment. For this reason, the process of shared decision-making should begin with education. The therapist may begin by describing some of the decisions that will need to be made during the course of therapy, and emphasize that the decisionmaking process involves information exchange and discussion concerning these options. The first decision might focus on how involved the client would like to be in the process. Shared decision-making necessitates a progression of communication that follows a specific course: listen, speak, and then listen again. The therapist listens to the client regarding the client’s values, preferences, and particular life circumstances. In the context of the information provided by the client, the therapist presents options and explains the pros and cons of each. The therapist then listens to the client as he or she weighs the options. The therapist and client continue the discussion, which is focused on ultimately making a treatment decision. In the Evidence in the Real World box below, Judy is managing at work, although it is difficult. She just works through the pain, but finds that after work she has few resources left to take care of her home, enjoy her grandchildren, or engage in previously valued leisure activities such as swimming and gardening. She spends more time than she would like to in bed or watching television. During the speaking phase, you provide Judy with some options for intervention, along with the evidence. You discuss mind/body interventions, such as relaxation and mindfulness, resistance training, aerobic training, and aquatic exercise, and then you provide Judy with the handout shown in Figure 11-3, which outlines the evidence in a simplified form. Overall, the results indicate that the exercise interventions have better outcomes than the mind/body interventions, but in all cases the improvements are relatively modest. You then ask Judy if she has any additional questions about these options and to express what she is thinking. After some discussion, the two of you decide to start with an aquatic exercise program, based on Judy’s interest in swimming and the evidence suggesting that she may experience some pain relief from this activity. EVIDENCE IN THE REAL WORLD Shared Decision-Making The following scenario illustrates the process of shared decision-making, including the types of questions one might ask to facilitate communication. Judy (59 years old) has fibromyalgia and experiences significant pain. You know that she is married and currently working. To gather some information on Judy’s personal preferences and values, you prepare some questions; this is the first listen phase. Some questions you ask include: 1. 2. 3. 4. How does fibromyalgia affect your work performance? What would you like to do that you are currently not doing? How are you currently managing the pain? When you are feeling well, what sort of activities do you like to engage in? 4366_Ch11_203-216.indd 206 27/10/16 2:12 pm CHAPTER 11 ● Integrating Evidence From Multiple Sources 207 Interventions for Fibromyalgia The following table summarizes the effectiveness of four different types of interventions for fibromyalgia based on the results of four Cochrane summaries. These include: mind/body interventions such as relaxation, mindfulness, and biofeedback (Theadom et al., 2015); resistance training, such as weight lifting (Busch et al., 2013); aerobic exercise (most of the studies used walking) (Busch et al., 2007); and aquatic exercise (Bidonde et al., 2014). Improvements on a 100-point scale (interventions were compared with usual care) Outcomes Physical function Mind/Body Interventions Resistance Training Aerobic Exercise Aquatic Exercise 2 6 7 6 2 2.5 Well-being Pain 4 17 1.3 7 Moderate Low 20 minutes, 2–3 times a week for 2–24 weeks. Need at least 12 weeks for improvement Not specified Quality of the Research Low Low Frequency Not specified Many participants dropped out of the treatment 2–3 times a week for 16–21 weeks FIGURE 11-3 Sample handout outlining potential interventions for fibromyalgia. 2. EXERCISE 11-1 Crafting Questions to Elicit Information (LO1) QUESTIONS Select one of the three following scenarios and write four questions that you could ask to engage clients in the decisionmaking process and elicit information from them and/or their families regarding values and preferences. Remember that it is useful to ask open-ended questions that are respectful of the clients’ knowledge and expertise about their own personal experience. 3. a. A grade-school child with autism who has difficulty making friends at school b. A young adult with traumatic brain injury who is returning home and needs help with maintaining a schedule c. An athlete with chronic joint pain 1. 4366_Ch11_203-216.indd 207 4. 27/10/16 2:12 pm 208 CHAPTER 11 ● Integrating Evidence From Multiple Sources Components of the Process In their seminal article on shared decision-making, Charles, Gafni, and Whelan (1997) identified four components: 1. The interaction involves at least two people, including the therapist and client. 2. Both individuals take steps to participate in decisionmaking. 3. Both individuals participate in consensus building. 4. An agreement is reached on the treatment to implement. Each of these components is described in the following subsections and applied to an example of an individual with multiple sclerosis who is receiving physical therapy. People Involved The inclusion of two people, including the therapist and client, is an obvious component, but it is important to recognize that often more than two people will be involved. From the clinical side, therapists typically work on a team, and other team members may be invested in the treatment decision. On the client’s side, there may be family members or other individuals who are concerned about the treatment decision. It is especially important to involve other individuals who are close to the client when they assume caregiving responsibilities and will be directly involved in carrying out the intervention, and when they are impacted by the intervention choice. Jim, who has multiple sclerosis, recently experienced a flare-up and was referred to physical therapy to address issues related to mobility and balance. He was also referred to occupational therapy for energy conservation and speech-language therapy for problems with swallowing. Jim lives at home with his wife, who is very involved and supportive in his treatment. Optimally shared decision-making would include the physical therapist, occupational therapist, speech-language therapist, Jim, and his wife. Engaging the Client in the Process The second component involves both individuals taking steps to participate in the decision-making process. However, some clients are not prepared to make decisions about their therapy options. Many will come to the situation expecting or expressly wanting the health-care professional to make all the decisions. In actuality, the health-care professional will likely be the individual to establish the norms of the interaction. Because most clients do not expect to be involved in decision-making, the therapist will need to set the stage. Therefore, it becomes even more important for the therapist to create an environment in which the client is both comfortable and empowered to take an active role. The therapist can use many different strategies to promote collaboration. The physical environment in which discussions take place is important. A space that is free from distraction and allows for private discussion will facilitate greater connection between therapist and client. The manner in which the therapist communicates can go a long way toward promoting inclusion of the client in shared 4366_Ch11_203-216.indd 208 decision-making. The therapist’s nonverbal cues, such as an open and forward-leaning posture, eye contact, and head nods, can indicate warmth and attentiveness. Open-ended questions that promote discussion are more useful than questions that can be answered with single-word responses, and active listening with paraphrasing will likely encourage more openness on the part of the client. Once the therapist educates the client about shared decision-making, he or she can determine the client’s desired level of involvement. The physical therapist schedules a meeting with the other therapists, Jim, and his wife. They meet in a quiet conference room and spend some time getting to know each other. Shared decision-making is explained, and Jim indicates that he is on board with the idea of participating in the process. After the recent flare-up of his disease, he expresses a desire to take a more active role in his rehabilitation and overall wellness. Consensus Building During consensus building, therapist and client share relevant information. The therapist can begin the process by asking the client to talk about his or her values and preferences, including identifying desired outcomes. Outcomes should be discussed broadly and include impairment issues, such as strength, fatigue, pain, and cognition, as well as activity and participation limitations. Particularly important is identifying any activities that the client wants or needs to participate in, but which are currently hampered by the condition. In addition, it is important for the therapist to learn about the client’s situation, including financial resources, social supports, and living arrangements. Once the client shares this information, the therapist can provide a summary to ensure that the client was understood correctly. Then the client and therapist can work on establishing priorities for therapy. The therapist shares information about intervention options, pragmatic issues associated with these options (e.g., costs and time commitment), and the evidence to support the interventions. The information should be presented in language that is accessible to the client and in an unbiased manner. It is easy for the therapist to just present his or her preference—and this should be included if requested by the client—but for true shared decision-making to occur, the client must have a real choice in the matter. This requires therapists to avoid presenting their preferences as the only real option. Jim explains that he is concerned about falling, and that this fear makes him less likely to leave his home. He is particularly reluctant to be in places where the ground is unstable and where crowds are present. This fear has had a very negative impact on his quality of life, because Jim enjoys the outdoors and bird watching—both activities he shares with his wife. The physical therapist describes numerous approaches to address balance and mobility, including strengthening exercises, aquatic therapy, Tai Chi, and vestibular rehabilitation. These approaches are discussed in the context of the other therapies that Jim will be receiving and with the occupational and speech-language therapists. 27/10/16 2:12 pm CHAPTER 11 ● Integrating Evidence From Multiple Sources EXERCISE 11-2 Sharing Evidence With Clients and Families (LO2) You are working with a child who has ADHD and motor delays, and the family asks you about Interactive Metronome therapy. You find three studies (see the following abstracts). STUDY #1 Shaffer, R. J., Jacokes, L. E., Cassily, J. F., Greenspan, S. I., Tuchman, R. F., & Stemmer, P. J., Jr. (2001, March-April). Effect of Interactive Metronome Training on children with ADHD. American Journal of Occupational Therapy, 55(2),155-162. Objective The purpose of this study was to determine the effects of a specific intervention, the Interactive Metronome, on selected aspects of motor and cognitive skills in a group of children with attention deficit hyperactivity disorder (ADHD). Method The study included 56 boys who were 6 years to 12 years of age and diagnosed before they entered the study as having ADHD. The participants were pretested and randomly assigned to one of three matched groups. A group of 19 participants receiving 15 hr of Interactive Metronome training exercises were compared with a group receiving no intervention and a group receiving training on selected computer video games. Results A significant pattern of improvement across 53 of 58 variables favoring the Interactive Metronome treatment was found. Additionally, several significant differences were found among the treatment groups and between pretreatment and posttreatment factors on performance in areas of attention, motor control, language processing, reading, and parental reports of improvements in regulation of aggressive behavior. Conclusion The Interactive Metronome training appears to facilitate a number of capacities, including attention, motor control, and selected academic skills, in boys with ADHD. STUDY #2 Bartscherer, M. L., & Dole, R. L. (2005). Interactive Metronome Training for a 9-year-old boy with attention and motor coordination difficulties. Physiotherapy Theory & Practice, 21(4), 257-269. The purpose of this case report is to describe a new intervention, the Interactive Metronome, for 4366_Ch11_203-216.indd 209 209 improving timing and coordination. A nine-yearold boy, with difficulties in attention and developmental delay of unspecified origin underwent a seven-week training program with the Interactive Metronome. Before, during, and after training timing accuracy was assessed with testing procedures consistent with the Interactive Metronome training protocol. Before and after training, his gross and fine motor skills were examined with the BruininiksOseretsky Test of Motor Proficiency (BOTMP). The child exhibited marked change in scores on both timing accuracy and several BOTMP subtests. Additionally his mother relayed anecdotal reports of changes in behavior at home. This child’s participation in a new intervention for improving timing and coordination was associated with changes in timing accuracy, gross and fine motor abilities, and parent reported behaviors. These findings warrant further study. STUDY #3 Cosper, S. M., Lee, G. P., Peters, S. B., & Bishop, E. (2009). Interactive Metronome Training in children with attention deficit and developmental coordination disorders. International Journal of Rehabilitation Research, 32(4), 331-336. doi:10.1097/MRR.0b013e328325a8cf The objective of this study was to examine the efficacy of Interactive Metronome (Interactive Metronome, Sunrise, Florida, USA) training in a group of children with mixed attentional and motor coordination disorders to further explore which subcomponents of attentional control and motor functioning the training influences. Twelve children who had been diagnosed with attention deficit hyperactivity disorder, in conjunction with either developmental coordination disorder (n=10) or pervasive developmental disorder (n=2), underwent 15 1-h sessions of Interactive Metronome training over a 15-week period. Each child was assessed before and after the treatment using measures of attention, coordination, and motor control to determine the efficacy of training on these cognitive and behavioral realms. As a group, the children made significant improvements in complex visual choice reaction time and visuomotor control after the training. There were, however, no significant changes in sustained attention or inhibitory control over inappropriate motor responses after treatment. These results suggest Interactive Metronome training may address deficits in visuomotor control and speed, but appears to have little effect on sustained attention or motor inhibition. QUESTIONS How might you present these findings accurately and accessibly to the parents? Consider the levels of evidence when assessing the studies. 27/10/16 2:12 pm 210 CHAPTER 11 ● Integrating Evidence From Multiple Sources Write a few statements that you could use when discussing the evidence with the family and client. Agreement Shared decision-making results in an agreement about the direction that intervention will take. This does not necessarily mean that both parties conclude that the intervention chosen is the best one or that both parties are equally involved in the decision. For example, after the process of sharing information, the client may still ask the therapist to make the final decision. Or, the client may choose an approach that is not the therapist’s preferred choice, although the intervention fits well with the client’s situation and is one in which the client will more likely engage. In a less desirable scenario, the client may choose an option that the therapist is not equipped to provide or one that the therapist feels will result in negative outcomes, in which case the therapist may decline to provide the intervention. Shared decision-making is a fluid process that differs with each client encounter. Jim was initially very interested in aquatic therapy, but decided against this option when he learned that the evidence was only fair for its efficacy in improving balance for individuals with multiple sclerosis (Marinho-Buzelli, Bonnyman, & Verrier, 2014). In addition, the clinic that provided aquatic therapy was far from his home. Using information from practice guidelines (Latimer-Cheung et al, 2013), Jim and his physical therapist decide on an intervention that incorporates aerobic and strength training two times a week that Jim can eventually do on his own at home with the support of his wife. He is also interested in using a cane for everyday mobility and is considering a wheelchair for those situations in which he feels the most insecure. Decision Aids Decision aids are materials that provide information to the client to support the decision-making process. They may take the form of written materials, videos, interactive Internet presentations, or other resources. A Cochrane review (Stacey et al, 2011) of the use of decision aids found that they increased client involvement, improved knowledge, created a more realistic perception of outcomes, and reduced discretionary surgery. In addition, the review found no adverse effects from use of decision aids. 4366_Ch11_203-216.indd 210 At the time of this writing, decision aids are just beginning to be developed for health-care decisions, and there are few decision aids available in the field of rehabilitation; however, existing aids in other areas of health care can serve as a model. Therapists can develop their own decision aids, in the form of written materials to provide to clients when engaging in shared decision-making. The abstract in From the Evidence 11-1 provides an example of a research team developing decision aids for individuals with multiple sclerosis. This abstract describes the author’s process of developing and evaluating decision aids for individuals with multiple sclerosis. In addition to increasing the client’s involvement in the decision-making process, the authors examine the quality of the decisions made and indicate that the research will examine client involvement and decision quality. Content Decision aids generally provide several types of information, including: 1. Explanation of the condition, to help the client understand his or her condition and how interventions can work to target specific aspects of it. 2. Identification of the decision that needs to be made, including the different interventions that are available and the option of no intervention. 3. Options and potential outcomes based on scientific evidence. This will likely be the most lengthy section of the decision aid. It includes a summary of the evidence for each option and often identifies the number of individuals who are expected to improve given a particular situation or the percent of improvement that can be expected. 4. Questions to help clients clarify their values. This is often written in a workbook format, with space for clients to write their responses. In rehabilitation, the questions might focus on activities the individual client wants to return to, desired outcomes, the most troublesome aspects of the condition, living arrangements, social support, financial resources, insurance coverage, and so on. Resources for Shared Decision-Making An excellent source of information about shared decisionmaking is the Ottawa Hospital Research Institute, which provides an online inventory of decision aids that can be searched by health topics (http://decisionaid.ohri.ca/ decaids.html). It also includes information on the importance of shared decision-making and offers online tutorials for developing a decision aid based on the Ottawa development process. At the time of this writing, the Mayo Clinic is in the process of developing decision aids. One example focuses on bone health: http://osteoporosisdecisionaid .mayoclinic.org/index.php/site/index. This aid helps the 27/10/16 2:12 pm CHAPTER 11 ● Integrating Evidence From Multiple Sources 211 FROM THE EVIDENCE 11-1 Article Discussing Development of Decision Aids Heesen, C., Solari, A., Giordano, A., Kasper, J., & Kopke, S. (2011). Decisions on multiple sclerosis immunotherapy: New treatment complexities urge patient engagement. Journal of Neurological Sciences, 306, 2192–2197. For patients with multiple sclerosis (MS) involvement in treatment decisions becomes ever more imperative. Recently new therapeutic options have become available for the treatment of MS, and more will be licensed in the near future. Although more efficacious and easier to administer, the new drugs pose increased risks of severe side effects. Also, new diagnostic criteria lead to more and earlier MS diagnoses. Facing increasingly complex decisions, patients need up-to-date evidence-based information and decision support systems in order to make informed decisions together with physicians based on their autonomy preferences. This article summarizes recently terminated and ongoing trials on MS patient education and decision aids conducted by the authors' study groups. Programs on relapse management, immunotherapy, and for patients with suspected and early MS have been developed and evaluated in randomized controlled clinical trials. It could be shown that the programs successfully increase knowledge and allow patients to make informed decisions based on their preferences. For the near future, we aim to develop a modular program for all relevant decisions in MS to increase patients' self-management and empower patients to develop their individual approach with the disease. Faced by a disease with many uncertainties, this should enhance patients' sense of control. Still, it remains a challenge to adequately assess decision quality. Therefore, a study in six European and one Australian centers will start soon aiming to establish adequate tools to assess decision-making quality. Note A: This abstract indicates that decision aids are important but also notes that the process of developing them is complex and in its early stages. FTE 11-1 Question Identify a particular area of practice in which you think a decision aid would be helpful. Explain why you chose this area. individual assess the risk of having a fracture and evaluate different treatment options for preventing fractures. As with many decision aids, this one presents the benefits and risks associated with deciding not to intervene. Common Ground is a comprehensive web application program that prepares individuals with psychiatric conditions to meet with their psychiatrist and treatment team and arrive at decisions (https://www.patdeegan.com/ commonground/about). It includes applications for shared decision-making and decision aids with a focus on decisions about psychiatric medications. It provides an excellent example of a process for gathering information about client preferences and values and engaging in collaborative decision-making. These sites are useful to explore to review examples of decision aids. Existing decision aids can provide a useful template for developing one’s own. 4366_Ch11_203-216.indd 211 EXERCISE 11-3 Creating Decision Aids for Clients and Families (LO2) Read the conclusions from the following Cochrane review regarding interventions to reduce falls in older adults. Gillespie, L. D., Robertson, M. C., Gillespie, W. J., Sherrington, C., Gates, S., Clemson, L. M., & Lamb, S. E. (2012). Interventions for preventing falls in older people living in the community. Cochrane Database System Reviews 9: CD007146. doi:10.1002/14651858.CD007146.pub3 Background Approximately 30% of people over 65 years of age living in the community fall each year. This 27/10/16 2:12 pm 212 CHAPTER 11 ● Integrating Evidence From Multiple Sources is an update of a Cochrane review first published in 2009. Objectives To assess the effects of interventions designed to reduce the incidence of falls in older people living in the community. Search Methods We searched the Cochrane Bone, Joint and Muscle Trauma Group Specialised Register (February 2012), CENTRAL (The Cochrane Library 2012, Issue 3), MEDLINE (1946 to March 2012), EMBASE (1947 to March 2012), CINAHL (1982 to February 2012), and online trial registers. Selection Criteria Randomised trials of interventions to reduce falls in community-dwelling older people. Data Collection and Analysis Two review authors independently assessed risk of bias and extracted data. We used a rate ratio (RaR) and 95% confidence interval (CI) to compare the rate of falls (e.g. falls per person year) between intervention and control groups. For risk of falling, we used a risk ratio (RR) and 95% CI based on the number of people falling (fallers) in each group. We pooled data where appropriate. Main Results We included 159 trials with 79,193 participants. Most trials compared a fall prevention intervention with no intervention or an intervention not expected to reduce falls. The most common interventions tested were exercise as a single intervention (59 trials) and multifactorial programmes (40 trials). Sixty-two per cent (99/159) of trials were at low risk of bias for sequence generation, 60% for attrition bias for falls (66/110), 73% for attrition bias for fallers (96/131), and only 38% (60/159) for allocation concealment. Multiplecomponent group exercise significantly reduced rate of falls (RaR 0.71, 95% CI 0.63 to 0.82; 16 trials; 3622 participants) and risk of falling (RR 0.85, 95% CI 0.76 to 0.96; 22 trials; 5333 participants), as did multiple-component homebased exercise (RaR 0.68, 95% CI 0.58 to 0.80; seven trials; 951 participants and RR 0.78, 95% CI 0.64 to 0.94; six trials; 714 participants). For Tai Chi, the reduction in rate of falls bordered 4366_Ch11_203-216.indd 212 on statistical significance (RaR 0.72, 95% CI 0.52 to 1.00; five trials; 1563 participants) but Tai Chi did significantly reduce risk of falling (RR 0.71, 95% CI 0.57 to 0.87; six trials; 1625 participants). Multifactorial interventions, which include individual risk assessment, reduced rate of falls (RaR 0.76, 95% CI 0.67 to 0.86; 19 trials; 9503 participants), but not risk of falling (RR 0.93, 95% CI 0.86 to 1.02; 34 trials; 13,617 participants). Overall, vitamin D did not reduce rate of falls (RaR 1.00, 95% CI 0.90 to 1.11; seven trials; 9324 participants) or risk of falling (RR 0.96, 95% CI 0.89 to 1.03; 13 trials; 26,747 participants), but may do so in people with lower vitamin D levels before treatment. Home safety assessment and modification interventions were effective in reducing rate of falls (RR 0.81, 95% CI 0.68 to 0.97; six trials; 4208 participants) and risk of falling (RR 0.88, 95% CI 0.80 to 0.96; seven trials; 4051 participants). These interventions were more effective in people at higher risk of falling, including those with severe visual impairment. Home safety interventions appear to be more effective when delivered by an occupational therapist. An intervention to treat vision problems (616 participants) resulted in a significant increase in the rate of falls (RaR 1.57, 95% CI 1.19 to 2.06) and risk of falling (RR 1.54, 95% CI 1.24 to 1.91). When regular wearers of multifocal glasses (597 participants) were given single lens glasses, all falls and outside falls were significantly reduced in the subgroup that regularly took part in outside activities. Conversely, there was a significant increase in outside falls in intervention group participants who took part in little outside activity. Pacemakers reduced rate of falls in people with carotid sinus hypersensitivity (RaR 0.73, 95% CI 0.57 to 0.93; three trials; 349 participants) but not risk of falling. First eye cataract surgery in women reduced rate of falls (RaR 0.66, 95% CI 0.45 to 0.95; one trial; 306 participants), but second eye cataract surgery did not. Gradual withdrawal of psychotropic medication reduced rate of falls (RaR 0.34, 5% CI 0.16 to 0.73; one trial; 93 participants), but not risk of falling. A prescribing modification programme for primary care physicians significantly reduced risk of falling (RR 0.61, 95% CI 0.41 to 0.91; one trial; 659 participants). An anti-slip shoe device reduced rate of falls in icy conditions (RaR 0.42, 95% CI 0.22 to 0.78; one trial; 109 participants). One trial (305 participants) comparing 27/10/16 2:12 pm CHAPTER 11 ● Integrating Evidence From Multiple Sources multifaceted podiatry including foot and ankle exercises with standard podiatry in people with disabling foot pain significantly reduced the rate of falls (RaR 0.64, 95% CI 0.45 to 0.91) but not the risk of falling. There is no evidence of effect for cognitive behavioural interventions on rate of falls (RaR 1.00, 95% CI 0.37 to 2.72; one trial; 120 participants) or risk of falling (RR 1.11, 95% CI 0.80 to 1.54; two trials; 350 participants). Trials testing interventions to increase knowledge/ educate about fall prevention alone did not significantly reduce the rate of falls (RaR 0.33, 95% CI 0.09 to 1.20; one trial; 45 participants) or risk of falling (RR 0.88, 95% CI 0.75 to 1.03; four trials; 2555 participants). No conclusions can be drawn from the 47 trials reporting fall-related fractures. Thirteen trials provided a comprehensive economic evaluation. Three of these indicated cost savings for their interventions during the trial period: home-based exercise in over 80-year-olds, home safety assessment and modification in those with a previous fall, and one multifactorial programme targeting eight specific risk factors. Authors’ Conclusions Group and home-based exercise programmes, and home safety interventions reduce rate of falls and risk of falling. Multifactorial assessment and intervention programmes reduce rate of falls but not risk of falling; Tai Chi reduces risk of falling. Overall, vitamin D supplementation does not appear to reduce falls but may be effective in people who have lower vitamin D levels before treatment. Intervention Reduced the Rate of Falling (the Number of Times Individuals Fell Within a Year) 213 Reduced the Risk of Falling (the Risk That an Individual Would Have a Fall/Be a Faller) Multicomponent group exercise program Multicomponent homebased exercise program Tai Chi Home safety assessment and modification by an OT CRITICAL THINKING QUESTIONS 1. What is the relationship between evidence-based practice, client-centered practice, and shared decision-making? QUESTIONS Complete the following table, with the goal of including it in a decision aid that interprets the rate ratio and risk ratios for clients and families for a multicomponent group exercise program, a multicomponent home-based exercise program, Tai Chi, and home safety assessment and modifications provided by an occupational therapist. Identify the percentage of fall reduction for each category. (You might want to refer to Chapter 8 to review the information on risk ratios. Risk ratios and rate ratios can be especially challenging to interpret when they are less than 1.0. You must subtract the risk ratio from 1.0 to get the percentage. For example, if the risk ratio = 0.85, there is a 15% decrease in the risk of falling.) 4366_Ch11_203-216.indd 213 2. How can therapists modify the shared decision-making process for clients who do not want to be involved in decision-making? 27/10/16 2:12 pm 214 CHAPTER 11 ● Integrating Evidence From Multiple Sources 3. What steps might therapists take to ensure that they do not introduce their own biases into the decision-making process? When is it acceptable to inform a client of one’s own preferences? 4. Why are decision aids useful for facilitating shared decision-making? 5. What barriers do therapists face when using shared decision-making and decision aids? • What do you know about your child’s experiences with other children at school? b) A young adult with traumatic brain injury who is returning home and needs help with maintaining a schedule • What kind of technology do you currently use? • What do you like and not like about using technology? • How important is being on time and having a regular schedule to you? • When is maintaining a schedule most difficult for you? • What other people are involved in your life and schedule? c) An athlete with chronic joint pain • How important is it to you to be able to return to your sport? • What other athletic endeavors are you interested in? • When you start to feel pain, how do you respond? • How easy is it for you to follow a prescribed treatment program? • How do you like to exercise (for example, alone or with others)? • How does your injury affect other aspects of your daily life? EXERCISE 11-2 ANSWERS EXERCISE 11-1 Here are samples of potential answers. Remember that they should be open-ended. a) A grade-school child with autism who has difficulty making friends at school To ask of the child: • What kinds of things do you like to do for fun? • Who do you like to play with? To ask of the parent/caregiver: • What does a typical day for you and your child look like? • How much time do you have for providing therapy at home? 4366_Ch11_203-216.indd 214 Examples of statements that could be made to present evidence to the client’s parents include: There are very few studies that look at the effectiveness of Interactive Metronome training. Only one study compared Interactive Metronome training to other treatments. There is very limited evidence that it may improve some areas of attention and motor skills, but the studies were very small and at this time there is not enough evidence to recommend or not recommend it as an option. Are you interested in hearing about other approaches with more research evidence? EXERCISE 11-3 The percentages are derived from the rate of risk and risk ratios provided in the study summary. Because you consider the intervention to be effective if it reduces the rate of falling or risk of falling, a number of less than 1.0 (where the confidence interval does not include 1.0) indicates an effective intervention. To determine the amount of reduction, you would subtract the ratio from 1. For example, the rate ratio for group exercise was 0.71, so 1.0 – 0.71 = 0.29 or 29%. 27/10/16 2:12 pm CHAPTER 11 ● Integrating Evidence From Multiple Sources Reduced the Rate of Falling (the Number of Times Individuals Fell Within a Year) Reduced the Risk of Falling (the Risk That an Individual Would Have a Fall/Be a Faller) Multicomponent group exercise program 29% 15% Multicomponent homebased exercise program 32% 22% Tai Chi 28% 29% Home safety 19% assessment and modification by an OT 12% Intervention FROM THE EVIDENCE 11-1 There is no single correct answer to this question. Decision aids are particularly useful when decisions are complex and more than one option is available. REFERENCES Bartscherer, M. L., & Dole, R. L. (2005). Interactive Metronome Training for a 9-year-old boy with attention and motor coordination difficulties. Physiotherapy Theory & Practice, 21(4), 257–269. Bidonde, J., Busch, A. J., Webber, S. C., Schacter, C. L., Danyliw, A., Overend, T. J., Richards, R. S., & Rader, T. (2014). Aquatic exercise training for fibromyalgia. Cochrane Database of Systematic Reviews, 10, CD011336. doi:10.1002/14651858 Büker, N., Akkaya, S., Akkaya, N., Gökalp, O., Kavlak, E., Ok, N., Kiter, A. E., & Kitis, A. (2014). Comparison of effects of supervised physiotherapy and a standardized home program on functional status in patients with total knee arthroplasty: A prospective study. Journal of Physical Therapy Science, 26, 1531–1536. Busch, A. J., Barber, K. A. R., Overend, T. J., Peloso, P. M. J., & Schachter, C. L. (2007). Exercise for treating fibromyalgia syndrome. Cochrane Database of Systematic Reviews, 4, CD003786.pub2. Busch, A. J., Webber, S. C., Richards, R. S., Bidonde, J., Schachter, C. L., Schafer, L. A., . . . Rader, T. (2013). Resistance exercise training for fibromyalgia. Cochrane Database of Systematic Reviews, 12, CD010884. doi:10.1002/14651858 4366_Ch11_203-216.indd 215 215 Charles, C., Gafni, A., & Whelan, T. (1997). Shared decision-making in the medical encounter: What does it mean? (or it takes at least two to tango). Social Science and Medicine, 44, 681–692. Cosper, S. M., Lee, G. P., Peters, S. B., & Bishop, E. (2009). Interactive Metronome Training in children with attention deficit and developmental coordination disorders. International Journal of Rehabilitation Research, 32(4), 331–336. doi:10.1097/MRR.0b013e328325a8cf Cox, R. M., Schwartz, K. S., Noe, C. M., & Alexander, G. C. (2011). Preference for one or two hearing aids among adult patients. Ear and Hearing, 32, 181–197. Edwards, A., & Elwyn, G. (2006). Inside the black box of shared decision making: Distinguishing between the process of involvement and who makes the decision. Health Expectations, 9, 307–320. Gafni, A., & Charles, C. (2009). The physician-patient encounter: An agency relationship? In A. Edwards & G. Elwyn (Eds.), Shared decision-making in health care: Achieving evidence-based patient choice (2nd ed., pp. 73–78). Oxford, UK: Oxford University Press. Gillespie, L. D., Robertson, M. C., Gillespie, W. J., Sherrington, C., Gates, S., Clemson, L. M., & Lamb, S. E. (2012). Interventions for preventing falls in older people living in the community. Cochrane Database System Reviews, 9, CD007146. doi:10.1002/14651858. CD007146.pub3 Hammell, K. R. W. (2013). Client-centered occupational therapy in Canada: Refocusing our core values. Canadian Journal of Occupational Therapy, 80, 141–149. Heesen, C., Solari, A., Giordano, A., Kasper, J., & Kopke, S. (2011). Decisions on multiple sclerosis immunotherapy: New treatment complexities urge patient engagement. Journal of Neurological Sciences, 306, 2192–2197. Hoffmann, T. C., Montori, V. M., & Del Mar, C. (2014). The connection between evidence-based medicine and shared decision making. Journal of the American Medical Association, 312(13), 1295–1296. doi:10.1001/jama.2014.10186 Latimer-Cheung, A. E., Martin Ginis, K. A., Hicks, A. L., Motl, R. W., Pilutti, L. A., Duggan, M., . . . Smith, K. M. (2013). Development of evidence-informed physical activity guidelines for adults with multiple sclerosis. Archives of Physical Medicine and Rehabilitation, 94, 1829–1836. Longo, M. F., Cohen, D. R., Hood, K., Edwards, A., Robling, M., Elwyn, G., & Russell, I. T. (2006). Involving patients in primary care consultations: Assessing preferences using discrete choice experiments. British Journal of General Practice, 56, 35–42. Marinho-Buzelli, A. R., Bonnyman, A. M., & Verrier, M. C. (2014). The effects of aquatic therapy on mobility of individuals with neurological diseases: A systematic review. Clinical Rehabilitation. PubMed PMID: 26987621 [Epub ahead of print]. Shaffer, R. J., Jacokes, L. E., Cassily, J. F., Greenspan, S. I., Tuchman, R. F., & Stemmer, P. J., Jr. (2001, March-April). Effect of Interactive Metronome Training on children with ADHD. American Journal of Occupational Therapy, 55(2),155–162. Sillem, H., Backman, C. L., Miller, W. C., & Li, L. C. (2011). Comparison of two carpometacarpal stabilizing splints for individuals with thumb osteoarthritis. Journal of Hand Therapy, 24, 216–225. Stacey, D., Bennett, C. L., Barry, M. J., Col, N. F., Eden, K. B., HolmesRovner, M., . . . Thomson, R. (2011). Decision aids for people facing health treatment or screening decisions. Cochrane Database of Systematic Reviews, 10, CD001431. doi:10.1002/14651858 Sumsion, T., & Law, M. (2006). A review of evidence on the conceptual elements informing client-centred practice. Canadian Journal of Occupational Therapy, 73(3), 153–162. Theadom, A., Cropley, M., Smith, H. E., Feigin, V. L., & McPherson, K. (2015). Mind and body therapy for fibromyalgia. Cochrane Database of Systematic Reviews, 4, CD001980.pub3. doi:10.1002/14651858. CD001980.pub3 27/10/16 2:13 pm 4366_Ch11_203-216.indd 216 27/10/16 2:13 pm Glossary alternative treatment threat—a type of history threat in which an unintended treatment is provided to participants in a study that accounts for differences over time. analysis of covariance (ANCOVA)—a statistic used when a researcher wants to examine differences and statistically control a variable that may affect the outcome of a study. applied research—research that has direct application to health-care practices. artifact—an object collected during qualitative research to be used as data (e.g., pictures, documents, journals). assignment threat—a problem created when differences in groups due to assignment account for the differences between groups in a study. attrition—loss of participants who have enrolled in a study. Attrition can occur for numerous reasons, including voluntary drop-out, relocation, or death. Also called mortality. audit trail—the collection of documents from a qualitative study that can be used to confirm a researcher’s data analysis. axial coding—the process of identifying relationships between categories during analysis of the data in qualitative research. basic research—investigation of fundamental questions that is directed toward better understanding of individual concepts. between-group comparison—comparison of the results of two or more groups. Bonferroni correction—adjustment of an alpha level to prevent a Type I research error; performed by dividing 0.05 by the number of comparisons. Boolean operators—the words used in a database search for relating key terms. Common Boolean operators include AND, OR, and NOT. bracketing—a process of qualitative research that involves suspending judgment of others. case-control design—an observational, retrospective, cross-sectional study that can be used to answer prognostic research questions concerning which risk factors predict a condition. categorical variable—variables that describe attributes but do not have a quantitative value (e.g., gender, political affiliation). ceiling effect—a condition in which many individuals achieve the highest possible score, leaving little to no room to detect improvement. client-centered practice—practice in which the client is considered the expert regarding his or her situation and is the primary decision maker for health-care choices. clinically significant difference—a change that would be regarded by clinicians and the client as meaningful and important. cluster randomized controlled trial—a study design in which the groups are randomly assigned (e.g., one setting receives an intervention and another does not), but the individuals within the groups are not randomized. code-recode procedure—the process of coding data and then taking a break, returning and recoding the same data, and then comparing the results of the two efforts. Cohen’s d—a statistic that measures the strength of the difference between two group means reported in standard deviation units. compensatory demoralization—group leaders of the control condition or participants in the control group give up due to disappointment at not receiving the experimental treatment. compensatory equalization of treatment—group leaders of the control condition or participants in the control group work harder to compensate for not receiving the experimental treatment. concurrent validity—the process of supporting construct validity by finding relationships between the index measure and other measures of the same construct. confidence interval (CI)—a reliability estimate that suggests the range of outcomes expected when an analysis is repeated. confirmability—the extent to which qualitative data can be corroborated. connecting data—in mixed-methods research, when one method (either quantitative or qualitative) informs another. Typically one method is used first in one 217 4366_Glossary_217-224.indd 217 26/10/16 5:00 pm 218 Glossary study and the results are used in designing the second study. constant comparative method—the back-and-forth process of collecting data and then determining methods in qualitative research. constructivism—a philosophical perspective which suggests that reality is subjective and is determined by the individual based on experience and context. construct validity—the ability of a test to measure the construct it is intended to measure. continuous data—data with values that fall on a continuum from less to more. continuous variable—variable in which the numbers have meaning in relation to one another. control—a condition that remains constant or the same between groups or situations. control group—a group in which participants receive an alternate intervention, a standard intervention, or no intervention. control variable—a variable that remains constant or the same between groups or situations. convenience sampling—a type of sampling in which potential participants are selected based on the ease with which they can be included in a study. convergent validity—the process of supporting construct validity by finding relationships between the index measure and other measures of the same construct. correlation—the degree to which two or more variables fluctuate together. correlational study—a study designed to determine whether a relationship exists between two constructs and, if so, to assess the strength of that relationship. covary—a statistical process of controlling for a variable that may be different between groups. credibility—the extent to which qualitative data are authentic. criterion-referenced—a measure in which an individual’s scores are compared against some established standard. critically appraised paper—a type of analysis that critiques a published research study and interprets the results for practitioners. Cronbach’s alpha—a measure of internal consistency that ranges from 0.0 to 1.0 and indicates the degree to which the items of a measure are unidimensional. crossover study design—a design in which participants are randomly assigned to groups and receive the same treatments, but in a different order. cross-sectional research—a type of study that collects data at a single point in time. database—an organized collection of data. In the case of evidence-based practice, a bibliographic citation database is a compilation of articles, books, and book chapters. decision aids—material (often in the form of pamphlets, workbooks, or computer programs) that clients can use in the process of making health-care decisions. 4366_Glossary_217-224.indd 218 dependability—the extent to which qualitative data are consistent. dependent sample t-test—a statistic that compares a dependent variable within the same group, either a pretest and posttest or two different measures. dependent variable—the variable that is intended to measure the outcome of a study. descriptive statistics—statistics that provide an analysis of data to help describe, show, or summarize the data in a meaningful way such that, for example, patterns can emerge from the data. descriptive study—research that explains health conditions and provides information about the incidence and prevalence of certain conditions within a diagnostic group. dichotomous data—data that can take on only two values; often used to distinguish the presence or nonpresence of some construct. directional hypothesis—a hypothesis in which the researcher makes an assumption or expresses belief in a particular outcome. discrete data—data that can have only certain values. Discrete data are often used to classify categories into numerical data. discriminant validity—the process of supporting construct validity by finding that a test is able to differentiate between groups of individuals. divergent validity—the process of supporting construct validity by finding that the index measure is not related to measures of irrelevant constructs. ecological validity—the degree to which the environment in which a study takes place represents the real world. effectiveness study—research that examines the usefulness of an intervention under real-world conditions. Effectiveness research has strong external validity. effect size (ES)—a statistic that describes the magnitude or strength of a statistic. It can refer to the magnitude of the difference, relationship, or outcome. efficacy study—a study that examines whether or not an intervention has a positive outcome. Efficacy studies are typically conducted under ideal conditions in an effort to improve internal validity. embedding data—type of data used in mixed-methods research; one method (qualitative or quantitative) assumes a primary role, and the other is used to support the results. epidemiology—the study of the frequency and distribution of health conditions and factors that contribute to incidence and prevalence. ethnography—a qualitative research design that focuses on understanding a culture. evidence-based practice—the process of using the research evidence, practitioner experience, and the client’s values and desires to make the best clinical decisions. 26/10/16 5:00 pm Glossary experimental research—studies that examine cause-andeffect relationships. experimenter bias—a threat to the validity of a study that is caused by the involvement of the researcher in some aspect of the study. ex post facto comparisons—group comparisons that utilize existing groups rather than assigning groups, as in an experimental study. external responsiveness—the ability of a test to detect change that is clinically meaningful. external validity—the extent to which the results of a study can be generalized to a particular population, setting, or situation. extraneous variable—a variable that is tracked and then later examined to determine its influence. factorial design—a study with more than one independent variable, in which the interaction or impact of both independent variables can be examined simultaneously. field notes—remarks and descriptions made and collected by researchers while observing and/or interviewing participants. fishing—searching for findings that the researcher did not originally plan to explore. floor effect—a situation in which many individuals receive the lowest possible score on a test due to its difficulty or the rarity of the condition. focus group—an interview with a group of individuals that gathers information through a targeted discussion regarding a specific topic. forest plot—a graph used to illustrate the effect sizes of individual studies in a meta-analysis and the pooled effect. frequencies—statistics that describe how often something occurs. frequency distribution—a graph used to depict a count. grey literature—literature that is unpublished or difficult to obtain. grounded theory—a qualitative design that has the purpose of developing theory from the data collected; that is, the theory is grounded in the data. Hawthorne effect—name for the occurrence when research participants respond favorably to attention regardless of the intervention approach. hazard ratio—the chance of a particular event occurring in one group compared with the chance of the event occurring in another group. history threat—the possibility that changes occurring over time, which influence the outcome of a study, are due to external events. hypothesis—a proposed explanation for some phenomenon. incidence—the risk of developing a condition within a period of time or the frequency of new cases of a health condition within a specified time period. 4366_Glossary_217-224.indd 219 219 independent sample t-test—a statistic that compares the difference in the mean score for two groups that are independent of, or unrelated to, each other. independent variable—a variable that is manipulated or compared in a study. inductive reasoning—the process of drawing conclusions from the data; sometimes referred to as moving from the specific to the general. inferential statistics—statistical techniques that use study samples to make generalizations that apply to an entire population. informant—the individual providing data in a study. institutional animal care and use committee—the organization that reviews and approves the ethical practices associated with animal research. institutional review board—the organization that reviews and approves the ethical practices of research studies that use human participants. instrumentation threat—arises when problems with the instrument itself (e.g., reliability, calibration) affect the outcome of a study. interaction effect—the pattern of differences of the dependent variable for at least two independent variables. In intervention research, often one of the variables is a within-group variable of time and the other variable is the between-group variable of intervention vs. control. internal consistency—unity or similarity of items on a multi-item measure. internal responsiveness—the ability of a test to detect change. internal validity (of a study)—the ability to draw conclusions about causal relationships; in the case of intervention research, the ability to draw conclusions as to whether or not the intervention was effective. inter-rater reliability—consistency in scores among two or more testers or raters. intervention study—a study that examines whether or not an intervention has a positive outcome. intra-class correlation coefficient (ICC)—a measure of relationship, typically used in reliability studies, that ranges from 0.0 to 1.0 and indicates the degree to which two administrations of a measure are related. level of significance—also called alpha or ␣, the level at which a statistic is identified as statistically significant; most typically set at less than 0.05. levels of evidence—a hierarchical approach that rates research evidence from strongest to weakest. life history—the story of an individual’s life or a specific aspect of the individual’s life. Likert scale—a response scale, commonly used in questionnaires, that provides several options on a continuum, such as “strongly disagree” to “strongly agree.” linear regression—a statistical method in which several values are entered into a regression equation to determine how well they predict a continuous outcome. 26/10/16 5:00 pm 220 Glossary logistic regression—a statistical method in which several values are entered into a regression equation to determine how well they predict a categorical outcome. longitudinal research—a study in which data are collected over at least two time points and typically cover an extended period of time, such as several years or decades, with the purpose of examining changes over time. matching—a process of assigning participants to groups in which individuals are matched on some characteristic (e.g., gender, diagnosis) to ensure that equal numbers are assigned to each group. maturation threat—a validity problem which arises when changes that occur over time are due to natural changes in the individual, such as developmental changes or the healing process, rather than to the variable of interest. mean—same as average; a descriptive statistic that balances the scores above and below it. measurement error—the difference between a true score and an individual’s actual score. measure of central tendency—the location of the center of a distribution. median—a descriptive statistic indicating the score value that divides the distribution into equal lower and upper halves of the scores. member checking—returning results to the research participants so that they can check and verify or correct the data and analysis. merging data—name used in mixed-methods research when qualitative and quantitative research results are reported together. meta-analysis—a quantitative synthesis of multiple studies. method error—measurement error that is due to some inaccuracy inherent in the assessment itself or the manner in which it was administered or scored. minimally clinically important difference (MCID)—the amount of change on a particular measure that is deemed clinically important to the client. mixed-methods research—a research design that combines qualitative and quantitative methods. mixed model ANOVA—a difference statistic used when between-group and within-group analyses are conducted simultaneously; two or more groups are compared over two or more time points. mode—the score value that occurs most frequently in the distribution. mortality—also called attrition; loss of participants who have enrolled in a study. Mortality can occur for numerous reasons, including voluntary drop-out, relocation, or death. multicollinearity—the circumstance in which variables (or, in the case of regression, predictors) are correlated with one another. multiple linear regression—a statistical analysis that examines the relationship of two or more predictors and a continuous outcome. 4366_Glossary_217-224.indd 220 multiple logistic regression—a statistical analysis that examines the relationship of two or more predictors and a categorical outcome. narrative research—a qualitative design that uses a storytelling approach. narrative review—a descriptive review of the literature on a particular topic. naturalistic inquiry—research done from the point of view that multiple perspectives exist that can only be understood within their natural context. naturalistic observation—data collection that involves observing individuals in real-world circumstances. nondirectional hypothesis—an exploratory method of study, suggesting that the researcher does not have preconceptions about what the study results may be, but may assume that a difference or relationship exists. nonequivalent control group design—also known as a nonrandomized controlled trial; a comparison of two or more groups without randomization to group. nonexperimental research—research that does not manipulate the conditions but rather observes a condition as it exists; used to answer descriptive and relationship questions. nonrandomized controlled trial—a study in which at least two groups are compared, but participants are not randomly assigned to groups; also called a quasi-experiment. normal distribution—a type of frequency distribution that represents many data points distributed in a symmetrical, bell-shaped curve. norm-referenced—a measure in which an individual’s score is compared against the scores of other individuals. observational study—research in which only the naturally occurring circumstances are studied, as opposed to assigning individuals to an intervention or research condition. odds ratio—an estimate of the odds when the presence or absence of one variable is associated with the presence or absence of another variable. one-way ANOVA—a difference statistic that compares three or more groups at a single point in time. open coding—the process of identifying simple categories within qualitative data. open-ended interview—an interview without set questions that allows the process to direct the questioning. order effect—a type of testing effect in which the order in which a test is administered affects the outcome. participant bias—a threat to internal validity caused by the research participant consciously or unconsciously influencing the outcome. participant observation—a method of data collection in qualitative research in which the researcher becomes a participant and engages in activity with the research participant(s). 26/10/16 5:00 pm Glossary Pearson product-moment correlation—an inferential statistic that examines the strength of the relationship between two continuous variables. peer-reviewed—an appraisal process that uses experts in the field to determine whether or not an article should be published in a scientific journal. phenomenology—a design in qualitative research that seeks to understand the experience of others from their perspective. PICO—a type of research question used to answer efficacy questions; includes the population or problem, intervention, comparison, and outcome. power—the ability of a study to detect a difference or relationship. practice effect—a type of testing effect in which exposure to the pretest affects the outcomes of subsequent testing. practice guidelines—also called clinical guidelines or clinical practice guidelines; recommendations to practitioners for addressing specific clinical situations, based on research evidence and opinions of experts. predictive study—a type of research that provides information about factors related to a particular outcome. predictive validity—the process of supporting construct validity by finding that a test is capable of predicting an expected outcome. pre-experimental design—a research design that examines the outcomes of an effect (such as an intervention) on a group of individuals without comparing that effect to a control or comparison group. pre-experimental research—a research design in which a single group is compared before and after an intervention. prevalence—the proportion of individuals within a population who have a particular condition. primary research—the original studies that are contained in a systematic review. primary source—research studies and professional and governmental reports that are based on original research or data collection. prolonged engagement—spending extended time in the field to collect data to establish familiarity and trust. prospective—a study that is designed before data collection takes place. prospective cohort study—a research design that follows two groups of individuals with different conditions over time. provocative test—a diagnostic test in which an abnormality is induced through a manipulation that provokes the condition. psychometric properties—quantifiable characteristics of a test that speak to an instrument’s consistency and accuracy; include reliability, validity, and responsiveness. publication bias—the tendency of scientific journals to publish positive findings and reject negative findings. 4366_Glossary_217-224.indd 221 221 purposive sampling—identifying participants for a study because they will serve a particular purpose. Pygmalion effect—also called Rosenthal effect; occurs when the intervention leader’s positive expectations for an outcome lead the participants to respond more favorably. qualitative research—a type of research that studies questions about meaning and experience. quality-adjusted life year (QALY)—combines an assessment of quality of life and the number of years of life added by an intervention. quantitative research—a type of research that uses statistics and describes outcomes in terms of numbers. quasi-experimental study—a research design that compares at least two groups, but does not randomly assign participants to groups; also called a nonrandomized controlled trial. random assignment—when research participants are assigned to groups in such a way that each participant has an equal chance of being assigned to the available groups. randomized controlled trial (RCT)—a type of research design that includes at least two groups (typically an experimental group and a control group), and participants are randomly assigned to the groups. random sampling—a type of sampling in which each potential participant has an equal chance of being selected for participation in a study. range—a descriptive statistic that indicates the lowest and highest scores. reflective practitioner—a practitioner who intentionally approaches clinical situations with an inquisitive mind and considers past experiences when making decisions. reflexive journal—a diary maintained by a qualitative researcher that identifies personal biases and perspectives. reflexivity—the process in which a researcher identifies personal biases and perspectives so that they can be set aside. regression equation—a calculation that determines the extent to which two or more variables predict a particular outcome. regression to the mean—the tendency of individuals who have extreme scores on a measure to regress toward the mean when the measure is re-administered. reliability—the consistency and accuracy of a measure. repeated measures ANOVA—a difference statistic, similar to a dependent sample or within-group t-test, that is used when the means are compared over more than two time periods or more than two different tests. replication—conducting a study that duplicates most or all of the features of a previous study. research design—the specific plan for how a study is to be organized and carried out. 26/10/16 5:01 pm 222 Glossary research design notation—a symbolic representation of the design of a research study. response bias—a measurement error that creates inaccuracy in survey results. response rate—the number of individuals who respond to a survey divided by the number of individuals who initially received the survey. responsive measure—a measure that is able to detect change, typically before and after an intervention. retrospective cohort study—a research design that looks back at existing data to compare different groups of individuals over a specified time period. retrospective intervention study—a study that looks at the efficacy of an intervention, but does so after data collection has already taken place and relies on the use of existing data. risk ratio (RR)—also called relative risk; the probability of an event happening to one group of exposed individuals as compared with another group that is not exposed to a particular condition. Rosenthal effect—also called Pygmalion effect; occurs when the intervention leader’s positive expectations for an outcome lead the participants to respond more favorably. sampling error—the degree to which a sample does not represent the intended population. saturation—the point in qualitative data collection at which no new data or insights are emerging. scatterplot—a graph of plotted points that shows the relationship between two sets of data. scientific method—an approach to inquiry that requires the measurement of observable phenomena. secondary research—research that combines the results of previously conducted studies, such as in a systematic review. secondary source—documents or publications that interpret or summarize a primary source. selection threat—differences in groups that occur due to the selection process. selective coding—the process of articulating a theory based on the categories and relationships between categories identified during data analysis. sensitivity—the accurate identification of individuals who possess the condition of interest. shared decision-making—process of making health-care decisions that includes the clinician and the client as equal partners. single-subject design—a research design in which each participant’s response to an intervention is analyzed individually. skewed distribution—a frequency distribution in which one tail is longer than the other. snowball sampling—the selection of participants by having already-identified participants nominate people they know. 4366_Glossary_217-224.indd 222 Spearman correlation—an inferential statistic that examines the strength of the relationship between two variables when one or both of the variables is rank-ordered. specificity—the correct identification of individuals who do not have a condition. standard deviation—a descriptive statistic that expresses the amount of spread in the frequency distribution and the average amount of deviation by which each individual score varies from the mean. standardized test—a test that has established methods of administration and scoring. statistical conclusion validity—the accuracy of the conclusions drawn from the statistical analysis of a study. statistically significant difference—when the difference between two or more groups is not likely due to chance. statistical significance—expresses the probability that the result of a given experiment or study could have occurred purely by chance. study heterogeneity—condition that exists when studies within a systematic review differ on one or more important characteristics, such as interventions used, outcomes studied, or settings. survey research—descriptive study that uses a questionnaire to collect data. systematic review—a methodical synthesis of the research evidence regarding a single topic that can be replicated by others. testing effects—changes that occur as a result of the test administration process. test-retest reliability—consistency in scores across two or more test administrations. thematic synthesis—an approach to synthesizing the results of multiple qualitative studies into a systematic review. themes—patterns identified within qualitative data. thick description—detailed accounts of qualitative data. third variable problem—arises when two constructs may be related, but a third variable could account for the relationship or influence the relationship. trait error—measurement error that is due to some characteristic of the individual taking the test. transferability—the extent to which information from a qualitative study may apply to another situation. translational research—when findings from the laboratory are used to generate clinical research. triangulation—the use of multiple resources and methods in qualitative research to verify and corroborate data. true experiment—a randomized controlled trial. trustworthiness—describes how accurately qualitative research represents a phenomenon. Type I error—a statistical conclusion error that occurs when the hypothesis is accepted, yet the hypothesis is actually false. 26/10/16 5:01 pm Glossary Type II error—a statistical conclusion error that occurs when the hypothesis is rejected, yet the hypothesis is true. validity—in the case of a measure, the ability of that measure to assess what it is intended to measure. In the case of a study, when the conclusions drawn are based on accurate interpretations of the study findings and not confounded by alternative explanations. 4366_Glossary_217-224.indd 223 223 variability—the spread of scores in a distribution. variable—characteristic of people, activities, situations, or environments that is identified and/or measured in a study and has more than one value. within-group comparison—a comparison of changes or differences within the same groups, such as a pretest and posttest comparison. 26/10/16 5:01 pm 4366_Glossary_217-224.indd 224 26/10/16 5:01 pm Index Page numbers followed by b denote boxes; f denote figures; t denote tables. Abstract in evidence evaluation, 35 of systematic review, 185, 186, 188b Acknowledgments, in evidence evaluation, 34, 37 Activities of daily living (ADLs), 134, 166b Adams, John, 1 ADHD. See Attention deficit hyperactivity disorder ADLs. See Activities of daily living Adolescent/Adult Sensory Profile, 135b Advanced search terms, 26–29, 27f, 28f, 30, 30b, 31 AGREE. See Appraisal of Guidelines for Research and Evaluation Agreement, in shared decision-making, 210 ALS. See Amyotrophic lateral sclerosis Alternative treatment threat, 89 Alzheimer’s Association, 196 Alzheimer’s disease, 14, 49, 78, 115b, 115t, 205 ApoE allele in, 159 Ginkgo biloba in, 92 American Journal of Audiology, 31 American Journal of Occupational Therapy, 31 American Journal of Pharmacy Education, 150 American Journal of Speech-Language Pathology, 31 American Occupational Therapy Association (AOTA), 6–7, 31, 34, 195, 196b–197b American Physical Therapy Association (APTA), 31, 195 American Psychological Association, 24t, 37, 97b American Speech-Language-Hearing Association (ASHA), 7, 31, 97b, 184, 195 Amyotrophic lateral sclerosis (ALS), 185 Analogy, in representing themes, 173b, 180 Analysis of covariance (ANCOVA), 69–70, 105 Analysis of variance (ANOVA) critical thinking question of, 79 in design efficacy, 105, 106b, 112b, 115b, 115f in inferential statistics, 66–69, 67t, 69f, 70b ANCOVA. See Analysis of covariance ANOVA. See Analysis of variance Anterior cruciate ligament surgery, 22 Anxiety study, 14, 15b AOTA. See American Occupational Therapy Association Apolipoprotein E (ApoE), 121–122 in Alzheimer disease, 159 Applied research, 48–51, 51f, 56 Appraisal of Guidelines for Research and Evaluation (AGREE), 198, 199b Apraxia, 108, 165, 166b APTA. See American Physical Therapy Association Archives of Physical Medicine and Rehabilitation, 31 Artifacts, 167 ASD. See Autism spectrum disorder ASHA. See American Speech-Language-Hearing Association Assessment studies, 13–14, 16 Assignment threat, 85t, 87, 122, 124 Attention deficit hyperactivity disorder (ADHD), 22 applied research in, 49 sampling error of, 96 in shared decision-making, 209 Attrition, 87t, 93 Audiology Research, 31 Audiotape, in data collection, 167 Audit trail, 178 Authorship, in evidence evaluation, 35 Autism, 8b, 34–35, 136 facilitated communication in, 97b in shared decision-making, 207 Autism Speaks, 64b Autism spectrum disorder (ASD), 64b. See also Autism Avon Longitudinal Study of Parents and Children, 147 Axial coding, 170 Back pain, 2, 88b, 109b, 154b as MeSH search term, 26 Bar graph, 65b Basic Body Awareness intervention, 11b, 108–110 Basic research, 48–51, 51f, 56 BBS. See Berg Balance Scale Bedrest, for back pain, 2 Berg Balance Scale (BBS), 91, 93–94, 104, 130, 192 Between-group comparison, 105–107, 106b, 106f, 107f Bilateral hearing aids, 204 Bill and Melinda Gates Foundation, 34 Bimodal distribution, 61, 63f Binge drinking, 150 Bodett, Tom, 127 Bonferroni correction, 83 Boolean operators, 26–27, 28f, 30, 30b BOTMP. See Bruininks-Oseretsky Test of Motor Proficiency Bracketing, 170 Brain injury, 51f, 112b, 207 Brain Trauma Foundation, 196 Breast cancer, 147, 148b British Journal of Occupational Therapy, 25, 31 Bruininks-Oseretsky Test of Motor Proficiency (BOTMP), 140, 209 Caffeinated beverages, 2 Canadian Journal of Occupational Therapy, 31 Carotid sinus hypersensitivity, 212 Carpal tunnel syndrome, 131 Case-control design, 155, 157–159, 159t Case reports, in evidence hierarchy, 11t, 12 Categorical data, 128 Categorical variables, 52 225 4366_Index_225-234.indd 225 26/10/16 5:00 pm 226 Index CDC. See Centers for Disease Control and Prevention Ceiling effect, 138 Centers for Disease Control and Prevention (CDC), 32, 34, 38, 146 Centers for Medicare & Medicaid Services (CMS), 146 CENTRAL database, 212 Central tendency measure, 61–62, 63f Cerebral palsy, 43, 169b Cerebrovascular accident (CVA), 17 CFT. See Children’s Friendship Training CHAMP. See Comprehensive High-Level Activity Mobility Predictor Childhood obesity, 98 Children’s Friendship Training (CFT), 94–95 Chi-squared analysis, 148b Chronic joint pain, 207 Chronic low back pain (CLBP), 191b. See also Back pain Chronic obstructive pulmonary disease (COPD), 14, 140 CI. See Confidence interval CIMT. See Constraint-induced movement therapy CINAHL. See Cumulative Index of Nursing and Allied Health Literature CLBP. See Chronic low back pain Client-centered practice, 5, 6b, 204–206, 205f Client’s lived experience, 15–16, 18 Client’s use of evidence, 33b Clinical decision-making, 3, 3f. See also Shared decision-making Clinical guidelines. See Practice guidelines Clinically significant difference, 138 ClinicalTrials.gov, 23t, 189 Cluster randomized controlled trial, 112. See also Randomized controlled trial CMS. See Centers for Medicare & Medicaid Services Cochlear implants, 75b, 188b, 199 Cochrane Bone, Joint and Muscle Trauma Group Specialised Register, 212 Cochrane Collaboration, 185 Cochrane Database of Systematic Reviews, 7, 24, 25, 26f Cochrane Library, 23t, 185 Code-recode procedure, 178 Cognitive remediation, 184 Cohen’s d, 76, 192b Cohen’s kappa, 132 Cohort study, 117, 119b, 155, 157–158, 159t Cold and flu, 43, 45f Common Ground website, 211 Communication, in shared decision-making, 206–211, 206b, 207f, 211b Compensatory demoralization, 86t, 92 Compensatory equalization, 86t, 92, 92b Comprehensive High-Level Activity Mobility Predictor (CHAMP), 133b Concurrent validity, 135, 136f Condition question, 16–17 Confidence interval (CI), 76–79, 154, 154b, 212–213 Confirmability, 177t, 178 Conflict of interest, 34 Connecting data, 175, 176b Consensus building, 208 Constant comparative method, 170 Constraint-induced movement therapy (CIMT), 41, 42b, 49, 51f Constructivism, 165 Construct validity, 135–136, 136f Contemporary Issues in Communication Sciences and Disorders, 31 Continuous data, 128–131 Continuous variables, 52 Continuum of client involvement, 205, 205f Control group, 11, 12b, 40, 41, 94–95, 108, 109b. See also Nonequivalent control group 4366_Index_225-234.indd 226 Control variables, 53–55 Convenience sampling, 96 Convergent validity, 135, 136f Coordination disorder, 113b COPD. See Chronic obstructive pulmonary disease Correlational studies. See Nonexperimental research Correlation coefficient, 136b Correlation matrix, 73, 73b, 151, 151t, 153b Cost effectiveness, 122 Covariation, of age, 87–88 Crafting questions, in shared decision-making, 207 Credibility. See also Reliability; Validity determination of, 31–34 duplicate publication in, 34 exercise in, 34–35 funding bias in, 34 impact factor in, 33 participant bias in, 86t, 91–93 peer-review process in, 33 threats to experimenter bias in, 86t, 91–93 publication bias in, 34, 186–189, 190b, 200 in qualitative studies, 177, 177t response bias in, 150 Criterion-referenced measure, 129–131 Critically appraised paper, 3 Critical thinking questions of ANOVA, 79 of cross-sectional research, 56 of decision aids, 214 of EBP, 16–17, 37 of hypothesis, 56 of internal validity, 122 of publication bias, 200 of qualitative research, 56, 179 of RCT, 17 of reliability, 141 of research design efficacy, 122–123 of shared decision-making, 16, 213–214 of statistics, 78–79 of systematic review, 199–200 of validity, 100, 141 of variables, 56 Cronbach’s alpha, 134b, 142 Crossover study design, 55, 110, 112b, 123 Cross-sectional research, 47–48 critical thinking question of, 56 data collection in, 15 exercise in, 49–51 Cumulative Index of Nursing and Allied Health Literature (CINAHL), 25, 26f, 31, 212 CVA. See Cerebrovascular accident Data analysis, 167–168, 169b Databases. See Health-care databases Data collection of cross-sectional research, 15 of longitudinal research, 15 methods of, 166–167, 167f, 168f saturation in, 170 Decision aids, 210–213, 211b critical thinking question of, 214 Decision-making, client inclusion in, 5, 6b Degrees of freedom (df), 148b Delayed treatment control group (DTC), 94–95 Dementia, 121–122, 138 Demographic variables, 87, 88b, 89, 102 26/10/16 5:00 pm Index Denver Developmental Screening, 132 Department of Health and Human Services, U.S., 197–198 Dependability, 177t, 178 Dependent sample t-test, 66, 67t, 71f Dependent variables, 53–55, 54b Depression, 14, 15b, 154b Descriptive research evaluation of, 157–158 group comparison in, 147–149, 148b, 149b incidence and prevalence in, 146–147, 147b, 148b survey research in, 149–150 Descriptive statistics, 60–64 Descriptive study design, 14 Design efficacy, 105, 106b, 112b, 115b, 115f, 122–123 Developmental coordination disorder, 147 df. See Degrees of freedom Diabetes, 99b Diagnostic arthroscopy, 137 Dichotomous data, 128–129 Difference studies. See Experimental research d-Index, 193t Directional hypothesis, 46, 56 Discrete data, 128–131 Discriminant validity, 136, 136f Discussion section in evidence evaluation, 37 of systematic review, 186 Divergent validity, 135–136, 136f Down syndrome, 149, 187b DTC. See Delayed treatment control group Duplicate publication, 34 Dysphagia, 147b, 176b Eating disorders, 111b EBP. See Evidence-based practice Ecological validity, 95t, 96–97. See also Validity Educational Resources Information Center (ERIC), 23t Effectiveness study, 98, 99b Effect size (ES), 70b, 76, 79, 190–192, 191b, 192b, 193t, 200 Efficacy questions, 16 construction of, 10–11 experimental study of, 42b levels of evidence in, 10–12, 10t, 11t, 12b research designs for, 10–12 Efficacy studies. See Experimental research EFPT. See Executive Function Performance Test Ehrman, Bart, 21 Electronic databases. See Health-care databases Eliciting information, in shared decision-making, 207 EMBASE database, 212 Embedding data, 175, 176b EMs. See Explanatory models EMST. See Expiratory muscle strength training Epidemiology, 146–147 ERIC. See Educational Resources Information Center ES. See Effect size Eta squared (η2), 61t, 76 ETCH-M. See Evaluation Tool of Children’s Handwriting-Manuscript Ethical issues, 36b, 41 Ethnography, 170t, 171–173, 173b European Journal of Physiotherapy, 25 Evaluation Tool of Children’s Handwriting-Manuscript (ETCH-M), 12b Evidence-based practice (EBP) components of, 3, 3f critical thinking question of, 16–17, 37 evaluating evidence in, 7–8, 8b 4366_Index_225-234.indd 227 227 evaluating outcomes in, 8, 8b guidelines of, 197b implement findings in, 8, 8b introduction to, 2–6 process of, 7–8, 7f professional organizations in, 31 question formulation in, 7, 8b, 9–16, 9t reasons for, 6–7 shared decision-making in, 204–206, 205f Evidence evaluation, 31–37, 36b Evidence hierarchy case reports in, 11t, 12 control group in, 11, 12b expert opinion in, 11t, 12 pretest-posttest in, 11, 11t, 12b of research design, 10–13, 10t, 11t, 12b Executive Function Performance Test (EFPT), 132 Exercises in confidence intervals, 78 in continuous data, 130–131 in credibility, 34–35 in cross-sectional research, 49–51 in decision aids, 211–213 in discrete data, 130–131 in EBP question formulation, 16 in effect size, 192 in eliciting information, 207 in group comparison studies, 107 in health-care database, 22, 31, 118–120, 185 in internal validity, 93–95 in level-of-evidence, 13 in practice guidelines, 198–199 in psychometric properties, 140 in PubMed, 118–120 in qualitative research, 49–51, 168, 175 in quantitative research, 49–51 in research design, 49–51, 114, 118–120, 121–122, 157 in search strategies, 30 in sensitivity and specificity, 137–138 in shared decision-making, 209–210, 211–213 in statistical conclusion validity, 84–85 in statistics, 64, 71–72 in systematic review, 185 in trustworthiness, 178–179 in validity, 84–85, 93–95, 98 in variables, 55 Experimental designs. See Research designs Experimental research comparison to nonexperimental, 43, 45t control group in, 40 efficacy question of, 42b nonrandomized controlled trial in, 41 Experimenter bias, 86t, 91–93 Expert opinion in evidence hierarchy, 11t, 12 as Level V evidence, 159, 159t Expiratory muscle strength training (EMST), 50 Explanatory models (EMs), 179 Ex post facto comparisons, 147–149 External responsiveness, 140 External scientific evidence, 3, 3f External validity compared to internal, 97–98, 99b ecological validity in, 95t, 96–97 sampling error in, 95t, 96 threats to, 95–98, 95t, 97b, 99b Extraneous variables, 53–55 26/10/16 5:00 pm 228 Index Facilitated communication, 97b Factorial design, 53, 114, 115b, 116b, 118, 123 Falling risk, in older adults, 154, 154t, 211–213 False negative, 136, 137f, 137t, 139b False positive, 136, 137f, 137t, 139b Fibromyalgia, 206b, 207f Field notes, 167 Filters, in database searches, 27–29, 28f, 30b FIM. See Functional Independence Measure Finger tapping test (FT), 93–94 Fishing, 83, 83t Floor effect, 138 Focus groups, 167 Forest plot, 192–193, 194b Fragile X syndrome, 149 Frank, Arthur, 175b Frequency distribution, 60–61, 62b FT. See Finger tapping test F-test. See Analysis of variance Fugl-Meyer Assessment of Motor Recovery After Stroke, 14, 138 Functional Independence Measure (FIM), 72, 74, 93–94, 119b, 130, 135 Funding bias, 34. See also Publication bias Gates Foundation, 34 General Educational Development (GED), 77b Ginkgo biloba, 92 Glasgow Coma Scale, 130, 130b Google Scholar, 23t, 24, 37 Grade point average, 60 Grey literature, 189 Grounded theory, 170–171, 170t, 172b Group comparison studies, 147–149, 148b, 149b. See also Experimental research of between-group, 105–107, 106b, 106f, 107f exercise in, 107 in predictive research, 155–156 of within-group, 105–107, 106b, 106f, 107f, 109b Guide to Healthy Web Surfing, 32b Gyrokinesis, 109b HAPI. See Health and Psychosocial Instruments Hawthorne effect, 87t, 92, 110, 122, 124 Hazard ratio (HR), 50, 156, 156b, 157t, 192b, 193t Health and Psychosocial Instruments (HAPI), 23t Health-care databases accessing evidence in, 29–30, 29f exercise in, 22, 31, 118–120, 185 expanding searches for, 29, 30b introduction to, 22 search strategies of, 25–29, 26f, 27f, 28f, 29f, 30b selection of, 22–25, 23t–24t Hearing in Noise Test (HINT), 114 Heterogeneity, 189–190, 190b, 200 Hierarchy of evidence, 158–159, 159t of research design, 10–13, 10t, 11t, 12b High motor competence (HMC), 69, 70b HINT. See Hearing in Noise Test History threats, 85t, 89–90 HLE. See Home literacy environment HMC. See High motor competence Home literacy environment (HLE), 51 HR. See Hazard ratio Hurston, Zora, 103 Hypothesis critical thinking question of, 56 in quantitative research, 43–46 variables in, 54b Hypothesis testing, 52, 52t 4366_Index_225-234.indd 228 IACUC. See Institutional animal care and use committee IADL. See Instrumental activities of daily living ICARS. See International Cooperative Ataxia Rating Scale ICC. See Intra-class correlation coefficient ID. See Intellectual disability IEP. See Individualized education plan Impact factor, 33 Incidence, 146–147, 147b, 148b Independent sample t-test, 66, 67t, 71f, 115b Independent variables, 52–55, 54b Individualized education plan (IEP), 61 Inductive reasoning, 164–165 Inferential statistics, 60. See also Statistics for analyzing relationships, 72–76, 72f ANCOVA in, 67t, 69–70 ANOVA in, 66–69, 67t, 69f, 70b correlation matrix in, 73, 73b introduction to, 65 one outcome with multiple predictors in, 74, 75b scatterplots in, 72, 73f significance in, 66 t-test in, 66, 67t, 68b, 71f, 78, 106b, 115b, 148b Informants, 171–172 Institutional animal care and use committee (IACUC), 35–36, 36b Institutional review board (IRB), 35–36, 36b Instrumental activities of daily living (IADLs), 196b, 201 Instrumentation threats, 86t, 91 Intellectual disability (ID), 51 Interaction effect, 69, 69f, 70b, 115f in group comparisons, 105–107, 106b, 106f, 107f Interactive Metronome, 209, 214 Interlibrary loan system, 31 Internal consistency, 133–134, 134b, 135b, 141 Internal responsiveness, 138 Internal validity assignment in, 85t, 87, 122, 124 attrition in, 87t, 93 compared to external validity, 97–98, 99b critical thinking question of, 122 exercise in, 93–95 experimenter bias in, 86t, 91–93 history in, 85t, 89–90 instrumentation in, 86t, 91 maturation in, 85t, 88–89, 89b, 122, 124 mortality in, 87t, 93 participant bias in, 86t, 91–93 in RCT, 11 regression to the mean in, 86t, 90, 90f selection in, 85t, 87 testing in, 86t, 90–91 threats to, 85–98, 85t–87t, 88b International Committee of Medical Journal Editors, 189 International Cooperative Ataxia Rating Scale (ICARS), 104 International Journal of Therapy and Rehabilitation, 31 Inter-rater reliability, 132–133, 133b Intervention question, 9, 16–17 Intervention studies. See Experimental research Intra-class correlation coefficient (ICC), 132b, 133b, 134b Introduction section in evidence evaluation, 35 of systematic review, 185 IRB. See Institutional review board JAMA. See Journal of the American Medical Association Jebson Taylor Hand Function Test (JTT), 140 Johns Hopkins University, 36b Journal of Communication Disorders, 25 Journal of Intellectual Disability Research, 51 26/10/16 5:00 pm Index Journal of Orthopaedic and Sports Physical Therapy, 195–196 Journal of Speech, Language, and Hearing Research, 31 Journal of the American Medical Association (JAMA), 29 JTT. See Jebson Taylor Hand Function Test Key word search, 25–26, 25f, 26f, 30, 30b Kinesio taping, 24, 27, 108 Knee injury, 22 Labral tear, of the shoulder, 137 Lee Silverman Voice Treatment (LSVT), 5 Level I evidence, 11, 11t, 158, 159t, 186, 199 Level II evidence, 11, 11t, 108, 158, 159t Level III evidence, 11, 11t, 110, 158, 159t Level IV evidence, 11–12, 11t, 12b, 108, 158–159, 159t Level of significance, 66 Levels of evidence, 10–13, 10t, 11t, 12b, 158–159, 159t Level V evidence, 12, 159, 159t Librarian, 30 Lift for Life, 99b Likert scale, 129, 129f Limits, in database searches, 27–29, 28f, 30b, 37 Linear regression, 74, 75b, 78t, 152, 153b Literature searches accessing evidence in, 29–30, 29f expanding of, 29, 30b introduction to, 22 selection of databases in, 22–25, 23t–24t strategies of, 25–29, 26f, 27f, 28f, 29f, 30b Lived experience, 15–16, 18 LLL. See Lower-limb loss LMC. See Low motor competence Logistic regression, 74–76, 75t, 77b, 78t. See also Multiple linear regression Longitudinal research, 15, 47–48, 48b, 49–51 Lower back pain, 88b, 109b. See also Back pain Lower-limb loss (LLL), 133b Low motor competence (LMC), 69, 70b Low power, 83–84, 83t Low vision, 196b–197b LSVT. See Lee Silverman Voice Treatment MA. See Mental age Matching study, 87 Maturation threats, 85t, 88–89, 89b, 122, 124 Mayo Clinic, decision aids by, 211–212 Mayo Clinic Health Letter, 25 MCID. See Minimally clinically important difference Mean, 61, 63f, 65b Measurement error, 131. See also Reliability Measure of central tendency, 61–62, 63f Median, 61, 63f Medical research librarian, 30 Medical Subject Headings (MeSH®), 24–26, 25f, 30, 30b Medline database, 23t, 32, 32b, 212 Member checking, 177 Memory or Reasoning Enhanced Low Vision Rehabilitation (MORE-LVR), 109b Mental age (MA), 51 Merging data, 175, 176b MeSH®. See Medical Subject Headings Meta-analysis, 190–192, 190f, 191b, 200 Method error, 131. See also Reliability Methods section, 35–36, 36b, 185 Michigan Hand Outcomes questionnaire (MQ), 140 Mind/body interventions, 206, 207f Mini-BESTest, 91 Minimally clinically important difference (MCID), 140 4366_Index_225-234.indd 229 229 Mirror therapy, 7, 78 Misconduct, 36b Mixed-method research, 175, 176b Mixed model analysis of variance, 67t, 69 Mode, 61, 63f MORE-LVR. See Memory or Reasoning Enhanced Low Vision Rehabilitation Mortality threats, 11b, 87t, 93 Motion coordination disorder, 209 MQ. See Michigan Hand Outcomes questionnaire MSFC. See Multiple Sclerosis Functional Composite Mulgan, Geoff, 203 Multicollinearity, 152, 152f Multiple linear regression, 152, 152b, 153b Multiple logistic regression, 152–154, 154b, 154t Multiple sclerosis, 14, 15b, 18, 65b decision aids in, 211b qualitative research of, 46, 48b shared decision-making in, 208 Multiple Sclerosis Functional Composite (MSFC), 104 NARIC. See National Rehabilitation Information Center Narrative research, 170t, 173–174, 174b Narrative reviews, 184 National Center for Advancing Translational Sciences, 49 National Guideline Clearinghouse (NGC), 197–198 National Health and Nutrition Examination Surveys (NHANES), 146, 150 National Institute for Health and Clinical Excellence (NICE), U.K., 188b National Institutes of Health (NIH), 34, 49 Public Access Policy of, 30 National Library of Medicine, 24, 24t, 32 National Rehabilitation Information Center (NARIC), 23t National Stroke Foundation, 196 Naturalistic inquiry, 164 Naturalistic observation, 165, 166b Negatively skewed distribution, 61–64, 63f Neuromotor task training (NTT), 113b Neuroplasticity, 49, 51f News media, in evaluating evidence, 32 NGC. See National Guideline Clearinghouse NHANES. See National Health and Nutrition Examination Surveys NICE. See National Institute for Health and Clinical Excellence NIH. See National Institutes of Health Nintendo Wii Fit training, 113b Nondirectional hypothesis, 46 Nonequivalent control group, 113–114, 113b Nonexperimental research, 41–43, 44b, 45f, 45t. See also Descriptive research; Predictive research Nonrandomized controlled trial, 11, 11t, 41, 110–114, 113b Normal distribution, 61–64, 63f, 64f Norm-referenced measure, 129–131, 141 NTT. See Neuromotor task training Nuffield Dyspraxia Programme, 108 Obesity, 147 O’Brien’s Test, 137–138 Observational study. See Nonexperimental research Occupational therapy (OT), 22, 77b Occupational Therapy Code of Ethics and Ethics Standards, 6–7 Odds ratio (OR) in effect size, 192b, 193t hazard ratio compared to, 156, 156b, 157t in logistic regression, 74–76, 75t, 77b in multiple logistic regression, 152–154, 154b, 154t risk ratio compared to, 156, 156b, 157t Omega squared (ω2), 61t, 76 26/10/16 5:00 pm 230 Index One-way analysis of variance, 67t, 68, 71f, 115b, 148–149, 148b, 149b Open coding, 170 Open-ended interview, 166, 175b OR. See Odds ratio Order effects, 86t, 91 Orthopedic surgeon, 6b Orthotic devices, 24 OT. See Occupational therapy OT Practice, 25 OT Search, 23t. See also American Occupational Therapy Association OTseeker database, 23t, 184–185 Ottawa Hospital Research Institute, 210 Outcome question, 16–18 Oxford Centre for Evidence-Based Medicine, 158–159, 159t Pacemakers, 212 Parkinson’s disease, 5, 49–50, 65b, 97, 115b nonexperimental research in, 43, 44b Participants bias of, 86t, 91–93 observation by, 171–172 selection of, 165 Patterned Elicitation Syntax Test, 129 PCI. See Phase coordination index Pearson correlation coefficients, 153b Pearson product-moment correlation, 73, 78t, 132b, 136b, 151b PEDI. See Pediatric Evaluation of Disability Inventory Pediatric Evaluation of Disability Inventory (PEDI), 135 PEDro database of, 23t, 184–185 ratings of, 22b, 24t scale of, 121–122, 121b, 123 Peer-review process, 33, 37 Phalen’s Test, 131, 132 Phase coordination index (PCI), 44b Phenomenology, 168–170, 170t, 172b Phonological Awareness Test, 74, 75b, 131 Photovoice, 167, 168f Physical Therapy Journal, 31 Physiotherapy Evidence Database, 121 PICO question format, 10, 16 Positively skewed distribution, 61–64, 63f Post hoc analysis, 115b Posttest, in evidence hierarchy, 11, 11t, 12b Posttraumatic stress disorder (PTSD), 97, 165 Power. See Low power Practice effects, 86t, 90–91 Practice guidelines, 195–196, 196b–197b, 198f applying and using, 199 evaluating strength of, 198–199, 199b exercise in, 198–199 finding of, 184–185, 197–198 as Level I evidence, 158, 159t reading of, 185–186, 187b, 188b Practitioner experience, 3–5 Predictive research case-control design in, 155, 157–159, 159t cohort studies in, 155, 157–158, 159t correlation matrix in, 73, 73b, 151, 151t, 153b evaluation of, 157–158 group comparison studies in, 155–156 hazard ratios in, 50, 156, 156b, 157t, 192b, 193t multiple linear regression in, 152, 152b, 153b multiple logistic regression in, 152–154, 154b, 154t odds ratios in, 152–154, 154b, 154t, 156, 156b, 157t research design in, 14–15 risk ratios in, 155–156, 156b, 156t, 157t using correlational methods, 150–154, 151b 4366_Index_225-234.indd 230 Predictive validity, 136, 136f Pre-experimental research, 41 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), 184 Pretest-posttest ANOVA in, 69, 69f control group in, 109b in group comparisons, 105–107, 106b, 106f, 107f research design of, 11, 11t, 12b, 41 t-test comparison in, 68b Prevalence, 146–147, 147b, 148b, 157 Primary source in evidence evaluation, 31–32 in systematic review, 184 Print awareness, 74, 75b PRISMA. See Preferred Reporting Items for Systematic Reviews and Meta-Analyses Professional organizations, 31 Prolonged engagement, 177 Prospective cohort study, 155, 157, 158, 159t Provocative test, 137–138 Psychometric properties, 128 exercise in, 140 internal consistency of, 133–134, 134b, 135b, 141 inter-rater reliability of, 132–133, 133b responsiveness of, 138–140, 141 sensitivity and specificity of, 136–138, 137b, 137f, 137t, 139b test-retest reliability of, 132, 133b validity in, 134–138, 136f, 139b Psychosis, 175 PsycINFO database, 24t PT in Motion, 25 PT Now, 31 PTSD. See Posttraumatic stress disorder Public Access Policy, 30 Publication bias, 34, 186–189, 190b critical thinking question of, 200 Public Press, 32 PubMed, 24–25, 24t, 25f exercise in, 118–120 search terms for, 27–29, 27f, 28f, 29f, 30, 30b PubMed Central, 24t Purposive sampling, 165 p value, 52, 148b Pygmalion effect, 86t, 91–92 QALY. See Quality-adjusted life year Qualitative research analysis in, 167–168 critical thinking questions of, 56, 179 data collection in, 166–167, 167b, 168f designs of, 168–176, 170t differences to quantitative research, 46–47, 47t ethnography in, 170t, 171–173, 173b exercise in, 49–51, 168, 175 grounded theory in, 170–171, 170t, 172b mixed-methods research in, 175, 176b of multiple sclerosis, 46, 48b narrative in, 170t, 173–174, 174b phenomenology in, 168–170, 170t, 172b philosophy of, 164–165, 166b properties of, 176–178, 177t questions in, 165 selection in, 165, 166b themes of, 46, 167–168 example of, 169b, 171b synthesis of, 193–195, 195b Quality-adjusted life year (QALY), 122 26/10/16 5:00 pm Index Quantitative research, 43–46 differences to qualitative research, 47t exercise in, 49–51 Quasi-experimental study, 41, 113–114, 113b QuickSIN, 114 Quotation, in research, 167, 169b, 171b, 175 RA. See Rheumatoid arthritis Random assignment, 11, 85t, 87–88, 95t Randomized controlled trial (RCT) critical thinking question of, 17 in evidence-based questions, 10, 11, 11t, 13b, 22, 40 in research design, 104–105, 108–110, 111b Random sampling, 95t, 96 Range, 62–64, 63f Rapid Syllable Transition Treatment, 108 Rate ratio (RaR), 212–213 RCT. See Randomized controlled trial Reading articles for evidence, 35–37 References, in evidence evaluation, 37 Reflective practitioner, 3–5 Reflexive journal, 178 Regression analysis, 74, 151–154, 152b, 153b, 154b, 154t. See also Linear regression; Logistic regression Regression to the mean, 86t, 90, 90f Related citations function, 29f Reliability. See also Credibility; Validity critical thinking question of, 141 internal consistency in, 133–134, 134b, 135b, 141 inter-rater conduct in, 132–133, 133b relationship to validity of, 138 of standardized tests, 131–132 test-retest score in, 132, 133b understanding statistics in, 131b, 132b Renfrew Bus Story, 136 Repeated measures analysis of variance, 67t, 69, 71f, 72f Replication to promote generalizability, 97 in scientific method, 3 in systematic review, 186, 190b Research designs ANOVA in, 105, 106b, 112b, 115b, 115f in assessment studies, 13–14, 16 case-control design in, 155, 157–159, 159t for client’s lived experience, 16 control group in, 108, 109b crossover study in, 55, 110, 112b, 123 for efficacy questions, 10–12 evaluation by scale of, 120–122, 121b evidence hierarchy of, 10–13, 10t, 11t, 12b exercise in, 49–51, 114, 118–120, 121–122, 157 factorial-type of, 53, 114, 115b, 116b, 118, 123 nonequivalent control group in, 113–114, 113b in predictive studies, 14–15 pretest-posttest in, 11, 11t, 12b, 41 in qualitative research, 168–176, 170t RCT in, 104–105, 108–110, 111b of single-subject, 117, 118b, 122 Researcher bias. See Experimenter bias Research librarian, 30 Response bias, 150. See also Participants Response rate, 84, 150. See also Sample size Responsiveness measure, 138–140, 141 Results section in evidence evaluation, 36, 37 of systematic review, 185–186, 187b Retrospective cohort study, 117, 119b, 155 as Level III evidence, 158, 159t 4366_Index_225-234.indd 231 231 Retrospective intervention study, 117 Rheumatoid arthritis (RA), 155 Risk ratio (RR), 155–156, 156b, 156t, 157t, 212–213 RMRT. See Route Map Recall Test Rosenthal effect, 86t, 91–92, 122, 124 Rotator cuff injury, 171, 172b Route Map Recall Test (RMRT), 138 RR. See Risk ratio Rush Memory and Aging Project, 159 Russell, Bertrand, 145 r value, 192b, 193t Sackett, David, 2, 3–5 Sample size, 120, 158. See also Effect size Sampling error, 95t, 96 Saturation, in data collection, 170 Scatterplots, 72, 73f Schizophrenia, 14, 54b, 78 Scholarly publication credibility, 33–34 Scientific method, 3 Scoring and measures, 128–129, 129–130 Search strategies, 25–30, 26f, 27f, 28f, 29f, 30b Search terms, 26–29, 26f, 27f, 28f Secondary research. See Systematic review Secondary source evaluation, 32 Selection threat, 85t, 87 Selective coding, 170 Self-reporting issues, 150 Self-Stigma of Stuttering Scale, 134, 134b Sensitivity and specificity, 13–14, 136–138, 137b, 137f, 137t, 139b, 141 Shared decision-making, 5, 33b ADHD in, 209 agreement in, 210 autism in, 207 Common Ground website in, 211 communication in, 206–211, 206b, 207f, 211b components of, 208–211, 211b crafting questions in, 207 critical thinking question of, 16, 213–214 decision aids in, 210–213, 211b in EBP, 204–206, 205f eliciting information in, 207 exercise in, 209–210, 211–213 multiple sclerosis in, 208 Shaw, George, 163 Simple correlation, 150–151, 151f Single-blinded study, 113b Single-subject designs, 117, 118b, 122 Six-Minute Walk test, 140 Skewed distribution, 61, 63f Snowball sampling, 165 Sort by relevance search strategy, 27–29, 28f Spearman correlation, 73, 78t, 136b, 151b Specificity. See Sensitivity and specificity SpeechBite, 24t Speech impairment, 187b Speech recognition, predictors of, 153b Spinal cord injury, 16 SPORTDiscus, 24t SPP. See Student Performance Profile SS-QOL. See Stroke Specific Quality of Life Standard deviation, 62–64, 63f, 64f, 65b Standardized test, 131–132 Statistical conclusion validity, 82–83, 83t exercise in, 84–85 type I and II errors of, 52, 52t, 97, 115b Statistical significance (or Statistically significant), 66, 138 26/10/16 5:00 pm 232 Index Statistics. See also Inferential statistics; Understanding statistics central tendency in, 61–62, 63f confidence intervals in, 76–79 critical thinking questions of, 78–79 effect size in, 70b, 76, 79, 190–192, 191b, 200 exercise in, 64, 71–72 frequency distribution in, 60–61, 62b standard deviation in, 62–64, 63f, 64f, 65b symbols of, 60, 61t variability in, 62–64, 63f, 64f Storytelling, in data collection, 175b STREAM. See Stroke Rehabilitation Assessment of Movement Strength and balance training, 13b Stroke, 3, 16–18, 72, 135, 149b, 192 mirror therapy for, 7, 78 retrospective cohort study of, 119b Stroke Impact Scale, 135 Stroke Rehabilitation Assessment of Movement (STREAM), 135 Stroke Specific Quality of Life (SS-QOL), 72 Student Performance Profile (SPP), 62b Study heterogeneity, 189–190 Study setting selection, 165, 166b Survey research, 149–150 Swaddling, as database search term, 26, 27f Symbols, of statistics, 60, 61t Systematic review, 10–11 abstract in, 185, 186, 188b applying and using, 199 components of, 188b, 190b, 201 critical thinking questions of, 199–200 data analysis in, 190–195, 190f, 191b, 193t, 194b discussion section in, 186 evaluating strength of, 186–190, 190b exercise in, 185 introduction section in, 185 primary source in, 184 replication in, 186, 190b results section in, 185–186, 187b Tai Chi, 97, 208, 212–213, 215 TAP. See Television assisted prompting TBI. See Traumatic brain injury Television assisted prompting (TAP), 122b Test designs. See Research designs Testing threats, 86t, 90–91 Test reliability critical thinking question of, 141 internal consistency in, 133–134, 134b, 135b, 141 of inter-rater conduct, 132–133, 133b relationship to validity of, 138 of standardized tests, 131–132 of test-retest score, 132, 133b understanding statistics in, 131b, 132b Themes, of qualitative research, 46, 167–168 example of, 169b, 171b synthesis of, 193–195, 195b The Wounded Storyteller: Body, Illness and Ethics (Frank), 175b Thick description, 177–178 Third variable problem, 43, 45f Threats, to credibility of research of alternative treatment, 89 of assignment, 85t, 87, 122, 124 of attrition, 87t, 93 4366_Index_225-234.indd 232 of ecological validity, 95t, 96–97 of experimenter bias, 86t, 91–93 of external validity, 95–98, 95t, 97b, 99b of funding bias, 34 of history, 85t, 89–90 of instrumentation, 86t, 91 of internal validity, 85–98, 85t–87t, 88b of maturation, 85t, 88–89, 89b, 122, 124 of mortality, 11b, 87t, 93 of participant bias, 86t, 91–93 of publication bias, 34, 186–189, 190b, 200 of response bias, 150 of selection, 85t, 87 of testing, 86t, 90–91 Thumb osteoarthritis, 204 Timed up and go test (TUG), 93–94 TIS. See Trunk Impairment Scale Title, in evidence evaluation, 35 TMW. See Two-minute walk test Trait error, 131 Transferability, 177–178, 177t Translational research, 49, 51f Traumatic brain injury (TBI), 165 Triangulation, 177 True experiment. See Randomized controlled trial Trunk Impairment Scale (TIS), 104, 105, 107 Trustworthiness, 176–179, 177t t-test, 66, 67t, 68b, 71f, 78, 106b, 115b, 148b TUG. See Timed up and go test Twain, Mark, 81 Two-minute walk test (TMW), 93–94 Type I error, 52, 52t, 97, 115b Type II error, 52, 52t Understanding statistics of ANOVA, 115b of correlation, 136b, 151b of factorial design, 115b of group comparisons, 106b, 148b incidence and prevalence in, 147b of meta-analysis, 192b of reliability, 131b, 132b in research design, 112b of sensitivity and specificity, 137b Unethical practices, 36b Vaccination, 34–35 Validity, 82. See also Credibility; Ecological validity; Reliability of assessment tools, 13 critical thinking questions of, 100, 141 exercise in, 84–85, 93–95, 98 external threats to, 95–98, 95t, 97b, 99b internal threats to, 85–98, 85t–87t, 88b of statistical conclusions, 82–85, 83t types of, 134–138, 136f, 139b understanding statistics in, 136b, 137b Variability, measures of, 62–64, 63f, 64f Variable problem, 43, 45f Variables, 52–55, 54b critical thinking question of, 56 Variance, 75b, 151f. See also Analysis of covariance; Analysis of variance Vatlin, Heinz, 2 Very early mobilization (VEM), 192 26/10/16 5:00 pm Index Video, 166b, 167 Virtual shopping, 149b Vitamin D supplementation, 114, 116b, 212–213 Water, daily intake of, 2 WBV. See Whole body vibration therapy Websites, in evidence credibility, 32, 32b WeeFIM, 14 Western Ontario and McMaster University Osteoarthritis Index (WOMAC), 136 4366_Index_225-234.indd 233 233 Whole body vibration therapy (WBV), 114, 116b Wii Fit training, 113b Within-group comparison, 105–107, 106b, 106f, 107f, 109b Wolf Motor Function Test, 14, 17 WOMAC. See Western Ontario and McMaster University Osteoarthritis Index Woodcock-Johnson Writing Fluency and Writing Samples test, 12b World Confederation of Physical Therapy, 7 Yoga, 191, 191b 26/10/16 5:00 pm