Uploaded by beccaiisip

the-evidence-based-practitioner-applying-research-to-meet-client-needs-0803643667-9780803643666 compress

advertisement
The Evidence-Based
Practitioner: Applying Research
to Meet Client Needs
4366_FM_i-xxii.indd i
27/10/16 2:13 pm
4366_FM_i-xxii.indd ii
27/10/16 2:13 pm
The Evidence-Based
Practitioner: Applying
Research to Meet Client Needs
Catana Brown, PhD, OTR/L, FAOTA
Midwestern University
Department of Occupational Therapy
Glendale, Arizona
4366_FM_i-xxii.indd iii
27/10/16 2:13 pm
F. A. Davis Company
1915 Arch Street
Philadelphia, PA 19103
www.fadavis.com
Copyright © 2017 by F. A. Davis Company
Copyright © 2017 by F. A. Davis Company. All rights reserved. This product is protected by copyright. No part of it may be reproduced, stored
in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without written
permission from the publisher.
Printed in the United States of America
Last digit indicates print number: 10 9 8 7 6 5 4 3 2 1
Senior Acquisitions Editor: Christa A. Fratantoro
Director of Content Development: George W. Lang
Developmental Editor: Nancy J. Peterson
Content Project Manager: Julie Chase
Art and Design Manager: Carolyn O’Brien
As new scientific information becomes available through basic and clinical research, recommended treatments and drug therapies undergo changes.
The author(s) and publisher have done everything possible to make this book accurate, up to date, and in accord with accepted standards at the time
of publication. The author(s), editors, and publisher are not responsible for errors or omissions or for consequences from application of the book, and
make no warranty, expressed or implied, in regard to the contents of the book. Any practice described in this book should be applied by the reader
in accordance with professional standards of care used in regard to the unique circumstances that may apply in each situation. The reader is advised
always to check product information (package inserts) for changes and new information regarding dose and contraindications before administering
any drug. Caution is especially urged when using new or infrequently ordered drugs.
Library of Congress Cataloging-in-Publication Data
Names: Brown, Catana, author.
Title: The evidence-based practitioner : applying research to meet client
needs / Catana Brown.
Description: Philadelphia : F.A. Davis Company, [2017] | Includes
bibliographical references and index.
Identifiers: LCCN 2016046032 | ISBN 9780803643666 (pbk. : alk. paper)
Subjects: | MESH: Occupational Therapy | Evidence-Based Practice | Physical
Therapy Modalities | Speech Therapy | Language Therapy | Problems and
Exercises
Classification: LCC RM735.3 | NLM WB 18.2 | DDC 615.8/515—dc23 LC record available at https://lccn.loc.gov/2016046032
Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by F. A. Davis Company for
users registered with the Copyright Clearance Center (CCC) Transactional Reporting Service, provided that the fee of $.25 per copy is paid directly
to CCC, 222 Rosewood Drive, Danvers, MA 01923. For those organizations that have been granted a photocopy license by CCC, a separate system
of payment has been arranged. The fee code for users of the Transactional Reporting Service is: 978080364366/17 0 + $.25.
4366_FM_i-xxii.indd iv
27/10/16 2:13 pm
For Lauren—
You’re an astonishing teacher, despite the fact that you won’t read this book.
But then I probably won’t read your opera book, either.
CB
4366_FM_i-xxii.indd v
27/10/16 2:13 pm
4366_FM_i-xxii.indd vi
27/10/16 2:13 pm
Foreword
The introduction of evidence-based medicine by David
Sackett and other researchers in the 1990s (Sackett,
1997) initiated a radical shift in the approach to instruction in research methods and the application of
research findings to health-care practice. Until then,
practitioners learned about research through standard
academic research methods courses in which they were
taught to read and critique journal articles using the
well-established criteria of reliability and validity. They
were then expected to use those skills to “keep up” with
the research literature relevant to their area of practice
and apply the results to patient care. Unfortunately, for
the most part, they didn’t.
Sackett and his colleagues determined that the traditional approach to applying research to practice was
ineffective, and they proposed a radically different
approach—what we now recognize as evidence-based
practice. What was so different? Sackett and colleagues
recognized that research was relevant and useful to the
practitioner only to the extent that it addressed a clinical
question of importance to practice and provided a useful
guide to clinical decision-making. From this perspective, reading journal articles just to “keep current” and
without a particular question in mind was unfocused and
unproductive.
The alternative method they proposed taught practitioners to use research evidence as one of three integral
components of clinical reasoning and decision-making.
This method is reflected in the now-familiar definition of
evidence-based practice: integration of the clinician’s expertise and the best available scientific evidence with the client’s preferences and values to determine an appropriate
course of action in a clinical encounter.
To support the use of evidence-based practice as an
integral part of clinical reasoning, a different method of
instruction was developed, which is exemplified in The
Evidence-Based Practitioner: Applying Research to Meet
Client Needs. Evidence-based practice (EBP) is a process to be learned, not a content area to be mastered
the way we learn to identify the bones of the body or
the details of a particular assessment. Although it does
require learning about research methods and design,
measurement, and statistics, this knowledge is mastered
in the context of appraising evidence in relation to a
particular clinical question regarding a particular clinical scenario. The EBP process involves a specific set of
steps to formulate an answerable question, and then to
search, select, appraise, and apply the evidence to answer the clinical decision at hand. Ideally, students will
have multiple opportunities to practice these steps so
that ultimately the process can be initiated and carried
out smoothly and efficiently in occupational therapy
practice.
One of the valuable features of this text is that it is
designed to be used with team-based learning. This approach supports another important element of Sackett’s
(Sackett, 1997) and others’ original recommendations for
how to conduct EBP: that is, the importance of distributing the work and learning from one another’s insights.
Team-based learning models a method that can be carried
forward into the “real world” to continue to implement
EBP in practice.
Here’s what this can look like: Each of the five practitioners in a group prepares and shares an appraisal of one
key study that addresses a clinical question of importance
to the group. In less than an hour of discussion, the group
synthesizes the findings and reaches a decision on the best
answer (known as the “clinical bottom line” in EBP) to a
clinical question at hand. One busy practitioner working
alone might find that amount of work daunting. In addition, he or she would miss the crucial insights that other
group participants provide.
There’s another important advantage to team-based
EBP: it’s much more fun. Group members energize one
another, and examining the evidence becomes an interesting exploration and lively discussion of how best to
balance strengths and limitations, clinical relevance and
feasibility, and similarities and differences in the evidence.
The outcome of that lively discussion will help ensure
that your clinical decisions are guided by the best evidence available to help your clients.
vii
4366_FM_i-xxii.indd vii
27/10/16 2:13 pm
viii
Foreword
In The Evidence-Based Practitioner: Applying Research to
Meet Client Needs, Catana Brown provides occupational
therapy, physical therapy, and speech-language pathology
students with a clear and concise overview of research designs, methodology, use of statistical analysis, and levels
of evidence, as well as the tools with which to evaluate
and apply evidence. Interesting and engaging features
such as From the Evidence lead the readers through the
steps to becoming effective consumers of evidence. Exercises and Critical Thinking Questions motivate learners
to explore how this knowledge can be applied to their
clinical practice.
4366_FM_i-xxii.indd viii
I hope that you will approach learning EBP as a great
adventure and that you and your fellow students make
exciting discoveries.
Wendy Coster, PhD, OTR/L FAOTA
Professor and Chair, Department of Occupational Therapy
Director, Behavior and Health Program
Boston University
Boston, Massachusetts, USA
Sackett, D. L. (1997). Evidence-based medicine: How to
practice and teach EBM. New York/Edinburgh: Churchill
Livingstone.
27/10/16 2:13 pm
Preface
Evidence-based practice is no longer a new idea: it’s
a mandate from third-party payers, accrediting bodies, health-care institutions, and clients. Although the
majority of therapists will become practitioners and
consumers of research rather than academic researchers, good consumers of research must still understand
how a study is put together and how to analyze the results. Occupational therapists, physical therapists, and
speech-language pathologists are expected to use evidence when discussing intervention options with clients
and their families, and when making clinical decisions.
The skills required to be an effective evidence-based
practitioner are complex; for many therapists, finding
and reviewing research is considered a daunting or
tedious endeavor. In addition, evidence-based practice
is still new enough that many working therapists were
not trained in the methods, and some work settings have
not yet adopted a culture of evidence-based practice
that provides sufficient resources.
GUIDING PRINCIPLE: CONSUMING
VS. CONDUCTING RESEARCH
The Evidence-Based Practitioner: Applying Research to Meet
Client Needs is designed for entry-level graduate students
in occupational therapy, physical therapy, and speechlanguage pathology, particularly those in courses that
focus on evidence-based practice versus the performance
of research. Its emphasis is on providing therapists with
the knowledge and tools necessary to access evidence, critique its strength and applicability, and use evidence from
all sources (i.e., research, the client, and clinical experience)
to make well-informed clinical decisions.
This textbook was designed with multiple features
that allow students and practitioners not only to acquire knowledge about evidence-based practice, but also
to begin to apply that knowledge in the real world.
Numerous examples and excerpts of published journal
articles from occupational therapy, physical therapy, and
speech-language pathology are used throughout the text.
In addition to learning about evidence-based practice,
students are exposed to research in their own disciplines
and the disciplines of their future team members.
The text contains 11 chapters and is intended to fit
within a single entry-level course in a health-care program. It will fit ideally into programs offering a course on
evidence-based practice, and can also be used to support a
traditional research methods text in research courses that
cover evidence-based practice.
The content of the initial chapters focuses on explaining basic research concepts, including describing qualitative and quantitative approaches. A separate chapter on
statistics is included in this introductory material. Subsequent chapters explain the different designs used in healthcare research, including separate chapters for each of the
following types of research: intervention, assessment,
descriptive/predictive, and qualitative, as well as a chapter
on systematic reviews. These chapters prepare students to
match their own evidence-based questions with the correct type of research. In addition, students will acquire the
knowledge and skills necessary to understand research articles, including those aspects of the research article that can
be particularly befuddling: statistics, tables, and graphs.
Importantly, the chapters provide students with an understanding of how to evaluate the quality of research studies.
The text ends with a chapter on integrating evidence from
multiple sources, which highlights the importance of involving clients and families in the decision-making process
by sharing the evidence.
A TEAM-BASED LEARNING WORKTEXT
This text uses a unique team-based learning (TBL)
approach. TBL is a specific instructional strategy that
facilitates the type of learning that helps students solve
problems. It is a method that requires active involvement
of the student in the learning process from the outset.
Ideally, students work in small teams, using methods that
enhance accountability for both individual and team work;
this can result in a deeper level of understanding that is
more relevant to real-life practice. Still, this textbook is
useful for all types of instructional strategies and is relevant even with approaches that do not use a TBL format.
Nevertheless, TBL provides the pedagogy for applying
information, and therefore one strength of this text is its
emphasis on application.
ix
4366_FM_i-xxii.indd ix
27/10/16 2:13 pm
x
Preface
To facilitate application, the text is presented as a
worktext that interweaves narrative with exercises, critical thinking questions, and other means of engaging
students and helping them comprehend the information. When appropriate, answers to these questions are
included at the end of the chapter. An advantage of the
worktext approach is that it gets students engaged with
the material from the beginning. In courses that use a
TBL format, the worktext prepares students to be effective team members.
TERMINOLOGY
A challenging aspect of evidence-based practice for students and instructors alike is terminology. In fact, this was
one of the greatest challenges for the author of this text.
In evidence-based practice, several different terms can be
used to describe the same or similar concepts. Making
matters more difficult, there are several issues with terminology that can make deciphering the research literature
perplexing. For example:
• Different terms are used to describe the same or similar concepts.
• There are disagreements among experts as to the proper
use of some terms.
• Terms are used incorrectly, even in peer-reviewed
articles.
• Labels and terms are sometimes omitted from research
articles.
Because deciphering research terminology is challenging, a significant effort was directed toward using
the most common terms that are likely to appear in the
literature. When multiple terms are routinely used, this
is explained in the text. For example, what some call a
nonrandomized controlled trial may be described by others as a quasi-experimental study.
Due to the challenges with terminology, students
need to read actual articles and excerpts of articles during
the learning process so that these terminology issues can
become transparent. When students have a more thorough understanding of a concept and the terms involved,
they can better interpret the idiosyncrasies of individual
articles.
Fortunately many journals are creating standard formats for reporting research, and with time some terminology issues will be resolved, although differences in
opinion and disciplines (e.g., school-based practice vs.
medicine) will likely continue to exist.
SPECIAL FEATURES
The special features developed for this text will enable
students to better understand content, develop the
4366_FM_i-xxii.indd x
advanced skills needed for assessing the strength and
applicability of evidence, and apply the material to practice. The Evidence-Based Practitioner: Applying Research to
Meet Client Needs includes several special features.
Key Terms
An alphabetical list of key terms appears at the beginning
of each chapter. These terms are also bolded where they
are first described in the chapter and fully defined in the
end-of-book glossary.
From the Evidence
Students often have trouble applying research concepts
to reading a research article. This key feature helps students make the link by providing real-world examples
from research articles in occupational therapy, physical therapy, and speech-language pathology. From the
Evidence visually walks the student through graphic examples such as abstracts, tables, and figures to illustrate
key concepts explained in the chapter. Arrows and text
boxes are used to point out and elucidate the concept
of interest.
From the Evidence features are included in each
chapter. Each has at least one corresponding question to
ensure that the student fully understands the material.
Answers to these questions are provided at the end of
each chapter.
Exercises
Exercises are distributed throughout the chapters to help
students learn to apply information in context. In TBL
courses, the exercises are intended to prepare students for
the in-class team assignments; similarly, in flipped classrooms, students would complete the exercises at home
and arrive at class prepared for discussions and activities.
Each exercise is tied directly to a Learning Outcome and
includes questions requiring students to apply the knowledge acquired in the chapter. There is space in the text for
the student to complete the exercise, and the answers are
provided at the end of the chapter.
Understanding Statistics
After Chapter 4, “Understanding Statistics: What They
Tell You and How to Apply Them in Practice,” the
Understanding Statistics feature is included in chapters
in which specific statistical procedures are described.
Understanding Statistics boxes provide an example of a
statistic with additional explanation to reinforce information that is typically challenging. The feature also helps
put the information in context for students by associating
the statistic with a specific research design.
27/10/16 2:13 pm
Preface
Evidence in the Real World
The Evidence in the Real World feature uses a storytelling or case scenario approach to demonstrate how
theoretical research concepts apply to real-life practice.
It serves as another method of demystifying research
concepts—such as how the concept of standard deviations can be used to understand the autism spectrum—
and showing students the relevance/practical application
of what they are learning.
Critical Thinking Questions
Each chapter ends with Critical Thinking Questions.
These questions require higher-level thinking and serve
4366_FM_i-xxii.indd xi
xi
as prompts for students to evaluate their comprehension
of the chapter concepts.
CLOSING THOUGHTS
In today’s health-care environment, occupational therapists, physical therapists, and speech-language pathologists must be proficient in accessing, critiquing, and
applying research in order to be effective evidence-based
practitioners. With solid foundational information and
engaging application exercises, this text provides the
framework for developing the evidence-based practice
skills that allow practitioners to best meet their clients’
needs.
27/10/16 2:13 pm
4366_FM_i-xxii.indd xii
27/10/16 2:13 pm
Acknowledgment
Although it is now widely valued, evidence-based practice
is not the favorite topic of most rehabilitation therapy students. When I began this process, I knew that I wanted a
very different sort of textbook that would require students
to actively engage with the material; hence, the use of a
team-based learning format. However, doing something
different required a lot of help along the way.
First, I would like to acknowledge the fantastic editorial support provided by F.A. Davis. In particular I would
like to thank Christa Fratantoro, the acquisitions editor,
who grasped my vision for a new evidence-based textbook
and believed in my ability to pull it off. I appreciate her
friendship and backing. Nancy Peterson, developmental
editor extraordinaire, was with me through every step of
the process. All the things that are good about this text are
better because of Nancy. In addition, Nancy is my sounding board, my counselor, motivator, and guide.
I owe a debt of gratitude to the occupational therapy
and physical therapy students at Midwestern University–
Glendale in Arizona, who used different variations of the
rough drafts of the text and provided invaluable feedback,
resulting in the addition, clarification, and improvement
of the content. I would especially like to thank Morgan
Lloyd, who helped me with some of the content that was
the most difficult to explain.
Larry Michaelsen, who developed the team-based
learning approach, inspired me to try a new way of teaching, which ultimately led to my insight that a new type of
textbook was needed. Furthermore, I would like to thank
Bill Roberson and Larry Michaelsen for contributing a
marvelous team-based learning primer as part of the instructor resources.
Finally, a big thanks to those who offered support, both
professional and personal, providing me with the time,
space, and encouragement to make this text a reality. This
includes my chair, Chris Merchant; my husband, Alan
Berman; and my friend, Bob Gravel.
Catana Brown, PhD, OTR/L, FAOTA
xiii
4366_FM_i-xxii.indd xiii
27/10/16 2:13 pm
4366_FM_i-xxii.indd xiv
27/10/16 2:13 pm
Reviewers
Evelyn Andersson, PhD, OTR/L
Sharon Gutman, PhD, OTR, FAOTA
Associate Professor
School of Occupational Therapy
Midwestern University
Glendale, AZ
Associate Professor
Programs in Occupational Therapy
Columbia University
New York, NY
Suzanne R. Brown, PhD, MPH, PT
Elisabeth L. Koch, MOT, OTR/L
Educational Consultant
Mesa, AZ
Faculty and Clinical Coordinator
Occupational Therapy Assistant Program
Metropolitan Community College of Kansas City–Penn
Valley, Health Science Institute
Kansas City, MO
April Catherine Cowan, OTR, OTD, CHT
Assistant Professor
Occupational Therapy
The University of Texas Medical Branch
Galveston, TX
Denise K. Donica, DHS, OTR/L, BCP
Associate Professor, Graduate Program Director
Occupational Therapy
East Carolina University
Greenville, NC
Marc E. Fey, PhD, CCC-SLP
Professor
Department of Hearing and Speech
University of Kansas Medical Center
Kansas City, KS
Teresa Plummer, PhD, OTR/L, CAPS, ATP
Assistant Professor
School of Occupational Therapy
Belmont University
Nashville, TN
Patricia J. Scott, PhD, MPH, OT, FAOTA
Associate Professor
Occupational Therapy
Indiana University
Indianapolis, IN
Thomas F. Fisher, PhD, OTR, CCM, FAOTA
Professor and Chair
Occupational Therapy
Indiana University
Indianapolis, IN
xv
4366_FM_i-xxii.indd xv
27/10/16 2:13 pm
4366_FM_i-xxii.indd xvi
27/10/16 2:13 pm
Contents in Brief
Chapter
1
Evidence-Based Practice: Why Do
Practitioners Need to Understand
Research?
Chapter
59
5
Validity: What Makes a Study Strong?
Chapter
Chapter
6
Choosing Interventions for
Practice: Designs to Answer Efficacy
Questions
103
163
10
Tools for Practitioners That Synthesize
the Results of Multiple Studies:
Systematic Reviews and Practice
Guidelines
183
Chapter
81
145
9
Qualitative Designs and Methods:
Exploring the Lived Experience
39
127
8
Descriptive and Predictive Research
Designs: Understanding Conditions
and Making Clinical Predictions
Chapter
4
Understanding Statistics: What They
Tell You and How to Apply Them in
Practice
Chapter
21
7
Using the Evidence to Evaluate
Measurement Studies and Select
Appropriate Tests
Chapter
3
Research Methods and Variables:
Creating a Foundation for Evaluating
Research
Chapter
1
2
Finding and Reading Evidence:
The First Steps in Evidence-Based
Practice
Chapter
Chapter
11
Integrating Evidence From Multiple
Sources: Involving Clients and
Families in Decision-Making
203
Glossary
217
Index
225
xvii
4366_FM_i-xxii.indd xvii
27/10/16 2:13 pm
4366_FM_i-xxii.indd xviii
27/10/16 2:13 pm
Contents
Chapter
1
Chapter
2
Evidence-Based Practice: Why Do
Practitioners Need to Understand
Research? 1
Finding and Reading Evidence: The First
Steps in Evidence-Based Practice 21
INTRODUCTION
IDENTIFYING DATABASES
INTRODUCTION
2
WHAT IS EVIDENCE-BASED PRACTICE?
2
EMPLOYING SEARCH STRATEGIES
6
THE PROCESS OF EVIDENCE-BASED PRACTICE
7
Formulate a Question Based on a Clinical
Problem 7
Identify the Relevant Evidence 7
Evaluate the Evidence 7
Implement Useful Findings 8
Evaluate the Outcomes 8
WRITING AN EVIDENCE-BASED QUESTION
ANSWERS
17
REFERENCES
18
16
25
Selecting Key Words and Search Terms 26
Combining Terms and Using Advanced Search
Using Limits and Filters 27
Expanding Your Search 29
ACCESSING THE EVIDENCE
26
29
The Research Librarian 30
Professional Organizations 31
9
Questions on Efficacy of an Intervention 9
Research Designs for Efficacy Questions
and Levels of Evidence 10
Questions for Usefulness of an Assessment 13
Research Designs Used in Assessment
Studies 13
Questions for Description of a Condition 14
Research Designs Used in Descriptive Studies 14
Questions for Prediction of an Outcome 14
Research Designs Used in Predictive Studies 14
Questions About the Client’s Lived
Experience 15
Research Designs Addressing the Client’s Lived
Experience 16
CRITICAL THINKING QUESTIONS
22
PubMed 24
Cumulative Index of Nursing and Allied Health
Literature 25
Cochrane Database of Systematic Reviews 25
External Scientific Evidence 3
Practitioner Experience 3
Client Situation and Values 5
WHY EVIDENCE-BASED PRACTICE?
22
DETERMINING THE CREDIBILITY OF A SOURCE
OF EVIDENCE 31
Websites 32
The Public Press/News Media 32
Scholarly Publications 33
Impact Factor 33
The Peer-Review Process 33
Research Funding Bias 34
Publication Bias 34
Duplicate Publication 34
READING A RESEARCH ARTICLE
35
Title 35
Authorship 35
Abstract 35
Introduction 35
Methods 35
Results 36
Discussion 37
xix
4366_FM_i-xxii.indd xix
27/10/16 2:13 pm
xx
Contents
References 37
Acknowledgments 37
CRITICAL THINKING QUESTIONS
ANSWERS
37
38
REFERENCES
Chapter
Inferential Statistics for Analyzing Relationships 72
Scatterplots for Graphing Relationships 72
Relationships Between Two Variables 73
Relationship Analyses With One Outcome
and Multiple Predictors 74
Logistic Regression and Odds Ratio 74
38
EFFECT SIZE AND CONFIDENCE INTERVALS
3
CRITICAL THINKING QUESTIONS
Research Methods and Variables:
Creating a Foundation for Evaluating
Research 39
INTRODUCTION
40
TYPES OF RESEARCH
HYPOTHESIS TESTING: TYPE I AND TYPE II
ERRORS 52
52
CRITICAL THINKING QUESTIONS
Chapter
Chapter
80
5
INTRODUCTION
VALIDITY
82
82
STATISTICAL CONCLUSION VALIDITY
82
Threats to Statistical Conclusion Validity 82
Fishing 83
Low Power 83
85
Threats to Internal Validity 85
Assignment and Selection Threats 85
Maturation Threats 88
History Threats 89
Regression to the Mean Threats 90
Testing Threats 90
Instrumental Threats 91
Experimenter and Participant Bias Threats 91
Attrition/Mortality Threats 93
55
56
REFERENCES
REFERENCES
INTERNAL VALIDITY
Independent Variables 52
Dependent Variables 53
Control Variables 53
Extraneous Variables 53
ANSWERS
79
Validity: What Makes a Study Strong? 81
40
Experimental Research 40
Nonexperimental Research 41
Quantitative Research 43
Qualitative Research 46
Cross-Sectional and Longitudinal Research 47
Basic and Applied Research 48
VARIABLES
ANSWERS
57
EXTERNAL VALIDITY
4
95
Threats to External Validity 95
Sampling Error 96
Ecological Validity Threats 96
Understanding Statistics: What They
Tell You and How to Apply Them
in Practice 59
INTERNAL VERSUS EXTERNAL VALIDITY
INTRODUCTION
ANSWERS
60
SYMBOLS USED WITH STATISTICS
DESCRIPTIVE STATISTICS
60
60
Chapter
97
100
100
102
6
Choosing Interventions for
Practice: Designs to Answer Efficacy
Questions 103
65
Statistical Significance 66
Inferential Statistics to Analyze Differences
The t-test 66
Analysis of Variance 66
Analysis of Covariance 69
4366_FM_i-xxii.indd xx
CRITICAL THINKING QUESTIONS
REFERENCES
60
Frequencies and Frequency Distributions
Measure of Central Tendency 61
Measures of Variability 62
INFERENTIAL STATISTICS
76
78
66
INTRODUCTION
104
RESEARCH DESIGN NOTATION
104
BETWEEN- AND WITHIN-GROUP COMPARISONS
105
27/10/16 2:13 pm
Contents
RESEARCH DESIGNS FOR ANSWERING EFFICACY
QUESTIONS 107
Designs Without a Control Group 108
Randomized Controlled Trials 108
Crossover Designs 110
Nonrandomized Controlled Trials 110
Factorial Designs 114
Single-Subject Designs 117
Retrospective Intervention Studies 117
SAMPLE SIZE AND INTERVENTION
RESEARCH 120
USING A SCALE TO EVALUATE THE STRENGTH
OF A STUDY 120
COST EFFECTIVENESS AS AN OUTCOME
CRITICAL THINKING QUESTIONS
ANSWERS
122
123
REFERENCES
Chapter
122
125
7
Using the Evidence to Evaluate
Measurement Studies and Select
Appropriate Tests 127
INTRODUCTION
128
TYPES OF SCORING AND MEASURES
128
Continuous Versus Discrete Data 128
Norm-Referenced Versus Criterion-Referenced
Measures 129
Norm-Referenced Measures 129
Criterion-Referenced Measures 130
TEST RELIABILITY
131
ANSWERS
141
REFERENCES
4366_FM_i-xxii.indd xxi
142
INTRODUCTION
146
DESCRIPTIVE RESEARCH FOR UNDERSTANDING
CONDITIONS AND POPULATIONS 146
Incidence and Prevalence Studies 146
Group Comparison Studies 147
Survey Research 149
STUDY DESIGNS TO PREDICT AN OUTCOME
150
Predictive Studies Using Correlational
Methods 150
Simple Prediction Between Two Variables 150
Multiple Predictors for a Single Outcome 151
Predictive Studies Using Group Comparison
Methods 155
Case-Control Studies 155
Cohort Studies 155
EVALUATING DESCRIPTIVE AND PREDICTIVE
STUDIES 157
LEVELS OF EVIDENCE FOR PROGNOSTIC
STUDIES 158
CRITICAL THINKING QUESTIONS
ANSWERS
160
REFERENCES
Chapter
159
161
9
164
THE PHILOSOPHY AND PROCESS OF QUALITATIVE
RESEARCH 164
Philosophy 164
Research Questions 165
Selection of Participants and Settings 165
Methods of Data Collection 166
Data Analysis 167
Construct Validity 135
Sensitivity and Specificity 136
Relationship Between Reliability
and Validity 138
138
CRITICAL THINKING QUESTIONS
Descriptive and Predictive Research
Designs: Understanding Conditions
and Making Clinical Predictions 145
INTRODUCTION
134
RESPONSIVENESS
8
Qualitative Designs and Methods:
Exploring the Lived Experience 163
Standardized Tests 131
Test-Retest Reliability 132
Inter-Rater Reliability 132
Internal Consistency 133
TEST VALIDITY
Chapter
xxi
141
QUALITATIVE RESEARCH DESIGNS
168
Phenomenology 168
Grounded Theory 170
Ethnography 171
27/10/16 2:13 pm
xxii
Contents
Narrative 173
Mixed-Method Research
175
PROPERTIES OF STRONG QUALITATIVE STUDIES
176
CRITICAL THINKING QUESTIONS
Credibility 177
Transferability 177
Dependability 178
Confirmability 178
ANSWERS
Chapter
180
REFERENCES
Chapter
179
10
Tools for Practitioners That Synthesize
the Results of Multiple Studies: Systematic
Reviews and Practice Guidelines 183
184
SYSTEMATIC REVIEWS
184
Finding Systematic Reviews 184
Reading Systematic Reviews 185
Evaluating the Strength of Systematic Reviews
Replication 186
Publication Bias 186
Heterogeneity 189
DATA ANALYSIS IN SYSTEMATIC REVIEWS
Meta-Analyses 190
Qualitative Thematic Synthesis
PRACTICE GUIDELINES
11
190
186
INTRODUCTION
204
CHILD-CENTERED PRACTICE
204
SHARED DECISION-MAKING
204
EDUCATION AND COMMUNICATION
206
Components of the Process 208
People Involved 208
Engaging the Client in the Process 208
Consensus Building 208
Agreement 210
Decision Aids 210
Content 210
Resources for Shared Decision-Making 210
CRITICAL THINKING QUESTIONS
ANSWERS
193
213
214
REFERENCES
215
Glossary
217
195
Finding Practice Guidelines 197
Evaluating the Strength of Practice
Guidelines 198
4366_FM_i-xxii.indd xxii
201
Integrating Evidence From Multiple
Sources: Involving Clients and Families
in Decision-Making 203
180
INTRODUCTION
199
200
REFERENCES
CRITICAL THINKING QUESTIONS
ANSWERS
THE COMPLEXITIES OF APPLYING AND USING
SYSTEMATIC REVIEWS AND PRACTICE
GUIDELINES 199
Index
225
27/10/16 2:13 pm
“Facts are stubborn things; and whatever may be our wishes, our inclinations,
or the dictates of our passions, they cannot alter the state of facts and evidence.”
—John Adams, second President of the United States
1
Evidence-Based Practice
Why Do Practitioners Need
to Understand Research?
CHAPTER OUTLINE
LEARNING OUTCOMES
KEY TERMS
INTRODUCTION
WHAT IS EVIDENCE-BASED PRACTICE?
External Scientific Evidence
Practitioner Experience
Client Situation and Values
WHY EVIDENCE-BASED PRACTICE?
THE PROCESS OF EVIDENCE-BASED
PRACTICE
Formulate a Question Based on a Clinical
Problem
Identify the Relevant Evidence
WRITING AN EVIDENCE-BASED QUESTION
Questions on Efficacy of an Intervention
Research Designs for Efficacy Questions and Levels
of Evidence
Questions for Usefulness of an Assessment
Research Designs Used in Assessment Studies
Questions for Description of a Condition
Research Designs Used in Descriptive Studies
Questions for Prediction of an Outcome
Research Designs Used in Predictive Studies
Questions About the Client’s Lived Experience
Research Designs Addressing the Client’s Lived
Experience
Evaluate the Evidence
CRITICAL THINKING QUESTIONS
Implement Useful Findings
ANSWERS
Evaluate the Outcomes
REFERENCES
LEARNING OUTCOMES
1. Identify the three sources of evidence, including what each source contributes to evidence-based decision-making.
2. Apply an evidence-based practice hierarchy to determine the level of evidence of a particular research study.
3. Describe the different types of research questions and the clinical information that each type of question elicits
for therapists.
1
4366_Ch01_001-020.indd 1
28/10/16 2:30 pm
2
CHAPTER 1 ● Evidence-Based Practice
KEY TERMS
client-centered practice
random assignment
control
randomized controlled
trial
critically appraised
paper
reflective practitioner
cross-sectional research
reliability
evidence-based practice
replication
incidence
scientific method
internal validity
sensitivity
levels of evidence
shared decision-making
longitudinal research
specificity
PICO
systematic review
prevalence
validity
INTRODUCTION
“H
ow much water should you drink every day?” Most
of us have heard, read, or even adhered to the recommendation that adults should drink at least eight glasses
of 8 ounces of water each day (abbreviated as “8 ⫻ 8”), with
caffeinated beverages not counting toward the total. Is this
widely accepted recommendation based on scientific evidence? Heinz Vatlin (2002) examined the research, consulted
with specialists in the field, and found no evidence to support the 8 ⫻ 8 advice. In fact, studies suggested that such
large amounts of water are not needed for healthy, sedentary
adults and revealed that caffeinated drinks are indeed useful
for hydration.
The 8 ⫻ 8 recommendation is an example of practice that is not supported by research, or “evidence.”
Such practices even creep into our professions. No doubt
there are practices that rehabilitation professionals have
adopted and accepted as fact that, although they are not as
well-known as the 8 ⫻ 8 adage, are also ingrained in
practice—despite the fact that they are not supported
by evidence.
Let’s look at an example: For decades, the recommended
treatment for acute low back pain was bedrest, typically
for 2 days with no movement other than toileting and eating. A Finnish study examined this recommendation in a
well-designed, randomized controlled trial that compared
2 days of bedrest with back extension exercises and ordinary activity (Malmiraara et al, 1995). The study found the
best results with ordinary activity. Subsequent research
confirmed this finding, or at least found that staying active
was as effective as bedrest for treating low back pain, and
4366_Ch01_001-020.indd 2
had obvious advantages associated with less disruption of
daily life (Dahm, Brurberg, Jamtveat, & Hagen, 2010).
Without the research evidence, the recommendation
for bedrest may have been difficult to challenge; bedrest
did eventually ameliorate low back pain, so clinical and
client experience suggested a positive outcome. Only
through testing of alternatives was the accepted standard
challenged.
Questioning what we do every day as health-care practitioners, and making clinical decisions grounded in science, is
what evidence-based practice (EBP) is all about. However,
the use of scientific evidence is limited; clinical decisions are
made within the context of a clinician’s experience and an individual client’s situation. Any one profession will never have
a suitable number of relevant studies with adequate reliability and validity to answer all practice questions. However,
the process of science is a powerful self-correcting resource.
With the accumulation of research, clinicians can continually
update their practice knowledge and make better clinical
decisions so that clients are more likely to achieve positive
results.
Evidence-based practitioners are reflective and able to
articulate what is being done and why. In evidence-based
practice, decisions are not based on hunches, “the way it
has always been done,” or what is easiest or most expedient. Rather, in evidence-based practice, the therapist’s
clinical decisions and instructions can be explained, along
with their rationale; evidence-based practice is explicit by
nature.
This chapter provides an introduction to evidencebased practice. Topics such as sources of evidence, the
research process, and levels of evidence are discussed
so that the reader can understand the larger context in
which evidence-based practice takes place. These topics
are then explored in greater detail in subsequent chapters. This chapter focuses on the what, why, and how of
evidence-based practice: What is evidenced-based practice? Why is evidence-based practice a “best practice”? How
do practitioners integrate evidence into their practice?
WHAT IS EVIDENCE-BASED PRACTICE?
Evidence-based practice in rehabilitation stems from
evidence-based medicine. David Sackett, a pioneer
of evidence-based medicine, and his colleagues provided
the following widely cited definition: “Evidence based
medicine is the conscientious, explicit and judicious use
of current best evidence in making decisions about the
care of individual patients” (Sackett, Rosenberg, Gray,
Haynes, & Richardson, 1996, p. 71).
Evidence-based practice requires an active exchange
between researchers and clinicians (Thomas, Saroyan,
& Dauphinee, 2011). Researchers produce fi ndings
with clinical relevance and disseminate those findings
through presentations and publications. Clinicians then
28/10/16 2:30 pm
CHAPTER 1
use this information in the context of their practice
experience. The researchers’ findings may be consistent or inconsistent with the clinician’s experience and
understanding of a particular practice question. Reflective practitioners capitalize on the tension that exists
between research findings and clinical experience to
expand their knowledge.
In addition, from a client-centered practice perspective,
the values and experience of the clients, caregivers, and
family are essential considerations in the decision-making
process. Thus, evidence-based practice is a multifaceted
endeavor comprising three components (Fig. 1-1):
1. The external scientific evidence
2. The practitioner’s experience
3. Client/family situation and values
Each source of evidence provides information for clinical decision-making. The best decisions occur when all
three sources are considered.
External Scientific Evidence
External scientific evidence is a component of evidencebased practice that arises from research. Typically therapists obtain scientific evidence from research articles
published in scientific journals. The scientific evidence
provides clinicians with objective information that can
be applied to clinical problems. In the development
of this source of evidence, researchers use the scientific method to attempt to remove bias by designing
a well-controlled study, objectively collecting data, and
Practitioner
experience
External
evidence
Client’s
situation
and values
●
Evidence-Based Practice
3
using sound statistical analysis to answer research questions. The steps of the scientific method include:
1.
2.
3.
4.
5.
Asking a question
Gathering information about that question
Formulating a hypothesis
Testing the hypothesis
Examining and reporting the evidence
From the Evidence 1-1 provides an example of the
scientific method from a critically appraised paper. A
critically appraised paper selects a published research
study, critiques the study, and interprets the results for
practitioners. In this paper, Dean (2012) summarizes a
study by van de Port, Wevers, Lindeman, and Kwakkel
(2012) to answer a question regarding the efficacy of a
group circuit training intervention compared with individualized physical therapy for improving mobility after
stroke. Of particular interest is the efficiency of using
a group versus an individual approach. As a clinician,
you might have a question about group versus individual treatment, and this external scientific evidence can
help you answer your clinical question. The hypothesis, although not explicitly stated, is implied: The group
circuit training will be as effective as individual physical therapy. The hypothesis is tested by comparing two
groups (circuit training and individual physical therapy)
and measuring the outcomes using several assessments.
Although slight differences exist between the two groups
at some time points, the overall conclusion was that the
group circuit training was as effective as individual physical therapy.
Although this example provides evidence from a single
study that uses a strong randomized controlled trial design (described in Chapter 6), all studies have limitations.
The results of a single study should never be accepted as
proof that a particular intervention is effective. Science
is speculative, and findings from research are not final.
This concept speaks to another important characteristic of the scientific method: replication. The scientific
method is a gradual process that is based on the accumulation of results. When multiple studies produce similar
findings, as a practitioner you can have more confidence
that the results are accurate or true. Later in this chapter, the hierarchical levels of scientific evidence and the
limitations of this approach are presented. Subsequent
chapters describe in greater detail the research process
that is followed to create evidence to address different
types of research questions.
Practitioner Experience
b
d decision
d i i made in
Evidence-based
collaboration with practitioner and client
FIGURE 1-1 Components of evidence-based practice. Each source of
evidence provides information for clinical decision-making.
4366_Ch01_001-020.indd 3
Initial models of evidence-based practice were criticized
for ignoring the important contribution of practitioner
experience in the clinical decision-making process. From
early on, Sackett and colleagues (1996) have argued
28/10/16 2:30 pm
FROM THE EVIDENCE 1-1
Critically Appraised Paper
Dean, C. (2012). Group task-specific circuit training for patients discharged home after stroke may be as effective as individualized
physiotherapy in improving mobility. Journal of Physiotherapy, 58(4), 269. doi:10.1016/S1836-9553(12)70129-7.
Note A: The research question implies
the hypothesis that is tested: Group circuit
training is as effective as individualized PT
for improving mobility after stroke.
Abstract
QUESTION:
Does task-oriented circuit training improve mobility in patients with stroke compared with individualised physiotherapy?
DESIGN:
Randomised, controlled trial with concealed allocation and blinded outcome assessment.
SETTING:
Nine outpatient rehabilitation centres in the Netherlands.
PARTICIPANTS:
Patients with a stroke who had been discharged home and who could walk 10m without assistance were included.
Cognitive deficits and inability to communicate were key exclusion criteria. Randomisation of 250 participants
allocated 126 to task oriented circuit training and 124 to individualised physiotherapy.
INTERVENTIONS:
The task-oriented circuit training group trained for 90min twice-weekly for 12 weeks supervised by physiotherapists
and sports trainers as they completed 8 mobility-related stations in groups of 2 to 8 participants. Individualised
outpatient physiotherapy was designed to improve balance, physical conditioning, and walking.
OUTCOME MEASURES:
The primary outcome was the mobility domain of the stroke impact scale measured at 12 weeks and 24 weeks. The
domain includes 9 questions about a patient's perceived mobility competence and is scored from 0 to 100 with
higher scores indicating better mobility. Secondary outcome measures included other domains of the stroke impact
scale, the Nottingham extended ADL scale, the falls efficacy scale, the hospital anxiety and depression scale,
comfortable walking speed, 6-minute walk distance, and a stairs test.
RESULTS:
242 participants completed the study. There were no differences in the mobility domain of the stroke impact scale
between the groups at 12 weeks (mean difference (MD) -0.05 units, 95% CI -1.4 to 1.3 units) or 24 weeks (MD -0.6,
95% CI -1.8 to 0.5). Comfortable walking speed (MD 0.09m/s, 95% CI 0.04 to 0.13), 6-minute walk distance (MD
20m, 95% CI 35.3 to 34.7), and stairs test (MD -1.6s, 95% CI -2.9 to -0.3) improved a little more in the circuit training
group than the control group at 12 weeks. The memory and thinking domain of the stroke impact scale (MD -1.6
units, 95% CI -3.0 to -0.2), and the leisure domain of the Nottingham extended ADL scale (MD -0.74, 95% CI -1.47
to -0.01) improved a little more in the control group than the circuit training group at 12 weeks. The groups did not
differ significantly on the remaining secondary outcomes at 12 weeks or 24 weeks.
CONCLUSION:
In patients with mild to moderate stroke who have been discharged home, task-oriented circuit training completed in
small groups was as effective as individual physiotherapy in improving mobility and may be a more efficient way of
delivering therapy.
Note B: Objective measurements provide
data supporting the hypothesis that the
group intervention is as effective as an
individualized approach.
FTE 1-1 Question
4366_Ch01_001-020.indd 4
How might this external scientific evidence influence your practice?
28/10/16 2:30 pm
CHAPTER 1
against “cookbook” approaches and submit that best
practices integrate scientific evidence with clinical expertise. Practice knowledge is essential when the scientific evidence is insufficient for making clinical decisions
and translating research into real-world clinical settings
(Palisano, 2010). Research will never keep up with clinical practice, nor can it answer all of the specific questions
that therapists face every day, given the diversity of clients, the different settings in which therapists practice,
and the pragmatic constraints of the real world. There
may be studies to indicate that a particular approach is
effective, but it is much less common to find evidence
related to frequency and intensity, or how to apply an
intervention to a complicated client with multiple comorbidities, as typically seen in practice. Practitioners
will always need to base many of their decisions on expertise gathered through professional education, interaction with colleagues and mentors, and accumulation
of knowledge from their own experience and practice.
Practitioner expertise is enriched by reflection. The
reflective practitioner takes the time to consider what
he or she has done, how it turned out, and how to make
things even better. Again, it is reflection that makes
knowledge more explicit and easier to communicate to
others. Reflection becomes even more explicit and methodical when therapists use program evaluation methods. Collecting data on your own clients and evaluating
their responses to treatment will provide important information for enhancing the overall effectiveness of your
services.
An example of integrating research evidence and practice experience is illustrated by Fox et al’s work with the
Lee Silverman Voice Treatment (LSVT) LOUD and BIG
approach for individuals with Parkinson’s disease (Fox,
Ebersbac, Ramig, & Sapir, 2012). Preliminary evidence
indicates that individuals receiving LSVT LOUD and
BIG treatment are more likely than individuals in control
conditions to increase their vocal loudness and frequency
variability and improve motor performance, including
walking speed and coordination. Yet, when working with
an individual client who has Parkinson’s disease, the practitioner still faces many questions: At what stage in the
disease process is the intervention most effective? Is the
intervention effective for clients who also experience depression or dementia? Is the intervention more or less
effective for individuals receiving deep brain stimulation?
The intervention is typically provided in 16 60-minute
sessions over a 1-month period. Can a more or less intensive schedule be used for specific clients? What about the
long-term effects of the treatment? Because Parkinson’s
disease is progressive in nature, will there be maintenance
issues?
This is where clinical reasoning comes into play.
You will use your practice experience to make decisions
about whether or not to implement the approach with
4366_Ch01_001-020.indd 5
●
Evidence-Based Practice
5
a particular client and how the intervention should be
implemented. If you do implement LSVT LOUD and
BIG, you will reflect upon whether or not it is working
and what modifications might be warranted.
Client Situation and Values
Interestingly, client-centered practice has moved into the
forefront at the same time that evidence-based practice is
gaining traction. Client-centered practice emphasizes
client choice and an appreciation for the client’s expertise in his or her life situation. A client’s situation should
always be considered in the treatment planning process.
An intervention is unlikely to result in successful outcomes if a client cannot carry it out due to life circumstances. Some very intensive therapies may be ineffective
if the individual does not have the financial resources,
endurance/motivation, or social support necessary to
carry them out. For example, a therapist should consider the issues involved when providing a single working
mother of three with an intensive home program for her
child with autism.
Client preferences and values also play an important
part in the decision-making process. For example, an
athlete who is eager to return to his or her sport and is
already accustomed to intensive training is more likely
to respond favorably to a home exercise program than a
client who views exercise as painful and tedious.
Rarely is there a single option in the treatment planning process. When shared decision-making occurs between clients and health-care providers, clients increase
their knowledge, are more confident in the intervention they are receiving, and are more likely to adhere to
the recommended therapy (Stacey et al, 2011). Shared
decision-making is a collaborative process in which
the clinician shares information from research and clinical experience, and the client shares information about
personal values and experiences. Different options are
presented, with the goal of arriving at an agreement regarding treatment. From a client-centered practice perspective, the client is the ultimate decision maker, and the
professional is a trusted advisor and facilitator.
The accessibility of Internet resources has increased
clients’ involvement in the treatment planning process.
Today, clients are more likely to come to the practitioner
with their own evidence-based searches (Ben-Sasson,
2011). The practitioner can help the client understand
and interpret the evidence in light of the client’s own
situation. In addition, the practitioner who is well versed
in the research literature may be able to supplement
the client’s search with further evidence on the topic of
interest and help evaluate the sources the client has already
located. Chapter 11 provides additional information
about the process of integrating practitioner experience,
client values, and research evidence.
28/10/16 2:30 pm
6
CHAPTER 1 ● Evidence-Based Practice
EVIDENCE IN THE REAL WORLD
Client Inclusion in Decision-Making
The following personal example illustrates the inclusion (or lack of inclusion) of the client in the decision-making
process. When my daughter was in elementary school, she broke her arm in the early part of the summer. Our
typical summer activity of spending time at the pool was disturbed. On our scheduled follow-up visit, when it was
our understanding that the cast would be removed, we showed up at the appointment wearing our bathing suits
underneath our clothes, ready to hit the pool as soon as the appointment was over. However, after the cast was
removed and an x-ray was taken, the orthopedic surgeon explained to us that, although the bone was healing well,
a small line where the break had occurred indicated that the bone was vulnerable to refracturing if my daughter
were to fall again. The orthopedic surgeon was ready to replace the cast and told us that my daughter should wear
this cast for several more weeks to strengthen the bone.
I have every confidence that this decision was based on the research evidence and the orthopedic surgeon’s
practice experience. He was interested in keeping my daughter from reinjuring herself. However, his recommendation was not consistent with our values at that time. I requested that instead of a cast, we would like to have a
splint that my daughter would wear when she was not swimming. I explained that we understood the risk, but
were willing to take said risk after weighing the pros and cons of our personal situation. The orthopedic surgeon
complied with the request, yet made it clear that he thought we were making the wrong decision and included
this in his progress note on our visit.
As a health-care practitioner, it is easy to appreciate the orthopedic surgeon’s dilemma. Furthermore, it is natural to want our expertise to be valued. In this case, the orthopedic surgeon may have felt that his expertise was
being discounted—but the family situation and my opinion as a parent were important as well. If the health-care
professional (in this case the orthopedic surgeon) had approached the situation from a shared decision-making
perspective, the values of the child and parent would have been determined and considered from the beginning.
Best practice occurs when the decision-making process is collaborative from the outset and the perspectives of all
parties are appreciated.
B.
EXERCISE 1-1
Strategizing When Client Values
and Preferences Conflict With the
External Research Evidence and/or the
Practitioner’s Experience (LO1)
The “Evidence in the Real World” example describes an experience in which there was conflict
between the mother’s preference and the research
evidence and the orthopedic surgeon’s experience.
There will likely be situations in your own practice
when similar conflicts emerge.
QUESTION
1. Identify three strategies that you might use to address
a conflict such as this while still honoring the client’s
values and autonomy to make decisions.
A.
4366_Ch01_001-020.indd 6
C.
WHY EVIDENCE-BASED PRACTICE?
In the past, practitioners were comfortable operating exclusively from experience and expert opinion, but best
practice in today’s health-care environment requires the
implementation of evidence-based practice that incorporates the research evidence and values of the client. It
is expected and in many instances required. The official
documents of professional organizations speak to the importance of evidence-based practice. For example, the
Occupational Therapy Code of Ethics and Ethics Standards
(American Occupational Therapy Association [AOTA],
2010) includes this statement in the section addressing
beneficence: “use to the extent possible, evaluation, planning, and intervention techniques and equipment that
28/10/16 2:30 pm
CHAPTER 1
are evidence-based and within the recognized scope of
occupational therapy practice.” The Position Statement
from the American Speech-Language-Hearing Association’s (ASHA’s) Committee on Evidence-Based Practice
includes the following: “It is the position of the American
Speech-Language-Hearing Association that audiologists
and speech-language pathologists incorporate the principles of evidence-based practice in their clinical decisionmaking to provide high quality care” (ASHA, 2005). The
World Confederation of Physical Therapy’s policy statement on evidence-based practice maintains that “physical
therapists have a responsibility to use evidence to inform
practice and ensure that the management of patients/
clients, carers and communities is based on the best available evidence” (WCPT, 2011).
Clinical decisions grounded in evidence carry more
weight and influence when they are supported with
appropriate evidence. Imagine participating in a team
meeting and being asked to justify your use of mirror
therapy for a client recovering from stroke. You respond
by telling the team that not only is your client responding favorably to the treatment, but a Cochrane review of
14 studies also found that mirror therapy was effective
for reducing pain and improving upper extremity motor
function and activities of daily living (Thieme, Merholz,
Pohl, Behrens, & Cohle, 2012). Use of evidence can increase the confidence of both your colleagues and your
client that the intervention is valid. Likewise, payers are
more likely to reimburse your services if they are evidence based.
Evidence-based practice also facilitates communication with colleagues, agencies, and clients. As clinical
decision-making becomes more explicit, the practitioner
is able to support choices with the source(s) of evidence
that were used and explain those choices to other practitioners, clients, and family members.
Ultimately, the most important reason to implement
evidence-based practice is that it improves the quality of
the services you provide. An intervention decision that is
justified by scientific evidence, grounded in clinical expertise, and valued by the client will, in the end, be more
likely to result in positive outcomes than a decision based
on habits or expediency.
THE PROCESS OF EVIDENCE-BASED
PRACTICE
The process of evidence-based practice mirrors the steps
of the scientific method (Fig. 1-2). It is a cyclical process
that includes the following steps:
1.
2.
3.
4.
5.
Formulate a question based on a clinical problem.
Identify the relevant evidence.
Evaluate the evidence.
Implement useful findings.
Evaluate the outcomes.
4366_Ch01_001-020.indd 7
●
Evidence-Based Practice
7
Formulate a
question
based on a
clinical
problem
Evaluate the
outcomes
Identify the
relevant
evidence
Implement
useful findings
Evaluate the
evidence
FIGURE 1-2 The cycle of evidence-based practice.
Formulate a Question Based
on a Clinical Problem
The first step in evidence-based practice involves identification of a clinical problem and formulation of a question to narrow the focus. Once the problem has been
identified, a question is formulated. The formulation of
a specific evidence-based question is important because
it provides the parameters for the next step of searching
the literature.
Questions can be formulated to address several areas of
practice. The most common types of questions address the
following clinical concerns: (1) efficacy of an intervention,
(2) usefulness of an assessment, (3) description of a condition, (4) prediction of an outcome, and (5) lived experience
of a client. Each type of question will lead the practitioner
to different types of research. The process of writing a
research question is discussed in more detail later in this
chapter.
Identify the Relevant Evidence
After the question has been formulated, the next step is
to find relevant evidence to help answer it. Evidence can
include information from the research literature, practice
knowledge, and client experience and values. Searching
the literature for evidence takes skill and practice on the
part of practitioners and students. Development of this
skill is the focus of Chapter 2. However, as mentioned
previously, the research evidence is only one component
of evidence-based practice. Therapists should always consider research evidence in light of their previous experience, as well as information gathered about the client and
his or her situation.
Evaluate the Evidence
Once evidence is found, evidence-based practitioners
must critically appraise that evidence. The design of the
28/10/16 2:30 pm
8
CHAPTER 1 ● Evidence-Based Practice
study, size of the sample, outcome measures used, and
many other factors all play a role in determining the
strength of a particular study and the validity of its conclusions. In addition, practitioners need to evaluate the
applicability of a particular study to their practice situation and client life circumstances. Much of this textbook
focuses on evaluating research, and additional information is presented in Chapters 5 through 10.
Implement Useful Findings
Clinical decision-making may focus on an intervention or
assessment approach, use evidence to better understand a
diagnosis or an individual’s experience, and/or predict an
outcome. Once the evidence has been collected, screened,
and presented to the client, the practitioner and client use
a collaborative approach and, through shared decisionmaking, apply the gathered evidence to practice. Chapter 11
provides more information on clinical decision-making and
presenting evidence to clients.
Evaluate the Outcomes
The process of evidence-based practice is recursive; that
is, the process draws upon itself. When a practitioner
evaluates the outcomes of implementing evidence-based
practice, the evaluation process contributes to practice
knowledge. The practitioner determines whether the
evidence-based practice resulted in the intended outcomes. For example, did the intervention help the client
achieve established goals? Did the assessment provide the
therapist with the desired information? Was prediction
from the research evidence consistent with the results of
clients seen by the therapist? Did the client’s lived experience resonate with the research evidence? Evidence-based
practitioners reflect on the experience as well as gather
information directly from their clients to evaluate outcomes. The step of evaluating the outcomes helps the
practitioner to make clinical decisions in the future and
ask new questions to begin the evidence-based process
over again.
EVIDENCE IN THE REAL WORLD
Steps in Evidence-Based Practice
The following example shows all of the steps in the process of evidence-based practice.
You are working with an 8-year-old boy, Sam, who has a diagnosis of autism. During therapy, Sam’s parents
begin discussing issues related to sleep. Sam frequently awakens in the night and then, when encouraged to go
back to sleep, typically becomes very upset, sometimes throwing temper tantrums. You explain to the parents that
you will help them with this concern, but first you would like to examine the evidence.
First, you formulate the question: “Which interventions are most effective for reducing sleep problems (specifically nighttime awakening) in children with autism?” Second, you conduct a search of the literature and identify
relevant evidence in the form of a systematic review by Vriend, Corkum, Moon, and Smith in 2011. (A systematic
review provides a summary of many studies on the same topic.) You talk with the parents about what approaches
they have tried in the past.
The parents explain that, when Sam awakens, one of the parents typically stays in his room until he eventually
falls asleep again. They are unhappy with this tactic, but have not found a technique that works any better. The
Vriend et al (2011) systematic review is evaluated. Although a systematic review is considered a high level of evidence,
this review finds that the studies addressing sleep issues for children with autism are limited. The review discusses
the following approaches: extinction, scheduled awakening, faded bedtime, stimulus fading for co-sleeping, and
chronotherapy. The approach with the most research support for addressing nighttime awakenings is extinction.
Standard extinction (i.e., ignoring all negative behaviors) was examined in three studies and resulted in a decrease
in night awakening that was maintained over time.
You work to put the findings in an appropriate language for the parents to understand and present the evidence
to them. Sam’s parents decide to implement these useful findings and try standard extinction. This technique can be
challenging for parents to implement because, in the short term, it is likely to result in an increase in tantrums and
agitation. However, the parents are desperate and willing to give it a try. You provide them with basic instruction,
and together you develop a plan. The parents decide to start the intervention on a long weekend so that they will
have time to adjust to the new routine before returning to work.
After the initial weekend trial, you talk to the parents about the extinction process and evaluate the outcomes for
Sam. They report that, although it was initially very difficult, after 1 week they are already seeing a significant
reduction in their son’s night awakenings and an improvement in his ability to self-settle and get back to sleep.
4366_Ch01_001-020.indd 8
28/10/16 2:30 pm
CHAPTER 1
●
Evidence-Based Practice
9
WRITING AN EVIDENCE-BASED
QUESTION
3. Description of a condition
4. Prediction of an outcome
5. Lived experience of a client
This section helps you begin to develop the skills of
an evidence-based practitioner by learning to write
an evidence-based question. As mentioned previously,
there are different types of questions; the appropriate type depends on the information you are seeking.
The five types of questions relevant to this discussion
include:
Table 1-1 provides examples of questions for each
category, as well as the research designs that correspond
to the question type. Subsequent chapters describe the
research designs in much greater detail.
1. Efficacy of an intervention
2. Usefulness of an assessment
Questions related to the efficacy of an intervention are
intended to help therapists make clinical decisions about
Questions on Efficacy of an Intervention
TABLE 1-1 Examples of Different Types of Evidence-Based Clinical Questions
Type of
Question
Examples
Common Designs/
Research Methods
Efficacy of an
intervention
• In individuals with head and neck cancer, what is
the efficacy of swallowing exercises versus usual
care for preventing swallowing problems during
chemotherapy?
• In infants, what is the efficacy of swaddling (versus
no swaddling) for reducing crying?
• For wheelchair users, what is the best cushion to
prevent pressure sores?
• Randomized controlled
trials
• Nonrandomized controlled
trials
• Pretest/posttest without a
control group
Usefulness of an
assessment
• What is the best assessment for measuring
improvement in ADL function?
• How reliable is goniometry for individuals with
severe burns?
• What methods increase the validity of health-related
quality of life assessment?
•
•
•
•
Description of a
condition
• What motor problems are associated with cerebral
palsy?
• What are the gender differences in sexual
satisfaction issues for individuals with spinal
cord injury?
• Incidence and prevalence
studies
• Group comparisons
(of existing groups)
• Surveys and interviews
Prediction of an
outcome
• What predictors are associated with successful
return to employment for individuals with back
injuries?
• What childhood conditions are related to stuttering
in children?
• Correlational and regression
studies
• Cohort studies
Lived experience
of a client
• What is the impact of multiple sclerosis on
parenting?
• How do athletes deal with career-ending injuries?
•
•
•
•
4366_Ch01_001-020.indd 9
Psychometric methods
Reliability studies
Validity studies
Sensitivity and specificity
studies
Qualitative studies
Ethnography
Phenomenology
Narrative
28/10/16 2:30 pm
10
CHAPTER 1 ● Evidence-Based Practice
implementing interventions. Efficacy questions are often
structured using the PICO format:
P = population
I = intervention
C = comparison or control condition
O = outcome
The following is an example of a PICO question: “For
individuals with schizophrenia (population), is supported
employment (intervention) more effective than transitional
employment (comparison) for work placement, retention,
and income (outcomes)?” The order of the wording is less
important than inclusion of all four components.
PICO questions are useful when you are familiar with
the available approaches and have specific questions
about a particular approach. However, it may be necessary to start with a more general question that explores
intervention options. For example, you might ask, “What
approach/es is/are most effective for increasing adherence
to home exercise programs?” Searching for answers to
this question may involve weeding through a substantial
amount of literature; however, identification of the possible interventions is your best starting place.
In other situations, you may know a great deal about
an intervention and want to ask a more specific or restricted efficacy question, such as, “How does depression
affect outcomes for individuals receiving low vision rehabilitation?” or “What are the costs associated with implementing a high-intensity aphasia clinic on a stroke unit?”
Research Designs for Efficacy Questions
and Levels of Evidence
Evidence-based practitioners need a fundamental understanding of which research designs provide the
strongest evidence. An introduction to designs used to
answer efficacy questions is provided here so that you
can begin to make basic distinctions. Designs used to
answer efficacy questions are discussed in greater detail
in Chapter 6.
The concept of levels of evidence establishes a
hierarchical system used to evaluate the strength of the
evidence for research that is designed to answer efficacy questions. Determining if a particular approach
is effective implies a cause-and-effect relationship:
that is, the intervention resulted in or caused a particular outcome. Certain research designs are better suited
for determining cause and effect. Hence, knowing that
a researcher used an appropriate type of study design
means the practitioner can have more confidence in the
results.
There is no universally accepted hierarchy of levels of
evidence; several exist in the literature. However, all hierarchies are based on principles reflecting strong internal validity. Controlled studies with random assignment
result in the highest level of evidence for a single study.
Table 1-2 gives examples of evidence hierarchies and
their references.
For a study to be deemed a randomized controlled
trial, three conditions must be met:
1. The study must have at least two groups, an experimental and a control or comparison condition.
2. The participants in the study must be randomly assigned to the conditions.
3. An intervention (which serves as the manipulation)
must be applied to the experimental group.
Stronger than a single study is a systematic review,
which identifies, appraises, and analyzes (synthesizes)
the results of multiple randomized controlled trials on a
TABLE 1-2 Examples of Evidence Hierarchies and Supporting References
Hierarchy
Reference
Number of Levels
Oxford Centre for Evidence Based
Medicine 2011 Levels of Evidence
OCEBM Levels of Evidence Working
Group (2011)
5
Sackett and colleagues
Sackett, Straus, and Richardson (2000)
10
AOTA hierarchy (adaptation of Sackett)
Arbesman, Scheer, and Lieverman (2008)
5
Research Pyramid for Experimental,
Outcome, and Qualitative Evidence
Tomlin and Borgetto (2011)
4*
ASHA hierarchy
ASHA (n.d.)
6
*Includes four levels each for experimental, outcome, and qualitative evidence.
4366_Ch01_001-020.indd 10
28/10/16 2:30 pm
CHAPTER 1
single topic using a rigorous set of guidelines. Other
factors taken into consideration in some level-ofevidence hierarchies are issues such as sample size,
confidence intervals, and blinding (these topics are addressed in Chapters 4, 5, and 6). As a result, different
hierarchies include varying numbers of levels. Table 1-3
outlines an example of a standard levels-of-evidence
hierarchy that can be used for the purpose of evaluating
studies that examine the efficacy of an intervention. Because different hierarchies exist, it is important to recognize that a Level II as described in this table may differ
from a Level II in another hierarchy.
In the hierarchy shown in Table 1-3, the highest
level of evidence is a systematic review of randomized
controlled trials. Because a systematic review involves
analysis of an accumulation of studies, this levelof-evidence hierarchy supports the value of replication. Although this is the highest level of evidence, it
does not mean that, just because a systematic review
has been conducted, the practice is supported. At all
levels, the research may or may not result in statistically significant findings to support the conclusion that
the intervention caused a positive outcome. In other
words, it is possible that a systematic review may find
strong evidence that the intervention of interest is not
effective.
Also, it is important to consider the studies included in
the review. A systematic review may not provide the highest level of evidence if the studies in the review are not
randomized controlled trials. If randomized controlled
trials have not been conducted in the area under study
and therefore could not be included in the review, the
systematic review would not meet the criteria for Level I
evidence. Systematic reviews are described in more detail
in Chapter 10.
TABLE 1-3 Example of Standard Levels-ofEvidence Hierarchy
Level
Description
I
Systematic review of randomized
controlled trials
II
Randomized controlled trial
III
Nonrandomized controlled trial
IV
One group trial with pretest and
posttest
V
Case reports and expert opinion
4366_Ch01_001-020.indd 11
●
Evidence-Based Practice
11
Level II evidence comes from randomized controlled
trials. The strength of the randomized controlled
trial lies in its ability to indicate that the intervention, rather than another influence or factor, caused
the outcome. This quality of a study is also known as
internal validity. Factors that contribute to internal
validity include the use of a control group and random
assignment. Random assignment to a control group
eliminates many biases that could confound the findings
of a study. This is discussed in much greater detail in
Chapter 4.
A Level III study is similar to a Level II study, with
the exception that the assignment to groups is not
random. This is a fairly common occurrence in studies reported in the research literature. There may be
pragmatic or ethical reasons for avoiding random assignment. A typical example of nonrandomized group
comparisons is a study in which, in one setting, individuals receive the intervention and in the other setting
they do not. In another design, individuals in one setting may receive one intervention, which is compared
with a different intervention provided in an alternate
setting. Similarly, in a school setting, one classroom
may receive an intervention, while another classroom
receives standard instruction. Problems with the validity of conclusions arise in such situations. For example, the teacher in one classroom may be more effective
than the teacher in the other classroom; therefore, the
results of the study are due not to the intervention,
but to the skills of the teacher. In the situation of different hospital settings, one hospital may tend to have
patients with more severe conditions who are thus less
responsive to treatment, which introduces a different
type of bias.
Sometimes there is confusion about the term
control, because a control group is typically considered to be a group that receives no intervention. However, individuals receiving an alternate intervention
or a standard intervention often serve as the control. In
a nonrandomized group comparison, the control group
provides the primary support for the internal validity
of the design.
Studies yielding evidence at Level IV typically
have no control group. These studies are often referred to as pre-experimental. In these types of studies, individuals serve as their own control through the
use of a pretest and a posttest. A pretest is administered, the intervention is applied, and then the posttest
is administered. The conclusion that differences from
pretest to posttest are due to the intervention must
be made with great caution, as is discussed further in
Chapter 4.
In the vast majority of Level II and III studies,
pretests and posttests are also used. Because it lacks
a control group, a Level IV study is much less robust
for drawing conclusions about cause-and-effect relationships. The improvements that occur from pretest to
28/10/16 2:30 pm
12
CHAPTER 1 ● Evidence-Based Practice
posttest could be due to general maturation or healing rather than the intervention; in other words, the
client would get better if left alone. Another reason
for improvements could be due to a placebo effect,
whereby the individual gets better because he or she
expects to get better and not because of actions specific
to the intervention. Study designs that use pretests and
posttests without a control group are often used as initial pilot studies so that minimal resources are used to
identify whether an approach has potential benefits. If
the results are promising, future research will typically
use stronger designs, such as a randomized controlled
trial. From the Evidence 1-2 provides an example of
a Level IV study.
Level V evidence includes case reports and expert
opinion. Level V studies do not use statistical analyses
to draw conclusions. This level of evidence is based on a
single case study or expert opinion.
FROM THE EVIDENCE 1-2
Example of a Level IV Study
Case-Smith, J., Holland, T., Lane, A., & White, S. (2012). Effect of a coteaching handwriting program for first graders:
One-group pretest–posttest design. American Journal of Occupational Therapy, 66(4), 396–405. doi:10.5014/ajot.
2012.004333.
Note A: This study used a pretest-posttest design
without a control group. Although children improved,
without a comparison group one cannot know if the
children would have improved without an intervention.
We examined the effects of a co-taught handwriting and writing
program on first-grade students grouped by low, average, and
high baseline legibility. The program's aim was to increase
legibility, handwriting speed, writing fluency, and written
expression in students with diverse learning needs. Thirty-six
first-grade students in two classrooms participated in a 12-wk
handwriting and writing program co-taught by teachers and an
occupational therapist. Students were assessed at pretest,
posttest, and 6-mo follow-up using the Evaluation Tool of
Children's Handwriting-Manuscript (ETCH-M) and the
Woodcock-Johnson Writing Fluency and Writing Samples tests.
Students made large gains in ETCH-M legibility (η² = .74), speed
(η²s = .52-.65), Writing Fluency (η² = .58), and Writing Samples
(η² = .59). Students with initially low legibility improved most in
legibility; progress on the other tests was similar across low-,
average-, and high-performing groups. This program appeared to
benefit first-grade students with diverse learning needs and to
increase handwriting legibility and speed and writing fluency.
FTE 1-2 Question
4366_Ch01_001-020.indd 12
Which conditions that must be present for Level II evidence are lacking in this study?
28/10/16 2:30 pm
CHAPTER 1
EXERCISE 1-2
Using the Levels-of-Evidence Hierarchy
to Evaluate Evidence (LO2)
●
Evidence-Based Practice
13
training or usual care. The number of falls for the two
groups is compared for 1 month before the intervention and 1 month after the intervention.
Level of evidence:
QUESTIONS
For the following examples, identify the level of evidence
that best matches the study description for the question, “For
individuals in long-term care facilities, what is the efficacy
of strength and balance training when compared with usual
care for reducing falls?”
1. Forty individuals in a long-term care facility receive
strength and balance training to reduce falls. The
number of falls for a 1-month period before the intervention is compared with the number of falls for a
1-month period after the intervention.
Level of evidence:
2. Twenty individuals on one wing of a long-term care
facility receive strength and balance training, while
20 individuals on another wing of the same long-term
care facility receive usual care. The number of falls for
the two groups is compared for 1 month before the
intervention and 1 month after the intervention.
Level of evidence:
Questions for Usefulness
of an Assessment
When practitioners have questions about assessments
and assessment tools, psychometric methods are used.
The primary focus of psychometric methods is to examine the reliability and validity of specific assessment
instruments. Reliability addresses the consistency of
a measure; that is, dependability of scores, agreement
of scoring for different testers, and stability of scoring
across different forms of a measure. For example, a
practitioner may want to know which measure of muscle functioning is the most consistent when scored by
different therapists. Validity is the ability of a measure
to assess what it is intended to measure. For example,
a practitioner may be interested in identifying which
measure most accurately assesses speech intelligibility.
Questions about the usefulness of assessments guide
practitioners in finding evidence to determine if the
measures they are currently using have sufficient reliability and validity, and help practitioners identify the
best measure for a particular client need. Chapter 7 discusses assessment studies in greater detail.
Research Designs Used in Assessment Studies
3. The results of three randomized controlled trials examining the efficacy of strength and balance training
to reduce falls in long-term care facilities are analyzed
and synthesized.
Level of evidence:
4. A resident in a long-term care facility receives individualized strength and balance training. The number of
falls for this individual is determined at different time
points and presented in a report.
Level of evidence:
5. Individuals in three long-term care facilities are randomly assigned to receive either strength and balance
4366_Ch01_001-020.indd 13
In reliability studies, the emphasis is on determining
the consistency of the scores yielded by the assessment.
In reliability studies, the measure may be administered
more than one time to the same individual, or different
evaluators may evaluate the same individual. A reliable
measure will produce comparable scores across these
different conditions. In validity studies, the measure of
interest is often compared with other similar measures to
determine if it measures the same construct. Validity
studies may also use the measure with different populations to determine if it distinguishes populations
according to the theoretical basis of the measure. For
example, a measure of depression should result in
higher scores for individuals with a diagnosis of depression when compared with scores of individuals in
the general population.
Another important consideration in the selection of
a measure for use in a clinical setting is sensitivity and
specificity. Sensitivity of a test refers to the proportion of
the individuals who are accurately identified as possessing
the condition of interest. Specificity is the proportion
28/10/16 2:30 pm
14
CHAPTER 1 ● Evidence-Based Practice
of individuals who are correctly identified as not having
the condition. It is possible for a measure to be sensitive
but not specific, and vice versa. For example, a measure
may correctly identify all of the individuals with a balance problem (highly sensitive), but misidentify individuals without balance problems as having a problem (not
very specific). Ideally, measures should be both sensitive
and specific.
Another feature of measures that is important in gauging the efficacy of an intervention is sensitivity to change.
If a measure lacks the precision necessary to detect a
change, it will not be useful as an outcome measure.
A validity study provides an example of how practitioners can use evidence to make decisions about what
assessment to use in practice. For example, a study comparing the shortened Fugl-Meyer Assessment with the
streamlined Wolf Motor Function Test for individuals
with stroke found that the Fugl-Meyer Assessment was
more sensitive to changes in rehabilitation and a better
predictor of outcomes (Fu et al, 2012).
The results from a reliability study show how practitioners can use evidence to make decisions about the
method of administration of a measure. Researchers
compared direct observation with an interview of parents to gather information for the WeeFIM (Sperle,
Ottenbacher, Braun, Lane, & Nochajski, 1997). The
results indicated that the scores were highly correlated
for the two methods of administration, suggesting that
parental interviews are comparable to direct observation. The clinical application one can draw from this
study is that, in situations in which the therapist is unable to directly observe the child, this study provides
support for the practice of collecting information from
the parent to score the WeeFIM.
Questions for Description of a Condition
As a practitioner, you will often have questions about
the conditions that you commonly see, or you may
encounter a client with an atypical condition that you
know very little about. There is a great deal of research
evidence available to answer questions about different
health-care conditions. Perhaps you notice that many of
your clients with chronic obstructive pulmonary disease
(COPD) also experience a great deal of anxiety, so you
have a question about the comorbidity of COPD and
anxiety disorders. Perhaps you are interested in gender differences as they relate to symptoms in attention
deficit disorder. By gathering evidence from descriptive
questions, you can better understand the people you
treat. Chapter 8 discusses descriptive studies in greater
detail.
Research Designs Used in Descriptive Studies
Research designs intended to assist in answering descriptive questions do not involve the manipulation of
4366_Ch01_001-020.indd 14
variables, as in efficacy studies. Instead, the phenomena
are depicted as they occur. Hence, these designs use observational methods. One type of descriptive research
that is common in health care is the prevalence and incidence study. Prevalence is the proportion of individuals within a population who have a particular condition,
whereas incidence is the risk of developing a condition
within a period of time. For example, the prevalence of
Alzheimer’s disease in people aged 65 and older living
in the United States is 13% (Alzheimer’s Association,
2012). The same report indicates that the incidence for
a lifetime risk of developing Alzheimer’s disease is 17.2%
for women and 9.1% for men. The difference is likely
attributable to the longer life span for women.
Rehabilitation practitioners are often interested in
incidence and prevalence statistics that occur within a
particular diagnostic group. For example, what is the incidence of pressure sores in wheelchair users? What is
the prevalence of swallowing disorders among premature
infants?
Observational methods may also be used to compare
existing groups of individuals to describe differences,
rather than assigning individuals to groups. For example,
cognitive assessments may be administered to describe
differences in cognition for individuals with and without
schizophrenia; or the social skills of children with and
without autism may be compared. This type of study allows researchers to describe differences and better understand how individuals with a particular condition differ
from individuals without the condition.
Survey methods are also used to answer descriptive
questions. With surveys, the participating individuals
themselves (respondents) are asked descriptive questions. One of the advantages of survey methods is that
many questions can be asked, and a significant amount
of data can be collected in a single session. From the
Evidence 1-3 is based on a survey of individuals with
multiple sclerosis. It illustrates the rates and levels of
depression among a large sample of individuals with
multiple sclerosis.
Questions for Prediction of an Outcome
With predictive questions, associations are made between
different factors. Some predictive questions are framed in
terms of a prognosis. In other words, what factors contribute to the prognosis, response, or outcome in a particular condition? Chapter 8 discusses predictive studies
in greater detail.
Research Designs Used in Predictive Studies
Predictive questions are similar to descriptive questions
in terms of the methods used. Observational or survey
data are collected to examine relationships or predictors.
As in descriptive studies, individuals are not randomly
assigned to groups, but instead are studied in terms of
28/10/16 2:30 pm
CHAPTER 1
●
Evidence-Based Practice
15
FROM THE EVIDENCE 1-3
Research Using a Survey
Jones, K. H., Ford, D. V., Jones, P. A., John, A., Middleton, R. M., Lockhart-Jones, H., Osborne, L. A., & Noble, J. G. (2012). A large-scale study
of anxiety and depression in people with multiple sclerosis: A survey via the web portal of the UK MS Register. PLoS One, 7(7), Epub.
doi:10.1371/journal.pone.0041910.
Depression
This pie chart from a large, descriptive study of 4,178
individuals with multiple sclerosis used survey research to
identify levels of depression. This study’s finding of high rates of
depression among people with multiple sclerosis is important
information for practitioners who work with this population.
6.3%
Normal
Mild depression
34.6%
32.7%
Moderate depression
Severe depression
26.4%
FTE 1-3 Question Assume that you are a health professional treating someone with multiple sclerosis. Based on
the results of this study, how likely would it be that the individual would be experiencing some level of depression?
the groupings or characteristics that naturally occur.
Data may be collected at a single point in time (crosssectional research) or over a period of time (longitudinal research). For example, in a cross-sectional study,
Hayton and colleagues (2013) found that the predictors
of adherence and attendance for pulmonary rehabilitation included smoking status (smokers were less likely
to attend), social support (greater support was associated
with better attendance), and severity of disease (more
severity was associated with lower attendance).
In assessing the quality of predictive studies, sample size is an important consideration. A larger sample
will be more representative of the population of interest, and the findings will be more stable and consistent.
With a small sample, there is a greater likelihood of
bias or that outliers will influence the results. Predictive
studies find associations, but their designs make it more
difficult to imply causation. “Correlation does not equal
causation” is an important axiom of the evidence-based
4366_Ch01_001-020.indd 15
practitioner that will be discussed in greater detail in
subsequent chapters.
Questions About the Client’s Lived
Experience
The evidence-based questions and research designs described thus far heavily emphasize objective data collection, analysis, and interpretation. However, numerical
data do not tell the full story, because each individual’s
experience is unique and subjective. Questions about the
lived experience provide practitioners with evidence from
the client’s perspective. These questions tend to be more
open-ended and exploratory, such as, “What is the meaning of recovery for people with serious mental illness?”
and “How do caregivers of individuals with dementia
describe their experience of managing difficult behaviors?”
More information about lived experience studies is provided in Chapter 9.
28/10/16 2:30 pm
16
CHAPTER 1 ● Evidence-Based Practice
Research Designs Addressing the Client’s
Lived Experience
Questions about lived experience are answered using qualitative methods. Based on a different paradigm than quantitative research, qualitative research is concerned with meaning
and explanation. The individual is appreciated for his or her
understanding of a particular phenomenon. Instead of using
numbers and statistics, qualitative research is typically presented in terms of themes that emerge during the course
of the research; in reports of such studies, these themes are
exemplified with quotations. The purpose of qualitative
research is to uncover new ideas or develop insights into a
phenomenon. In qualitative research, participants are interviewed extensively and/or observed extensively in natural
contexts. Conditions are not manipulated or contrived.
Norman and colleagues (2010) used qualitative methods
to identify questions that individuals with spinal cord injury
have related to experiences of pain. Extensive interviews indicated a general theme of dissatisfaction with the information participants had received about pain from health-care
professionals. The following quote was one of many used to
illustrate the findings: “I should have been told that I could
be in pain. . . . I kind of wished I had known that because
then I would have been better prepared mentally” (p. 119).
EXERCISE 1-3
2. Usefulness of an assessment question (reliability,
validity, sensitivity, or specificity of an assessment for
people with strokes):
3. Description of a condition question (consider
questions that will reveal more about people with
strokes):
4. Prediction of an outcome question (these questions
generally contain the words relationship, association, or
predictors of an outcome):
5. Lived experience of a client question (focus on understanding and explaining from the perspective of the
client):
Writing Evidence-Based Clinical
Questions (LO3)
QUESTIONS
Imagine that you are a practitioner at a rehabilitation hospital with a caseload that includes many individuals who
have experienced a stroke. Consider the questions you might
have and write one research question for each of the types
of questions discussed in this chapter. Use the PICO format
when writing the efficacy question. Remember that these
questions are used for searching the literature, so think
about wording and content that would make your question
searchable. A question that is too vague or open-ended will
be difficult to search, whereas specific questions are typically
easier to search. For example, it would be very difficult to
answer the following question with a search of the research
evidence: “What are the functional impairments associated
with neurological conditions?”
1. Efficacy of an intervention question (using PICO
format to ask a question related to interventions for
stroke):
4366_Ch01_001-020.indd 16
CRITICAL THINKING QUESTIONS
1. Identify at least three reasons why practitioner experience is an important component of evidence-based
practice.
2. How might you facilitate the shared decision-making
process when the client’s preferences conflict with the
external evidence?
28/10/16 2:30 pm
CHAPTER 1
3. Why is a randomized controlled trial not an appropriate design for many clinical questions?
4. Identify at least three advantages that the evidence-based
practitioner has over a practitioner who does not use
evidence in practice.
ANSWERS
EXERCISE 1-1
1. There is no single answer to this exercise, but some
strategies you might consider include:
• Be sure to ask clients about their preferences and
values before making any clinical decisions.
• Familiarize yourself with possible intervention options and avoid entering a treatment planning situation without consulting with the client first.
• Clearly present information about the external research evidence and your practitioner experience
to the client, and explain your rationale for recommending a particular intervention approach.
• Honor the client’s opinion regardless of the decision
and support the client in that choice.
• If the client’s choice presents a significant safety
concern, explain why you cannot ethically honor the
client’s choice.
EXERCISE 1-2
1. IV. In this study, there is a single group (no comparison) that is assessed before and after an intervention.
2. III. A comparison of two groups is made, so this study
includes a control condition, but there is no randomization in the assignments to the groups.
3. I. This is an example of a systematic review, in which
the results of three randomized controlled trials are
compared.
4. V. A case study describes an intervention and outcome
for a single individual only; thus, it is considered a lower level of evidence.
5. II. The highest level of evidence for a single study is a
randomized controlled trial.
4366_Ch01_001-020.indd 17
●
Evidence-Based Practice
17
EXERCISE 1-3
There is no single answer to this exercise, but a poor example and a good example are provided for each of the
types of questions. Try to judge your questions against
these examples.
1. Efficacy of an Intervention Question
Poor example:
“Is constraint-induced motor therapy effective?”
This question does not include all of the components
of the PICO question: The condition, comparison,
and outcome are missing. If you were to search the
evidence on constraint-induced therapy, you would
find many studies, but much of your search would
be irrelevant.
Good example:
“Is constraint-induced motor therapy more effective
than conventional upper-extremity rehabilitation for
improving fine and gross motor skills in people with
strokes?” Now your question includes all of the components of a PICO question and will lead you to more
relevant studies.
2. Usefulness of an Assessment Question
Poor example:
“What assessments should I be using when working
with people with strokes?” Stroke is a multifaceted
condition and the number of available assessments
is considerable. A better question would focus on
an assessment you are already using or considering
using, or identifying an assessment for a particular
purpose.
Good example:
“What is the reliability and validity associated with the
Wolf Motor Function Test?” This question will help
you gather data on the usefulness of a specific test used
in stroke rehabilitation.
3. Description of a Condition Question
Poor example:
“What are the symptoms of stroke?” This is the sort
of question that is answered in a textbook, which
combines the writer’s clinical expertise with the
evidence. However, answering this question with a
search of the evidence will be very difficult. Typically
researchers will focus on a narrower aspect of the
condition.
Good example:
“What is the prevalence of left-sided neglect in individuals with a right-sided cerebrovascular accident
(CVA)?” This question will lead you down a more specific path of research. Once the question is answered,
you will be more prepared to deal with the client who
has had a right-sided CVA.
4. Prediction of an Outcome Question
Poor example:
“How are placement decisions made after acute stroke
rehabilitation?” This question is vague and does not
28/10/16 2:30 pm
18
CHAPTER 1 ● Evidence-Based Practice
include the common terms of this type of question:
associated, related, and predictive. Experienced clinicians
can describe the decision-making process, but this
question does not help you predict an outcome.
Good example:
“What motor, cognitive, and psychological conditions after acute stroke rehabilitation are most associated with inability to return to independent
living?” Research evidence can answer this question
and thereby help you inform your clients and their
families about their prognosis.
5. Question About the Client’s Lived Experience
Poor example:
“What percentage of individuals with stroke are
satisfied with their rehabilitation experience?” Although this may be an answerable question, it is not
a qualitative question because it deals with numbers.
Qualitative questions are answered narratively, typically using themes derived or drawn from quotes and
observations.
Good example:
“How do individuals with stroke describe their rehabilitation experience?” In the case of the qualitative
question, sometimes vague is better. When obtaining information from the individual’s perspective, it
is more effective to let the individual do the talking
and remain open to whatever themes emerge. This
approach also captures and reveals the diversity of experiences as well as commonalities.
FROM THE EVIDENCE 1-1
This study provides you as a practitioner with external evidence that group treatment may be as effective
as individual treatment for improving mobility after
stroke.
FROM THE EVIDENCE 1-2
The study does not have at least two groups; therefore,
because there is only one group, the participants cannot
be randomly assigned to different conditions. There is an
intervention applied to the one group.
FROM THE EVIDENCE 1-3
If you combine mild, moderate, and severe depression,
65.4% of individuals in this study experienced some
level of depression. Therefore, this study suggests that
a large proportion of individuals with multiple sclerosis will experience depression. This is an important
consideration for health-care professionals, in that
depression will likely have a significant impact on an
individual’s daily life and ability to engage in the rehabilitation process.
4366_Ch01_001-020.indd 18
REFERENCES
Alzheimer’s Association. (2012). 2012 Alzheimer’s disease facts and
figures. Alzheimer’s Association, 8(2), 1–72.
American Occupational Therapy Association (AOTA). (2010). The
occupational therapy code of ethics and ethics standards. American
Journal of Occupational Therapy, 64(suppl.), S17–S26.
American Speech-Language-Hearing Association (ASHA). (n.d.). Assessing the evidence. Retrieved from http://www.asha.org/Research/
EBP/Assessing-the-Evidence.
American Speech-Language-Hearing Association (ASHA). (2005).
Evidence-based practice in communication disorders (Position Statement). Retrieved from www.asha.org/policy/PS2005-00221.htm
Arbesman, M., Scheer, J., & Lieverman, D. (2008). Using AOTA’s critically appraised topic (CAT) and critically appraised paper (CAP)
series to link evidence to practice. OT Practice, 13(5), 18–22.
Ben-Sasson, A. (2011). Parents’ search for evidence-based practice:
A personal story. Journal of Paediatrics and Child Health, 47,
415–418.
Case-Smith, J., Holland, T., Lane, A., & White, S. (2012). Effect of
a coteaching handwriting program for first graders: One-group
pretest-posttest design. American Journal of Occupational Therapy, 66,
396–405. doi:10.5014/ajot.2012.004333
Dahm, K. T., Brurberg, K. G., Jamtveat, G., & Hagen, K. B. (2010).
Advice to rest in bed versus advice to stay active for acute low-back
pain & sciatica. Cochrane Database of Systematic Reviews.
Dean, C. (2012). Group task-specific circuit training for patients discharged home after stroke may be as effective as individualized
physiotherapy in improving mobility: A critically appraised paper. Journal of Physiotherapy, 58, 269. doi:10.1016/S1836-9553(12)
70129–7
Fox, C., Ebersbac, G., Ramig, L., & Sapir, S. (2012). LSVT LOUD
and LSVT BIG: Behavioral treatment programs for speech and
body movement in Parkinson disease. Parkinson Disease, Epub,
1-13. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3316992/
Fu, T. S., Wu, C. Y., Lin, K. C., Hsieh, C. J., Liu, J. S., Wang, T. N.,
& Ou-Yang, P. (2012). Psychometric comparison of the shortened Fugl-Meyer Assessment and the streamlined Wolf Motor
Function Test in stroke rehabilitation. Clinical Rehabilitation, 26,
1043–1047.
Hayton, C., Clark, A., Olive, S., Brown, P., Galey, P., Knights, E., . . .
Wilson, A. M. (2013). Barriers to pulmonary rehabilitation: Characteristics that predict patient attendance and adherence. Respiratory
Medicine, 107, 401–407.
Jones, K. H., Ford, D. V., Jones, P. A., John, A., Middleton, R. M.,
Lockhart-Jones, H., Osborne, L. A., & Noble, J. G. (2012). A largescale study of anxiety and depression in people with multiple sclerosis: A survey via the web portal of the UK MS Register. PLoS One,
7(7), Epub.
Malmiraara, A., Hakkinen, U., Aro, T., Heinrichs, M. J., Koskenniemi, L.,
Kuosma, E., . . . Hernberg, S. (1995). The treatment of acute lowback pain: Bed rest, exercises or ordinary activity. New England
Journal of Medicine, 332, 351–355.
Norman, C., Bender, J. L., Macdonald, J., Dunn, M., Dunne, S.,
Siu, B., . . . Hunter, J. (2010). Questions that individuals with spinal
cord injury have regarding their chronic pain: A qualitative study.
Disability Rehabilitation, 32(2), 114–124.
OCEBM Levels of Evidence Working Group. (2011). The Oxford
2011 levels of evidence. Retrieved from http://www.cebm.net/index
.aspx?o=5653
Palisano, R. J. (2010). Practice knowledge: The forgotten aspect of
evidence-based practice. Physical & Occupational Therapy in Pediatrics,
30, 261–262.
28/10/16 2:30 pm
CHAPTER 1
Sackett, D., Rosenberg, W. M. C., Gray, J. A. M., Haynes, R. B.,
& Richardson, W. S. (1996). Evidence-based medicine: What it is
and what it isn’t. BMJ, 312, 71–72.
Sackett, D., Straus, S. E., & Richardson, W. S. (2000). How to practice and teach evidence-based medicine (2nd ed.). Edinburgh, Scotland:
Churchill Livingstone.
Sperle, P. A., Ottenbacher, K. J., Braun, S. L., Lane, S. J., & Nochajski,
S. (1997). Equivalence reliability of the Functional Independence
Measure for children (WeeFIM) administration methods. American
Journal of Occupational Therapy, 51, 35–41.
Stacey, D., Bennett, C. L., Barry, M. J., Col, N. F., Eden, K. B., HolmesRovner, M., . . . Thomson, R. (2011). Decision aids for people facing
health treatment or screening decisions. Cochrane Database of Systematic
Reviews, 10, 3.
Thieme, H., Merholz, J., Pohl, M., Behrens, J., & Cohle, C. (2012).
Mirror therapy for improving motor function after stroke. Cochrane
Database of Systematic Reviews, 14, 3.
Thomas, A., Saroyan, A., & Dauphinee, W. D. (2011). Evidence-based
practice: A review of theoretical assumptions and effectiveness of
4366_Ch01_001-020.indd 19
●
Evidence-Based Practice
19
teaching and assessment interventions in health professions. Advances
in Health Science Education, 16, 253–276.
Tomlin, G., & Borgetto, B. (2011). Research pyramid: A new evidencebased practice model for occupational therapy. American Journal of
Occupational Therapy, 65, 189–196.
van de Port, I. G., Wevers, L. E., Lindeman, E., & Kwakkel, G. (2012).
Effects of circuit training as an alternative to usual physiotherapy
after stroke: Randomized controlled trial. BMJ, 344, e2672.
Vatlin, H. (2002). “Drink at least eight glasses of water a day.” Really? Is there scientific evidence for “8 ⫻ 8”? American Journal of
Physiology: Regulatory, Integrative and Comparative Physiology, 283,
993–1004.
Vriend, J. L., Corkum, P. V., Moon, E. C., & Smith, I. M. (2011). Behavioral interventions for sleep problems in children with autism
disorders: Current findings and future directions. Journal of Pediatric
Psychology, 36, 1017–1029.
World Confederation of Physical Therapy (WCPT). (2011). Policy
statement: Evidence based practice. London, UK: WCPT. Retrieved
from www.wcpt.org/policy/ps-EBP
28/10/16 2:30 pm
4366_Ch01_001-020.indd 20
28/10/16 2:30 pm
“The search for truth takes you where the evidence leads you, even if, at first,
you don’t want to go there.”
—Bart D. Ehrman, Biblical scholar and author
2
Finding and Reading
Evidence
The First Steps in Evidence-Based Practice
CHAPTER OUTLINE
LEARNING OUTCOMES
Scholarly Publications
KEY TERMS
Impact Factor
INTRODUCTION
The Peer-Review Process
IDENTIFYING DATABASES
Research Funding Bias
PubMed
Publication Bias
Cumulative Index of Nursing and Allied Health
Literature
Duplicate Publication
Cochrane Database of Systematic Reviews
EMPLOYING SEARCH STRATEGIES
READING A RESEARCH ARTICLE
Title
Authorship
Selecting Key Words and Search Terms
Abstract
Combining Terms and Using Advanced
Search
Introduction
Using Limits and Filters
Expanding Your Search
ACCESSING THE EVIDENCE
The Research Librarian
Professional Organizations
DETERMINING THE CREDIBILITY OF A SOURCE
OF EVIDENCE
Websites
Methods
Results
Discussion
References
Acknowledgments
CRITICAL THINKING QUESTIONS
ANSWERS
REFERENCES
The Public Press/News Media
21
4366_Ch02_021-038.indd 21
27/10/16 3:33 pm
22
CHAPTER 2 ● Finding and Reading Evidence
LEARNING OUTCOMES
1. Identify the relevant databases for conducting a search to answer a particular research question.
2. Use search strategies to find relevant studies based on a research question.
3. Identify evidence-based practice resources that are available through your institution and professional
organizations.
4. Evaluate the credibility of a specific source of evidence.
KEY TERMS
Boolean operator
peer-review process
database
primary source
impact factor
publication bias
institutional animal care
and use committee
secondary source
institutional review
board
to determine what evidence to include; that is, the database will focus on an area of research, which determines
which journals, textbook chapters, newsletters, and so
on, will be catalogued. Once your topic of interest is
identified by your research question, you use this information to select a database. Exercise 2-1 guides you
through some selection considerations.
EXERCISE 2-1
Selecting Databases (LO1)
QUESTIONS
INTRODUCTION
Using Table 2-1, identify which database(s) you would most
likely search to find the given information.
I
1. PEDro ratings on randomized controlled trials examining interventions for people with Parkinson’s disease
n the past, to keep up with the latest evidence, you would
need to go to a library building and look through the card
catalog or skim through stacks of journals to find an article
of interest. Today, access to scientific literature is simpler
than ever, and search engines make the research process
faster and easier. Still, the amount of information at your
fingertips can be overwhelming. The challenge becomes
sorting through all of that information to find what you
need, and then being able to identify the most credible
sources.
Chapter 1 describes the importance of evidence to practice and how to write a research question. This chapter explains how to use your research question as a framework for
locating trustworthy evidence, and provides general guidelines for deciphering research articles. It serves only as an introduction to locating evidence, by providing basic information and tips; finding evidence takes skill and creativity that
are honed with practice and experience.
IDENTIFYING DATABASES
An electronic database is an organized collection of
digital data—a compilation of the evidence within a
searchable structure. Each database uses its own criteria
4366_Ch02_021-038.indd 22
2. Systematic reviews examining the efficacy of exercise
for depression
3. Identifying effective interventions to improve seat
time in elementary school-age children with ADHD
4. Early examples of textbooks in occupational therapy
5. Best approaches for rehabilitation of the knee after
ACL surgery
27/10/16 3:33 pm
CHAPTER 2 ● Finding and Reading Evidence
Most health sciences databases are comprised primarily of peer-reviewed journal articles. There is often overlap of journals across databases, but what may be found in
one database may not be available in another. For this reason it is helpful to know the primary databases that contain content pertinent to rehabilitation research. Three
23
databases—PubMed, CINAHL, and the Cochrane Database of Systematic Reviews—are described in detail in this
chapter. Table 2-1 provides a more comprehensive list of
databases used by health-care professionals, including information about accessing them. If a database is not free to
the public, it may be available through your school’s library.
TABLE 2-1 Health-Care Databases
Access Free
of Charge?
Database
Content Focus
Clinicaltrials.gov
A registry of publicly and privately funded clinical trials.
Provides information on studies that are currently underway. May include results if study has been completed.
Yes
Cochrane Library
Database of systematic reviews and registry of clinical
trials.
Yes
Educational Resources
Information Center
(ERIC)
Includes articles and book chapters of particular interest
to school-based practice.
Yes
Google Scholar
A wide-ranging database of peer-reviewed literature.
Does not include the extensive limits and functions of
most scientific databases.
Yes
Health and Psychosocial Instruments
(HAPI)
Provides articles about assessment tools.
No
Medline
Comprehensive database of peer-reviewed medical
literature. Medline is included in PubMed.
Yes
National Rehabilitation Information
Center (NARIC)
Includes publications from NARIC and other articles
with a focus on rehabilitation.
Yes
OT Search
American Occupational Therapy Association’s
comprehensive bibliography of literature relevant to
the profession.
No
OTseeker
Abstracts of systematic reviews and randomized
controlled trials relevant to occupational therapy.
Includes PEDro ratings.
Yes
PEDro
Physical therapy database with abstracts of systematic
reviews, randomized controlled trials, and evidence-based
clinical practice guidelines with ratings.
Yes
Continued
4366_Ch02_021-038.indd 23
27/10/16 3:33 pm
24
CHAPTER 2 ● Finding and Reading Evidence
TABLE 2-1 Health-Care Databases (continued)
Access Free
of Charge?
Database
Content Focus
PsycINFO
American Psychological Association’s abstracts of journal
articles, book chapters, and dissertations. Useful for
research on psychiatric and behavioral conditions.
No
PubMed
The National Library of Medicine’s comprehensive
database of peer-reviewed literature. Includes Medline
and other resources.
Yes
PubMed Central
A subset of PubMed’s database of all free full-text articles.
Yes
SpeechBite
An evidence-based database of literature related to speech
therapy. Includes PEDro ratings.
Yes
SPORTDiscus
Full-text articles relevant to sports medicine, exercise
physiology, and recreation.
No
Although you are likely familiar with Google Scholar,
this is not the optimum database to search for professional
literature. Google Scholar does not provide many of the
features important for evidence-based practice, such as
the ability to limit your search to specific research designs. When searching Google Scholar, the “most relevant results” are highly influenced by the number of times
a particular study was cited. Hence, you are less likely to
find newly published studies. Also, the most frequently
cited studies are not necessarily the studies with the strongest evidence; in fact, in an effort to be comprehensive,
Google Scholar even includes journals with questionable
reputations (Beall, 2014).
PubMed
PubMed is the electronic version of the U.S. National
Library of Medicine. PubMed offers free access and includes Medline, a database comprised of thousands of
health-care journals, textbooks, and other collections of
evidence such as the Cochrane Database of Systematic
Reviews. PubMed is the most comprehensive medical
database. All journals included in PubMed are peer
reviewed. The abstracts of journal articles are available
on PubMed and in some cases the full text of an article
is also free. PubMed Central is a subset of PubMed that
includes the full text of all articles.
PubMed uses a system titled MeSH® (Medical Subject Headings), which is a set of terms or descriptors
hierarchically arranged in a tree structure. When you
4366_Ch02_021-038.indd 24
search PubMed using a MeSH term, PubMed will automatically search for synonyms and related terms. To
determine the appropriate MeSH term to use, search
the MeSH browser at http://www.nlm.nih.gov/mesh/
MBrowser.html.
For example, when searching orthotic devices, the
structure is as follows:
Equipment and Supplies
Surgical Equipment
Orthopedic Equipment
Artificial Limbs
Canes
Crutches
Orthotic Devices
Athletic Tape
Braces
Foot Orthoses
If you would like to broaden the search, use the term
“orthotic equipment,” which is higher in the tree, or narrow the search by using the term “athletic tape.” In rehabilitation research, the specific intervention or term you
wish to examine frequently is not included because it is not
a MeSH term. In this case you can still enter the term as a
key word by selecting a term that you would expect to find
in the article title or abstract of the article. For example,
“kinesio tape” is not a MeSH term; however, entering it
as a key word brings up many relevant studies. Searching
MeSH terms and key words is further discussed later in
this section.
27/10/16 3:33 pm
CHAPTER 2 ● Finding and Reading Evidence
The home page of PubMed is shown in Figure 2-1.
You would enter the selected MeSH term, or a key word
that you would expect to find in the article title or abstract, in the box at the top of the page.
Cumulative Index of Nursing
and Allied Health Literature
As its name indicates, the focus of the Cumulative Index
of Nursing and Allied Health Literature (CINAHL) is on
journals specific to nursing and allied health disciplines. It
is useful to search in CINAHL because some journals in
the occupational therapy, physical therapy, and speech and
language professions that are not referenced in PubMed
are included in CINAHL, such as the British Journal of
Occupational Therapy, the Journal of Communication Disorders, and the European Journal of Physiotherapy. In addition,
CINAHL includes a broader range of publications, such
as newsletters and magazines like OT Practice, the Mayo
Clinic Health Letter, and PT in Motion. Due to the broader
range of publications, it is important to recognize that not
all articles in CINAHL are peer reviewed. CINAHL does
not use the MeSH system, instead using its own system of
subject headings. The home page of CINAHL is shown
in Figure 2-2. You can click “CINAHL Headings” at the
top of the page to find terms used in CINAHL.
Cochrane Database of Systematic Reviews
The Cochrane Database of Systematic Reviews is part of
the Cochrane Library. The systematic reviews contained
in this database are conducted by members of the Cochrane Collaboration using rigorous methodology. As
of 2015, the database included more than 5,000 reviews
that are regularly updated. It includes one database of
interventions and another for diagnostic test accuracy;
therefore, the reviews are designed to answer efficacy
25
and diagnostic questions. However, the Cochrane
Database is not an optimal source for descriptive,
relationship, or qualitative research. The database is
available free of charge, although access to the actual
reviews requires a subscription. Most medical and university libraries hold subscriptions. Figure 2-3 shows
the home page for the Cochrane Database of Systematic
Reviews. You can go to the “Browse the Handbook” link
to enter the names of interventions or conditions you
would like to search.
EMPLOYING SEARCH STRATEGIES
After selecting a database, the next step is to enter the
terms or keywords into the database; that is, you enter a
MeSH term, a CINAHL subject heading, or a key word
you expect to find in the article title or abstract. Your
initial search may not elicit the relevant studies you are
looking for, in which case you need to apply additional
strategies to locate the studies that are most applicable to
your research question.
Although the Internet has made evidence more available to students and practitioners, locating and sorting
through the evidence found there can be challenging.
For example, a search on PubMed using the MeSH
term “dementia” resulted in 142,151 hits on the day
this sentence was written, and that number will only
increase. Sifting through that much evidence to find the
information most relevant to your question would be
overwhelming, but familiarity with the databases and
search strategies makes the process much more efficient
(although this chapter can only include basic guidelines). Most databases include online tutorials, which
can be extremely useful for improving your search skills.
This section describes search strategies, using the
PubMed database in the examples. The search process
may vary with other databases, but if you are familiar
This is where you
enter the MeSH
term or key word.
FIGURE 2-1 Home page for
PubMed. (Copyright National
Library of Medicine.)
4366_Ch02_021-038.indd 25
27/10/16 3:33 pm
26
CHAPTER 2 ● Finding and Reading Evidence
This is where you find
the CINAHL terms
(comparable to MeSH terms).
CINAHL makes it easy to
enter multiple terms from
a variety of fields.
FIGURE 2-2 Home page for the Cumulative Index of Nursing and Allied Health Literature (CINAHL). (Copyright © 2015 EBSCO Industries, Inc. All rights
reserved.)
This is where you would enter
search terms to find systematic
reviews at the Cochrane Library.
FIGURE 2-3 Home page for the Cochrane Database of Systematic
Reviews. (Copyright © 2015—The Cochrane Collaboration.)
with the process of searching on PubMed, you will be
able to apply the same or similar search strategies to
other databases.
Selecting Key Words and Search Terms
Your research question is the best place to start when beginning your search. Identify the key words used in your
question and enter them into the search box. Think about
which words are most important. Let’s take a question
from Chapter 1 to use as an example: “In infants, what is
4366_Ch02_021-038.indd 26
the efficacy of swaddling (versus no swaddling) for reducing crying?” The word that stands out is swaddling. Swaddling is not listed in the MeSH system; however, if you
enter swaddling as a common word, you can expect to find
it in a relevant article title or abstract. PubMed provides
you with several common combinations. For example,
after entering swaddling into the search box, you get the
term “crying swaddling” as an option. If you click on this
term, the result is a very manageable 18 articles (Fig. 2-4).
A more complex search ensues with a different question:
“What predictors are associated with successful return to
employment for individuals with back injuries?” If you search
back injuries as a MeSH term, you find that back injuries is
indeed a MeSH term. However, if you enter the MeSH term
back injuries into PubMed, the result is more than 25,000
possibilities, and a quick review of the titles reveals that
many articles appear to be irrelevant to the question.
Combining Terms and Using
Advanced Search
A useful skill for expanding or narrowing your search involves combining terms. A simple way to combine terms
is to use the Advanced Search option in PubMed. This
allows you to use the Boolean operators (the words used
in a database search for relating key terms) AND, OR, and
NOT. You can use AND when you want to find articles
27/10/16 3:33 pm
CHAPTER 2 ● Finding and Reading Evidence
27
Combination of terms provided
by PubMed: “crying swaddling.”
FIGURE 2-4 After you enter a
term, PubMed will provide you with
common combinations, thereby
narrowing down the number of
articles. (Copyright National Library of
Medicine.)
that use both terms. Using OR will broaden your search
and identify articles that use either term; this may be useful
when different terms are used to describe a similar concept. For example, if you are looking for studies on kinesio
taping, you might also search for articles that use the term
strapping. The NOT operator eliminates articles that use
the identified term. Perhaps you are interested in kinesio
taping for the knee, but NOT the ankle. The All Fields
option in Advanced Search means that the terms you enter
can fall anywhere in the article’s title or abstract. However,
you can limit the field to options such as the title or author
or journal.
Returning to the original example, enter back injuries
AND employment in the Advanced Search option (Fig. 2-5).
The outcome of this search is a more manageable
330 articles (Fig. 2-6).
Using Limits and Filters
Another search strategy involves using limits or filters. In
the search box in Figure 2-6, several options for limiting
your search by using filters are listed on the left side of
the screen. You can use filters that will limit your search
by type of study (e.g., clinical trials or review articles),
availability of the full text, publication date, species, language of the article, age of the participants, and more. For
example, you can limit your search to human studies and
exclude all animal studies, limit your search to infants, or
limit your search to studies published during the past year.
The articles listed in a search are provided in order of
publication date. Another useful strategy is to click the
“sort by relevance” feature on the right side of the screen
(Fig. 2-6). This search now provides articles that are likely
to be more closely related to the specific question (Fig. 2-7).
You can combine terms in the
Advanced Search function of PubMed.
FIGURE 2-5 Search using the
Advanced Search option with the
entry back injuries AND employment.
(Copyright National Library of Medicine.)
4366_Ch02_021-038.indd 27
27/10/16 3:34 pm
28
CHAPTER 2 ● Finding and Reading Evidence
In this example the number of
articles is limited by using the
Boolean operator AND.
Examples of filters,
and you can add
even more filters here.
FIGURE 2-6 Results from the search in Figure 2-5 and list of filters (left of screen). (Copyright National Library of Medicine.)
Sorting by relevance instead of
date may make it easier to find
what you are looking for.
FIGURE 2-7 The “sort by relevance” feature can be used to locate articles more pertinent to your question. (Copyright National
Library of Medicine.)
4366_Ch02_021-038.indd 28
27/10/16 3:34 pm
CHAPTER 2 ● Finding and Reading Evidence
29
A quick scan of the initial results in Figures 2-6 and 2-7
suggests this to be true.
the best evidence. Box 2-1 supplies a summary of tips for
searching the evidence.
Expanding Your Search
ACCESSING THE EVIDENCE
When conducting a search, balance the likelihood of
finding what seems to be too many studies to review
against the possibility of missing something relevant.
When a search results in a small number of articles, consider using additional strategies to locate studies that
are more difficult to find but are specific to the research
question. One strategy is to draw on a major study as a
starting point. Examples of major studies are ones that
include a large number of participants, a review paper
that includes multiple studies, or a study published in a
major journal such as the Journal of the American Medical
Association (JAMA).
When you select a particular study, you will get the
full abstract and a screen on the right side that lists other
studies in which this study was cited and a list of related
citations (Fig. 2-8). Review these citations to determine
if any of the other studies are relevant to your research.
Important studies are cited frequently and can lead you to
similar work.
Another strategy for expanding your search is to use
the reference lists of relevant studies to find other studies
of interest. This approach can be time-consuming, but it
helps ensure that you do not miss any important information when conducting a comprehensive review to locate
Looking at the results of a search (as in Fig. 2-7), some
articles include a “Free PMC Article” link. Simply click
on the link to obtain access to the article. If this link is
not available, click on the article title to get the abstract
of the study. In Figure 2-8, there are links in the upper
right corner for accessing the full text of the study. The
links take you to the publisher and in some cases access
through your library. Sometimes it is possible to access
an article for free after clicking on the link. In other
cases, you will need to pay a fee to the publisher. As
a student, you will typically have access to a university
library with substantial access to online subscriptions.
Generally, if the library does not have a subscription to
the title, it can still access the article through interlibrary
loan, which may or may not require payment of a fee.
As a practicing therapist, you may not have access to
a medical library; however, there may be other ways to
access the full text of articles without paying a fee. Some
university alumni organizations offer library access to
former students who are members of the alumni club
or to fieldwork sites that take students. Some articles
are available to the general public and are identified as
full free text. There is no easy way to know which articles will be free, as many factors are involved. Specific
The “similar articles” and “cited by”
features can be helpful in
finding other relevant studies.
FIGURE 2-8 Using a “related
citations” function to expand a search.
(Copyright National Library of Medicine.)
4366_Ch02_021-038.indd 29
27/10/16 3:34 pm
30
CHAPTER 2 ● Finding and Reading Evidence
BOX 21 Tips for Searching the Evidence
• Use your research question as a starting point to
identify search terms.
• Use the Boolean operators (AND, OR, NOT) to
limit or expand your search.
• Use MeSH terms or subject headings when
applicable, but remember that not all topics will
be included as MeSH terms.
• Remember to use filters and limits to narrow
your search.
• Recognize that finding one relevant study can
help you identify other relevant studies by using
the related citations or reference lists.
• Become a member of professional organizations
that provide access to many full-text articles.
• Use the tutorials associated with databases to
improve your search skills.
• Take advantage of the expertise of your research
librarian.
journals may provide limited access, such as to articles
published in the most recent edition; conversely, journals may provide access to all articles except those published in the previous year.
The National Institutes of Health Public Access
Policy (NIH, n.d.) now requires that all studies that
receive funding from the NIH must be accessible to all
individuals through PubMed no later than 12 months
after the official date of publication. This policy took
effect in April of 2008, so many more studies are now
available.
Research librarians can be helpful in both finding and
accessing articles. In addition, professional organizations
offer another source of articles. More information on
using the research librarian or professional organization
is provided later in this chapter.
EXERCISE 2-2
Using Search Strategies to Locate
the Evidence (LO2)
Write a research question and then follow the
instructions below and answer the associated
questions.
Your research question:
4366_Ch02_021-038.indd 30
QUESTIONS
Enter the relevant key word(s), MeSH terms, or CINAHL
subject headings into a PubMed search.
1. What words did you enter, and how many results did
you obtain?
Now go to the Advanced Search option and practice using
the Boolean operators AND, OR, and NOT.
2. How did this change your search? Did it result in a
reasonable number of articles to search through? Do
you think you are missing some of the evidence?
Practice using other strategies, such as searching for
a particular author or journal, or limiting the search to
clinical trials or review articles, a particular age group, or
human studies.
3. What strategies were most helpful?
The Research Librarian
If you have access to a medical library, the research
librarian can be an invaluable resource. Research librarians are specifically trained to help people find the
resources they are looking for. You can go to the research librarian with your research question, and he or
she can help you conduct a search and access the material. Medical research librarians typically have a master’s
degree and in-depth training in finding research evidence. Although this chapter gives you the basic tools
to use in searching for evidence, and practice and experience will improve your skills, research librarians are
experts who can help you when you are finding a search
difficult and need to ensure that you have located all of
the available articles.
In most states, there is a medical library supported with
public funds. You can also contact this library to determine if the research librarian can assist you in a search.
The librarian at your public library may also be able to
offer assistance with a search. Some large health-care institutions also have a medical library and librarian.
27/10/16 3:34 pm
CHAPTER 2 ● Finding and Reading Evidence
EXERCISE 2-3
Learning About the Resources
at Your Institution (LO3)
31
2. Learn how to use the interlibrary loan system at your
institution. When is a cost associated with obtaining
full-text articles using interlibrary loan? What is
the cost?
1. Access your electronic library resources and, using the
following table, identify the databases that are available through your institution.
Database
Clinicaltrials.gov
Cochrane Library
Educational Resources
Information Center (ERIC)
Google Scholar
Health and Psychosocial
Instruments (HAPI)
Medline
National Rehabilitation
Information Center
(NARIC)
OT Search
OTseeker
PEDro
Place a √ If
Available
3. Use the advanced search option of PubMed or
CINAHL to search for the journals listed here. Select
a study to determine whether or not the full text is
available through your institution. Note by circling
yes or no.
Archives of Physical Medicine
and Rehabilitation
International Journal of Therapy
and Rehabilitation
Audiology Research
YES
NO
YES
NO
YES
NO
Professional Organizations
As mentioned in Chapter 1, the American Occupational
Therapy Association, the American Physical Therapy Association, and the American Speech-Language-Hearing
Association are associations that are dedicated to increasing evidence-based practice among their membership.
Consequently, these organizations provide many resources for evidence-based practice, including access to
relevant journals. Members of the American Occupational
Therapy Association have access to all full-text articles
published in the American Journal of Occupational Therapy, as well as articles in the British Journal of Occupational
Therapy and the Canadian Journal of Occupational Therapy.
Members of the American Speech-Language-Hearing
Association have access to articles in the American Journal
of Audiology, the American Journal of Speech-Language Pathology, Contemporary Issues in Communication Sciences and
Disorders, and the Journal of Speech, Language, and Hearing Research. Members of the American Physical Therapy
Association have access to Physical Therapy Journal and
PT Now. Membership in professional associations assists
you in functioning more efficiently as an evidence-based
practitioner.
PsycINFO
PubMed
PubMed Central
SpeechBite
SPORTDiscus
4366_Ch02_021-038.indd 31
DETERMINING THE CREDIBILITY
OF A SOURCE OF EVIDENCE
One important consideration in evaluating evidence is
the credibility of the source. The source of research may
be a journal article, professional report, website, the popular press, and/or other sources. The primary source
of information is the most reliable, as it has not been
interpreted or summarized by others. Primary sources
27/10/16 3:34 pm
32
CHAPTER 2 ● Finding and Reading Evidence
can include original research studies and professional and
governmental reports that are based on original research
or data collection.
Secondary sources are documents or publications
that interpret or summarize a primary source. They are
one step removed and include someone else’s thinking.
Websites are typically secondary sources, although they
may include original work. The scholarly nature of
websites varies immensely. Stories in the news media
are secondary sources. When you hear about a study
on the radio or television, or read about it in a newspaper, you cannot assume that you are getting full and
completely accurate information. In such cases, it is
wise to access the original study published in a scholarly journal. As an informed and critical evidence-based
practitioner, you can read the original publication and
evaluate the evidence for yourself. In fact, even work
published in a highly respected scholarly journal should
be read with a skeptic’s eye. The following discussion
provides some tips for appraising the credibility of a
source of evidence.
Websites
Websites providing health-care information are very popular with health-care providers and the general public;
however, much of the information on the Internet is inaccurate. Medline, a service of the U.S. National Library
of Medicine, provides a guide for evaluating the quality of
health-care information on the Internet, the MedlinePlus
Guide to Healthy Web Surfing (Medline, 2012). Box 2-2
outlines the key points of this document. Much of the
BOX 22 Highlights of the MedlinePlus Guide
to Healthy Web Surfing
information provided in the Medline guide is relevant to
print sources as well. Key factors to consider when evaluating information on a website are the source of the information, timeliness of the information, and review process
of the website.
Of primary interest is the provider of the information. Respectable sources, such as professional organizations like AOTA, APTA, and ASHA, and federal
government sites such as the Centers for Disease
Control and the National Institutes of Health, are
recognized authorities—as opposed to individuals or
businesses that have websites to promote a product
or service. A credible website will clearly describe its
mission and provide information about the individuals involved in the organization, such as who makes
up the board of directors and the affiliations of those
individuals. If you find that the website is linked to a
corporation with financial ties to the information, and
particularly when the website is selling a product, you
should be concerned about bias.
Sites that rely on testimonials and individual reports
are less reliable than sites that utilize research evidence.
The research should be cited so that the reader can access
the original work. When extraordinary claims are made,
be wary and compare the information with information
from other credible sources. Sites that have an editorial
board and a review policy/process that utilizes experts in
the field are more credible. The approval process for information on a website can often be found in the “About
Us” section of the website. Information posted there
should be up to date.
In summary, when using information from the Internet, look for sources that cite research from peer-reviewed
journals, have up-to-date information, and describe a review process for posting content.
The Public Press/News Media
• Consider the source: Use recognized authorities
• Focus on quality: Not all websites are created
equal
• Be a cyberskeptic: Quackery abounds on the
Internet/Web
• Look for the evidence: Rely on medical research,
not opinion
• Check for currency: Look for the latest
information
• Beware of bias: Examine the purpose of the
website
• Protect your privacy: Ensure that health information is kept confidential
Adapted from: Medline. (2012). MedlinePlus guide to healthy web
surfing. Retrieved from http://www.nlm.nih.gov/medlineplus/
healthywebsurfing.html
4366_Ch02_021-038.indd 32
Stories about health-care research are very common in
the news media. These stories are often based on press
releases provided by scholarly publications or institutions that alert health-care providers and the lay public
to newly released research. It is possible that your clients
and their families will approach you with information
they obtained from the news media. The news media is
a valuable source for sharing important health-care evidence with the public. However, the coverage of healthcare research in the media tends to overemphasize lower
levels of evidence and report findings that are more favorable than the actual study results reveal (Yavchitz et al,
2012). As indicated in Chapter 1, replication of research
findings is essential to establish a finding. However,
much media reporting emphasizes only initial findings,
and gives little follow-up as the evidence develops or matures over time.
27/10/16 3:34 pm
CHAPTER 2 ● Finding and Reading Evidence
33
EVIDENCE IN THE REAL WORLD
Clients’ Use of Evidence
Just as you have greater access than ever to health-care research, so too does your client. Don’t be surprised when
a client asks about a specific intervention or treatment based on a news story or Internet search. It is important
to listen to your client respectfully and with an open mind. Based on your own familiarity with the topic, you
may need to do your own search before offering an opinion. Once you have gathered information, engage in
a discussion on the topic. Educate your client as needed, taking care not to disregard or discount the client’s
opinions. This process of making collaborative decisions, known as shared decision-making, is explained in greater
detail in Chapter 11.
Scholarly Publications
Scholarly journals provide the highest level of rigor because the publication process includes protections to
enhance the credibility of published research. Scholarly
publications can come from professional organizations,
such as AOTA, which publishes the American Journal of
Occupational Therapy; the APTA, which publishes Physical Therapy; and the ASHA, which publishes the American
Journal of Speech-Language Pathology; interdisciplinary
organizations that focus on a particular aspect of health
care (e.g., the Psychiatric Rehabilitation Association,
which publishes the Psychiatric Rehabilitation Journal);
and publishers that specialize in medical research (e.g.,
Lippincott, Williams & Wilkins, which publishes Spine).
Scholarly journals often adopt guidelines, such as the
guidelines for scholarly medical journals proposed by
the International Committee of Medical Journal Editors
(ICMJE) (ICMJE, 2014) or Enhancing the Quality and
Transparency of Health Research (EQUATOR, n.d.), to
ensure high-quality reporting.
Impact Factor
One measure of the importance of a journal is its impact
factor. The impact factor is based on the number of
times articles in that particular journal are cited in other
articles (Garfield, 2006). For example, in 2012 the impact score for the highly respected Journal of the American
Medical Association (JAMA) was 29.9; the Archives of Physical Medicine and Rehabilitation had an impact score of 2.35;
Physical Therapy’s score was 2.77; the American Journal of
Occupational Therapy had a score of 1.47; and the Journal
of Speech Language and Hearing Research was 1.97 (Journal
Citation Reports, 2012). The score is calculated by using
the number of citations for the two previous years. For example, the Physical Therapy score indicates that the articles
published during 2010 and 2011 were cited an average
of 2.77 times. These numbers reflect only one aspect of
a journal’s quality and have been criticized for favoring
journals that include many self-citations, long articles,
4366_Ch02_021-038.indd 33
and review articles, as well as journals that report on hot
topics in research (Seglen, 1997).
The Peer-Review Process
The peer-review process is used by scholarly journals to ensure quality. It involves a critical appraisal of
the quality of the study by experts in the field. Peerreviewed journals have strict guidelines for submission
and review. The peer reviewers and journal editors determine whether the study meets the quality standards
of the journal. In a research article, a scientific method
of reporting is followed by detailed methods and results.
References and citations that follow a specific style manual are required.
It is important to recognize that not all peer-reviewed
journals have the same mission and requirements for
rigor. The peer-review process provides a level of quality control, but should not be regarded as a guarantee
of credibility or accuracy. Some journals may include a
peer-review process, but set a low bar for acceptance.
Even journals with rigorous standards cannot guarantee
that the research reported in the article is completely
accurate. Errors may exist in the statistical analysis or
other sections of the study that go undetected. More
problematic is the possibility of fraudulent reporting,
in which the researcher intentionally misrepresents the
data. Failure to identify fraudulent reporting is perhaps
the greatest weakness of peer review. However, when
fraud is detected, it is the responsibility of the journal
to issue a retraction. One study found that the number
of retractions has increased sharply, and the majority of
retractions in medical and life sciences journals are due
to misconduct (Fang, Steen, & Casadevall, 2012). This
study found that more than 67% of retractions were due
to misconduct; fraud was the most common form of misconduct, followed by duplicate publication (described
later in this section) and then plagiarism. The number
of articles that have been retracted due to fraud has increased tenfold since 1975.
27/10/16 3:34 pm
34
CHAPTER 2 ● Finding and Reading Evidence
Research Funding Bias
Publication Bias
The funding of research has the potential to bias the reporting of outcomes. Much of the published research in
health care is funded by either a public or a private organization. Any sources of funding—and thus potential
conflict of interest by the authors—is a required disclosure of most peer-reviewed journals. The largest public
funding source is the United States government, which
funds health-care research through organizations such
as the National Institutes of Health and the Centers for
Disease Control and Prevention (CDC). Private charitable organizations, such as the Bill and Melinda Gates
Foundation, and professional organizations, such as the
American Occupational Therapy Foundation, can also
serve as funding sources for research. Generally speaking,
these organizations do not have a financial interest in the
outcomes of a particular study and fund research solely to
advance the sciences.
In contrast, private research funding that is linked to
the product under study can present a conflict of interest.
This does not mean that all industry-sponsored research
is biased, but a healthy skepticism is advised. For example, a Cochrane review (Lundh, Sismondo, Lexchin,
Busvio, & Bero, 2012) found that studies sponsored by
drug and medical device companies were more likely to
report favorable results than studies sponsored by other
sources. Weak study designs and misrepresentation of
the data do not appear to account for this difference, as
industry-sponsored research is often well designed and
reported. However, it has been suggested that industrysupported research is more likely to utilize the desired
product in the most ideal situation with a less favorable
comparison or control condition (e.g., a low dose of the
comparison drug; Bastian, 2006).
Most journals require authors to include a full disclosure of the author’s financial interests and relationship
with sponsors. Look for this information (often in the
acknowledgment section) when reading research studies and, when possible, compare the results of industrysponsored research with research conducted with a
less-biased sponsor.
Even without funding, a researcher may be biased by
an intervention or assessment that he or she developed.
It is commonplace for initial studies to be carried out by
the developer of the intervention or assessment. Not only
will these researchers have a vested interest in positive
outcomes, but they are also more likely to have expertise in administering the approach that would not be expected of the typical clinician. It will likely be apparent
from the methods section or the reference list whether
the researcher is also the developer of the intervention
or assessment. Evidence for a particular approach is
strengthened when research findings are repeated by individuals who are not the developers.
Journals are more likely to accept studies that indicate a
positive finding (e.g., in the case of an intervention study,
the intervention was effective). Simply put, positive findings are more interesting than negative findings. However, the result can be a publication bias toward positive
outcomes. Publication bias is the inclination for journals
to publish positive outcomes more frequently than negative outcomes. When collecting evidence on a single
topic, it is possible to conclude that an intervention is
more effective than it truly is because you do not have access to the studies that were not published due to negative
outcomes. This is particularly relevant when considering
compilations of studies such as systematic reviews. The
best reviews will make an effort to obtain unpublished
results by contacting researchers in the field or following
up on grants and study protocols. However, locating unpublished research is a challenging process that may still
not uncover all of the research conducted on a topic. Publication bias is discussed in greater detail in Chapter 10.
4366_Ch02_021-038.indd 34
Duplicate Publication
A research article should represent original work, unless
it is identified as a republication. Duplicate publication of
the same work in multiple journals is viewed as an unethical practice. It is not unusual for researchers to publish
several articles from a single research study; however, each
article should include unique data. Some researchers will
“stretch” the data to create as many articles as possible. If
you come upon studies with the same or similar authors
and a similar number of participants and procedures, it is
important to examine the results to determine if the same
findings are reported in both articles, or if the authors are
providing unique information.
EXERCISE 2-4
Evaluating the Credibility
of the Evidence (LO4)
There is a great deal of controversy surrounding the
issue of childhood vaccinations and their potential
link to autism. Consider the research question, “Do
childhood vaccinations increase the risk for developing autism?” Evaluate the following websites and
their information about autism risk and vaccines.
Consider the source and purpose of the website.
Think about the evidence that is used to support
the information from each source.
• http://www.cdc.gov/vaccinesafety/Concerns/
Autism/Index.html
27/10/16 3:34 pm
CHAPTER 2 ● Finding and Reading Evidence
• http://www.infowars.com/do-vaccines-causeautism/
• http://autismsciencefoundation.org/what-isautism/autism-and-vaccines/
• http://www.generationrescue.org/resources/
vaccination/
After evaluating the websites, also consider the
following systematic review: Gerber, J. S., & Offit,
P.A. (2009). Vaccines and autism: A tale of shifting
hypotheses. Clinical Infectious Diseases, 48, 456-461.
QUESTIONS
After reading the information, what conclusion do you
draw, and why? What recommendation would you make to
a parent trying to decide whether or not to vaccinate a child?
35
will be contact information in the form of addresses and
e-mail for the lead author.
Consider:
• Information about individual authors may not be meaningful to beginning readers, but once familiar with a
topic, you will begin to recognize key contributors.
Abstract
The abstract provides a summary of the entire paper.
Professional journals have a word limit that results in a
very brief synopsis; therefore, many details will be missing
from the abstract. However, the abstract is a useful place
to start to get oriented to the study and help determine if
you wish to read further.
Consider:
• Does the abstract provide enough information to determine if the article is relevant to the evidence-based
question? Read the full text if the article is relevant.
Introduction
READING A RESEARCH ARTICLE
Research articles follow a standard format that includes
the title, authorship, an abstract, introduction, methods,
results, discussion, references, and acknowledgment sections. When you are reading a study, it is helpful to know
what each section contains. Each section of the article is
described here, along with question prompts for evaluating
the general readability and meaningfulness of the article,
as well as key aspects of that section. Subsequent chapters
address how to critique the quality of the research itself.
Title
The title should reflect the content of the paper. A wellworded title will include key words that readers are likely
to use in a search.
Consider:
• Does the title accurately reflect the content of the
article?
Authorship
The authors of an article are listed in order of importance
in terms of their contribution to the work. The lead author
is the individual who contributed most to the writing of
the article. All listed authors of an article should make
a significant contribution to the writing of the article;
in fact, some journals require that the authors identify
their particular contribution. The author’s affiliation (i.e.,
where he or she works) will be listed, and typically there
4366_Ch02_021-038.indd 35
The introduction provides background information, explains why the study is important, and includes a literature
review of related articles. A good introduction provides a
strong justification of the need for the study. The introduction should include up-to-date references and seminal
or classic studies that may be older. The introduction typically ends with a purpose statement that explains the aims
of the study. This is followed by the research question or
hypothesis.
Consider:
• Is there enough information to understand why the
study is important?
• Are there adequate references to support the background?
• Will the research question identified in this study answer
your own evidence-based question?
Methods
The methods section describes the process the researcher(s) used when conducting the study. It should
provide enough detail to allow the study to be replicated.
The methods section is typically divided into subsections.
The first section describes the design of the study; for
example, the author may state that the study design was
a randomized controlled trial and go on to describe the
experimental and control conditions. Another subsection
describes the participants; typically this explains how the
participants were selected, the inclusion/exclusion criteria, and the human subjects consent process. Reporting
guidelines require authors to indicate the ethical board
that approved the research, such as the institutional
27/10/16 3:34 pm
36
CHAPTER 2 ● Finding and Reading Evidence
review board (IRB) for research involving human subjects research and the institutional animal care and use
committee (IACUC) for animal research (see Box 2-3).
Some articles describe the demographic characteristics
of the participants in the methods section, whereas others
include this information in the results. The methods section also includes a subsection describing the measures
used in the study and any other instruments or materials.
If it is an intervention study, the intervention itself will be
described in the methods section. The methods section
BOX 23 IRB and IACUC
Because the focus of this textbook is on evidencebased practice rather than the methods of conducting research, an extensive discussion of research
ethics is not practical. However, practitioners
should be aware of the importance of the institutional review board (IRB) for research involving
human subjects and the institutional animal care
and use committee (IACUC) for animal research.
Institutions such as universities, hospitals, and research centers that conduct health-care research
must have an IRB and IACUC in place to review
the research that takes place within their institution or is conducted by employees of the institution. These boards review the research before it
begins to ensure that the benefits of the research
outweigh the risks; that ethical principles are adhered to; in the case of human research, that participants are fully and adequately informed of the
research process before providing consent; and, in
the case of animal research, that the animals are
cared for and not subjected to unnecessary pain
or suffering. Most journals now require that the
researchers identify that an IRB or IACUC approved the project within the methods section of
the paper.
Although the IRB and IACUC provide safeguards, it is still possible for individual researchers to become involved in unethical practices,
such as coercing participants or misrepresenting the data. Institutions that receive federal
funding can be audited if unethical practices are
suspected; if misconduct is found, research at
that institution may be discontinued. In 2001,
the Federal Office of Human Research Protections suspended human research at Johns Hopkins after a healthy volunteer died in an asthma
study: The young woman suffered lung damage
and eventually died after inhaling a drug that was
intended to induce an asthma attack (“Family of
fatality,” 2001).
4366_Ch02_021-038.indd 36
typically ends with an explanation of the data analysis
procedures.
Consider:
• Can you ascertain the design of the study?
• Are the methods for selecting participants and assigning them to groups specified?
• If groups are compared, is the control or comparison
condition sufficiently described?
• Are the measures described, including information
about their reliability and validity?
• If an intervention study, is the intervention described
in enough detail to allow you to determine its applicability in a typical practice setting?
• Are the specific statistical analyses identified?
Results
The results section of a journal article describes the findings of the study. If the participants are not described in
the methods section, the results section will begin with an
accounting and description of the participants. In an intervention study that involves two or more groups, the demographics and baseline scores of the outcome measures will
be compared across the groups to determine if there are
any systematic differences in the characteristics of subjects
in each of the groups. In an intervention study, the researcher would like for there to be no differences between
the groups in the demographics or baseline measures because this introduces potential confounding factors to the
study. Additional information regarding confounding factors is presented in greater detail in Chapter 5.
After the participants are described, the results of the
study as they relate to the research question/hypothesis
are presented. Additional findings that may not have been
addressed by the original question may also be presented.
The results section will describe the findings in statistical
terms and often includes tables and graphs. The results are
presented factually without speculation; the latter is generally reserved for the discussion section. The results section
can be intimidating for individuals who are unfamiliar with
research design and statistics, and this textbook is intended
to decrease the level of intimidation. Although scholarly
work should be presented in an objective manner, the interpretation of the results in the discussion section will
be influenced by the author’s point of view. Sophisticated
evidence-based practitioners can read the results section
and draw their own conclusions about the findings.
Consider:
• Are the results presented in such a way that the research question is answered and, if relevant, the hypothesis is identified as supported or not supported?
• Are tables and figures used to help elucidate the
results?
• Are the results presented in an objective manner without obvious bias?
27/10/16 3:34 pm
CHAPTER 2 ● Finding and Reading Evidence
Discussion
The discussion section summarizes and explains the findings. It should begin by restating the primary aims of the
study and answering the research question or describing
how the results relate to the original hypothesis. In this
section the author includes an interpretation of the results
and may speculate as to why a particular finding was obtained. However, a good discussion section does not go
beyond the data. For example, if the results of an intervention study suggest a trend toward beneficial outcomes
of a particular intervention, yet there is no statistically
significant difference between the groups, it would be
premature and inappropriate to conclude that the intervention is the preferred approach. In addition, the discussion should include citations from the existing literature
to support the interpretation. The discussion should specify whether the results of this study are consistent with
previous research. It should also describe the limitations
of the study and include implications for practice.
Consider:
• Does the discussion help you understand the findings?
• Does the discussion accurately reflect the results (e.g.,
it does not overstate the findings)?
• Is there an accounting of the limitations of the study?
• Are references provided to help substantiate the conclusions of the author(s)?
• Does the author explain how the results are meaningful
and/or applicable to practice?
37
CRITICAL THINKING QUESTIONS
1. Why is a scientific database search more efficient
than a Google search when trying to locate research
evidence?
2. In what ways has technology made evidence-based practice more practical? How has it made evidence-based
practice more challenging?
3. If an initial database search results in too many articles to manage, what strategies can you use to pare the
number down to fewer articles that are more relevant
to your topic?
References
The end of the article will display a reference list including all of the studies cited in the article. Different journals
use different styles for references. Journals that use the
American Psychological Association’s style list the references in alphabetical order, whereas journals that follow
the American Medical Association’s style list references
in the order in which they appear in the article. Although
the same basic information is presented, the order and
format of reference citations may vary considerably depending on the publication manual followed.
Consider:
• Are the dates of references recent?
4. How does the peer-review process enhance the credibility of the evidence, and what are its limitations?
5. Many novice readers of the evidence skip over the
results section and go directly to the discussion of
a research article. Why should evidence-based practitioners read the results section in addition to the
discussion?
Acknowledgments
The acknowledgments section recognizes individuals who
contributed to the work of the research but are not authors of the paper. This section also identifies the funding
source of the research.
Consider:
• Does the acknowledgment section provide enough information to determine possible conflicts of interest
on the part of the researcher?
4366_Ch02_021-038.indd 37
27/10/16 3:34 pm
38
CHAPTER 2 ● Finding and Reading Evidence
ANSWERS
EXERCISE 2-1
1.
2.
3.
4.
5.
OTseeker, PEDro, Speechbite
Cochrane, PubMed, Psychlit
ERIC, PubMed
OT Search
SPORTDiscus, PubMed
EXERCISE 2-2
Every search will result in unique answers. The more
you practice, the more familiar you will become with the
databases and the most effective methods for locating
the evidence.
EXERCISE 2-3
The answers are dependent on your particular institution. Check your answers with those of your classmates
to determine if you agree.
EXERCISE 2-4
The websites supported by better known and more
respected organizations (e.g., the Centers for Disease Control and the Autism Science Foundation) also
provide links to scientifi c evidence supporting their
position that vaccinations do not cause autism. The
systematic review of multiple studies came to the same
conclusion.
4366_Ch02_021-038.indd 38
REFERENCES
Bastian, H. (2006). They would say that, wouldn’t they? A reader’s guide
to author and sponsor biases in clinical research. Journal of the Royal
Society of Medicine, 99, 611–614.
Beall, J. (2014). Google Scholar is filled with junk science. Scholarly
Open Access. Retrieved from https://scholarlyoa.com/2014/11/04/
google-scholar-is-filled-with-junk-science/
Enhancing the Quality and Transparency of Health Research (EQUATOR). (n.d.). Home page. Retrieved from http://www.equatornetwork.org/
Family of fatality in study settle with Johns Hopkins. (2001, October 12).
New York Times, p. A14. Retrieved from http://www.nytimes.com/2001/
10/12/us/family-of-fatality-in-study-settles-with-johns-hopkins.html
Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for
the majority of retracted scientific publications. Proceedings of the National
Academy of Sciences of the United States of America, 109, 17028–27033.
Garfield, E. (2006). The history and meaning of the journal impact
factor. Journal of the American Medical Association, 295, 90–93.
International Committee of Medical Journal Editors (ICMJE). (2014).
Recommendations for the conduct, reporting, editing and publication of scholarly work in medical journals. Retrieved from http://
www.icmje.org/icmje-recommendations.pdf
Journal Citation Reports. (2012). Home page. Retrieved from http://
thomsonreuters.com/journal-citation-reports/
Lundh, A., Sismondo, S., Lexchin, J., Busvio, O. A., & Bero, L. (2012).
Industry sponsorship and research outcome. Cochrane Database of Systematic Reviews, 2012(12). doi:10.1002/14651858.MR000033.pub2.
Medline. (2012). MedlinePlus guide to healthy web surfing. Retrieved from
http://www.nlm.nih.gov/medlineplus/healthywebsurfing.html
National Institutes of Health. (n.d.). NIH public access policy. Retrieved
from http://publicaccess.nih.gov/policy.htm
Seglen, P. O. (1997). Why the impact factor of a journal should not be
used for evaluating research. British Medical Journal, 314, 497.
Yavchitz, A., Boutron, I., Bafeta, A., Marroun, I., Charles, P., Mantz, J., &
Ravaud, P. (2012). Misrepresentation of randomized controlled trials
in press releases and news coverage: A cohort study. PLoS Medicine,
9(9), 1–10.
27/10/16 3:34 pm
3
“The plural of anecdote is not data.”
—Roger Brinner, economist
Research Methods
and Variables
Creating a Foundation
for Evaluating Research
CHAPTER OUTLINE
LEARNING OUTCOMES
HYPOTHESIS TESTING: TYPE I AND TYPE II ERRORS
KEY TERMS
VARIABLES
INTRODUCTION
Independent Variables
TYPES OF RESEARCH
Dependent Variables
Experimental Research
Control Variables
Nonexperimental Research
Extraneous Variables
Quantitative Research
CRITICAL THINKING QUESTIONS
Qualitative Research
ANSWERS
Cross-Sectional and Longitudinal Research
REFERENCES
Basic and Applied Research
LEARNING OUTCOMES
1. Categorize the major types of research.
2. Determine the most appropriate type of research method to answer a given research question.
3. Given a research abstract or study, classify the variables.
39
4366_Ch03_039-058.indd 39
28/10/16 1:47 pm
40
CHAPTER 3 ● Research Methods and Variables
KEY TERMS
applied research
basic research
categorical variable
continuous variable
control group
nonexperimental
research
nonrandomized
controlled trial
observational studies
with a control group that does not receive treatment. Or perhaps it has been determined that it is unethical to withhold
treatment, so the control group receives “treatment as usual,”
which can mean many different things.
This chapter provides an overview of the major types of
research so that you can begin to make these distinctions.
It also introduces you to some important research concepts,
including hypothesis testing and variables. As you progress
through your research course, you will be able to distinguish
several research designs commonly used in rehabilitation
research.
control variable
pre-experimental
research
correlational studies
qualitative research
TYPES OF RESEARCH
cross- sectional research
quantitative research
dependent variable
quasi-experimental
study
In Chapter 1 you learned that there are different
types of research questions, and that different types of
research methods are used to answer different types
of questions. As an evidence-based practitioner, it is
important to know what type of research will best
address a particular practice question. Different types
of research include:
directional hypothesis
efficacy studies
experimental research
extraneous variable
factorial design
hypothesis
independent variable
intervention studies
longitudinal research
randomized controlled
trial
research design
third variable problem
translational research
true experiment
Type I error
Type II error
variable
nondirectional
hypothesis
•
•
•
•
•
•
•
•
Experimental
Nonexperimental
Quantitative
Qualitative
Cross-sectional
Longitudinal
Basic
Applied
As an evidence-based practitioner, it is important to
know what type of research will best address a particular
practice question.
Experimental Research
INTRODUCTION
T
he language of research can be intimidating to new
evidence-based practitioners. There are many ways in
which research can be categorized, and different people
often use different terminology. In addition, there are many
variations within each type of research. These variations within a research type may be considered a research design. A
research design is the more specific plan for how a study is
organized. For example, the terms true experiment and randomized controlled trial both refer to the research design, but
that may not be evident from the terminology.
A randomized controlled trial is a research design that
compares at least two groups, with participants randomly
assigned to a group. However, there are numerous variations
within this design. In one randomized controlled trial, there
may be a comparison of an intervention group with a control
group that does not receive treatment; another may compare
two different intervention approaches, and still another may
include three groups with a comparison of two interventions
4366_Ch03_039-058.indd 40
The terms research and experiment are often used interchangeably, yet experimental research is just one type
of research—a very important one for evidence-based
practitioners. Experimental research examines cause-andeffect relationships. In evidence-based practice, clinicians
want to know whether or not an intervention resulted in
a positive outcome for the client. These studies are also
described as efficacy studies or intervention studies; in
other words, they answer the research question, “Was the
intervention effective?”
For a causal relationship to be inferred, a specific
methodology must be followed. In a typical experiment,
participants are assigned to one of two groups, and the
groups are manipulated. One group receives the intervention of interest. The second group is the control group;
participants in the control group may or may not receive
the intervention. Control is the operative word here. By
controlling for alternate explanations, the researcher can
infer that differences between the intervention and control
group are due to (or caused by) the intervention. The
28/10/16 1:47 pm
CHAPTER 3 ● Research Methods and Variables
strengths and weaknesses of different control conditions
are discussed in Chapters 4 and 5.
Experimental studies may also be referred to as difference studies or group comparison studies because the
researcher seeks to determine if there is a difference
between two or more groups (e.g., an efficacy study
determines whether there is a difference between the
group that received the intervention of interest and the
group that did not).
From the Evidence 3-1 is an example of experimental research answering an efficacy question. In this study,
an intervention group receiving constraint-induced
movement therapy (CIMT) is compared with a control
group receiving traditional rehabilitation. The study is
designed to examine the efficacy of CIMT for improving
reaching, grasping, and functional movement. The researchers conclude that CIMT caused the improvements
that were discovered. At first glance, this abstract from
PubMed may be intimidating, but with a little practice
you can quickly recognize an experimental study when
you see one.
There are several reasons why the abstract in From the
Evidence 3-1 suggests an experimental design:
• It is an intervention study examining efficacy.
• A comparison is being made between groups.
• The purpose of the study is to determine if an intervention caused an improvement.
It is always important to read research with a critical
eye, which includes being careful about accepting the author’s language. In this case, because the study used an
experimental design, the author is indeed accurate in inferring that the intervention caused the improvement.
The CIMT study is not identified as an experimental
study, but as a randomized controlled trial. A randomized
controlled trial is the same as a true experiment. Like
a randomized controlled trial, with a true experiment at
least two groups are compared, the two groups are manipulated, and participants are randomly assigned to a group.
In experimental research there are true experiments and
quasi-experiments. Quasi-experimental studies are also
designed to answer cause-and-effect questions. The major
difference is that in a true experiment, participants are randomly assigned to groups, whereas in a quasi-experimental
study participants are not randomly assigned. The lack of
random assignment results in a decrease in the researcher’s
ability to draw cause-and-effect conclusions. The limitations of quasi-experimental studies are discussed in greater
detail in Chapters 5 and 6.
Pragmatic and ethical issues may arise in the decision
to use a quasi-experimental study. For example, studies of
school-aged children frequently compare two classrooms.
Perhaps a therapist is interested in studying the efficacy
of a new handwriting program. Because it is difficult to
randomly assign children in the same classroom to two
different approaches, one classroom receives the new
4366_Ch03_039-058.indd 41
41
handwriting program, and the second classroom acts as a
control group and receives traditional instruction.
In health-care research, the term nonrandomized controlled trial is often used; it can be equated with quasiexperiment. In this research design, two groups are compared, but the participants are not randomly assigned to
groups. Often the groups are pre-existing (e.g., clients in
two different clinics, classrooms, or housing units). In other
nonrandomized controlled trials, groups are comprised of
individuals who volunteer to receive the new intervention;
they are then compared with individuals who receive an
existing intervention. With all other factors being equal, a
randomized controlled trial (as compared with a nonrandomized trial) provides stronger evidence that the intervention caused the outcome. This means that, although
both types of experiments compare groups to determine
cause-and-effect relationships, you can have greater confidence in the results of a randomized controlled trial.
The primary limitation of a quasi-experimental study
is that the lack of randomization may bias the selection
process, such that there is a difference between the two
groups at the outset. Using the two-classroom example,
one teacher may be more effective and/or more experienced than the other teacher, so the outcomes may reflect the teachers’ abilities rather than the intervention.
In other quasi-experimental studies, volunteers for an intervention are identified first, and a control group is identified later. Individuals who volunteer for an intervention
are likely to be different from individuals who volunteer
to participate as controls. For example, the intervention
volunteers may be more motivated to change, have more
self-efficacy or belief that they can change, or generally
show more initiative toward following through with an
extensive intervention process. These personal characteristics may contribute to improvements outside of the
effects of the intervention.
When a single group is compared before and after an
intervention, the study is referred to as pre-experimental
research; in health-care research, this is typically referred
to as a pretest-posttest design. In this case, there is only
one group, nothing is manipulated (i.e., all participants
receive the same intervention), and there is no random
assignment. Therefore, it stands to reason that this design is much less powerful than a design with a control or
comparison group. The different types of experimental
designs are discussed in greater detail in Chapters 5 and 6.
In general, experimental research is intended to answer
practice questions that involve causation. It is one of the
most important types of research in evidence-based practice because it results in evidence-based decisions about
the efficacy of interventions.
Nonexperimental Research
In contrast to experimental research, nonexperimental research cannot determine causal relationships;
28/10/16 1:47 pm
FROM THE EVIDENCE 3-1
An Experimental Study That Answers an Efficacy Question
Lin, K.C., Wu, C.Y., Wei, T.H., Gung, C., Lee, C.Y., & Liu, J.S. (2007). Effects of modified constraint-induced movement therapy on reach-tograsp movements and functional performance after chronic stroke: A randomized controlled study. Clinical Rehabilitation, 12(21):
1075–1086. doi:10.1177/0269215507079843.
Note A: Hint: “Effects of”
suggests an efficacy study.
Note B: There are two groups,
indicating that a comparison is being made.
At first glance, this abstract from PubMed can be intimidating, but with a little practice you can quickly recognize an experimental
study when you see one.
Clin Rehabil. 2007 Dec;21(12):1075-86.
Effects of modified constraint-induced movement therapy on reach-to-grasp movements and functional performance
after chronic stroke: a randomized controlled study.
Lin KC, Wu CY, Wei TH, Lee CY, Liu JS.
Source: School of Occupational Therapy, College of Medicine, National Taiwan University and Department of Physical Medicine
and Rehabilitation, National Taiwan University Hospital, Taipei, Taiwan.
Abstract
OBJECTIVE: To evaluate changes in (1) motor control characteristics of the hemiparetic hand during the performance of a
functional reach-to-grasp task and (2) functional performance of daily activities in patients with stroke treated with modified
constraint-induced movement therapy.
DESIGN: Two-group randomized controlled trial with pretreatment and posttreatment measures.
SETTING: Rehabilitation clinics.
SUBJECTS: Thirty-two chronic stroke patients (21 men, 11 women; mean age = 57.9 years,
range = 43-81 years) 13-26 months (mean 16.3 months) after onset of a first-ever cerebrovascular accident.
INTERVENTION: Thirty-two patients were randomized to receive modified constraint-induced movement therapy (restraint of
the unaffected limb combined with intensive training of the affected limb) or traditional rehabilitation for three weeks.
MAIN MEASURES: Kinematic analysis was used to assess motor control characteristics as patients reached to grasp a
beverage can. Functional outcomes were evaluated using the Motor Activity Log and Functional Independence Measure.
RESULTS: There were moderate and significant effects of modified constraint-induced movement therapy on some aspects of
motor control of reach-to-grasp and on functional ability. The modified constraint-induced movement therapy group preplanned
reaching and grasping (P = 0.018) more efficiently and depended more on the feedforward control of reaching (P = 0.046) than
did the traditional rehabilitation group. The modified constraint-induced movement therapy group also showed significantly
improved functional performance on the Motor Activity Log (P < 0.0001) and the Functional Independence Measure (P = 0.016).
CONCLUSIONS: In addition to improving functional use of the affected arm and daily functioning, modified constraint-induced
movement therapy improved motor control strategy during goal-directed reaching, a possible mechanism for the improved
movement performance of stroke patients undergoing this therapy.
Note C: The authors conclude that the constraint-induced
therapy resulted in (caused) the improvements.
FTE 3-1 Question This study is described as a group comparison between the CIMT group and the traditional rehabilitation group. What specifically about these two groups is being compared?
4366_Ch03_039-058.indd 42
28/10/16 1:47 pm
CHAPTER 3 ● Research Methods and Variables
nevertheless, this type of research is essential for
evidence-based practice. Not all research questions are
causal in nature; nonexperimental research can answer
descriptive and relationship questions. Many different
methods can be used to collect data and information
in nonexperimental research. Common approaches
include surveys, observation of behavior, standardized
measures, and existing data from medical records.
Group comparison studies may be conducted to answer
descriptive questions when comparing existing groups.
For example, in health-care research, researchers are
often interested in the differences between people with
and without a particular condition. When making these
comparisons, it is not possible to arbitrarily or randomly
assign individuals to groups. For example, when comparing people with and without schizophrenia, the researcher
will recruit individuals with schizophrenia and compare
the schizophrenia group with a similar group of individuals without schizophrenia. The research may elucidate
how these groups differ in terms of particular cognitive
abilities, employment rates, and quality of life. In addition, there is no manipulation. For this reason these
studies are observational studies. In an observational
study, the naturally occurring circumstances are studied,
as opposed to assigning individuals to an intervention or
research condition.
From the Evidence 3-2 provides an example of a group
comparison study that is intended to answer a descriptive question. No intervention is included in this study;
instead, the researchers compare people with and without Parkinson’s disease. More specifically, they examine
whether there is a difference between the two groups in
gait when asked to walk while performing a cognitive task.
Nonexperimental studies can describe people with a
particular diagnosis, identify the incidence of a condition,
or predict an outcome. For example, in a nonexperimental study of youth with autism, 55% were employed, and
fewer than 35% had attended college six years after graduating from high school (Shattuck et al, 2012). This study
provides helpful information to the practitioner because
it identifies significant areas of need that can be addressed
by health-care providers. In another example, Metta et al
(2011) found that individuals with Parkinson’s disease
were more likely to experience fatigue in the later stages
of the disorder, or when they were also experiencing
depression, anxiety, or difficulty sleeping. In this study,
the cause of the fatigue is unclear, but health-care practitioners can use the information to predict when fatigue
may need to be addressed.
Dalvand et al (2012) found a strong relationship between gross motor function and intellectual function in
children with cerebral palsy. It is improbable that the
gross motor impairments cause the intellectual impairments, and more likely that the causal factor affecting
both intellect and motor skills is related to brain functioning. Yet knowing that this relationship exists has
4366_Ch03_039-058.indd 43
43
implications for understanding a client who has cerebral
palsy and developing appropriate intervention plans.
Some nonexperimental studies examine relationships.
These studies, which are also known as correlational
studies, seek to determine whether a relationship exists
between two constructs and, if so, assess the strength
of that relationship. For example, rehabilitation studies
often examine the relationship between a particular impairment, such as cognition or muscle strength, and activities of daily living. In correlational studies, the existence
of the third variable problem always presents a potential
alternative. In other words, the two constructs may indeed be related, but a third variable could account for that
relationship or influence the relationship.
The relationship between cold weather and flu season
is a case in point. It is often stated that getting a chill or
spending too much time outdoors in cold weather will
cause the flu. However, scientifically it is known that viruses are the culprit. Why the connection? The third or
fourth variable probably lies in one’s behavior during the
winter; that is, instead of being outside, it is the extra time
spent inside around and exposed to other sick people that
results in catching the flu.
Figure 3-1 illustrates the third variable problem in the
context of developing the flu. Other well-known relationships that are probably more complex than a simple oneto-one cause and effect include video games and violent
behavior, and lower weight and eating breakfast. Still, in
the media, everyday conversation, and even in the scientific literature, correlations are sometimes presented in
terms of causation. As an evidence-based practitioner, you
know that correlation does not mean causation.
In nonexperimental research, one cannot make conclusions related to causation; however, these studies answer
questions that cannot be answered with experimental research. For example, seeking information about incidence
or prevalence, identifying coexisting conditions, and predicting outcomes are practice issues for which nonexperimental research can provide the answers. Table 3-1
outlines the differences between experimental and nonexperimental research and some of the more specific designs
within these types. (See Chapter 8 for more details about
nonexperimental research.)
Quantitative Research
Research that uses statistics and describes outcomes in
terms of numbers is quantitative research; in fact, most
research is quantitative in nature. Quantitative research is
centered on testing a hypothesis. The researcher develops a hypothesis based on prior knowledge that informs
an idea or question to be answered. Consequently, the hypothesis is related to the research question, and the study
is designed to answer the question. The research study
is then designed to test the hypothesis, and the data collected will either support or fail to support the hypothesis.
28/10/16 1:47 pm
44
CHAPTER 3 ● Research Methods and Variables
FROM THE EVIDENCE 3-2
A Nonexperimental Group Comparison Study That Answers a Descriptive Question
Plotnick, M., Giladi, N., & Hausdorff, J. M. (2009). Bilateral coordination of gait and Parkinson’s disease: The effects of dual tasking.
Journal of Neurology & Neurosurgical Psychiatry, 80(3), 347–350. doi:10.1136/jnnp.2008.157362.
Note A: The question is not about
the effectiveness of an intervention,
but is descriptive in nature, asking
if gait is affected in people with
Parkinson’s disease when they
are engaged in a cognitive task.
Note B: Individuals with and
without Parkinson’s disease are
compared. Because of the condition,
participants cannot be randomly
assigned to groups.
The etiology of gait disturbances in Parkinson's disease (PD) is not fully understood.
Recently, it was shown that in patients with PD, bilateral coordination of gait is
impaired and that walking while being simultaneously engaged in a cognitive task is
detrimental to their gait. To assess whether cognitive function influences the bilateral
coordination of gait in PD, this study quantified left-right stepping coordination using
a phase coordination index (PCI) that evaluates both the variability and inaccuracy
of the left-right stepping phase (phi) generation (where the ideal phi value between
left and right stepping is 180 degrees).
This report calculated PCI values from data obtained from force sensitive insoles
embedded in subjects' shoes during 2 min of walking in a group of patients with PD
(n = 21) and in an age matched control group (n = 13).
All subjects walked under two walking conditions: usual walking and dual tasking
(DT) (ie, cognitive loading) condition. For patients with PD, PCI values were
significantly higher (ie, poorer coordination) during the DT walking condition
compared with usual walking (p < 0.001). In contrast, DT did not significantly affect
the PCI of the healthy controls (p = 0.29). PCI changes caused by DT were
significantly correlated with changes in gait variability but not with changes in gait
asymmetry that resulted from the DT condition. These changes were also associated with performance on a test of executive function. The present findings suggest
that in patients with PD, cognitive resources are used in order to maintain consistent
and accurate alternations in left-right stepping.
Note C: The findings suggest
that Parkinson’s disease requires
individuals to use cognitive
resources when walking and
that the person is compromised
when these cognitive resources
are overloaded.
FTE 3-2 Question Why must the researchers avoid drawing cause-and-effect conclusions from the results of a
study comparing individuals with and without Parkinson’s disease?
4366_Ch03_039-058.indd 44
28/10/16 1:47 pm
CHAPTER 3 ● Research Methods and Variables
45
People are more likely to develop the flu during colder months:
Exposure to cold
Develop the flu
However, there are intervening variables between cold and flu:
Exposure to
cold
Infected by
others inside
who have the flu
Spend more
time inside
Develop the flu
FIGURE 3-1 Cold and flu: An example of why practitioners need to be careful when interpreting relationships.
TABLE 31 Experimental and Nonexperimental Research
Type of Research
Type of
Question
Designs Within
a Type
Experimental
Efficacy
Nonexperimental
Descriptive
Relationship
4366_Ch03_039-058.indd 45
Features of Design
Other Terms
Randomized
controlled trial
At least two groups
Participants randomly
assigned
Groups are manipulated
True experiment
Nonrandomized
controlled trial
At least two groups
Participants are not randomly
assigned
Groups are manipulated
Quasi-experimental
study
Pretest/posttest
One group
All participants receive
the same intervention or
condition
Pre-experiment
Group
comparison
Two or more existing groups
Groups compared to identify
differences on one or more
characteristics
Observational
Incidence/
prevalence
The occurrence of one or
more characteristics is
calculated
Observational
Correlation
The relationship between
two or more constructs is
calculated
Observational
Predictive
Multiple predictors are considered in terms of their impact
on a particular outcome
Observational
28/10/16 1:47 pm
46
CHAPTER 3 ● Research Methods and Variables
Starting with a hypothesis strengthens the research design
and is much more effective than collecting data on a topic
and hoping to find something interesting in the data. It
is important to emphasize that the results of a study can
only support a hypothesis; the hypothesis is never proven.
With replication and similar results, our confidence that
a hypothesis is correct is bolstered. Similarly, a hypothesis
cannot be disproven, but it may be discarded for lack of
evidence to support it.
Just as with research types, hypotheses are described
using various classifications, such as directionality or
support. A hypothesis can be either directional or nondirectional. A directional hypothesis indicates that the
researcher has an assumption or belief in a particular outcome. A nondirectional hypothesis is exploratory, suggesting that the researcher does not have a prior notion
about what the study results may be, but may assume that
a difference or relationship exists. For example, in a correlational study a researcher may set out to examine the
relationship between being bilingual and particular cognitive abilities such as memory, problem-solving, cognitive
flexibility, and divided attention. The researcher may have
a directional hypothesis that bilingualism is associated
with greater cognitive flexibility and divided attention. In
contrast, the researcher may go into the study expecting
a relationship, but not speculating about the particular
associations (nondirectional hypothesis).
Generally speaking, it is preferable from a scientific
method perspective to begin a study with a directional hypothesis. A strong hypothesis is one that is clear and testable. When the researcher has an expected outcome, the
study can be better designed to collect data that will test
the hypothesis. A researcher who goes into a study blind
may be described as going on a “fishing expedition”; he or
she is likely to find something, but that something may be
difficult to explain. In addition, findings gathered from a
“shotgun approach,”—one in which the researcher hunts
for interesting findings among the data—are more likely
to represent a chance result that is difficult to replicate.
In a hypothesis-driven study, statistical analysis is used
to determine whether or not the hypothesis is supported.
However, not everything can be quantified, and numbers
tend to obscure the uniqueness and individuality of the
participants. A drawback of quantitative research is the
loss of the individual when data are combined in means,
standard deviations, and other statistics.
Qualitative Research
Qualitative research is an important type of research
for answering questions about meaning and experience.
Qualitative research provides a more personal and indepth perspective of the person or situation being studied
than quantitative research. In that way, qualitative research
is able to provide information that quantitative research
cannot. Although qualitative research is nonexperimental
4366_Ch03_039-058.indd 46
research, it is included as a separate type in this text because it is so different. In fact, qualitative research operates
from a unique and different paradigm or way of thinking.
Qualitative research utilizes inductive reasoning. Instead of beginning with a hypothesis and working down
to determine if the evidence supports the hypothesis (deductive reasoning), qualitative research moves from the
specific to the general (inductive reasoning). Very specific
information is collected from interviews, observations,
and the examination of documents or other artifacts, and
the qualitative researcher looks for patterns in the data.
These patterns are identified as themes and, in some qualitative research, a theory is developed out of the patterns
that have been discovered.
In qualitative research, the focus is on the experience of
the individual and, most importantly, from the perspective
of the individual. Qualitative research looks for meaning
in individual stories and experiences. Extensive data are
collected on a few individuals (sometimes only one). In
qualitative research, it is important to gather extensive
information so that the researcher has a thorough understanding of the phenomenon in question. There is an emphasis on discovery as opposed to confirmation. Once the
data are collected, the analysis, which may include photographs or diagrams, identifies recurring themes within
the data. Although the research methods vary, qualitative
research articles utilize the same format as quantitative
research, with introduction, methods, results, and discussion sections. Table 3-2 outlines the differences between
quantitative and qualitative research.
Qualitative research is a broad term that encompasses
several different designs, such as ethnography, grounded
theory, phenomenology, and participatory action research. These topics are described in greater detail in
Chapter 9. An example of a qualitative study is provided
in From the Evidence 3-3. This study looks at the
meaning of multiple sclerosis from the perspective of the
individual with the condition. In-depth information is
collected on a relatively small number of individuals, and
the results are presented as themes instead of being summarized in a numerical format. This abstract of a qualitative study, particularly the results section, looks very
different from most quantitative research. Some of the
unique features of qualitative research are identified in
this study, which delves into the lived experience of people with multiple sclerosis. Although qualitative research
is not intended to be generalizable to the larger population, this form of evidence can inform health-care practice. For example, after reading this study, practitioners
may become more sensitive to how the illness requires a
reformulation of self-concept.
It is important to distinguish qualitative research from
descriptive survey research, which is quantitative and
nonexperimental. A survey could be developed to answer
questions about the emotional, social, and practical impact
of multiple sclerosis. However, if numbers are used and
28/10/16 1:48 pm
CHAPTER 3 ● Research Methods and Variables
47
TABLE 32 Differences Between Quantitative and Qualitative Research
Category
Quantitative Research
Qualitative Research
Purpose
Tests theory and/or hypotheses; focus is
on confirmation
Builds theory and/or explores
phenomenon; focus is on discovery
Point of view
Outsider
Objective
Insider
Subjective
Reasoning
Deductive
Inductive
Data collection
Use of quantifiable, typically standardized measures with many participants
Interviews and observations of a few
individuals in their natural environments
Data analysis
Descriptive and inferential statistics
Identification of themes using text or
pictures
Evaluating the rigor
of the research
Reliability and validity: Are the data
accurate and consistent?
Trustworthiness: Are the data believable?
combined among the participants (e.g., the percentage of
respondents who report feeling depressed), the study becomes quantitative research. In qualitative research, openended questions are used, and participants are typically
interviewed multiple times. Many other methods, such as
focus groups, observations, and reviews of documents and
artifacts, are incorporated in qualitative research to ensure
that a deep understanding is reached. The results of qualitative research are presented narratively and can take the
form of stories and/or themes. Sometimes researchers use
a mixed-methods approach and combine qualitative and
quantitative methods.
Cross-Sectional and Longitudinal
Research
Research can also be categorized by the time period over
which data are collected. In cross-sectional research, data
are collected at a single point in time, whereas longitudinal research requires that data be collected over at least
two time points and typically covers an extended period
of time, such as several years or decades. Cross-sectional
studies use nonexperimental methods; they are observational in nature, meaning that the researcher does not
manipulate a situation (e.g., provide an intervention).
Descriptive and correlational studies frequently use
cross-sectional research. Longitudinal research is intended to examine the effect of time (such as development, aging, or recovery) on some phenomenon (such as
cognition, independent living, or language).
4366_Ch03_039-058.indd 47
Most longitudinal studies examine naturalistic changes,
making them observational. However, occasionally intervention studies examine the impact of an intervention
over an extended time period. Although there is no time
frame criterion for determining whether a study is longitudinal, a simple pretest-posttest intervention study is
not considered longitudinal. Therefore, most intervention studies are neither cross-sectional nor longitudinal.
Cross-sectional studies often compare different groups of
individuals at the same point in time, whereas longitudinal studies compare the same people over several time
points.
One advantage of cross-sectional studies is that they
are efficient because all of the data are collected at once,
or at least over a short period of time. The results of crosssectional studies are available quickly. Because longitudinal research examines changes over time, the studies are
more time consuming, and it is more likely that researchers will lose participants, which compromises follow-up.
The results of longitudinal studies come more slowly
than those of cross-sectional studies. Imagine studying
the same group of individuals for 30 years. Although the
results will be extremely valuable, such research requires
a lot of patience from researchers and participants.
When cross-sectional and longitudinal methods are
used to answer the same questions about the impact of
time on some phenomenon, longitudinal findings generally have greater credibility. In a cross-sectional study, individuals of different ages, developmental stages, or points
in recovery are compared, whereas in a longitudinal study
28/10/16 1:48 pm
48
CHAPTER 3 ● Research Methods and Variables
FROM THE EVIDENCE 3-3
A Qualitative Experience Examining the Lived Experience of Multiple Sclerosis
Mozo-Dutton, L., Simpson, J., & Boot, J. (2012). MS and me: Exploring the impact of multiple sclerosis on perceptions of self. Disability
and Rehabilitation, 34(14), 1208–1217. [Epub December 13, 2011]. doi:10.3109/09638288.2011.638032.
Note A: Twelve participants
are included in the study,
which is actually a fairly large
number for qualitative research.
Note B: The phenomenon
of self-perception is
analyzed thematically
with words.
Note C: The researchers
are “exploring” rather than
determining, confirming,
proving, etc.
Abstract
Purpose: The aim of this qualitative study was to explore the impact of multiple
sclerosis (MS) on perceptions of self as well as the emotional, social, and practical
implications of any self-reported changes.
Method: Twelve participants were interviewed and interpretative phenomenological
analysis used to analyze the data. Participants were recruited from a MS hospital
clinic in the north-west of England.
Results: Four themes were identified although for reasons of space and novelty
three were discussed, (i) 'my body didn't belong to me': the changing relationship to
body, (ii) 'I miss the way I feel about myself': the changing relationship to self, and
(iii) 'let's just try and live with it': incorporating yet separating MS from self.
Conclusions: The onset of MS was seen to impact upon self yet impact did not
necessarily equate with a loss of self but rather a changed self. Self-related
changes did, however, carry the potential to impact negatively upon a person's
mood and psychological functioning and consequently, clinicians are encouraged
to consider issues relating to self as standard.
FTE 3-3 Question Compare the abstract of the qualitative study to the abstracts in FTE 3-1 and FTE 3-2. Beyond the difference in reporting thematically as opposed to numerically, what other differences do you notice in this
qualitative study?
the same individuals are followed as they get older, develop, or recover, making it possible to determine if people
actually change over time.
Consider a study that examines cooking practices in the
home. A cross-sectional study may find that older adults
prepare more food at home than younger adults. However,
a longitudinal study that has followed individuals over 30
years finds that older adults actually cook at home less than
they did when they were younger and had children at home.
4366_Ch03_039-058.indd 48
A longitudinal study is better at providing evidence as to
how people change over time, whereas a cross-sectional
study identifies differences in people at one point in time.
Basic and Applied Research
The concepts of basic and applied research are better
conceptualized as a continuum rather than absolute and
discrete categories. What appears to be basic research to
28/10/16 1:48 pm
CHAPTER 3 ● Research Methods and Variables
49
EVIDENCE IN THE REAL WORLD
Cross-Sectional and Longitudinal Research in Older Drivers
In one example, a cross-sectional study suggested that older drivers (above age 65) are more likely to make safety
errors than younger drivers (ages 40 to 64) (Askan et al, 2013). However, the cause-and-effect relationship was
ambiguous. Does age alone affect driving, or do the conditions associated with aging impact driving? Longitudinal
studies can provide more information about the cause of the difficulty. Two studies, one that examined driver safety
over 2 years (Askan et al, 2012) and one that looked at driving cessation over 10 years (Edwards, Bart, O’Connor,
& Cissell, 2010), followed a group of older adults over time. Both studies concluded that diminished cognitive
functioning, particularly speed of processing and visual attention, was associated with driving problems. Taken
together, these studies suggest that it is important to assess cognitive function in older adults when making decisions about driving, rather than relying on age as a determinant of fitness to drive. Gaining this information was
possible only because the same individuals were followed over time.
a clinician may be categorized as applied research to a scientist working in a laboratory. Generally speaking, basic
research is used to investigate fundamental questions
that are directed at better understanding individual concepts. Using brain imaging techniques to determine the
purposes of different brain regions, identifying environmental conditions that contribute to a stress reaction, and
examining the role of proteins in antibody responses are
examples of research on the basic end of the continuum.
Applied research, in contrast, has direct application to
health-care practices. Studies that determine the efficacy
of a fall prevention program, describe the prevalence of
foot drop in multiple sclerosis, and ascertain the strength
of the relationship between diet and ADHD symptoms
would be considered examples of research on the applied
end of the continuum.
One might ask, “If applied research has a clear application and basic research does not, then why are clinicians
interested in basic research at all?” One reason is because
we never know what the implications of basic research
might be. Basic research may lead to unintended discoveries. However, even scientists conducting basic research
who are far removed from clinical practice typically have
some real-world application in mind. The information
obtained from a basic scientist who studies cellular differences in aging brains can lead to both an understanding
of and treatment for Alzheimer’s disease. However, additional research will have to take place to make the link.
In contrast, an applied researcher who studies caregiver
training for people with dementia will be able to make
direct recommendations for practice.
Both basic and applied research are important endeavors and contribute to rehabilitation practice. When the
two come together, the result is translational research.
Translational research takes place when findings from the
laboratory are used to generate clinical research. The newest institute of the National Institutes of Health (NIH) is
the National Center for Advancing Translational Sciences,
4366_Ch03_039-058.indd 49
which has a mission to promote more translational research
(National Institutes of Health, n.d.). Consider the example of neuroplasticity and CIMT, as illustrated in the continuum shown in Figure 3-2. The early researchers, who
used basic cell-based experiments, likely did not predict
that CIMT would be an application, yet the applied research was based on earlier cellular and then animal studies (Gleese & Cole, 1949). Basic animal research indicates
that the brain can adapt to injury, and that injury can lead
to nonuse (Taub, 1980). Early developers of CIMT theorized that, by forcing individuals to use the affected limb,
they would prevent learned non-use from occurring and
promote brain reorganization (Buonoman & Merzenick,
1998). Research on people with stroke provides evidence in
support of these theories. One applied study indicates that
CIMT is an effective intervention for improving function
of the affected limb (Wolf et al, 2006), and another applied
study using brain imaging indicates changes in the brain
after CIMT intervention (Laible et al, 2012).
EXERCISE 3-1
Identifying Types of Studies (LO1)
QUESTIONS
Classify the studies described in this exercise as experimental or
nonexperimental, quantitative or qualitative, cross-sectional
or longitudinal, and basic or applied.
1. Hu, G., Bidel, S., Jousilahti, P., Antikainen, R., &
Tuomilehto, J. (2007). Coffee and tea consumption
and the risk of Parkinson’s disease. Movement Disorders, 15, 2242-2248.
Several prospective studies have assessed the association between coffee consumption and Parkinson’s
disease (PD) risk, but the results are inconsistent.
28/10/16 1:48 pm
50
CHAPTER 3 ● Research Methods and Variables
We examined the association of coffee and tea
consumption with the risk of incident PD among
29,335 Finnish subjects aged 25 to 74 years without a history of PD at baseline. During a mean
follow-up of 12.9 years, 102 men and 98 women developed an incident PD. The multivariate-adjusted
(age, body mass index, systolic blood pressure, total cholesterol, education, leisure-time physical activity, smoking, alcohol and tea consumption, and
history of diabetes) hazard ratios (HRs) of PD associated with the amount of coffee consumed daily (0, 1-4, and ≥ 5 cups) were 1.00, 0.55, and 0.41
(P for trend = 0.063) in men, 1.00, 0.50, and 0.39
(P for trend = 0.073) in women, and 1.00, 0.53, and
0.40 (P for trend = 0.005) in men and women combined (adjusted also for sex), respectively. In both
sexes combined, the multivariate-adjusted HRs
of PD for subjects drinking ≥ 3 cups of tea daily
compared with tea nondrinkers was 0.41 (95%
CI 0.20-0.83). These results suggest that coffee
drinking is associated with a lower risk of PD. More
tea drinking is associated with a lower risk of PD.
2. Troche, M. S., Okun, M. S., Rosenbek, J. C.,
Musson, N., Fernandez, H. H., Rodriguez, R., . . .
Sapienza, C. M. (2010). Aspiration and swallowing
in Parkinson disease and rehabilitation with EMST: A
randomized trial. Neurology, 23, 1912-1919.
Objective
Dysphagia is the main cause of aspiration pneumonia and death in Parkinson disease (PD), with no
established restorative behavioral treatment to date.
Reduced swallow safety may be related to decreased
elevation and excursion of the hyolaryngeal complex. Increased submental muscle force generation
has been associated with expiratory muscle strength
training (EMST) and subsequent increases in hyolaryngeal complex movement provide a strong
rationale for its use as a dysphagia treatment. The
current study objective was to test the treatment
outcome of a 4-week device-driven EMST program on swallow safety and define the physiologic
mechanisms through measures of swallow timing
and hyoid displacement.
Methods
This was a randomized, blinded, sham-controlled
EMST trial performed at an academic center. Sixty
participants with PD completed EMST for 4 weeks,
5 days per week, for 20 minutes per day, using a
calibrated or sham handheld device. Measures of
4366_Ch03_039-058.indd 50
swallow function, including judgments of swallow
safety (penetration-aspiration [PA] scale scores), swallow timing, and hyoid movement, were made from
videofluoroscopic images.
Results
No pretreatment group differences existed. The active
treatment (EMST) group demonstrated improved
swallow safety compared to the sham group as evidenced by improved PA scores. The EMST group
demonstrated improvement of hyolaryngeal function
during swallowing, findings not evident for the sham
group.
Conclusions
EMST may be a restorative treatment for dysphagia
in those with PD. The mechanism may be explained
by improved hyolaryngeal complex movement.
Classification of Evidence
This intervention study provides Class I evidence
that swallow safety as defined by PA score improved
post EMST.
3. Todd, D., Simpson, J., & Murray, C. (2010). An interpretative phenomenological analysis of delusions in
people with Parkinson’s disease. Disability and Rehabilitation, 32, 1291-1299.
Purpose
The aim of this study was to explore what delusional experiences mean for people with Parkinson’s disease (PD) and to examine how psychosocial factors
contribute to the development and maintenance of
delusional beliefs.
Method
Eight participants were interviewed, and interpretative phenomenological analysis was used to identify themes within their accounts. Participants were
either recruited from a hospital-based outpatient
movement disorder clinic or from a PD support
group in the northwest of England.
Results
Four themes emerged from the analysis: (1) “I got
very frightened”: The emotional experience associated with delusions; (2) “Why the hell’s that
happening?”: Sense of uncertainty and of losing
control; (3) “I feel like I’m disintegrating”: Loss of
identity and sense of self; (4) “I’ve just tried to make
the best of things”: Acceptance and adjustment
to experience of delusions. These interconnected
28/10/16 1:48 pm
CHAPTER 3 ● Research Methods and Variables
themes in participants’ accounts of delusional beliefs were reflected in their descriptions of living
with, and adjusting to, PD.
Conclusions
The results of this study add to the evidence base
indicating the urgent examination of psychological
alternatives to conventional, medication-based approaches to alleviating distress caused by delusions
in people with PD.
4. Van der Schuit, M., Peeters, M., Segers, E., van
Balkom, H., & Verhoeven, L. (2009). Home literacy
environment of pre-school children with intellectual
disabilities. Journal of Intellectual Disability Research, 53,
1024-1037.
Background
For preschool children, the home literacy environment (HLE) plays an important role in the development of language and literacy skills. As little is
known about the HLE of children with intellectual disabilities (IDs), the aim of the present study
was to investigate the HLE of children with IDs in
comparison to children without disabilities.
Method
Parent questionnaires concerning aspects of the
HLE were used to investigate differences between
48 children with IDs, 107 children without disabilities of the same chronological age, and 36 children
without disabilities of the same mental age (MA).
Rat and primate studies
indicate that brain
components reorganize
to recover function after
brain injury
Gleese & Cole, 1949
51
Furthermore, for the children with IDs, correlations were computed between aspects of the HLE
and children’s nonverbal intelligence, speech intelligibility, language, and early literacy skills.
Results and Conclusions
From the results of the multivariate analyses of variance, it could be concluded that the
HLE of children with IDs differed from that of
children in the chronological age group on almost all aspects. When compared with children
in the MA group, differences in the HLE remained. However, differences mainly concerned
child-initiated activities and not parent-initiated
activities. Correlation analyses showed that children’s activities with literacy materials were positively related with MA, productive syntax and
vocabulary age, and book orientation skills. Also,
children’s involvement during storybook reading
was related with their MA, receptive language age,
productive syntax and vocabulary age, book orientation, and rapid naming of pictures. The amount
of literacy materials parents provided was related
to a higher productive syntax age and level of book
orientation of the children. Parent play activities
were also positively related to children’s speech
intelligibility. The cognitive disabilities of the children were the main cause of the differences found
in the HLE between children with IDs and children
without disabilities. Parents also adapt their level to
the developmental level of their child, which may
not always be the most stimulating for the children.
Research with monkeys
leads to concept of
learned non-use
Taub, 1980
Changes in hand function
from CIMT are related to
functional MRI change in
activation in the primary
sensory cortex in
individuals with CVA
Laible et al, 2012
Somatosensory cortex of
brain is modifiable and
can be mapped as
markers of recovery
Buonoman & Merzenick,
1998
EXCITE RCT of 222
participants finds
constraint-induced
movement therapy
(CIMT) is effective in
improving upper
extremity function
Wolf et al, 2006
FIGURE 3-2 An example of translational research with a continuum of basic to applied research in the topic of
stroke intervention.
4366_Ch03_039-058.indd 51
28/10/16 1:48 pm
52
CHAPTER 3 ● Research Methods and Variables
HYPOTHESIS TESTING: TYPE I
AND TYPE II ERRORS
When conducting a quantitative study, the researcher
typically decides to accept or reject the hypothesis based
on the p value obtained from the statistical analysis. If
p is less than or equal to 0.05, the hypothesis is accepted.
A p value that is less than 0.05 means that there is less
than a 5% chance that the hypothesis is incorrect (from
a statistical point of view). When p is greater than 0.05,
the hypothesis is rejected. How do you know that the
conclusion that is reached is accurate? Well, you don’t. It
is possible that the results are misleading and in actuality
the statistical conclusion is incorrect. Statistical conclusion validity is discussed in greater detail in Chapter 4.
Mistakes that occur when interpreting the results of a
study can fall into two categories: Type I and Type II
errors.
A Type I error occurs when the hypothesis is accepted, yet the hypothesis is actually false. This error
might occur because of chance. Although by convention we generally accept the research hypothesis when
p ≤ 0.05, there is still a 5% chance that the hypothesis is
incorrect. A Type II error occurs when the hypothesis
is rejected, yet the hypothesis is true. Perhaps the most
common reason for a Type II error is a sample size that
is too small: The smaller the sample, the more difficult it
is to detect a difference between two groups. Therefore,
a study with too few participants will often draw the incorrect conclusion that an intervention was ineffective.
Table 3-3 illustrates the decision-making process associated with hypothesis testing.
VARIABLES
Variables are characteristics of people, activities, situations, or environments that are identified and/or measured in a study and have more than one value. As the
name implies, a variable is something that varies. For example, if a research sample is comprised of children with
TABLE 33 Hypothesis Testing
Hypothesis Is
True
Hypothesis Is
False
Hypothesis
accepted
Correct
decision
Type I error
Hypothesis
rejected
Type II error
Correct decision
EXERCISE 3-2
Matching Research Questions
and Categories of Research (LO2)
QUESTIONS
Using these questions from Chapter 1, indicate whether they
would best be answered using qualitative or quantitative research. Then consider whether the type of research will most
likely be experimental or nonexperimental.
1. For wheelchair users, what is the best cushion to prevent
pressure sores?
2. What are the gender differences in sexual satisfaction
issues for individuals with spinal cord injury?
3. How do athletes deal with career-ending injuries?
4. What childhood conditions are related to stuttering in
children?
autism, autism is not a variable. However, if you compare
girls and boys with autism, gender is a variable. As in this
example, some variables are categorical; however, in research categorical variables may be assigned a number
and compared. Other examples of categorical variables
are group assignment, such as control and intervention
groups, race, and geographical region. Continuous variables are ones in which the numbers have meaning in
relation to one another; that is, a higher number means
there is more of something. Using the autism example, severity of autism would be a continuous variable, as would
age and grade level.
In research there are different types of variables. Familiarity with variables is important for understanding
research designs and choice of statistic. Four important
types of variables in research are independent, dependent,
control, and extraneous.
Independent Variables
4366_Ch03_039-058.indd 52
Independent variables are the variables that are manipulated or compared in a study. In research concerning interventions, the independent variable is often identified
as the intervention being studied. What varies is whether
individuals are assigned to the intervention or control
(no-intervention) group. In this example, the independent
28/10/16 1:48 pm
CHAPTER 3 ● Research Methods and Variables
variable has two levels: intervention and control. Of
course, a study can involve more than two groups: A researcher can compare two different types of interventions
and then use a third group that is a no-treatment control.
In this case the independent variable has three levels.
The levels of an independent variable are the number of
categories or groups that make up the variable.
You can also think of the independent variable as the
variable by which a comparison is made between two or
more groups. In some cases you may not manipulate the
independent variable, but simply make a comparison. For
example, if you compare boys and girls or people with and
without autism, these independent variables are not manipulated. Researchers often will consider more than one
independent variable in a study. An intervention study of
children with autism could compare a social skills intervention group, a play therapy group, and a no-treatment
control group, as well as determine if there are differences
in how boys and girls respond to the interventions. This
study would be identified as a 3 ⫻ 2 design. There are
two independent variables (intervention type and gender).
One independent variable (intervention type) has three
levels (social skills intervention, play therapy, and no
treatment), and the other independent variable (gender)
has two levels (boys and girls).
When more than one independent variable is included in a study, the study is described as a factorial
design. In a factorial design, the interaction or impact
of both independent variables can be examined simultaneously. Factorial designs are discussed in more detail
in Chapter 6.
Dependent Variables
Dependent variables are observed and, in the case of an
experimental study, are intended to measure the result of
the manipulation. Returning to the preceding example,
suppose that the researcher is interested in the effect of
the intervention on the communication skills of the participants. Communication skills would be the dependent
variable. In an intervention study, the dependent variable
is what is measured and may also be referred to as the
outcome or outcome variable. In a weight loss study, one dependent variable would likely be weight. In a descriptive
group comparison study, the dependent variable is the
characteristic(s) that is measured for differences between
the groups. For example, individuals with and without
obesity may be compared to identify differences in sleeping patterns; in this case, sleeping patterns would be the
dependent variable.
Control Variables
Control variables are those variables that remain constant. These variables could potentially affect the outcome of a study, but they are controlled by the design
4366_Ch03_039-058.indd 53
53
of the study or the statistical procedures used. Again
using the preceding example, age may be a variable
that could affect social skills, because you would expect
that older children would have better social skills than
younger children. This variable could be controlled in
a number of ways: all children in the study could be the
same age, the ages of children could be the same across
the three groups, or age could be controlled statistically
through a procedure such as an analysis of covariance.
Analysis of covariance is explained in greater detail in
Chapter 4.
When groups are compared, it is important that they
be as similar as possible in all factors other than the independent variable(s) of interest. The more control that is
in place, the more you can be confident that the independent variable caused the change in the dependent variable.
Methods used to promote similarity across groups include
random assignment and matching. When groups are not
similar and a variable is not controlled—for example, children in one group are older than the children in another
group—the study may be confounded. This means that
the difference in the variable (in this example age) could
account for any differences found between the independent variables being studied. Again, older children would
be expected to have better social skills than younger
children.
Extraneous Variables
It is impossible to control everything, and researchers
find it difficult, if not impossible, to imagine all of the
potential variables that can influence the outcome of
a study. However, sometimes extraneous variables
are tracked and then later examined to determine their
influence. Using the autism study as an example, the
number of siblings could influence the outcome, because having siblings in the household could support
the development of social skills. Therefore, the researcher may keep track of the number of siblings for
each participant and then determine if the number of
siblings affected the outcome. The researcher would
report this influence if it did indeed exist, and future
research might then take this into account and try to
control for this variable.
From the Evidence 3-4 illustrates the different types
of variables found in an experimental study investigating
the efficacy of a psychoeducation intervention for people
with schizophrenia. This study compares a psychosocial
intervention group with a control group (independent
variable) on insight and willingness to engage in treatment (dependent variables). The researcher suggests that
the short length of treatment may have affected the findings (a possible extraneous variable).
The use of dependent and independent variables only
applies to studies in which groups are compared (the independent variable) and then observed on some measure
28/10/16 1:48 pm
54
CHAPTER 3 ● Research Methods and Variables
FROM THE EVIDENCE 3-4
Abstract of an Experimental Study Examining an Intervention for People With Schizophrenia
to Illustrate Types of Variables
Medalia, A., Saperstein, A., Choi, K. H., & Choi, J. (2012). The efficacy of a brief psycho-educational intervention to improve awareness
of cognitive dysfunction in schizophrenia. Psychiatry Research, 199(3), 164–168. doi:10.1016/j.psychres.2012.04.042.
Note A: The two groups (psychoeducational intervention and control
condition) are manipulated and
therefore would be classified as
independent variables.
Note B: Participants are the same
at baseline on several variables.
Because they are the same,
these variables are controlled.
People with schizophrenia have neuro-cognitive deficits that are associated with
poor functional outcome, yet their awareness of their cognitive deficiencies is
variable. As new treatments for cognition are developed, it will be important that
patients are receptive to the need for more therapy. Since insight into symptoms
has been associated with treatment compliance, it may be of value to provide
psycho-education to improve understanding about cognition in schizophrenia.
We report a randomized controlled trial that enrolled 80 subjects in either a brief
psycho-education intervention about cognition, or a control condition. Subjects
in the two conditions did not differ at baseline in insight or receptiveness to
treatment, or on demographic, cognitive, or psychiatric variables. Current
cognitive impairment of subjects was evidenced by the indices of working
memory, attention, and executive functioning abilities, (X = 77.45 intervention
group; 82.50 control condition), that was significantly below both the normative
mean and estimated average premorbid IQs (X = 101.3 intervention group;
X = 104.57 control condition). Multivariate repeated measures (ANOVAs)
indicated that subjects who received the psycho-education did not improve
insight into their cognitive deficits or willingness to engage in treatment for
cognitive dysfunction. While the failure to find a significant impact of this
intervention on awareness of cognitive deficit and receptiveness to cognitive
treatment raises questions about the malleability of insight into neuro-cognitive
deficits, the intervention was briefer than most reported psycho-education
programs and multi-session formats may prove to be more effective.
Note C: The outcome variables
in this study are insight and
willingness to engage in treatment.
These are the dependent variables
of the study.
FTE 3-4 Question Write a hypothesis that would pertain to this research study. The research questions should contain the independent and dependent variables and be testable. Did the findings of the study support the hypothesis?
4366_Ch03_039-058.indd 54
28/10/16 1:48 pm
CHAPTER 3 ● Research Methods and Variables
(the dependent variable). A study describing the social skill
impairments in autism or correlating social skill impairments with school performance would not have dependent
and independent variables; rather, social skill impairments
and school performance would be the variables of interest.
Control and extraneous variables still exist in descriptive
research. For example, controlling for age and considering
the number of siblings as an extraneous variable would be
relevant and important in descriptive studies of autism.
55
designing a dysphagic patient’s management plan.
The longer-term impact of short-term prevention of
aspiration requires further study.
Identify the following variables as:
A.
B.
C.
D.
Independent
Dependent
Control
Extraneous
1. Dementia and Parkinson’s disease
EXERCISE 3-3
Identifying Variables (LO3)
The following study is an experiment that uses
only one group. Also called a crossover design, all
individuals receive the same treatments, but in a
random order. This design is typically considered
comparable to a randomized controlled trial with
more than one group.
Logemann, J. A., Gensler, G., Robbins, J., Lindblad, A. S., Brandt, D.,
Hind, J. A., . . . Miller Gardner, P. J. (2008). A randomized study of three
interventions for aspiration of thin liquids in patients with dementia or Parkinson’s
disease. Journal of Speech Language Hearing Research, 51, 173-183.
Purpose
This study was designed to identify which of three
treatments for aspiration of thin liquids—chin-down
posture, nectar-thickened liquids, or honey-thickened
liquids—results in the most successful immediate
elimination of aspiration of thin liquids during the
videofluorographic swallow study in patients with
dementia and/or Parkinson’s disease.
Method
This randomized clinical trial included 711 patients
ages 50 to 95 years who aspirated thin liquids as assessed videofluorographically. All patients received
all three interventions in a randomly assigned order
during the videofluorographic swallow study.
Results
Immediate elimination of aspiration of thin liquids
occurred most often with honey-thickened liquids
for patients in each diagnostic category, followed by
nectar-thickened liquids and then chin-down posture. Patient preference was best for chin-down posture, followed closely by nectar-thickened liquids.
Conclusion
To identify the best short-term intervention to prevent
aspiration of thin liquids in patients with dementia
and/or Parkinson’s disease, a videofluorographic swallow assessment is needed. Evidence-based practice
requires taking patient preference into account when
4366_Ch03_039-058.indd 55
2. Individuals with aspiration of thin liquids
3. Degree of aspiration
4. Three treatments: chin- down posture, nectar-thickened
liquids, honey- thickened liquids
5. Patient preference
6. Patient’s anxiety level
CRITICAL THINKING QUESTIONS
1. What are the differences between experimental, quasiexperimental, and pre-experimental research?
2. How is an experimental longitudinal study different
from a nonexperimental longitudinal study?
28/10/16 1:48 pm
56
CHAPTER 3 ● Research Methods and Variables
3. How does basic research relate to applied research?
ANSWERS
EXERCISE 3-1
4. What is the major limitation of cross-sectional studies
when it comes to understanding differences across time?
1. Nonexperimental, quantitative, longitudinal, applied
2. Experimental, quantitative; neither cross-sectional
(because it is experimental and variables are manipulated
rather than observed) nor longitudinal (follow-up is
not long enough); applied
3. Nonexperimental, qualitative; neither cross-sectional
nor longitudinal, as these terms are applied only to
quantitative research; applied
4. Nonexperimental, quantitative, cross-sectional, applied
EXERCISE 3-2
5. What contributions does qualitative research make to
evidence-based practitioners?
6. Why is it important to have a directional hypothesis in
quantitative research?
1. Quantitative and experimental—this is an efficacy
study that would likely compare either groups of individuals using different cushions, or compare the same
people using different cushions.
2. Quantitative and nonexperimental—because this is a
descriptive question, the researcher would likely identify the percentage of individuals with different sexual
satisfaction issues and then determine if there were
differences between men and women (a nonexperimental group comparison).
3. Qualitative and nonexperimental—this question is
more exploratory in nature and does not imply hypothesis testing. Instead, the researcher is searching for the
answer(s).
4. Quantitative and nonexperimental—with this question, predictors are examined to determine relationships between particular conditions and stuttering.
EXERCISE 3-3
7. Why is it important to operate without a hypothesis in
qualitative research?
1.
2.
3.
4.
5.
6.
C
C
B
A
B
D
FROM THE EVIDENCE 3-1
8. Explain the difference between control variables and
extraneous variables.
4366_Ch03_039-058.indd 56
The manipulation in this study is the two groups: the experimental group receiving CIMT and the control group
receiving traditional rehabilitation. These two groups
are compared to determine if there is a difference in the
reach-to-grasp performance of the hemiparetic hand and
if there is a difference in the performance of activities of
daily living. More importantly, the researcher wants to
know if the improvements from pretest to posttest for the
CIMT group are greater than improvements from pretest
to posttest for the traditional rehabilitation group. This
type of comparison is discussed in much greater detail in
Chapter 5.
28/10/16 1:48 pm
CHAPTER 3 ● Research Methods and Variables
FROM THE EVIDENCE 3-2
In this nonexperimental group comparison study, individuals cannot be randomly assigned to groups based on
their condition of Parkinson’s disease or no Parkinson’s
disease. Although it may be sensible to conclude that the
Parkinson’s disease causes the difference in outcomes,
there could be other factors common to this group that
explain the difference. For example, the medications
taken for Parkinson’s disease could make the difference,
rather than the disease itself.
FROM THE EVIDENCE 3-3
There is no hypothesis; instead, the study is open to what
might be found. One difference you might have noted
is that there are no groups. Even among the 12 participants, the data are not aggregated, but rather presented
individually. In a qualitative study, a subjective approach
with an interview is used to both collect and interpret the
data. The results are presented from the point of view of
the participants. The conclusions are based on inductive
reasoning. Specific quotations are used to reveal a more
general experience of MS that involved a change in self.
FROM THE EVIDENCE 3-4
Hypothesis: A psychoeducational intervention for people
with schizophrenia will increase insight into cognitive
deficits and willingness to engage in treatment.
The hypothesis was not supported, as the intervention group did not have better outcomes than the control
group on the dependent variables of insight and willingness to engage in treatment.
REFERENCES
Askan, N., Anderson, S. W., Dawson, J. D., Johnson, A. M., Uc, E. Y.,
& Rizzo, M. (2012). Cognitive functioning predicts driver safety on
road tests 1 and 2 years later. Journal of the American Geriatric Society,
60, 99–105.
Askan, N., Dawson, J. D., Emerson, J. L., Yu, L., Uc, E. Y., Anderson, S.
W., & Rizzo, M. (2013). Naturalistic distraction and driving safety
in older drivers. Human Factors, 55, 841–853.
Buonoman, D. V., & Merzenick, M. M. (1998). Cortical plasticity:
From synapses to maps. Annual Review of Neuroscience, 21, 149–186.
Dalvand, H., Dehghan, L., Hadian, M. R., Feizy, A., & Hosseini, S. A.
(2012). Relationship between gross motor and intellectual function
in children with cerebral palsy: A cross-sectional study. Archives of
Physical Medicine and Rehabilitation, 93, 480–484.
Edwards, J. D., Bart, E., O’Connor, M. L., & Cissell, G. (2010). Ten
years down the road: Predictors of driving cessation. Gerontologist,
50, 393–399.
4366_Ch03_039-058.indd 57
57
Gleese, P., & Cole, J. (1949). The reappearance of coordinated movement of the hand after lesions in the hand area of the motor cortex
of the rhesus monkey. Journal of Physiology, 108, 33.
Hu, G., Bidel, S., Jousilahti, P., Antikainen, R., & Tuomilehto, J. (2007).
Coffee and tea consumption and the risk of Parkinson’s disease.
Movement Disorders, 15, 2242–2248.
Laible, M., Grieshammer, S., Seidel, G., Rijntjes, M., Weiller, C.,
& Hamzei, F. (2012). Association of activity changes in the primary sensory cortex with successful motor rehabilitation of the
hand following stroke. Neurorehabilitation and Neural Repair, 26,
881–888.
Logemann, J. A., Gensler, G., Robbins, J., Lindblad, A. S., Brandt, D.,
Hind, J. A., . . . Miller Gardner, P. J. (2008). A randomized study
of three interventions for aspiration of thin liquids in patients with
dementia or Parkinson’s disease. Journal of Speech Language Hearing
Research, 51, 173–183.
Medalia, A., Saperstein, A., Choi, K. H., & Choi, J. (2012, May 29).The
efficacy of a brief psycho-educational intervention to improve awareness of cognitive dysfunction in schizophrenia. Psychiatry Research.
[Epub ahead of print].
Metta, V., Logishetty, K., Martinez-Martin, P., Gage, H. M., Schartau,
P. E., Kaluarachchi, T., . . . Chaudhuri, K. R. (2011). The possible clinical predictors of fatigue in Parkinson’s disease: A study of
135 patients as part of international nonmotor scale validation project. Parkinson’s Disease, 124271. PMID: 22191065.
Mozo-Dutton, L., Simpson, J., & Boot, J. (2012). MS and me:
Exploring the impact of multiple sclerosis on perceptions of self. Disability and Rehabilitation, 34(14), 1208–1217. [Epub December 13,
2011].
National Institutes of Health. (n.d.). National Center for Advancing
Translational Sciences. Retrieved from http://www.ncats.nih.gov/
Plotnick, M., Giladi, N., & Hausdorff, J. M. (2009). Bilateral coordination of gait and Parkinson’s disease: The effects of dual tasking.
Journal of Neurology & Neurosurgical Psychiatry, 80(3), 347–350.
doi:10.1136/jnnp.2008.157362
Shattuck, P. T., Narendorf, S. C., Cooper, B., Sterzing, P. R., Wagner, M.,
& Taylor, J. L. (2012). Postsecondary education and employment
among youth with an autism spectrum disorder. Pediatrics, 129,
1042–1049.
Taub, E. (1980). Somatosensory deafferentation research with monkeys:
Implications for rehabilitation medicine. In L. P. Ince (Ed.), Behavioral
psychology in rehabilitation medicine: Clinical implication. New York, NY:
Williams & Wilkins.
Todd, D., Simpson, J., & Murray, C. (2010). An interpretative phenomenological analysis of delusions in people with Parkinson’s disease.
Disability and Rehabilitation, 32, 1291–1299.
Troche, M. S., Okun, M. S., Rosenbek, J. C., Musson, N., Fernandez, H. H.,
Rodriguez, R., Romrell, J., . . . Sapienza, C. M. (2010). Aspiration
and swallowing in Parkinson disease and rehabilitation with EMST:
A randomized trial. Neurology, 23, 1912–1919.
Van der Schuit, M., Peeters, M., Segers, E., van Balkom, H., &
Verhoeven, L. (2009). Home literacy environment of pre-school
children with intellectual disabilities. Journal of Intellectual Disability Research, 53, 1024–1037.
Wolf, S. L., Winstein, C. J., Miller, J. P., Taub, E., Uswatte, G.,
Morris, D., . . . EXCITE Investigators. (2006). Effect of constraintinduced movement therapy on upper extremity function 3 to
9 months after stroke: The EXCITE randomized clinical trial.
Journal of the American Medical Association, 296, 2095–2104.
28/10/16 1:48 pm
4366_Ch03_039-058.indd 58
28/10/16 1:48 pm
“Life is not just a series of calculations and a sum total of statistics, it’s about
experience, it’s about participation, it is something more complex and more
interesting than what is obvious.”
—Daniel Libeskind, architect, artist, and set designer
4
Understanding Statistics
What They Tell You and How
to Apply Them in Practice
CHAPTER OUTLINE
LEARNING OUTCOMES
KEY TERMS
Inferential Statistics for Analyzing
Relationships
INTRODUCTION
Scatterplots for Graphing Relationships
SYMBOLS USED WITH STATISTICS
Relationships Between Two Variables
DESCRIPTIVE STATISTICS
Relationship Analyses With One Outcome
and Multiple Predictors
Frequencies and Frequency Distributions
Measure of Central Tendency
Measures of Variability
INFERENTIAL STATISTICS
Statistical Significance
Inferential Statistics to Analyze Differences
Logistic Regression and Odds Ratios
EFFECT SIZE AND CONFIDENCE
INTERVALS
CRITICAL THINKING QUESTIONS
ANSWERS
REFERENCES
The t-Test
Analysis of Variance
Analysis of Covariance
LEARNING OUTCOMES
1. Understand descriptive statistics and the accompanying graphs and tables that are used to explain data in a
study.
2. Interpret inferential statistics to determine if differences or relationships exist.
3. Interpret statistics in tables and text from the results section of a given research study.
4. Determine which types of statistics apply to specific research designs or questions.
59
4366_Ch04_059-080.indd 59
27/10/16 2:02 pm
60
CHAPTER 4 ● Understanding Statistics
KEY TERMS
analysis of covariance
(ANCOVA)
Cohen’s d
confidence interval (CI)
correlation
dependent sample t-test
descriptive statistics
median
mixed model ANOVA
mode
normal distribution
odds ratio
one-way ANOVA
effect size (ES)
Pearson productmoment correlation
frequencies
range
frequency distribution
regression equation
independent sample
t-test
repeated measures
ANOVA
inferential statistics
scatterplot
level of significance
skewed distribution
linear regression
Spearman correlation
logistic regression
standard deviation
mean
statistical significance
measure of central
tendency
variability
INTRODUCTION
F
or many people, statistics are intimidating. All of those
numbers, symbols, and unfamiliar terms can be daunting.
The field of statistical analysis is complex and even though
numbers may seem to be a very concrete subject, there are
many controversies surrounding the use, application, and
interpretation of particular statistical tests. Neither a single
chapter on statistics nor several classes will make you an expert on the subject. However, as an evidence-based practitioner, it is important to be able to read the results section of
an article and make sense of the numbers there. Some basic
knowledge of and practice in reading tables, graphs, and results narratives will provide you with a better understanding
of the evidence and allow you to be a more critical consumer
of research.
This chapter serves as an overview of descriptive and
inferential statistics. Subsequent chapters include a feature
titled “Understanding Statistics,” in which statistical tests are
matched with particular research designs. Within this feature, examples are provided with additional explanation.
Health care is a science and an art. Statistics are a key
component of the science of clinical practice. However,
4366_Ch04_059-080.indd 60
it is important to avoid reducing clients to numbers and
thus failing to recognize the uniqueness of each individual
and situation. The artful evidence-based practitioner is able
to meld the numbers with personal experience and client
needs.
SYMBOLS USED WITH STATISTICS
One of the reasons statistics can seem like a foreign
language is that the results sections of research articles
often include unfamiliar symbols and abbreviations.
Table 4-1 provides a key to many symbols commonly
used in statistics.
DESCRIPTIVE STATISTICS
As the name implies, descriptive statistics describe the
data in a study. Descriptive statistics provide an analysis of data that helps describe, show, or summarize it
in a meaningful way such that, for example, patterns
can emerge from the data. Descriptive statistics are
distinguished from inferential statistics, which are
techniques that allow us to use study samples to make
generalizations that apply to the population (described
in greater detail later). At this point, it is useful to
understand that descriptive statistics are used in the
calculation of inferential statistics.
When summarizing data, you always lose some of the
important details. Consider a student’s grade point average, which provides a summary of grades received in all
of the classes taken, but excludes some important information, such as which classes were more difficult and how
many credit hours were taken at the time. Although descriptive statistics are useful for condensing large amounts
of data, it is important to remember that details and individual characteristics are lost in the process.
Different types of descriptive statistics are used to summarize data, including frequencies and frequency distributions, measures of central tendency, and measures of
variability.
Frequencies and Frequency
Distributions
Frequencies are used to describe how often something
occurs. Typically, the actual number or count is provided,
along with the percentage. For example, a study may indicate that 70 (35%) men and 130 (65%) women were
enrolled in the study. When a graph is used to depict the
count, the graph is called a frequency distribution. In
this type of graph, the vertical axis (or y axis) identifies
the frequency with which a score occurs. The horizontal
axis (or x axis) presents the values of the scores. From
the Evidence 4-1 provides an example of a frequency
distribution graph; this one represents the pretest and
27/10/16 2:02 pm
CHAPTER 4
TABLE 41 Statistical Symbols Used
to Describe the Sample
and Statistics in Research Articles
Symbol
Meaning
x-
mean of a sample
s, sd, σ
standard deviation of a sample
s2
variance of a sample
N, n
number of participants in a study or
number of participants in a group in
a study
α
alpha level; the level of significance
that is used to determine whether or
not an analysis is statistically
significant
p
probability value (p value); the
calculated likelihood of making a
Type I error
df
degrees of freedom; the number of
values that are free to vary in a statistical analysis based on the number of
participants and number of groups
r
correlation coefficient
r2
variance; the amount of variance accounted for in a correlational study
t
the critical value in a t-test
F
the critical value in an ANOVA
ES
effect size
η2
eta squared; an effect-size statistic
ω2
omega squared; an effect-size
statistic
CI
confidence interval
4366_Ch04_059-080.indd 61
●
Understanding Statistics
61
posttest scores for children on the Student Performance
Profile. With this measure, children are rated in terms of
the degree to which they meet their individualized education plan (IEP), according to percent ability, in 10%
increments, with 0% indicating no ability and 100% indicating total ability.
A normal distribution is a type of frequency distribution that represents many data points distributed in a
symmetrical, bell-shape curve. In a normal distribution,
the two halves are mirror images, with the largest frequency occurring at the middle of the distribution, and
the smallest occurring at the ends or “tails.” A skewed
distribution is one in which one tail is longer than the
other. Some distributions may have more than one peak
and are described in relation to the number of peaks; for
example, a distribution with two peaks is classified as a
bimodal distribution. Figure 4-1 illustrates the different
types of distributions.
Distributions of scores differ in terms of their measures
of central tendency and variability. Knowing the central
tendency and variability of a set of data provides you with
important information for understanding a particular distribution of scores. The connection between distributions
and other descriptive statistics is explained in the following sections.
Measure of Central Tendency
A measure of central tendency describes the location
of the center of a distribution. The three most commonly
used measures of central tendency are the mode, median,
and mean.
1. The mode is simply the score value that occurs most
frequently in the distribution. The mode provides information about the distribution, but it is generally of
less use than other measures of central tendency. The
mode is greatly influenced by chance and does not take
into account other scores in the distribution.
2. The median is the score value that divides the distribution into the lower and upper halves of the scores.
The median is most useful when distributions are
skewed, because it is less sensitive to extreme scores.
If there are an odd number of participants in a distribution, the median is the score value of the participant
exactly in the middle of the distribution. When there
is an even number of participants in a distribution, the
median is the score halfway between the scores of the
two middle participants.
3. The mean (x-) is the same as the average and balances
the scores above and below it. The mean is calculated by
summing the scores and dividing the sum by the number
of participants. The symbol for the sample mean is x-.
The relationship between the different measures of
central tendency depends on the frequency distribution.
If the scores are normally distributed, the values of the
27/10/16 2:02 pm
62
CHAPTER 4 ● Understanding Statistics
FROM THE EVIDENCE 4-1
Frequency Distribution
Watson, A. H., Ito, M., Smith, R. O., & Andersen, L. T. (2010). Effect of assistive technology in a public school setting. American Journal
of Occupational Therapy, 64(1), 18–29. doi:10.5014/ajot.64.1.18.
Frequency of Averaged SPP Scores
Frequency distribution of the scores of the averaged Student
Performance Profile (SPP) on the pretest and posttest forms,
in percentages, of current ability level of individualized
education program goals and objectives (N = 13).
6
5
4
3
2
1
0
0
10
20
30
40
50
60
70
80
90
100
Scores on SPP (% Current Ability)
Note A: The number
of children (frequency)
who received a
particular score are
indicated on the y axis.
FTE 4-1 Question
Distribution of Scores Pretest
Distribution of Scores Posttest
Note B: The score
received on the
SPP in 10%
increments is indicated
on the x axis.
How many children score at 60% ability at pretest, and how many children score at 60% ability
at posttest?
mode, mean, and median are equal. In a positively skewed
distribution, the mode is a lower score than the mean,
while the median falls between the mode and mean. In a
negatively skewed distribution, the mode is a higher score
than the mean, and once again the median falls between
the mode and mean. Figure 4-2 depicts the measures of
central tendency with the different distributions.
The mean is the measure of central tendency used most
often in research, particularly when calculating inferential
statistics. The fact that the mean balances the distribution is a desirable quality because it takes into account
the values of all participants. However, in some cases, a
few outliers can influence the mean to such a degree that
the median becomes a more accurate descriptor of the
central tendency of the distribution. For example, when
describing the demographics of a sample, the median is
4366_Ch04_059-080.indd 62
often used for income. If most individuals have incomes
around $70,000, but a few millionaires are included in the
sample, income will be distorted if the mean is reported.
However, if the median is reported, these outliers will not
misrepresent the majority.
Measures of Variability
Variability refers to the spread of scores in a distribution.
Distributions with the same central tendencies can still
be very different because of the variability in the scores.
The range is one measure of central tendency that simply indicates the lowest and highest scores. For example,
the age range of participants in a study may be expressed
as 18 to 49. The most common measure of variability is
the standard deviation, which is the expression of the
27/10/16 2:02 pm
CHAPTER 4
FIGURE 4-1 Types of frequency distributions:
A. Normal distribution. B. Negatively skewed distribution. C. Positively skewed distribution. D. Bimodal
distribution.
A
●
B
Normal distribution
Negatively skewed
distribution
D
C
Bimodal distribution
Positively skewed
distribution
A
Mean
Median
Mode
B
Mode
Median
Mean
63
Understanding Statistics
amount of spread in the frequency distribution and the
average amount of deviation by which each individual
score varies from the mean. Standard deviation is abbreviated as s, sd, or σ, and is often shown in parentheses next
to the mean in a table. A large standard deviation means
that there is a high degree of variability, whereas a small
standard deviation means that there is a low degree of
variability. If all of the scores in a distribution are exactly
the same, the standard deviation is zero.
When standard deviations are different, the frequency
distributions are also different, even with the exact same
mean. Figure 4-3 illustrates three frequency distributions
with the same mean, but variability that is dissimilar.
The concept of a perfectly normal distribution is only
theoretical; with actual data it is rare that a distribution will
be exactly symmetrical. That said, many distributions are
Large S.D.
Small S.D.
C
Moderate S.D.
Mode
Median
Mean
–5
FIGURE 4-2 Measure of central tendency in different distributions:
A. Normal distribution. B. Positively skewed distribution. C. Negatively
skewed distribution.
4366_Ch04_059-080.indd 63
–4
–3
–2
–1
0
1
2
3
4
5
FIGURE 4-3 Frequency distributions with the same mean but different amounts of variability. Dark blue line indicates large standard
deviation; dashed line indicates small standard deviation; light blue line
indicates a moderate standard deviation.
27/10/16 2:02 pm
64
CHAPTER 4 ● Understanding Statistics
EVIDENCE IN THE REAL WORLD
Applying the Statistical Concept of Variability to the Term Spectrum
When individuals with autism are described as being on a “spectrum,” this term is meant to characterize their
behavioral repertoire as being heterogeneous (i.e., as a population, this group of individuals has a lot of variability).
The Autism Speaks organization explains autism spectrum disorder (ASD) as follows:
Each individual with autism is unique. Many of those on the autism spectrum have exceptional abilities in visual skills, music, and
academic skills. About 40 percent have intellectual disability (IQ less than 70), and many have normal to above-average intelligence.
Indeed, many persons on the spectrum take deserved pride in their distinctive abilities and “atypical” ways of viewing the world.
Others with autism have significant disability and are unable to live independently. About 25 percent of individuals with ASD are
nonverbal but can learn to communicate using other means (Autism Speaks, n.d.).
approximately normal, and understanding the characteristics of the normal distribution is helpful in understanding
the role of standard deviations in both descriptive and inferential statistics. With a normal distribution, it is easy
to predict the number of individuals who will fall within
the range of a particular standard deviation: 34% of the
scores will fall one standard deviation above the mean, and
another 34% will fall one standard deviation below the
mean, for a total of 68% of the scores (Fig. 4-4).
Therefore, in a normal distribution, most people will
have scores within one standard deviation of the mean,
approximately 14% will have scores between one and two
standard deviations on either side of the mean, and a little
more than 2% will have scores more than two standard
deviations from either side of the mean. For example, in
a distribution in which the mean is 50 and the standard
deviation is 10, 68% of individuals will have scores between
40 and 60, 34% will score between 40 and 50, and the remaining 34% will score between 50 and 60. If an individual
has a score that is 0.5 standard deviations above the mean,
his or her score is 55. From the Evidence 4-2 displays a
bar graph from a descriptive study comparing individuals
with Parkinson’s disease (PD), multiple sclerosis (MS), and
healthy controls. Each bar includes indications of the mean
and standard deviations. In both conditions, the PD and
MS groups had fewer appropriate pauses than the controls.
EXERCISE 4-1
Using Statistics to Describe Samples (LO1)
QUESTIONS
1. What different types of information would be provided
by the mean and standard deviation for the following
two groups?
A. A group composed exclusively of basketball players
B. A mixed group of athletes including basketball
players, jockeys, and baseball players
2. What measure of central tendency would be most useful in describing length of stay in a hospital where most
people stay less than two weeks, but a few patients stay
more than three months? Explain why.
34.1%
2.1%
–3 ␴
34.1%
13.6%
–2 ␴
–1 ␴
13.6%
0␴
+1 ␴
+2 ␴
68.3%
2.1%
+3 ␴
3. What type of graph illustrates both the central tendencies and variability of a sample?
95.5%
99.73%
FIGURE 4-4 Standard deviations in a normal distribution.
4366_Ch04_059-080.indd 64
27/10/16 2:02 pm
CHAPTER 4
●
Understanding Statistics
65
FROM THE EVIDENCE 4-2
Bar Graph Comparing Speech Patterns in Three Different Groups
Tjaden, K., & Wilding, G. (2011). Speech and pause characteristics associated with voluntary rate reduction in Parkinson’s disease and
multiple sclerosis. Journal of Communication Disorders, 44(6), 655–665. doi:10.1016/j.jcomdis.2011.06.003.
Proportion of Grammatically
Appropriate Pauses
Note A: The top of the bar
indicates the mean, and
the extension represents the
standard deviation.
1.2
1.0
0.8
0.6
MS
PD
Control
0.4
Habitual
Slow
FTE 4-2 Question When looking at the standard deviations in the habitual condition, what useful information is
provided beyond what you learn by just looking at the mean differences?
INFERENTIAL STATISTICS
The important element of inferential statistics is “infer,”
in that this type of statistic is used when the researcher
wants to infer something about a larger population based
on the sample used in the study. Consider researchers
who are interested in a particular social skill training
program for individuals with autism. If those researchers
conduct a study, they will be unable to study all children
with autism, so a sample is selected. The results of the
study are calculated with inferential statistics and then
used to infer or suggest that the same or similar findings
would be expected with another sample of children with
4366_Ch04_059-080.indd 65
autism. Of course, the representativeness of the sample
is an important consideration when making inferences
(see Chapter 5). Most results sections that report inferential statistics also provide the descriptive statistics
(e.g., means and standard deviations) that were used in
the calculation.
Inferential statistics are often divided into two categories: (1) tests of differences (e.g., t-tests and analysis
of variance) and (2) tests of relationships (e.g., correlations and regressions). With inferential statistics, a test
is conducted to determine if the difference or relationship is statistically significant.
27/10/16 2:02 pm
66
CHAPTER 4 ● Understanding Statistics
Statistical Significance
Even impeccably designed and conducted research involves chance. For example, there are many reasons
why a sample may not perfectly represent a population
and, as a result, introduce a sampling error. Statistical
significance is a number that expresses the probability
that the result of a given experiment or study could have
occurred purely by chance. In other words, statistical
significance is the amount of risk you are willing to assume when conducting a study. It is based on an agreedupon level of significance (also called alpha or α).
Typically, 0.05 is the standard level of significance that
most researchers use. This means there is a 5% risk that
the difference between two groups, or the difference in
scores from pretest to posttest, is not a true difference
but instead occurred by chance. Conversely, there is
a 95% chance that the difference is a true difference.
In correlational studies, the level of significance is interpreted as the amount of risk that the relationship
has occurred by chance and is not a true relationship.
Sometimes, when it is important to be more certain, the
level of significance is set at 0.01, or 1%. Occasionally,
researchers conducting exploratory research will accept
a 10% chance, but there is reason to be skeptical of findings when researchers are willing to accept greater levels
of risk.
Inferential Statistics to Analyze
Differences
Many rehabilitation studies are interested in differences.
Intervention studies seek to determine whether there is a
difference between the intervention and control groups,
or whether there are differences before and after treatment. A descriptive study may examine the differences between people with and without a condition. For example,
when determining the cognitive impairments in people
with traumatic brain injury, a group of people with brain
injury and a group of people without brain injury may be
compared to determine if there is a difference in cognition between the two groups.
There are many statistical tests for comparing differences, but the most common difference statistics are
the t-test and analysis of variance (ANOVA). In these
tests, the means of the two groups and/or time periods are compared, while taking into account the standard deviations of the means. If a difference is found
within the sample, an inference is made that a difference
would exist within a larger population. For example, if
a study finds that there is a difference (improvement) in
strength after an exercise intervention, the implication
is that this difference would also occur with similar clients who receive the same intervention. Many of the
4366_Ch04_059-080.indd 66
same tests are used in both intervention and descriptive
studies. Table 4-2 summarizes the purpose of each difference test and provides examples of research designs
that would use the statistic.
The t-Test
The t-test is the most basic of inferential difference statistics because it examines the difference between two
groups at a single time point, or one group at two time
points. There are two types of t-tests: (1) a t-test for independent samples or between-group analysis (also called
a two-group/two-sample t-test) and (2) a t-test for dependent samples or within-group analysis (also called a
paired-sample t-test).
An independent sample t-test compares the difference in the mean score for two groups—the two groups
being independent of, or unrelated to, each another. It
is called an independent sample t-test because the two
groups make up the independent variable in the study.
In a dependent sample t-test, the comparison is within
the same group and compares a dependent variable. Most
often, this type of t-test is used to compare the pretest and
posttest scores of a single group. However, two different
tests that have the same metric could also be compared.
For example, a t-test for dependent samples could be used
to compare a percentile score on a final exam with that on
the final exam of another course.
When a t-test is reported in the results section of a
research paper, the following three statistics are often
included: t, df, and p. For example, a study may report
that there was a statistically significant difference from
pretest to posttest (t = 2.39, df = 15, p = 0.03). The
t denotes the critical value of the t-test, a number that
cannot be interpreted out of context. The df represents
degrees of freedom, which indicates the number of values in a study that are free to vary, based on the number
of participants in the study and, in some analyses, the
number of groups. The p stands for probability and indicates the likelihood that the difference is not a true
difference or the likelihood of making a Type I error. In
this example, there is a 3% chance that the difference is
not a true difference.
From the Evidence 4-3 presents the results of a t-test
comparing pretest and posttest scores. Three separate
t and p values are provided, indicating that three separate t-tests were conducted, resulting in three different
outcomes of a caregiver training program for individuals
with dementia.
Analysis of Variance
When more than two means are compared, an analysis of
variance (ANOVA) is the appropriate test. With a t-test,
the t statistic is used. With an ANOVA, an F-test is used
27/10/16 2:02 pm
CHAPTER 4
●
Understanding Statistics
67
TABLE 42 Application of Inferential Statistics That Analyze Differences
Sample Research Designs That
Would Use the Statistic
Statistical Test
Purpose
Independent sample
t-test (also known
as two-sample or
two-group t-test)
• Compare the differences
between two groups
(independent variables) at
one point in time on a single
dependent measure
• Descriptive study comparing two groups
• Efficacy study comparing two groups that
uses a change score (i.e., difference between
the pretest and posttest) as the single
dependent variable
• Efficacy study that uses a posttest only design
Dependent sample
t-test (also known as
paired sample t-test)
• Compare the differences within a group at two time points
on a single dependent measure
• Compare the difference
within a group on two
different dependent measures
• Efficacy study using a single group
pretest-posttest design
• Descriptive study with a single group
comparing the same individuals on two
measures with the same metric
• Developmental study (descriptive) comparing
changes in individuals over two time points
One-way analysis of
variance (ANOVA)
• Compare the differences
between three or more
groups on a single dependent
measure
• Descriptive study comparing three different
groups on a single outcome
Repeated measures
ANOVA
• Compare the differences
within one group at three
or more time points
• Efficacy study using a single-group design
that includes more than two time points
(e.g., pretest, posttest, follow-up)
• Developmental study (descriptive) comparing changes over more than two time points
Mixed model
ANOVA
• Compare both betweengroup and within-group
differences simultaneously,
providing an interaction
effect
• Provide separate results of the
between-group differences
and within-group differences
(i.e., main effects)
• Efficacy study that compares two or more
groups at two or more time points (e.g.,
randomized controlled trial, nonrandomized
controlled trial)
Analysis of covariance
(ANCOVA)
• Compare differences
between and/or within groups
while statistically controlling
(equalizes groups) a variable
(this variable is the covariate)
• Efficacy study that compares two or more
groups and uses the pretest as a covariate
(randomized controlled trial, nonrandomized
controlled trial)
• Descriptive study that compares two
or more groups but controls for an extraneous variable by using that variable as
the covariate
4366_Ch04_059-080.indd 67
27/10/16 2:02 pm
68
CHAPTER 4 ● Understanding Statistics
FROM THE EVIDENCE 4-3
Comparing Pretest and Posttest Results Using a t-Test
DiZazzo-Miller, R., Samuel, P. S., Barnas, J. M., & Welker, K. M. (2014). Addressing everyday challenges: Feasibility of a family
caregiver training program for people with dementia. American Journal of Occupational Therapy, 68(2), 212–220. doi:10.5014/
ajot.2014.009829.
Note A: You know a t-test was
used because a t value is reported.
Activities of Daily Living Knowledge Pre- and Posttest Results.
Module
Communication and nutrition
Pretest
Posttest
Transfers and toileting
Pretest
Posttest
Bathing and dressing
Pretest
Posttest
N
M
SD
53
53
73.87
92.13
19.75
14.49
46
46
88.02
94.56
11.50
12.21
45
45
86.22
92.89
16.42
11.41
t
p (2-tailed)
7.05 (52)
.000**
3.10 (45)
.003*
2.71 (44)
.010*
Note. M = mean; SD = standard deviation.
*p < .05. **p < .001.
FTE 4-3 Questions
1. What type of t-test was used in this analysis?
2. What does the N stand for in the table?
to compare means, and the F statistic is used. There are
many variations of an ANOVA, including the one-way
ANOVA, repeated measures ANOVA, and mixed model
ANOVA.
A one-way ANOVA is similar to the independent
sample or between-group t-test; however, instead of
only comparing two groups, three or more groups are
compared at a single point in time. For example, Pegorari, Ruas, and Patrizzi (2013) compared three groups
4366_Ch04_059-080.indd 68
of elderly individuals—prefrail, frail, and nonfrail—
in terms of respiratory function. Using a one-way
ANOVA, the researchers found differences among the
groups. When more than two comparisons are made,
follow-up tests are necessary to determine where the
differences lie. In this example, three follow-up analyses
would be necessary to compare (1) the prefrail and frail
groups, (2) the prefrail and nonfrail groups, and (3) the
frail and nonfrail groups.
27/10/16 2:02 pm
CHAPTER 4
The repeated measures ANOVA is similar to a dependent sample or within-group t-test and is used when
the means are compared over more than two time periods
(i.e., the within-group analysis is repeated). For example,
a repeated measures ANOVA may determine if there is
a difference at baseline, at 3 months, and at 6 months.
Follow-up analyses are necessary to examine the difference for each pair: (1) baseline and 3 months, (2) baseline
and 6 months, and (3) 3 months and 6 months. In intervention research, a within- and between-group analysis
is often made simultaneously to determine if there is an
interaction effect (i.e., do the two groups have a pattern
of differences over time?).
A mixed model ANOVA is used when between- and
within-group analyses are conducted simultaneously;
however, in the literature you will often see this referred to
simply as a repeated measures ANOVA, with the betweengroup analysis implied. In these analyses, two or more
groups are compared over two or more time points. In a
mixed model ANOVA, it is possible to examine both main
effects and interaction effects. One main effect would
look at time differences and determine if there is a difference from pretest to posttest with both groups combined.
However, when trying to determine if an intervention is
effective, it is typically most important to determine if
there is an interaction effect. The interaction tells you
if there is a difference in how the two groups perform
over time.
When there is no interaction, this means that the groups
in the study perform similarly. The graph in Figure 4-5
shows that both groups improved from pretest to posttest,
but there was no difference in how much they improved.
When there is no interaction, the mixed model ANOVA
calculated with the means and standard deviations of the
two groups at both time points would indicate an interaction effect of p > 0.05.
When an interaction effect exists (p < 0.05), there will
be a difference in how the groups perform over time.
25
20
An analysis of covariance (ANCOVA) is used when the
researcher wants to statistically control a variable that
may affect the outcome of a study. An ANCOVA is useful
when a demographic variable differs between groups and
when that demographic variable is related to the outcome
variable (dependent variable). For example, a study of
cognitive remediation has one group with older participants; in this study, age is related to (i.e., correlated with)
the outcome variable of memory. In this case, age may
be covaried in the analysis. In doing so, variance or error
associated with age is removed, and the researcher is more
likely to find a difference between the intervention and
control groups.
Some analyses use the baseline scores as the covariate. In this instance, the analysis is similar to a repeated
measures ANOVA; however, instead of comparing individuals before and after an intervention, the error or
difference at baseline is equalized. The baseline (pretest) is used as the covariate and then only the posttest
25
Control
20
15
10
10
5
5
Posttest
FIGURE 4-5 Graph showing no interaction (p > 0.05); both groups
improved from pretest to posttest, but there was not a difference in how
much they improved.
4366_Ch04_059-080.indd 69
69
Analysis of Covariance
15
Pretest
Understanding Statistics
The graph in Figure 4-6 illustrates an interaction effect.
Both groups improved, but the intervention group
improved to a much greater extent. When there is an
interaction, the mixed model ANOVA calculated with
the means and standard deviations of the two groups at
both time points would indicate an interaction effect of
p < 0.05.
From the Evidence 4-4 provides an example of a
mixed model ANOVA. In this study, physical fitness
was compared in children at 9 years of age and again at
12 years of age (Haga, 2009). The children were also
grouped into a low motor competence (LMC) group and
a high motor competence (HMC) group. Several physical
fitness tests were performed. The results were similar for
most of the fitness tests. The following example provides
the results from the standing broad jump.
Intervention
0
●
0
Intervention
Control
Pretest
Posttest
FIGURE 4-6 Graph showing interaction effect (p < 0.05); both
groups improved, but the intervention group improved to a much
greater extent.
27/10/16 2:02 pm
70
CHAPTER 4 ● Understanding Statistics
FROM THE EVIDENCE 4-4
Main and Interaction Effect
Haga, M. (2009). Physical fitness in children with high motor competence is different from that in children with low motor
competence. Physical Therapy, 89, 1089–1097. doi:10.2522/ptj.20090052.
Note A: Effect size statistics
described at the end of
the chapter provide an
estimate of the magnitude
of the difference.
Note B: In the statistical
analysis, the subscript with
the F (in this case the
numbers 1 and 15) indicates
the degrees of freedom.
Standing Broad Jump
A significant main effect was obtained for time (F1,15 = 7.707,
p < .05), with a moderate effect size (partial η2 = .339). A significant
main effect also was obtained for group (F1,15 = 12.700, p < .05),
with a moderate effect size (partial η2 = .458). There was no
significant interaction effect (F1,15 = 0.135, p > .05; partial η2 = .009).
The article also provides the following descriptive statistics for the
standing broad jump. At 9 years of age, the LMC group had a mean
score of 1.18 (0.20), and the HMC had a mean score of 1.50 (0.21).
At 12 years, the LMC score was 1.30 (0.34), and the HMC score
was 1.69 (1.50).
Note C: The main effect for time
indicates that children in both
groups improved from ages 9 to
12. The main effect for group
indicates that there were
significant differences between
the groups at ages 9 and 12.
FTE 4-4 Question
How would you interpret the interaction effect?
scores are compared. For example, in a study of kinesiotaping for shoulder impingement, an ANCOVA was
used to control for pretesting; researchers found that the
change in pain level was greater for the experimental
group than for the control group (Shakeri, Keshavarz,
Arab, & Ebrahimi, 2013).
4366_Ch04_059-080.indd 70
Multiple statistical tests are often used in a single study.
For example, three different independent sample t-tests may
be used, each with a different outcome measure. Or, a oneway ANOVA may be used initially to compare three groups,
followed by three separate independent sample t-tests to
compare each pair of groups (1 and 2, 2 and 3, and 1 and 3).
27/10/16 2:02 pm
CHAPTER 4
●
Understanding Statistics
71
2.
EXERCISE 4-2
Describing the Results From a Study
(LO2 and LO3)
The graphs in Figures 4-7 through 4-11, which
present data that can be analyzed with difference statistics, come from Hall et al’s 1999 study, “Characteristics of the Functional Independence Measure in
Traumatic Spinal Cord Injury.” In all cases, the dependent variable is motor scores from the Functional
Independence Measure. For each graph, write a sentence to describe which differences are compared.
Consider whether or not the comparison involves
a difference between groups, a difference in time
points, or a combination.
35
32.5
30
25
22.6
20
14.9
15
16.9
10
5
0
Admission
QUESTIONS
25
C4 spinal cord injury
C6 spinal cord injury
C8 spinal cord injury
Thoracic spinal cord injury
22.6
FIGURE 4-9 One-way ANOVA (four groups, one time point).
20
16.9
3.
15
10
5
0
Admission
C6 spinal cord injury
C8 spinal cord injury
50
FIGURE 4-7 Independent sample t-test (two groups, one time point).
46.7
45
39.7
40
C6 spinal
cord injury
37.4
35
1.
30
25
20
15
16.9
10
5
40
0
37.4
35
30
1 year
post
2 years
post
FIGURE 4-10 Repeated measures ANOVA (one group, four time
25
points).
20
15
Admission Discharge
C6 spinal
cord injury
16.9
10
4.
5
0
Admission
Discharge
FIGURE 4-8 Dependent sample t-test (one group, two time points).
4366_Ch04_059-080.indd 71
27/10/16 2:02 pm
72
CHAPTER 4 ● Understanding Statistics
80
68.7
70
68.4
61.9
C6 spinal
cord injury
60
50
46.7
40
37.4
30
20 22.6
16.9
14.9
10
23.1
C4 spinal
cord injury
39.7
26.9
25.4
1 year
post
2 years
post
C8 spinal
cord injury
0
Admission Discharge
FIGURE 4-11 Repeated measures ANOVA (one group, four time points).
5.
Inferential Statistics for Analyzing
Relationships
Statistics for analyzing relationships are based on correlations, or the degree to which two or more variables
fluctuate together. The different types of correlational
statistics are described here. Correlations are described
in terms of strength, direction, and significance. Correlations range in strength from 0 to 1.0 and are reported as
r values (e.g., r = 0.35).
The direction of a correlation can be either positive or
negative. A positive correlation means that two variables
are related in the same direction; as one variable increases,
so does the other variable. For example, speed and distance
are generally positively correlated, meaning that the faster
you are, the more distance you can cover. In contrast,
speed and time are negatively correlated; the faster you
are, the less time it takes to get to a destination. As with
other inferential statistics, a p value will indicate whether
the strength of the relationship is statistically significant.
Returning to the speed and distance example, if all
other variables were held constant, these examples would
be perfectly correlated at r = 1.0 and r = –1.0, respectively.
In other words, you can perfectly predict one variable if
you know the other. For example, if time is held constant
at one half-hour, and someone runs at 6 miles per hour,
the distance covered would be 3 miles. If another person
runs 3.4 miles in the same time period, that person’s speed
is 6.8 miles per hour. Similarly, if distance is held constant
at 3 miles and someone runs 4 miles per hour, it will take
that person 45 minutes to run 3 miles. If another individual can run 6 miles per hour, that individual will cover
3 miles in only 30 minutes.
4366_Ch04_059-080.indd 72
However, in most rehabilitation research, the relationships being examined are not so simple. Two
variables may be related, but that relationship is less
than perfect, and some of the variance is unaccounted
for. For example, Kim et al (Kim, Kim, & Kim, 2014)
examined the relationship between activities of daily
living and quality of life for individuals after stroke.
When correlating the Functional Independence Measure (FIM) and the Stroke Specific Quality of Life
(SS-QOL) measure, they found a strong positive correlation of r = 0.80, which was statistically significant
(p < 0.01). Total variance accounted for is simple to calculate when you express the correlation as the variance
equals the correlation squared (in this case, 0.802 =
0.64). The variance is always smaller than the correlation. These findings suggest that functional independence accounts for 64% of quality of life, and 36% of
quality of life is related to other unknown factors. A
simple Venn diagram helps explain the meaning of the
correlation (R). The overlapping area is the amount of
variance (64%) that is shared between the two variables.
The remaining variance, 36%, remains unaccounted for
and is due to other factors (see Fig. 4-12).
Scatterplots for Graphing Relationships
Graphically, the relationship between two variables can be
illustrated using a scatterplot, which is a graph of plotted points that shows the relationship between two sets
of data. You can visually examine a scatterplot to determine if a relationship is weak or strong, and whether the
relationship is positive or negative. Using the preceding
example, each person’s score is plotted as a point based
on his or her scores on the two measures. Figure 4-13
illustrates different scatterplots.
Shared variance (64%)
Quality
of life
Functional
independence
Variance unaccounted for (36%)
FIGURE 4-12 Correlations and variance (relationship statistics).
27/10/16 2:02 pm
CHAPTER 4
●
Understanding Statistics
73
Relationships Between Two Variables
A
r = +1.0
D
B
r ⯝ +0.6
r = 0.0
E
C
r = –1.0
r ⯝ –0.6
FIGURE 4-13 Examples of scatterplot diagrams: A. Positive relationship B. No relationship. C. Negative relationship. D. Positive relationship.
E. Negative relationship.
The simplest correlations ascertain the relationship between two variables. When two continuous variables are
correlated (e.g., speed and distance, age and grip strength,
memory and attention), the statistic most commonly used
is the Pearson product-moment correlation, which is
an inferential statistic that examines the strength of the
relationship between two continuous variables. A similar statistic is the Spearman correlation, an inferential
statistic that examines the strength of the relationship
between two variables when one or both of the variables
is rank-ordered (e.g., burn degree). The results of correlations are often displayed in a table called a correlation
matrix; From the Evidence 4-5 provides an example. In
FROM THE EVIDENCE 4-5
Correlation Matrix Examining Relationships Between Variables and Fall Efficacy
Jeon, B. J. (2013). The effect of obesity on fall efficacy in elderly people. Journal of Physical Therapy Science, 25(11), 1485–1489.
doi:10.1589/jpts.25.1485.
Note A: The strongest correlation
is between BMI and VFA.
Correlations among the variables (N = 351)
BMI
VFA
Pain
Mobility
Balance
Age
Gender
Falls efficacy
BMI
VFA
Pain
Mobility
Balance
Age
–0.16*
–0.12
–0.40***
–0.41***
0.25**
–0.23**
0.18*
0.75**
0.03
–0.01
–0.08
–0.09
–0.30
0.06
0.02
–0.20**
0.14*
–0.12*
0.30 ***
–0.23 ***
0.24 ***
–0.23 ***
–0.26 ***
0.38 ***
–0.20 ***
–0.31 ***
0.16 **
0.06
*p < 0.05; **p < 0.01; ***p < 0.001
VFA = visceral fat area
Note B: Many correlations
are negative.
FTE 4-5 Question Why is the relationship between BMI and VFA so much stronger than any of the other correlations?
4366_Ch04_059-080.indd 73
27/10/16 2:02 pm
74
CHAPTER 4 ● Understanding Statistics
this study by Jeon (2013), the relationship between body
mass index and several other variables was calculated.
Relationship Analyses With One Outcome
and Multiple Predictors
Predicting outcomes is a major area of study in rehabilitation research. Researchers often want to know what factors have the greatest impact on a particular outcome. In
this case, multiple variables may be entered as predictors
(sometimes referred to as independent variables) to determine what factors carry the most weight. The weight of
each variable indicates the extent to which each variable
is useful in predicting the outcome.
Studies to predict outcomes use regression equations.
A regression equation calculates the extent to which two
or more variables predict a particular outcome. In linear regression, several predictors are entered into a regression equation to determine how well they predict an
outcome of interest (sometimes referred to as a criterion).
In linear regression, typically the outcome variable is continuous, such as scores on the Functional Independence
Measure (FIM), degrees of range of motion, and scores on
an assessment of social skills. The results are described in
terms of the total contribution of all of the predictors, as
well as the unique contribution of individual predictors.
For example, a linear regression may be used to predict
the outcome of grades in a course. Predictors might include interest in the topic, amount of hours studied, and
previous research courses taken. The linear regression
will indicate the strength of these predictors taken together in determining the outcome (i.e., do they account
for a large proportion of the variance?) and how much
each predictor uniquely contributes to the outcome. If
the number of hours studied is the strongest predictor
of outcome, you would recommend that students study
more to do better in the class. In contrast, if the strongest
predictor is previous research courses taken, perhaps an
additional course should be required as a prerequisite.
An important contribution of linear regression is that it
allows for the removal of shared variance among predictors so that you can determine which predictors are most
important. For example, a researcher may find that upperbody strength and range of motion together contribute
30% of the variance of FIM scores in individuals with
stroke; however, strength only contributes 5% uniquely,
and range of motion only contributes 7% uniquely. This
indicates that strength and range of motion are highly
intercorrelated. Perhaps the researcher adds cognition to
the predictors and finds that this predictor contributes
15% of unique variance to the equation. As a therapist,
you would then appreciate that both combined motor
abilities and cognition are important predictors of function, suggesting that you should work on both motor
skills and cognition to improve FIM scores.
4366_Ch04_059-080.indd 74
The challenge of linear regression lies in choosing
predictors. The amount of total variance accounted for
provides a clue as to how effective the researcher was in
the selection of predictors. When much of the variance
is left unaccounted for, you can only speculate as to what
those unknown predictors might be. In the preceding example, perhaps it is not the skills of the individual (motor
and cognition), but something outside of the person that
is a better predictor of FIM outcomes, such as amount of
social support or the type of rehabilitation facility that is
providing services.
Ambrose, Fey, and Eisenberg (2012) studied phonological awareness and print knowledge in children with
cochlear implants. Phonological awareness is the ability
to break words down into component syllables and
sounds and to manipulate these abstract components,
such as by building them back into words. These are
essential skills for learning to read words. Print awareness involves recognition of letters, the sounds they
produce, and their names. Ambrose et al conducted a
linear regression analysis to determine if these skills
were related to speech production and language abilities. A separate regression equation was computed for
the two different criteria of phonological awareness
and print knowledge. In other words, the researchers
wanted to determine how much of the variability in
phonological awareness and print knowledge skills of
preschoolers with cochlear implants could be predicted
by knowledge of their ability to speak intelligently and
in grammatical sentences. Language composite, speech
production, and speech perception were entered as
predictors. The table and the narrative in From the
Evidence 4-6 describe the contribution of the three
predictors taken together, as well as the individual contribution of each predictor.
Logistic Regression and Odds Ratios
Logistic regression is a statistical method for analyzing a
dataset in which the outcome is measured with a dichotomous variable (i.e., there are only two possible outcomes),
such as employed vs. not employed or experienced a fall
vs. did not fall. In this case, the study examines what predicts the dichotomous outcome, and the number that is
reported is an odds ratio (an estimate of the odds when
the presence or absence of one variable is associated with
the presence or absence of another variable).
When a single predictor is used that is also dichotomous, the odds ratio calculation is a simple one that uses
a 2 ⫻ 2 table and the formula OR = AD/BC. In a hypothetical study that compares supported and transitional
employment and has 50 individuals in each group, the
2 ⫻ 2 table looks like that shown in Table 4-3.
The odds ratio is then calculated as OR = 40*25/10*25 =
1000/250 = 4.0.
27/10/16 2:02 pm
CHAPTER 4
●
Understanding Statistics
75
FROM THE EVIDENCE 4-6
Linear Regression Table
Ambrose, S. E., Fey, M. E., & Eisenberg, L. S. (2012). Phonological awareness and print knowledge of preschool children with cochlear
implants. Journal of Speech, Language, and Hearing Research, 55(3), 811–823. doi:10.1044/1092-4388(2011/11-0086).
Note A: The total contribution of
all three variables and each
individual predictor is expressed
in terms of variance (R2). As
expected, the combined variance
is greater than the contribution of
each individual predictor, which
in this case is very small.
Predictor
Regression 1:
Phonological awareness
(R2 = .34)
β
t
p
sr2
Language composite
Speech production
Speech perception
.324
.066
.287
1.06
0.23
1.33
.301
.820
.197
.04
<.01
.06
β
.243
.271
–.006
Regression 2:
Print knowledge
(R2 = .23)
t
p
sr2
0.74
0.88
–0.03
.468
.390
.978
.02
0.3
<.01
Note. For both regressions, all predictor variables were entered simultaneously in
one step. sr2 is the squared semi-partial correlation coefficient, which represents
the percentage of total variance in the criterion variable accounted for by each
individual predictor variable with the other variables controlled.
For the first regression, TOPEL Phonological Awareness was entered as the
criterion variable and the language composite, speech production, and speech
perception variables were entered simultaneously as predictors. This model was
significant (F(3, 20) = 3.43, p = .037), with the predictor variables accounting for
34% of the variance in the CI group’s phonological awareness abilities. None of the
predictor variables contributed unique variance to phonological awareness after
accounting for the variance that was shared with the other predictors.
A second regression was conducted with TOPEL Print Knowledge entered as the
criterion variable. Again, the language composite, speech production, and speech
perception variables were entered simultaneously as predictors. Although language
and speech variables correlated significantly with TOPEL scores on their own, this
model combining all three variables was not significant, F(3, 20) = 2.00, p = .146.
FTE 4-6 Question In the phonological awareness linear regression, how can the total model accounting for 34% of
the variance be statistically significant if none of the individual predictors contributes a unique amount of variance?
4366_Ch04_059-080.indd 75
27/10/16 2:02 pm
76
CHAPTER 4 ● Understanding Statistics
TABLE 43 Example of a Hypothetical Odds Ratio
Type of
Employment
Services
Employed
Not Employed
Supported
A
40
B
10
Transitional
C
25
D
25
Odds ratios are interpreted as the odds that a member
of a group will have a particular outcome compared with
the odds that a member of another group will have that
outcome. In this example, individuals in supported employment are four times more likely than individuals in
transitional employment to be employed. With an odds
ratio, a value of 1 indicates that the odds are equal or no
different for either condition. Odds greater than 1 indicate an increased chance, and odds less than 1 indicate a
decreased chance of the outcome of interest.
In logistic regression, many predictors can be examined
as potentially affecting the outcome. Odds ratios are calculated for each predictor to determine which are most impactful. From the Evidence 4-7 shows a table from a study
examining children with physical disabilities that attempted
to identify predictors associated with whether the child received postsecondary education (Bjornson et al, 2011).
The different relationship statistics described in this
chapter are presented in Table 4-4.
EFFECT SIZE AND CONFIDENCE
INTERVALS
In addition to statistical significance, other statistics
such as effect size and confidence intervals are useful in
interpreting the results of a study and determining the
importance of the findings in clinical practice or decisionmaking. Effect size (ES) describes the magnitude or
strength of a statistic (i.e., the magnitude of the difference or the relationship). Confidence interval (CI) is a
reliability estimate that suggests the range of outcomes
expected when an analysis is repeated.
It is important to understand that statistical significance is not the same thing as clinical significance. Statistical significance is highly dependent on sample size;
you are likely to find a difference between two groups
and/or before and after an intervention when you have a
large sample size, even if that difference is very small. If
4366_Ch04_059-080.indd 76
a study found a statistically significant improvement on
the Functional Independence Measure (FIM), but that
improvement was only a mean of 1.5 points, a therapist
familiar with the measure would recognize that 1.5 points
is not clinically relevant.
Another way to evaluate the magnitude of a difference
is to look at its effect size. Not all researchers report the
effect size, but it is becoming more valued as an important
statistic, with some researchers asserting that it is more
important than significance testing. A common effect
size statistic that is easy to understand is Cohen’s d. This
effect size measures the difference between two group
means reported in standard deviation units. The overall
standard deviation is calculated by pooling the standard
deviations from each group. For example, a comparison
of pretest to posttest difference with an effect size of d =
0.5 means the group changed one-half of a standard deviation. Similarly, if an effect size for an intervention and
control group was reported as 1.0, this difference could
be interpreted as the intervention group improving one
standard deviation more than the control group.
Cohen (1988), who developed this statistic, indicates
that 0.2 to 0.5 is a small effect, 0.5 to 0.8 is a medium
effect, and anything greater than 0.8 is a large effect. Because Cohen’s d is expressed in terms of standard deviations, the d value can be greater than 1. Other effect sizes
that you are likely to encounter are eta squared (η2) and
omega squared (ω2). Usually, the author will provide an
interpretation of the magnitude of the effect. In relationship studies, the r value is the effect size, as it expresses
the strength of the relationship. If r = 0.87, this would indicate a large effect and a strong relationship, whereas r =
0.20 is a small effect and a weak relationship.
Another statistic that is useful in interpreting the
meaningfulness of inferential statistics is a confidence
interval. Whenever a statistic is calculated, such as a t- or
F-statistic, an odds ratio, or an effect size, the result is
an imperfect estimate that contains error. The potential
impact of this error can be expressed by calculating the
confidence intervals of the statistic. Remember that with
inferential statistics, the calculated mean from the sample
is an estimate of the true value of the population, whereas
a confidence interval is the range of values estimated for
the population. A 95% confidence interval, which is most
commonly reported in research studies, suggests that you
can be 95% confident that the true mean of the population exists between those two values. As with odds ratios,
the 95% confidence ratio tells you the range within which
you would expect to find the odds of the true population.
As such, a smaller confidence interval is better than a large
confidence interval because clinicians can more reliably
predict an outcome when the range of scores is smaller.
The confidence interval may also be interpreted as the
range of scores you would expect to find if you were to
conduct the study again.
27/10/16 2:02 pm
FROM THE EVIDENCE 4-7
Predictors of Postsecondary Education for Children With Physical Disabilities
Bjornson, K., Kobayashi, A., Zhou, C., & Walker, W. (2011). Relationship of therapy to postsecondary education and employment in
young adults with physical disabilities. Pediatric Physical Therapy, 23(2), 179–186. doi:10.1097/PEP.0b013e318218f110.
Note A: The ability to use both hands is
the strongest predictor. Children able to use
both hands were 4.8 times more likely to obtain
postsecondary education. Receiving OT or PT
services was another strong predictor, whereas
the use of expressive language was not a
strong predictor.
Table 2. A Logistic Regression Model Describing the
Relationship Between Therapy Services During
Secondary Education Years and Report of
Postsecondary Education Among Participants in the
National Longitudinal Transition Study 2 Who Have
Physical Disabilitiesa
Predictor of Interest
Intercept
Has received any OT
or PT
Gender
Age
Is able to use both arms
and hands normally
for holding things like
a pencil or spoon
Has increased self-care
skills
Higher parental
education
Has increased
expressive language
Uses a wheelchair for
mobility
Has any health
insurance
Child has a high school
diploma or GED
Has increased overall
social skills
OR (95% Cl)b
P
0.021 (3.1E–6–144.1)
3.2 (1.133–9.151)
.386
.029
0.5 (0.198–1.286)
0.8 (0.542–1.21)
4.8(1.863–12.174)
.149
.297
.001
1.3 (1.065–1.705)
.014
2.1 (1.494–3.029)
<.001
1.1 (0.608–1.93)
.783
1.4 (0.599–3.14)
.449
1.4 (0.309–6.178)
.667
3.1 (1.157–8.328)
.025
1.2 (1.075–1.329)
.001
Abbreviations: CI, confidence interval; GED, General Educational
Development test; OR, odds ratio; OT, occupational therapy; PT,
physical therapy.
aValues in bold represent factors that are significantly associated
with the outcomes of interest.
bOdds ratios are the odds of exposure (eg, to PT or OT) to each
predictor among people who did develop the outcome (eg, who
did receive postsecondary education) divided by the odds of
exposure among people who did not develop the outcome. The
odds ratio expresses likelihood that an exposure of interest is
related to the outcome. The 95% confidence interval (CI)
expresses the precision of this estimate.
FTE 4-7 Question How would you interpret the predictor “child has a high school diploma or GED”?
4366_Ch04_059-080.indd 77
27/10/16 2:02 pm
78
CHAPTER 4 ● Understanding Statistics
TABLE 44 Application of Relationship Statistics
Statistic
Purpose
Sample Research Designs
That Would Use the Statistic
Pearson productmoment correlation
Describe the relationship between two
continuous variables.
Nonexperimental relationship
studies
Spearman correlation
Describe the relationship between two variables
when at least one variable is rank-ordered.
Nonexperimental relationship
studies
Linear regression
Identify predictors of a continuous outcome
measure.
Nonexperimental predictive
studies
Logistic regression
Identify predictors of a categorical outcome measure.
Nonexperimental predictive
studies
EXERCISE 4-3
Interpreting Confidence Intervals (LO3)
Although the odds ratio for use of both hands and
postsecondary education is very strong at 4.8, the
95% confidence interval is also large, ranging from
1.86 to 12.17. How might the confidence interval
affect your interpretation of the findings?
2. Which is a better predictor of earnings for individuals
with schizophrenia: cognition, symptoms, or educational background?
3. Is mirror therapy or acoustic feedback more effective
in reducing neglect in individuals with stroke?
CRITICAL THINKING QUESTIONS
EXERCISE 4-4
Matching the Statistics with the Research
Question (LO4)
1. Why is it more difficult to find a significant difference between groups when the standard deviations are
large?
QUESTIONS
What statistic would likely be used in studies designed to
answer the following research questions?
1. What is the difference in memory impairment for individuals with early Alzheimer’s disease and same-aged
adult peers?
4366_Ch04_059-080.indd 78
2. Explain the difference between an independent sample
and a dependent sample t-test.
27/10/16 2:02 pm
CHAPTER 4
3. What types of research results would be analyzed using the different types of ANOVAs?
●
Understanding Statistics
79
provides a general perspective as to the central tendencies of the distribution.
EXERCISE 4-2
There may be some differences in the interpretations
of the graphs, but in general the conclusions should be
similar.
4. What are some examples from rehabilitation research in
which you would expect to find negative correlations?
5. What information can a regression equation provide
(both linear and logistic) that goes beyond the information you get from a simple two-variable correlation?
6. Why are effect sizes and confidence intervals sometimes preferred over statistical significance when it
comes to making clinical interpretations of a study?
1. At admission, individuals with C8 SCI have higher
FIM scores than individuals with C6 SCI.
2. Individuals with C6 SCI more than double their FIM
scores from admission to discharge.
3. The lower the SCI, the greater the FIM score at admission. There is a particularly marked difference
between C6 and C8 and between C8 and Thoracic, with admission scores for C4 and C6 being more
similar.
4. The greatest improvement in FIM scores for C6
SCI occurs from admission to discharge, but there
is continuous improvement through 2 years after
discharge.
5. Individuals with C4, C6, and C8 SCI improve over
time, but there are some differences in the patterns of
improvement. For example, those with C8 SCI make
the greatest amount of improvement, and most of
their improvement occurs between admission and discharge and then levels off. Patients with C4 SCI have
a slight decrease in FIM scores from 1 year to 2 years
after discharge.
EXERCISE 4-3
ANSWERS
EXERCISE 4-1
1. The mean will show the difference in height for
the two groups (the basketball players being a taller
group). However, the standard deviation will be important to illustrate the variability of the samples, with
the basketball players as a group having less variability
because they will tend to all be tall, while the mixed
group of basketball players, jockeys, and baseball players will include shorter individuals, thus increasing
variability.
2. The median will be a more useful statistic to describe
length of stay if there are a few outliers that would
skew the mean.
3. A frequency distribution provides a visual display
that illustrates the variability of the sample and also
4366_Ch04_059-080.indd 79
The large confidence interval suggests that the results
could be unstable or imprecise and that, if a second study
were conducted, you might get very different results. Although the odds ratio may be even larger, the low end
at 1.86 suggests that this predictor may not actually be
so important. These results speak to the importance of
replication and why another study would be useful to determine if the results from the first study hold up.
EXERCISE 4-4
1. Independent sample t-test, because you are comparing
two groups of individuals at one point in time.
2. Multiple linear regression, because you are looking at
several predictors for a continuous outcome measure.
If the outcome was a dichotomous variable of employment, you could analyze the results using logistic
regression.
3. Mixed model ANOVA, because you would want to
know the interaction effect of the between-group
comparison of groups across time from baseline to
posttesting.
27/10/16 2:02 pm
80
CHAPTER 4 ● Understanding Statistics
FROM THE EVIDENCE 4-1
FROM THE EVIDENCE 4-7
At pretest, one child had 60% ability; at posttest, 5 children
had 60% ability.
If the child has a high school education or GED, he or
she is more than three times as likely to receive secondary
education. This is understandable, as one typically must
finish high school before going on to postsecondary education. However, sometimes we want the research to verify
what we assume is apparent.
FROM THE EVIDENCE 4-2
The standard deviations for the Parkinson’s disease and
multiple sclerosis groups extend to almost the same point
as the control group. As a group, although the means are
different between the groups, this indicates that many individuals in the PD and MS groups will perform similarly
to the control group.
FROM THE EVIDENCE 4-3
1. It must be a dependent sample t-test because the pretest and posttest scores are compared. This study utilized a one-group pretest-posttest design.
2. The N stands for the number of participants in the
study. The N varies because there were different numbers of individuals for whom data were available for
the different tests.
FROM THE EVIDENCE 4-4
Although the high-competence group had better scores
than the low-competence group at both time points, there
was a similar pattern of scores, in that both groups improved at a comparable rate from 9 years of age to age 12.
FROM THE EVIDENCE 4-5
Logically, you would expect that a person’s body mass
index (BMI) and the amount of fat on that individual to
be closely related; the other variables, such as BMI and
balance, may be related, but you would expect there to be
many other variables that remain unaccounted for, such as
level of physical activity, medications taken, or comorbid
conditions associated with balance problems.
FROM THE EVIDENCE 4-6
You could interpret this finding to mean that language
composite, speech production, and speech perception
are all important for phonological awareness, but that
these abilities are highly intercorrelated and are not
distinct in terms of their ability to explain phonological
awareness.
4366_Ch04_059-080.indd 80
REFERENCES
Ambrose, S. E., Fey, M. E., & Eisenberg, L. S. (2012). Phonological
awareness and print knowledge of preschool children with cochlear
implants. Journal of Speech Language and Hearing Research, 55, 811–823.
Autism Speaks. (n.d.). Frequently asked questions. Retrieved from http://
www.autismspeaks.org/what-autism/faq
Bjornson, K., Kobayashi, A., Zhou, C., & Walker, W. (2011). Relationship of therapy to postsecondary education and employment in
young adults with physical disabilities. Pediatric Physical Therapy, 23,
179–186.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).
Hillsdale, NJ: Lawrence Erlbaum Associates.
DiZazzo-Miller, R., Samuel, P. S., Barnas, J. M., & Welker, K. M.
(2014). Addressing everyday challenges: Feasibility of a family caregiver training program for people with dementia. American Journal of
Occupational Therapy, 68(2), 212–220. doi:10.5014/ajot.2014.009829
Haga, M. (2009). Physical fitness in children with high motor competence is different from that in children with low motor competence.
Physical Therapy, 89, 1089–1097.
Hall, K. M., Cohen, M. E., Wright, J., Call, M., & Werner, P. (1999).
Characteristics of the Functional Independence Measure in traumatic spinal cord injury. Archives of Physical Medicine and Rehabilitation, 80, 1471–1476.
Jeon, B. J. (2013). The effect of obesity on fall efficacy in elderly people.
Journal of Physical Therapy Science, 24, 1485–1489.
Kim, K., Kim, Y. M., & Kim, E. K. (2014). Correlation between the
activities of daily living of stroke patients in a community setting and
their quality of life. Journal of Physical Therapy Science, 26, 417–419.
Pegorari, M. S., Ruas, G., & Patrizzi, L. J. (2013). Relationship between
frailty and respiratory function in the community dwelling elderly.
Brazilian Journal of Physical Therapy, 17, 9–16.
Shakeri, H., Keshavarz, R., Arab, A. M., & Ebrahimi, I. (2013). Clinical
effectiveness of kinesiological taping on pain and pain-free shoulder
range of motion in patients with shoulder impingement syndrome:
A randomized, double-blinded, placebo-controlled trial. International Journal of Sports and Physical Therapy, 3, 800–810.
Tjaden, K., & Wilding, G. (2011). Speech and pause characteristics
associated with voluntary rate reduction in Parkinson’s disease and
multiple sclerosis. Journal of Communication Disorders, 44, 655–665.
Watson, A. H., Ito, M., Smith, R. O., & Andersen, L. T. (2010). Effect
of assistive technology in a public school setting. American Journal of
Occupational Therapy, 64, 18–29.
27/10/16 2:02 pm
5
“The world will not stop and think—it never does, it is not its way; its way is to
generalize from a single sample.”
—Mark Twain
Validity
What Makes a Study Strong?
CHAPTER OUTLINE
LEARNING OUTCOMES
Regression to the Mean Threats
KEY TERMS
Testing Threats
INTRODUCTION
Instrumentation Threats
VALIDITY
Experimenter and Participant Bias Threats
STATISTICAL CONCLUSION VALIDITY
Attrition/Mortality Threats
Threats to Statistical Conclusion Validity
Fishing
Low Power
INTERNAL VALIDITY
Threats to Internal Validity
EXTERNAL VALIDITY
Threats to External Validity
Sampling Error
Ecological Validity Threats
INTERNAL VERSUS EXTERNAL VALIDITY
Assignment and Selection Threats
CRITICAL THINKING QUESTIONS
Maturation Threats
ANSWERS
History Threats
REFERENCES
LEARNING OUTCOMES
1. Detect potential threats to statistical conclusion validity in published research.
2. In a given study, determine if the researcher adequately managed potential threats to statistical conclusion
validity.
3. Detect potential threats to internal validity in published research.
4. In a given study, determine if the researcher adequately managed potential threats to internal validity.
5. Detect potential threats to external validity in published research.
6. In a given study, determine if the researcher adequately managed potential threats to external validity.
81
4366_Ch05_081-102.indd 81
27/10/16 5:14 pm
82
CHAPTER 5 ● Validity
KEY TERMS
alternative treatment
threat
assignment threat
attrition
Bonferroni correction
compensatory
demoralization
compensatory equalization of treatments
matching
maturation threat
mortality
order effect
participant bias
power
practice effect
Pygmalion effect
convenience sampling
random assignment
covary
random sampling
ecological validity
regression to the mean
effectiveness study
replication
efficacy study
response rate
experimenter bias
Rosenthal effect
external validity
sampling error
fishing
selection threat
Hawthorne effect
statistical conclusion
validity
history threat
instrumentation threat
internal validity
testing effect
validity
INTRODUCTION
T
he purpose of this chapter is to make sure that you don’t
do what Mark Twain suggests in the opening quote and
generalize from single research samples. You will learn how
to stop and think about data both from a single sample and
multiple samples, with the goal of using the information in
evidence-based practice.
When evaluating the strength of evidence, there are
certain axioms that practitioners tend to rely on, such as
“Randomized controlled trials are the strongest design”
and “Large sample sizes provide more reliable results.”
Although true in many cases, there can be exceptions. To be
a critical consumer of research, it is essential to understand
the whys behind these assertions. Why is a large sample
size desirable? Why are protections inherent in randomized
controlled trials? Even with a large-sample, randomized, controlled trial, other factors may compromise the validity of the
study.
4366_Ch05_081-102.indd 82
This chapter explains the concept of validity, describes
different threats to validity, and identifies possible solutions to these threats. This information will increase your
ability to critically appraise research. If you have a good
grasp of the possible threats to validity and the ways in
which these threats can be managed, you will be able
to evaluate the strength of evidence and become an
evidence-based professional.
VALIDITY
When thinking about the validity of a study, consider the
terms truthfulness, soundness, and accuracy. Validity is an
ideal in research that guides the design, implementation,
and interpretation of a study. The validity of a study is
enhanced when sound methods allow the consumer to
feel confident in the findings. The validity of a study
is supported when the conclusions drawn are based on
accurate interpretations of the statistics and not confounded with alternative explanations. The inferences
that are drawn from a study will have greater validity if
they are believable and reflect the truth. This chapter
describes three types of research validity: (1) statistical
conclusion validity, (2) internal validity, and (3) external
validity. Chapter 6 addresses a different type of validity
concerned with assessments used by researchers.
STATISTICAL CONCLUSION VALIDITY
Statistical conclusion validity refers to the accuracy of
the conclusions drawn from the statistical analysis of a
study. Recall that with most inferential statistics, a p value
is calculated; conventionally, if the p value is < 0.05, the
conclusion is one of statistical significance (i.e., there is a
statistically significant difference or there is a statistically
significant relationship). As an evidence-based practitioner, there may be reasons why you should question the
researchers’ conclusions that are presented in a research
article.
Threats to Statistical Conclusion Validity
In Chapter 3, mistaken statistical conclusions were described in terms of Type I and Type II errors. As an
evidence-based practitioner, you can identify potential
errors by increasing your awareness of research practices
that lead to error. Specific threats to statistical conclusion
validity, their relationship to error type, and methods researchers use to protect research from those threats are
described in this chapter. Table 5-1 outlines the threats
to statistical conclusion validity, confounding factors that
interfere with statistical conclusion, and methods for
protecting against these threats.
27/10/16 5:14 pm
CHAPTER 5 ● Validity
83
TABLE 5-1 Threats to Statistical Conclusion Validity and Their Protections
Confounding Factor That Interferes
With Statistical Conclusion
Threat
Type of Error
Fishing
Type I
• Researcher searches data for interesting findings that go beyond the
initial hypotheses.
• Conclusions may be due to chance.
• Use statistical methods that adjust
for multiple analyses.
• Conduct a second study to test
the new hypothesis with different
participants.
Low power
Type II
• A difference or relationship exists,
but there is not enough statistical
power to detect it.
• Increase alpha level.
• Ensure that intervention is adequately administered to obtain
optimal effect size.
• Increase sample size.
Fishing
Fishing is a euphemism that refers to looking for findings that the researcher did not originally plan to explore.
Ideally, when a researcher conducts a study, a hypothesis is developed before collecting data. Once the data
are collected, a statistical analysis is applied to test the
hypothesis. However, not infrequently, researchers will
explore existing data in what is sometimes called a “fishing expedition” or “mining for data.” In other words, the
researcher is letting the data lead the way toward interesting findings. Although there are legitimate reasons
for delving into the data, the risk in fishing is that the
researcher will see interesting differences or relationships
that may not be true and instead are due only to chance.
In other words, the researcher has committed a Type I
error by finding a difference that does not exist.
Typically many analyses are conducted in a researcher’s search for findings. Previously, you learned that
when alpha is set at 0.05, the researcher is willing to
take a 5% risk that the difference or relationship is not
true but is due to chance. However, this applies only
to a single analysis. Each time another analysis is performed, there is a greater risk that the finding is due to
chance. Researchers often explore their data for unexpected findings, which can lead to important discoveries.
However, protections should be in place so that chance
findings are not misleading.
Protection Against Fishing Threats
As an evidence-based practitioner, you may suspect that
a fishing expedition has occurred when the results of the
study are not presented in terms of answers to a research
hypothesis. A straightforward researcher may acknowledge
4366_Ch05_081-102.indd 83
Protection
the exploration and, if it is a robust study, will describe how
threats to Type I error were addressed.
One way that researchers can protect against fishing
threats is to use statistical procedures that take into account multiple analyses. There are many such procedures,
but the simplest one conceptually is the Bonferroni
correction. With the Bonferroni correction, the alpha
level of 0.05 is adjusted by dividing it by the number of
comparisons. For example, if six comparisons were made,
0.05/6 = 0.0083, meaning that the acceptable alpha rate
is much lower and much more conservative than the
initial 0.05.
Another method that protects against fishing threats
involves conducting another study to test the new hypothesis discovered when the data were explored. For example, consider a researcher who tested a new intervention
and found that the initial analyses did not show the intervention to be more effective than the control. However,
upon deeper analysis, the researcher discovered that men
experienced a significant benefit, whereas women stayed
the same. A new study could be conducted to test this
hypothesis. If the second study resulted in the same findings, there would be stronger evidence to conclude that
only men benefit from the intervention.
Low Power
Power is the ability of a study to detect a difference or
relationship. Power is based on three things: sample size,
effect size, and alpha level. The larger the sample is, the
more powerful the study is. It is easier to detect a difference when you have many participants. Likewise, if
you have a large effect, you will have greater power. If
an intervention makes a major difference in the outcome,
27/10/16 5:14 pm
84
CHAPTER 5 ● Validity
it will be easier to detect that difference than if an intervention makes only a minor difference.
Recall from Chapter 3 that a Type II error occurs when
no difference is found, but in actuality a difference is present. This occurs because of low power and is most often
the result of small sample size. When you review a study
with a small sample size that does not find a difference or
a relationship, low power is a potential threat to statistical
conclusion validity. However, it is also possible that, even
with a large sample, the researcher does not find a difference or relationship.
Protection Against Low Power Threats
Power can be increased by changes in the alpha level,
effect size, or sample size. In exploratory analyses, the
researcher may utilize a higher alpha level, such as 0.10
instead of 0.05; however, in doing so, the researcher takes
a greater chance of making a Type I error. It is more difficult to change the effect size, but the researcher needs
to ensure that everything is in place to test whether the
intervention is effective (e.g., trained individuals administer the intervention, strategies that foster adherence are
used, etc.).
The simplest way to increase the power of a test is to
increase sample size. However, it can be costly in terms
of both time and resources to conduct a study with a large
sample. Researchers often conduct a power analysis to determine the smallest sample possible to detect an effect
given a set alpha level and estimated effect size.
The potential for Type II errors provides a strong rationale for using large samples in studies. With a large
sample, a researcher is unlikely to make a Type II error.
However, there are additional benefits to having a large
sample size. With a large sample, outliers are less likely
to skew the results of a study. For example, consider the
average of the following six scores on an assessment:
5 ⫹ 4 ⫹ 5 ⫹ 3 ⫹ 26 ⫹ 5 ⫽ 48/6 ⫽ 8
The score of 26 is an outlier, when considering the
other scores. The mean score for this sample of 6 is 8.
When you look at each individual participant, 8 is a much
higher score than the majority of the participants received.
A single outlier misrepresents the group as a whole.
Now consider a sample of 40 participants:
sample represent the population, but it shows that
more individuals are likely to respond when invited to
complete the survey. The response rate is the number of individuals who respond to a request to participate in a survey or other research endeavor. The
larger the response, the more accurate the results. In
the case of survey research, the response rate is determined by dividing the number of surveys that were
completed by the number of surveys that were administered. For example, if 200 surveys were sent out, and
150 people completed and returned them, the response
rate would be 150/200 = 75%.
Individuals who choose not to participate in a study
may opt out of the study for a particular reason and, in
doing so, bias the results. For example, if you are conducting a satisfaction survey for your therapy program and
only 25% of your clients respond, it is possible that the
individuals who responded are either highly dissatisfied
or highly satisfied and therefore more motivated to voice
their opinions.
EXERCISE 5-1
Identifying Threats to Statistical
Conclusion Validity (LO1 and LO2)
Read the following scenario and identify which
practices present potential threats to statistical conclusion validity. Suggest methods for controlling
these threats.
A new researcher who is a therapist wants to collect
data to examine the efficacy of three different orthoses for
a specific hand condition. The researcher plans to recruit
clients from her clinic and expects that approximately
10 individuals will have the hand condition of interest.
The following outcomes will be measured: pain, range of
motion, fine motor control, and function. The researcher
has no expectation as to which orthosis will provide the
better outcome.
QUESTIONS
1. Why are there fishing threats?
5 ⫹ 4 ⫹ 5 ⫹ 3 ⫹ 26 ⫹ 4 ⫹ 3 ⫹ 4 ⫹ 4 ⫹ 5 ⫹ 4 ⫹
5⫹5⫹4⫹5⫹3⫹3⫹4⫹3⫹4⫹4⫹5⫹
4⫹5⫹5⫹4⫹5⫹3⫹4⫹3⫹3⫹4⫹5⫹
4 ⫹ 3 ⫹ 5 ⫹ 4 ⫹ 3 ⫹ 3 ⫹ 4 ⫽ 183/40 ⫽ 4.58
The outlier has a weaker effect on the group as a
whole, and the mean for this sample is more in line with
the typical scores.
Another benefit of a large sample is that, the larger
the sample, the more likely it is that the sample will
represent the population. This fact is particularly
relevant for survey research. Not only will a large
4366_Ch05_081-102.indd 84
2. How could the researcher address the fishing
threats?
27/10/16 5:14 pm
CHAPTER 5 ● Validity
3. Why are there threats due to low power?
4. How could the researcher address the low power
threats?
85
However, there is always the possibility that there is an
alternative explanation for the study results. Perhaps
the difference was due to chance. Or it could be that
the attention the children received is what made the
difference, and not the intervention itself. Perhaps the
individuals who administered the outcome assessments
were biased and tended to give higher scores to the individuals in the intervention group. Although you can
never be certain that the results of a study are entirely
accurate, certain features of the study can greatly increase your confidence in its accuracy and validity.
Threats to Internal Validity
INTERNAL VALIDITY
When evaluating a study for evidence, it is necessary
to consider internal validity and how it may affect the
study outcomes. A study has internal validity when the
conclusions drawn from the results are accurate and
true. Validity is not an either/or situation, but rather a
matter of degree. For example, a study that examines
the effectiveness of social stories in children with autism concludes that children in the intervention group
had greater improvement in their social skills than
children in the control group. If the study were internally valid, this would mean that it was truly the social stories intervention that improved the social skills.
When examining internal validity, ask yourself, “Is
there an alternative explanation for these study results?” Alternative explanations are often referred to as
“threats” to internal validity. This section of the chapter characterizes common threats to internal validity,
describes protections or solutions to avoid or minimize
those threats, and identifies types of research situations in which these threats are most likely to occur.
Table 5-2 summarizes the threats to internal validity
and their protections.
Assignment and Selection Threats
Threats to internal validity can occur when a bias is
present during the process of assigning or selecting
TABLE 5-2 Threats to Internal Validity and Their Protections
Threat
Confounding Factor Affecting
Outcome/Alternative Explanation
Protection
Maturation
• Changes occur over time in participants as a result of development or
healing.
• Use control groups.
• Ensure baseline equivalence through
random assignment or participant
matching.
Assignment/
Selection
• Groups are not equal on some
important characteristics.
• Random assignment
• Participant matching
• Statistical procedures such as
covariance
History
• Events occur between the pretest
and posttest.
• Use control groups.
• Ensure short time between pretest and
posttest.
• Ensure protection against exposure to
alternative therapies.
Continued
4366_Ch05_081-102.indd 85
27/10/16 5:14 pm
86
CHAPTER 5 ● Validity
TABLE 5-2 Threats to Internal Validity and Their Protections (continued)
Threat
Confounding Factor Affecting
Outcome/Alternative Explanation
Protection
Regression
to the mean
• Extreme scores change and move
toward the mean with repeated
testing.
• Use control groups.
• Exclude outliers.
• Take the average of multiple
measurements.
Testing/practice/
order effects
• Performance on measures changes due
to exposure or some other feature of
the testing experience.
• Use control groups.
• Use measures with good test/retest
reliability
• Use alternate forms of measures.
• Counterbalance the order of measures.
• Take breaks if fatigue is anticipated.
Instrumentation
• Invalid or unreliable measures, tester
error, or poor condition of the instrument result in inaccurate outcomes.
• Use measures with good reliability and
validity.
• Use measures that are sensitive to
change.
• Train the testers.
• Maintain the instruments.
• Blind the tester.
Participant and Experimenter Bias Threats
Rosenthal/
Pygmalion effect
• Intervention leaders expect participants to improve, or participants
expect to improve.
• When a control group is involved,
the expectation is that the experimental group will perform better than the
control group.
• Blind the intervention leaders and/or
participants.
• Limit contact between intervention
and control leaders and participants.
• Establish and follow protocols for
interventions.
Compensatory
equalization
of treatment
• Intervention leader’s behavior
encourages participants in the
control group to improve to equal
the intervention group, or control
group participants are motivated
to compete with the intervention
group.
• Blind intervention leaders and/or
participants.
• Limit contact between intervention
and control leaders and participants.
Compensatory
demoralization
• Intervention leader’s behavior
discourages the control group,
or participants are discouraged
because of being in the control
group.
• Blind intervention leaders and/or
participants.
• Limit contact between intervention
and control leaders and participants.
Continued
4366_Ch05_081-102.indd 86
27/10/16 5:14 pm
CHAPTER 5 ● Validity
87
TABLE 5-2 Threats to Internal Validity and Their Protections (continued)
Threat
Hawthorne
effect
Attrition/Mortality
Confounding Factor Affecting
Outcome/Alternative Explanation
• Participants improve because of
attention received from being in a
study.
• Blind intervention leaders and/or
participants.
• Ensure equal attention for intervention
and control groups.
• Participants who drop out affect the
equalization of the group or other
characteristics of participants.
• Employ strategies to encourage attendance or participation.
• Use statistical estimation for missing
data.
• Perform intent to treat analysis.
participants for groups. If there are differences between
the groups on a baseline characteristic, this difference
might affect the outcomes of the study.
Demographic characteristics, illness/condition issues,
medication, and baseline scores on the outcome measures
are examples of variables that should be considered
when examining assignment and selection threats, because these characteristics can account for differences in
outcome. An assignment threat indicates that the bias
occurred when groups were assigned, whereas a selection threat indicates the bias occurred during selection—
either the selection of participants or the selection of
sites. For example, Killen, Fortmann, Newman, and
Varady (1990) found that men responded better than
women to a particular smoking cessation program. If at
baseline there were more men in the intervention group
and more women in the control group, the study would
be biased toward finding more positive results than
would exist with equal distributions of gender across the
two groups. In another example, a splinting study for
cerebral palsy may find that one group is receiving more
antispasticity medication than another. The medication
could then account for some or all of the differences in
the outcomes.
Comparisons of the intervention and control groups
at baseline should be provided in the initial section of
the results section of a study so the reader can determine
if there are differences between the groups. A table is
typically provided that outlines the comparison of the
groups on important demographic characteristics as well
as the baseline scores of the outcome variables. An example of this type of comparison is shown in From the
Evidence 5-1. The table comes from an intervention
study examining the efficacy of an education and exercise
program to reduce the chronicity of low back pain. The
4366_Ch05_081-102.indd 87
Protection
table compares the intervention and control groups at
baseline (del Pozo-Cruz et al, 2012).
Protection Against Assignment Threats
Random assignment is the primary protection method
used by researchers against assignment threats. In random assignment to groups, each research participant has
an equal chance of being assigned to the available groups
for an intervention or control. There are times when researchers are reluctant to use random assignment to a
no-intervention control group, because it may be considered unethical to withhold treatment from a group.
Sometimes this concern is managed by using a wait-list
control group. The control group eventually receives the
intervention, but not until the intervention group has
completed the treatment.
Random assignment does not ensure equal distribution, which is particularly true with small samples in
which extremes in a few individuals can greatly influence
the group results. However, it works particularly well
with larger samples because you are more likely to form
equivalent groups. Therefore, when evaluating evidence,
examine the group comparisons presented in the results
section of the study.
Sometimes researchers use strategies such as matching study participants to ensure equal distribution on a
particularly important characteristic. For example, if the
researcher knows that the outcomes are likely to be influenced by a characteristic such as level of education, symptom severity, or medication, potential participants are
identified and matched on the variable of interest, with
one randomly assigned to the intervention group and one
randomly assigned to the control group.
Another procedure that can be used to minimize assignment threats is statistical equalization of groups. If
27/10/16 5:14 pm
88
CHAPTER 5 ● Validity
FROM THE EVIDENCE 5-1
Table Comparing Study Groups
del Pozo-Cruz, B., Parraca, J. A., del Pozo-Cruz, J., Adsuar, J. C., Hill, J., & Gusi, N. (2012). An occupational, Internet-based intervention
to prevent chronicity in subacute lower back pain: A randomized controlled trial. Journal of Rehabilitation Medicine, 44(7), 581–587.
doi:10.2340/16501977-0988.
Table I. Baseline Characteristics of Participants in the Study (n = 90)
Group
Age (years)
Sex (%)
Male
Female
Smokers, yes/no, %
Roland Morris Questionnaire
score, points
TTO, points
SBST total score, points
SBST psychological score, points
Control group
(n = 44)
Intervention group
(n = 46)
p
Mean (SD)
45.50 (7.02)
Mean (SD)
46.83 (9.13)
0.44
11.4
88.6
50/50
11.65 (2.14)
15.2
84.8
56.5/43.5
12.28 (2.63)
0.59
0.53
0.22
0.78 (0.08)
4.38 (1.67)
2.36 (1.03)
0.75 (0.11)
4.36 (1.28)
2.28 (0.98)
0.23
0.95
0.70
p-values from t-test for independent measures or 2 test.
TTO: Time Trade Off; SBT: STarT Back Tool; SD: standard deviation.
Note A: The SBST
is an outcome
measure for the study.
Note B: The p value is above 0.05 for all
comparisons, indicating that the two groups
are comparable (i.e., there are no statistically
significant differences) at baseline. This is
particularly true of the SBST total score.
FTE 5-1 Question 1 Are the two groups equivalent on all key characteristics—both demographic variables and
outcome variables? How do you know?
one group is older than another group, age can be covaried in the statistical analysis so that it does not influence
the outcomes. If, when reading the initial section of the
results section, you find that the groups are not equal on
one or more important characteristics (which sometimes
occurs, even with random assignment), check to see if the
researcher handled this by covarying that variable, or at
least acknowledging the difference in the limitations section of the discussion.
Maturation Threats
Maturation is a potential threat in intervention research
involving health-care practitioners. Maturation refers to
changes that occur over time in research participants.
4366_Ch05_081-102.indd 88
Two major types of maturation threats are particularly
common in health-care research: (1) changes that occur
as part of the natural growth process, which is particularly
relevant for research with children; and (2) changes that
occur as a result of the natural healing process, which is
particularly relevant for research related to diseases and
conditions in which recovery is expected. In other words,
is it possible that if left alone the research participants
would have changed on their own? Maturation is of greatest concern when the time period between the pretest and
posttest is prolonged, such as during longitudinal studies
or studies with long-term follow-up.
To illustrate the maturation threat, consider a study
that examines an intervention for children with language
delays. The study finds an improvement in language from
27/10/16 5:14 pm
CHAPTER 5 ● Validity
the pretest to the posttest; however, without adequate protection from other influences, it is difficult to determine
whether the intervention caused the improvement or the
change occurred as a result of developmental changes in
language. Maturation would be an even greater concern
if the study extended over a significant period of time,
such as throughout a school year. Similarly, an intervention study examining changes in mobility for individuals
after hip replacement would need to take into account
the maturation threat, because individuals can experience
improved mobility without therapy.
Maturation is in play whether the natural changes are
positive or negative. When conditions result in a natural
decline, the goal of therapy is often to reduce the speed
with which that decline occurs. For example, if a therapist is using a cognitive intervention for individuals with
Alzheimer’s disease, it would be challenging to determine
if a decline were less severe than would have occurred
naturally over the course of the illness. However, studies
can be designed with the proper protections to determine
whether a particular intervention reduces the natural
course of a decline in functioning.
Protections Against Maturation Threats
The primary protection against maturation threats is use
of a control group. If the intervention group improves
more than the control group, the difference between the
two groups is more likely to be due to the intervention,
even if both groups improve over time. The degree of improvement that the intervention group makes above and
beyond the control group is likely due to the intervention.
Another protection against maturation threats is outcome scores that are similar at baseline for the control and
intervention groups. This allows you to be more confident that the groups start out at a similar place, and makes
interpretations of changes from pretest to posttest more
straightforward.
Random assignment and matching of participants are
additional strategies that increase the likelihood that the
89
groups will be equal at baseline. (Random assignment
and matching are described in detail in the section on
assignment and selection threats.) In the results section
of a research study, typically the first report of results is
the comparison of the intervention and control groups;
this includes pretest scores on the outcomes of interest
and demographic variables that could affect the findings.
Finally, you can be more certain that maturation is not a
factor when the time between the pretest and posttest is
short and when it is unlikely that changes would occur
without an intervention.
History Threats
A history threat involves changes in the outcome or dependent variable due to events that occur between the
pretest and posttest, such as a participant receiving an
unanticipated treatment or exposure to an activity that
affects the study outcome. In this case, the threat may also
be referred to as an alternative treatment threat. For
example, participants in a fall prevention program may
start attending a new senior center that provides exercise
classes with an emphasis on strength and balance.
In fact, any external event that can affect the dependent
variable is a potential threat. A new teacher in a classroom
who uses an innovative approach, participation in a health
survey that draws attention to particular health practices,
or a new fitness center opening in the participants’ neighborhood could pose a threat to internal validity.
History can also have a negative effect on outcomes.
A snowstorm might affect attendance, or scheduling
a weight-loss program around the Thanksgiving and
Christmas holidays could interfere with desired outcomes
and act as a threat to internal validity.
Protections Against History Threats
History threats are avoided by many of the same strategies
that are used to protect against maturation effects. The
use of a control group provides protection, as long as both
groups have the same potential exposure to the historical
FROM THE EVIDENCE 5-1 (CONT.)
FTE 5-1 Question 2 Using this example, why is it important that participants in the intervention and control groups
have similar scores at baseline on the STarT Back Tool? How does equivalence at baseline protect against maturation
threats?
4366_Ch05_081-102.indd 89
27/10/16 5:15 pm
90
CHAPTER 5 ● Validity
event. Likewise, history is reduced as a threat when there
is a shorter time between pretest and posttest. Researchers can also put protections in place to reduce exposure
to alternative treatments, such as requiring participants
to avoid alternative exercise programs or drug therapies
while involved in the study. The researcher can include
questionnaires or observations to help determine if events
occurred that might affect the outcome.
Regression to the Mean Threats
Regression to the mean refers to a phenomenon in
which extreme scores are likely to move toward the average when a second measurement is taken; extremely high
scores will become lower, and extremely low scores will
become higher. When taking a test for a second time, it
is always possible—even likely—that you will not receive
the exact same score. This phenomenon is especially predictable in individuals who initially score at the extremes
of the distribution. At the ends of the distribution, it is
less likely that a second test score will become even more
extreme; instead, extreme scores tend to regress toward
the mean. The “Sports Illustrated curse” serves as a case in
point. It is often observed that after someone is featured
in Sports Illustrated, that individual has a decline in performance. Regression to the mean would explain this observation, because the individual athlete is likely featured
when he or she is at a peak of performance and superior to
most if not all other athletes in that sport. Consequently,
subsequent performance is likely to move toward the average, rather than improve.
In health-care research, study participants often start
with extreme scores because of their condition. Therefore, when extreme scores are involved, regression to the
mean should be considered a potential threat. Figure 5-1
depicts the normal curve and illustrates the propensity for
extreme scores to regress toward the mean; the extreme
scores toward both ends of the continuum move toward
the middle.
Protection Against Regression to the Mean Threats
Similar to history and maturation threats, regression to
the mean is protected against by use of a control group.
Once again, if the treatment group outperforms the control group, the difference between the groups is most
likely due to the intervention. The importance of a control group should be more and more apparent; control
groups are valuable because they address multiple threats
to validity.
One other option is for researchers to exclude outliers
from a study, although this tactic is not feasible when
large numbers of participants could be classified as outliers. When small samples are necessary, the threat posed
by an extreme score at baseline may be reduced by taking
multiple pretest measures and using the average. For example, waist circumference can be challenging to measure
accurately, so during testing, three measures may be taken
and then averaged.
Testing Threats
Testing as an internal validity threat occurs when changes
in test performance are a result of the testing experience
itself. A testing effect is present when an earlier experience somehow affects a later testing experience. There are
many different ways in which this can occur.
The testing experience often sensitizes participants to
a desirable outcome. For example, the pretest may ask
questions about following a home program, so the individual becomes sensitized to this behavior and begins
following the program (as a result of the test, not the intervention). In another example, pedometers and other
devices are often used as a measure of physical activity.
The simple act of wearing the pedometer can influence
how far an individual walks because the presence of the
pedometer motivates the person to walk more, especially
when the participant can see the readings. In this case, it
is the wearing of the pedometer and not the intervention
that causes the change. The tester can also influence the
outcomes of the testing with behaviors such as providing
cues to enhance performance, such as, “Try harder, you
can do a few more.”
Practice effects are a type of testing threat that occurs when exposure to the pretest allows the individual to perform better on the posttest. Prior exposure
can mean the test is more familiar, the participant is
Normal Curve
Standard Deviation
2.5%
2.5%
34%
13.5%
–3
4366_Ch05_081-102.indd 90
–2
–1
34%
0
13.5%
1
2
3
FIGURE 5-1 Normal curve, showing standard deviations and propensity for extreme scores to regress
toward the mean. The shape of the
curve suggests that individuals who
score toward the ends are more likely
to move toward the middle on a
second testing.
27/10/16 5:15 pm
CHAPTER 5 ● Validity
less anxious, and the participant can adopt a strategy for
improved performance at posttest. For example, students
who receive a handwriting test before and after a handwriting intervention may do better on the second test,
simply because of exposure and practice from the first test.
In the case of order effects, there is a change in performance based on the order in which the tests are presented. For example, in a long testing session there may
be a decline in performance due to fatigue.
Protection Against Testing Threats
A measure with strong test-retest reliability is a good
start for protecting against testing threats. Standardization, scripts, and training for the tester also can reduce
biases that the tester may introduce. Instead of logs or diaries that can act as prompts to engage in a behavior, time
sampling can be used to protect against testing threats.
With time sampling, individuals receive a prompt, but
the prompt is unexpected and random, and the individual or an instrument records what the person is doing at
that point in time. Control groups are also beneficial with
testing threats. If both groups receive some benefit from
exposure to the testing situation, the difference between
the control and intervention groups still represents the
intervention effect.
Some measures are more vulnerable to practice effects
than others. For example, a list learning test that assesses
memory is less reliable the second time it is administered
because the participant may remember words from the
first time. In contrast, range-of-motion testing is less
amenable to practice effects. Alternate forms are often
used when a measure can be learned, such as a list of
words. In this case alternate lists are made available, so
that the individual is tested with a different set of words.
91
many therapy studies use self-report measures to assess an
individual’s functional performance. Although these measures are easy to administer, they may not provide accurate
results. Sabbag et al (2012) found that performance measures were more accurate in assessing daily living skills in
individuals with schizophrenia than was self-report.
Another aspect of instrumentation threat is the “ceiling
effects” or “floor effects” of a measure. If ceiling effects
are in play, the participants may have such a high score
at the beginning that there is no room for improvement.
Alternatively, the test may be so difficult (floor effects)
that it is unlikely the researcher will be able to detect a
significant change.
Protection Against Instrumentation Threats
Proper selection of measures is an essential protection
against instrumentation threats prior to subject selection. It is essential to know the reliability and validity of
measures used in a study and their sensitivity to change.
Godi et al (2013) compared two balance measures—the
Mini-BESTest and the Berg Balance Scale—in terms of
their sensitivity in detecting change. They found that the
Berg Balance Scale had greater ceiling effects, suggesting
that the Mini-BESTest may be the better instrument to
use in a study examining the efficacy of an intervention to
improve balance.
Training of testers also provides protection against
instrumentation threats. If multiple testers are used,
inter-rater reliability among the testers should be
established. Electronic and mechanical measures should
receive the necessary maintenance and calibration. For
example, audiologists are particularly cognizant of the
importance of calibration, as testing of hearing impairment would be significantly compromised with a poorly
calibrated instrument.
Instrumentation Threats
Instrumentation threats occur when a measure itself or
the individual administering the measure is unreliable or
invalid. Instrumentation threats are a common problem
in research. A well-designed study is rendered useless if
instrumentation poses a threat to its validity. When using
mechanical or electronic measures, the quality, condition,
and calibration of the instruments can affect outcomes.
When an instrument is in poor condition, the measurements may be inaccurate. For example, it is recommended
that the Jamar dynamometer be professionally calibrated
on a yearly basis (Sammons Preston, n.d.).
Human error can also play a role in instrumentation
threats. For example, a tester may provide incorrect instructions to a participant or make poor judgments
when scoring an observational measure. The test itself
may be a poor choice for the study, such as when it does
not accurately measure the intended outcome; this represents an issue with validity of the test, which is a very
important feature for an intervention study. For example,
4366_Ch05_081-102.indd 91
Experimenter and Participant Bias Threats
Experimenter bias is introduced when the research process itself affects the outcomes of the study, whether intentionally or unintentionally. Experimenter bias can be
introduced by the person(s) providing the intervention.
A classic experimenter bias, known as the Rosenthal
effect or the Pygmalion effect, occurs when the researcher sets up different expectations for the intervention
and control groups. The term Rosenthal effect comes from
an experiment by Rosenthal and Jacobson (1968) that involved teachers. Rosenthal communicated to some teachers
that they should expect a strong growth spurt in intellectual
ability from the students, whereas other teachers were not
given this information. Students performed better when
the teachers expected them to perform better. This study
was set up to study this effect, but the same phenomenon
can occur unintentionally when an intervention leader
communicates an expectation of better outcomes from
the intervention group. The higher expectations become a
27/10/16 5:15 pm
92
CHAPTER 5 ● Validity
self-fulfilling prophecy. Perhaps the leader provides more
attention or enthusiasm, or works harder at providing the
intervention, or the participants pick up on the leader’s expectations and respond in kind.
Just being assigned to a particular group can introduce
an experimenter bias. For example, without the leader’s
prompting, the control participants may want to compensate for not being picked for the intervention. In many
rehabilitation studies, the control group receives standard
treatment or “treatment as usual.” If the control group is
aware that the intervention group is receiving something
new, they may try to compensate for this difference.
Another bias that can be introduced by the experimenter
is compensatory equalization of treatments. In this case,
the intervention leaders for the control group may feel compelled to work harder to compensate for the fact that the
control group is not receiving the intervention. This type of
bias is similar to the Rosenthal effect, but directed toward
the control group. The control group may also respond
in the other direction and feel discouraged because they
are not receiving the intervention. In response, they may
not try as hard or give up. This threat to validity is called
compensatory demoralization. The threats of compensatory equalization and demoralization are more likely to
occur when the leaders and/or participants of the control
and treatment groups interact with one another.
Participant bias threats come into play when the participant’s involvement in the study affects the outcomes.
The Hawthorne effect occurs when participants respond
to the fact that they are participating in a study and not
the actual intervention (Mayo, 1949). The term comes
from research conducted at the Hawthorne electric plant.
Many variables were studied to determine what factors
might affect productivity, such as lighting or changes in
workstations. No matter what was studied and how insignificant the change, there was a change in productivity.
It was concluded that the change and not the actual condition was resulting in greater productivity. The Hawthorne effect may occur because participants behave as
expected or want to please the researcher.
In an interesting study examining the efficacy of ginkgo
biloba in Alzheimer’s disease, McCarney and colleagues
(2007) examined follow-up as a confounding variable influenced by the Hawthorne effect. Some participants
had minimal follow-up, whereas others had intensive
follow-up. In other words, the intensive follow-up group
received more attention from the researchers. The results
indicated that participants receiving intensive follow-up had
greater cognitive improvement than participants receiving
minimal follow-up. This result is a particularly remarkable
example of the Hawthorne effect, given that cognition
was measured using an objective, standardized assessment
(ADAS-cog) that included 11 cognitive tasks, such as word
recall, orientation, and the ability to follow commands.
Protection Against Experimenter
and Participant Bias Threats
Unlike many of the previous threats to validity, random
assignment to an intervention and control group does not
protect against experiment and participant bias. However, blinding of the intervention leaders and participants
provides a strong protection against these threats. If the
leaders and participants do not know whether they are providing or receiving the intervention, it is more difficult to
introduce a bias. This is a common approach in drug trials
in which a placebo is used in place of the actual medication.
In rehabilitation research, it is often difficult to blind
intervention leaders and participants; therapists will know
EVIDENCE IN THE REAL WORLD
How Lack of Blinding Participants Can Lead to Compensatory Equalization
In rehabilitation and therapy practices, it is difficult and in many cases impossible to blind the intervention leaders
and participants to which group is receiving the experimental intervention. If you are the intervention leader, you
have to know what you are leading (as opposed to offering a placebo pill); if you are a participant, you will likely
know what intervention you are participating in.
In a real-life example, I was administering a weight-loss program for individuals with psychiatric disabilities.
Participants were randomly assigned to either an intervention or a no-treatment control group. After the informed
consent process, participants were told which groups they were assigned to. Some control participants voiced a
desire to show the researchers that they could lose weight on their own, apparently compensating for not being
assigned to the intervention. In some cases it worked, and indeed several control participants were successful in
losing weight during the time they participated in the study.
To control for this confound, it would have been helpful for both groups to receive some form of intervention,
so that neither group felt compelled to prove something based on their group assignment. As a reader of evidence,
it is often hard to discern when individuals are responding to experimenter or participant bias; however, it is useful
to know that the protections discussed in this section are in place to protect against potential problems.
4366_Ch05_081-102.indd 92
27/10/16 5:15 pm
CHAPTER 5 ● Validity
they are providing an intervention and will typically know
when they are providing the experimental intervention. A
form of a placebo is provided in some rehabilitation studies when the control group receives an intervention that
equalizes attention. When a new intervention is compared
with standard therapy and when both groups receive the
same amount of intervention time, experimenter and participant bias is less of a concern. Therefore, equal attention to groups is generally preferable to a no-treatment
control. However, participants may know through the
informed consent process that they are receiving the
experimental intervention. Typically when individuals
volunteer to be in a study, they want to receive the new
intervention, so this can lead to disappointment if they are
not assigned to the experimental condition.
Other methods can be used to minimize experimenter
and participant bias. Clear protocols for the administration of the interventions can reduce bias. In some cases
the therapists may be expected to follow the protocol and
ideally do not know which intervention is expected to
yield superior results. It is also helpful to limit interactions between intervention leaders to further prevent the
development of bias. Similarly, keeping the participants
in the intervention and control groups separate diminishes bias on the part of those receiving the treatment. In
addition, it is often possible to blind the individuals who
administer the outcome assessments in order to reduce or
eliminate bias in scoring the assessments.
Attrition/Mortality Threats
Threats due to attrition, also called mortality, involve the
withdrawal or loss of participants during the course of the
study. The process of informed consent acknowledges that
participants can withdraw from a study at any point in time.
Individuals withdraw from studies for many reasons, which
may or may not relate to the study itself. Individuals may
move or experience other personal issues that require withdrawal. Others may withdraw because they are no longer
interested in the study, find the time commitment too great,
or feel disappointed in the intervention. When people withdraw from a study, it can affect the equalization of groups
that was achieved at the outset. When substantial numbers
of participants withdraw from a study, group differences can
emerge that confound the results of the study.
When attrition occurs, it is important to identify any
characteristics of the individuals who dropped out of the
study that might make them different from the individuals
who remained in the study. Perhaps the individuals who
dropped out were experiencing a more severe condition,
in which case you would not know if the intervention was
effective for that group of individuals. Attrition may also
result in an uneven number of participants in the groups.
Protections Against Attrition/Mortality Threats
Depending on the length of the study and access to participants, a researcher may be able to recruit additional
4366_Ch05_081-102.indd 93
93
participants to replace individuals who drop out and thus
maintain the overall power of the study. In addition,
strategies such as reminder phone calls and e-mails can
be used to promote good attendance for an intervention
or follow-through with therapy.
Characteristics of the individuals who withdraw should
be compared with those of the individuals who remain.
If differences exist, this factor should be identified as a
limitation of the study. Statistical procedures can be used
to account for attrition/mortality threats, such as using
estimates for missing data, but this approach is less desirable than having actual participant scores. An “intent
to treat” analysis can be used in which the data of individuals who did not receive the intervention are still included in the analysis. This analysis is similar to real-life
practice, in which some individuals do not complete or
follow through with all aspects of their treatment. It also
maintains the integrity of the randomization process and
baseline equality of groups.
EXERCISE 5-2
Detecting Potential Threats to Internal
Validity in a Research Study (LO3 and LO4)
Analyze the following two study abstracts and determine which threats to internal validity (among the
options provided) are likely to be present. Before
looking at the answers, write down a rationale for
why you do or do not think a particular threat may
confound the interpretation of the results. In other
words, would the threat suggest that something other
than the intervention resulted in the improvement?
You can find the answers at the end of the chapter.
STUDY #1
Hohler, A. D., Tsao, J. M., Katz, D. I., Dipiero, T. J., Hehl, C. L., Leonard,
A., . . . Ellis, T. (2012). Effectiveness of an inpatient movement disorders program for patients with atypical parkinsonism. Parkinson’s Disease (2012),
871-974. doi:10.1155/2012/871974 (Epub 2011 Nov 10).
Abstract
This paper investigated the effectiveness of an inpatient movement disorders program for patients
with atypical parkinsonism, who typically respond
poorly to pharmacologic intervention and are challenging to rehabilitate as outpatients. Ninety-one
patients with atypical parkinsonism participated in
an inpatient movement disorders program. Patients
received physical, occupational, and speech therapy
for 3 hours/day, 5 to 7 days/week, and pharmacologic
adjustments based on daily observation and data.
Differences between admission and discharge scores
were analyzed for the functional independence measure (FIM), timed up and go test (TUG), two-minute
27/10/16 5:15 pm
94
CHAPTER 5 ● Validity
walk test (TMW), Berg balance scale (BBS) and
finger tapping test (FT), and all showed significant
improvement on discharge (P > .001). Clinically significant improvements in total FIM score were evident in 74% of the patients. Results were similar for
ten patients whose medications were not adjusted.
Patients with atypical parkinsonism benefit from an
inpatient interdisciplinary movement disorders program to improve functional status.
Consider this:
teasing. At posttesting, the CFT group was superior to the DTC group on parent measures of social
skill and play date behavior, and child measures of
popularity and loneliness. At 3-month follow-up,
parent measures showed significant improvement
from baseline. Post-hoc analysis indicated more
than 87% of children receiving CFT showed reliable change on at least one measure at posttest and
66.7% after 3 months follow-up.
Consider this:
• Not included in this abstract is the length of treatment.
Participants’ length of stay varied from 1 to 6 weeks,
with an average of 2.5 weeks. Also, the intervention
leaders administered the assessments.
• A working knowledge of atypical parkinsonism symptoms,
course, and treatment will be useful in identifying threats
to validity. You can obtain more information at http://
www.ncbi.nlm.nih.gov/pubmedhealth/PMH0001762/
• If you would like more information about the study,
you may want to use your library resources to obtain
the full text of the article.
• Ten participants did not complete the intervention and
therefore were not included in the follow-up data.
• The following table was included in the study, comparing the two groups at baseline.
QUESTIONS
Sample Characteristics for Children’s Friendship
Training (CFT) and Delayed Treatment Control
(DTC) Conditions
Variable
Explain your answer.
STUDY #2
Frankel, F., Myatt, R., Sugar, C., Whitham, C., Gorospe, C. M., & Laugeson,
E. (2010, July). A randomized controlled study of parent-assisted Children’s
Friendship Training with children having autism spectrum disorders. Journal
of Autism and Developmental Disorders, 40(7), 827-842. doi:10.1007/
s10803-009-0932-z
Abstract
This study evaluated Children’s Friendship Training (CFT), a manualized parent-assisted intervention to improve social skills among second to fifth
grade children with autism spectrum disorders.
Comparison was made with a delayed treatment
control group (DTC). Targeted skills included
conversational skills, peer entry skills, developing
friendship networks, good sportsmanship, good
host behavior during play dates, and handling
p
CFT
M (SD)
n = 35
DTC
M (SD)
n = 33
Age (months)
103.2
(15.2)
101.5
(15.0)
ns
Grade
3.2 (1.0)
3.4 (1.2)
ns
SESa
44.6 (10.6)
50.6 (11.8)
ns
Percent male
85.7
84.8
ns
Percent
Caucasian
77.1
54.5
ns
WISC-III
Verbal IQ
106.9
(19.1)
100.5
(15.7)
ns
ASSQ
22.4 (7.3)
22.0 (9.3)
ns
Communication 84.3 (20.5)
79.8 (15.3)
ns
Daily living
62.4 (15.7)
ns
1. Based on your reading of the abstract and any additional resources, which of the following would you
consider noteworthy threats to internal validity?
A. Maturation
B. History
C. Testing
Group
VABSb
67.0 (18.2)
Continued
4366_Ch05_081-102.indd 94
27/10/16 5:15 pm
CHAPTER 5 ● Validity
Sample Characteristics for Children’s Friendship
Training (CFT) and Delayed Treatment Control
(DTC) Conditions (continued)
Variable
Socialization
a
b
Group
CFT
M (SD)
n = 35
DTC
M (SD)
n = 33
66.3
(10.8)
66.1
(10.8)
Composite
68.1
(16.4)
64.4
(11.0)
# sessions
attended
11.3 (0.8) 10.7 (1.9)
C. Instrumentation
D. Attrition
Explain your answer:
p
ns
ns
ns
DTC n = 32
CFT n = 34
EXTERNAL VALIDITY
External validity is the extent to which the results of
a study can be applied to other people and other situations. External validity speaks to the generalizability of a
study. A study has more external validity when it reflects
real-world practice. As an evidence-based practitioner,
those studies that include conditions that are more similar
to your practice will have more external validity, and the
results can be applied with greater confidence.
Threats to External Validity
If you would like more information about the
study, you may want to use your library resources to
obtain the full-text article.
QUESTIONS
2. Based on your reading of the abstract and any additional resources, which of the following would you
consider noteworthy threats to internal validity?
A. Maturation
B. Selection
95
Threats to external validity occur when the situation or
study participants are different from the real world or
the clinical setting. As with internal validity, external
validity is a continuum; a study may have good or bad
external validity, but it will never be perfectly valid. As
an evidence-based practitioner, it is important that you
evaluate the characteristics of the people and situations
in a study to determine how similar those characteristics are to your own practice. Table 5-3 summarizes
threats to external validity and protections against those
threats.
TABLE 5-3 Threats to External Validity and Their Protections
Threat
Confounding Factor Affecting Generalizability
Protection
Sampling error
• Sample does not represent the population.
• Use random assignment.
• Use large samples.
• Select participants from multiple
sites.
• Replicate the study with new
samples.
Poor ecological
validity
• Conditions of the study are very different from
real-world practice (when the research is administered in a manner that closely mirrors real-life
practice, the results will be more generalizable).
• Ensure researcher is sensitive to
issues of real-world practice.
• Replicate with effectiveness studies.
4366_Ch05_081-102.indd 95
27/10/16 5:15 pm
96
CHAPTER 5 ● Validity
Sampling Error
A primary principle of quantitative research involves
generalizing the results from a sample to the population.
Sampling error is any difference that exists between the
population and the sample. The exact nature of sampling
error is not always known, because many characteristics
of the population are unknown. However, among known
characteristics it is possible to compare the sample with
the population to identify similarities and differences. For
example, boys are approximately five times more likely
than girls to be diagnosed with autism (CDC, 2012), although even this is an estimate from a sample. In a study
that intends to represent children with autism, a more
representative sample would be one with a similar gender
distribution.
Protections Against Sampling Error
Sampling methods influence external validity. Although
the best method for obtaining a representative sample is
to randomly select a large sample from the target population, this is not always possible. In random sampling
every individual in the population has an equal chance
of being selected. With a large random sample, you are
likely to select a group of participants that is representative of the population. Unfortunately, true random sampling rarely happens in health-care research, because it
is usually impractical to sample from an entire population. For example, in an intervention study of people with
multiple sclerosis, the population would be all individuals
with multiple sclerosis. A worldwide sampling and then
administration of the intervention would be next to impossible.
True random sampling does occur in some research
when the sample is smaller and accessible. For example, a
study of members of the American Occupational Therapy
Association could be obtained through random sampling
of the membership list.
The most common method of sampling in health-care
research is convenience sampling. In this method, participants are selected because they are easily available to
the researcher. When conducting a study of individuals
with a particular condition or disability, it is likely that
the researcher will go to a treatment facility that provides
services to those individuals. Then the researcher might
ask for volunteers, or a clinician might approach each
person who meets the study criteria when that person is
admitted. The lack of randomness in the process presents
a high potential for introducing bias or sampling error.
When samples are selected from one school, one neighborhood, or one clinic, for example, they are more likely
to have characteristics that are different from the population as a whole; depending on the setting, they may be
poorer, older, or more symptomatic.
4366_Ch05_081-102.indd 96
One method for reducing sampling error is by selecting a large sample from multiple settings. A larger sample
is more likely to approximate the population. In addition,
multiple settings can be selected to represent the heterogeneity of a population. For example, in considering the
generalizability of a study of children with attention deficit hyperactivity disorder (ADHD), it is more likely that
a sample would represent the population if children from
both urban and suburban locations in different areas of
the country were recruited, to better represent the racial
and socioeconomic characteristics of the population.
In the results section of a study, it is important for the
researcher to provide a detailed description of the study
participants. Many journals require that gender, age, and
race at a minimum be included. As an evidence-based
practitioner, you can review this information to determine if the sample is representative of the population.
However, more important to you is whether the sample
in the study is similar to the clients you work with. When
a study sample is similar to your clientele, you are more
justified in generalizing the findings.
Ecological Validity Threats
Ecological validity refers to the environment in which
a study takes place and how closely it represents the real
world. The treatment or method by which a study is administered, the time during which a study takes place,
and where a study takes place are all important considerations affecting the external validity or generalizability of
a study. Sometimes the administrators of the intervention
in a study are highly trained, more so than the typical
practitioner. The study time period may last longer than
the length of stay covered by most insurance companies,
or the intervention may be more intense than standard
practice. The study may take place in an inpatient setting,
although most individuals with the condition are actually
treated on an outpatient basis. Any differences from the
study conditions and real-world practice represent threats
to external validity. The generalizability of a particular
study will be good in situations that are similar to those in
the study and poor in those that are different.
Protections Against Ecological Validity Threats
Practitioners are more likely to apply research that is
relevant and practical to real-world practice, and studies
have greater ecological validity when they are sensitive
to typical practice situations. For example, a researcher
may ensure that the intervention takes place in the typical
time frame during which clients receive therapy, or that
the therapists providing the intervention are those who
already work in a particular type of hospital.
As a practitioner, it is important to apply the results
of a study cautiously and consider the similarity to your
27/10/16 5:15 pm
CHAPTER 5 ● Validity
own situation. A study is more generalizable to your practice and clients when the characteristics of the study are
similar. For example, a study by Sutherland et al (2012)
that examined exposure therapy for posttraumatic stress
disorder (PTSD) in veterans will be more applicable to
practice situations that involve treating veterans with
PTSD. The results of the study will be less applicable and
have less external validity for treating PTSD in women
who have experienced sexual abuse. In another example,
a well-designed study that involved 24 weeks of Tai Chi
showed that it was effective in improving balance for individuals with Parkinson’s disease (Tsang, 2013). However,
if you are unable to see clients for a 24-week time period,
this study is less relevant for your practice setting. Nevertheless, you may be able to use the results of this study to
justify to your administrators and/or insurance companies
why a longer length of stay is warranted.
Replication to Promote Generalizability
Replication, or reproducibility, is essential to the generalization of research and a primary principle of the
scientific method. It is important in the generalization
of both samples and situations. Study findings must be
capable of being repeated to ensure generalizability and
applicability of the results. If several studies yield similar
findings about the efficacy of an intervention or a predictor of an outcome, clinicians can be more confident
in those results.
Another consideration in replication is the researchers
themselves. Even if there are several studies that support a
particular approach, if all of those studies were conducted
by the same researcher, there should be some concern that
97
the results will not generalize to other situations. There
may be reasons why one researcher is able to garner more
positive findings than another. Perhaps that researcher
and his or her team are exceptional therapists and it is
their general competence as opposed to the actual intervention that makes the difference.
In addition, when findings are particularly surprising or remarkable, replication is important. These kinds
of findings are interesting, and thus will have a higher
likelihood of being published. However, replication will
reveal whether the findings were a result of chance (i.e., a
Type I error).
Replication is often a matter of degree. Studies rarely
follow the exact procedures of a previous study to determine whether the same results are obtained. Typically
variables are manipulated to extend the findings of previous research. A replication study may shorten an intervention period, utilize a different outcome measure,
apply the approach to a different sample, or administer
the intervention in a new setting. The ability of research
to build upon previous work is part of the power of the
scientific method.
INTERNAL VERSUS EXTERNAL VALIDITY
When designing a study, the researcher must find a balance between internal and external validity. Studies that
are tightly controlled to maximize internal validity will
have less external validity. For example, inclusion criteria
that produce a homogeneous sample, a strict protocol
for administering the intervention, expert intervention
EVIDENCE IN THE REAL WORLD
How Replication Changed the Perception of Facilitated Communication
In the early 1990s, there was great interest in facilitated communication for individuals with autism. Much of this
interest came from the work of Biklen and colleagues (1992), who got surprising and amazing results with this
technique. In facilitated communication, the facilitator provides physical assistance to help a person with autism
type out a message on a keyboard. The assumption is that facilitated communication overcomes neuromotor difficulties that interfere with the abilities of a person with autism to communicate.
In an uncontrolled study of 43 individuals with autism, Biklen and colleagues reported startling outcomes.
Previously nonverbal individuals were writing grammatically correct sentences and paragraphs, and even poetry.
Skepticism about these findings led other researchers to conduct controlled studies. A review by Green (1994)
found that when the facilitator’s influence was controlled, the technique was no longer useful. The review suggested that the facilitator’s belief in the potential of facilitated communication and the client’s untapped capabilities
led him or her to unconsciously or unintentionally guide the communication process. Now organizations such as
the American Speech and Hearing Association and the American Psychological Association assert that there is no
evidence to support facilitated communication for individuals with autism.
4366_Ch05_081-102.indd 97
27/10/16 5:15 pm
98
CHAPTER 5 ● Validity
leaders, and limited exposure to alternative treatments
will yield results that can be interpreted in the context
of a cause-and-effect relationship, yet do not reflect everyday practice. In contrast, studies that are conducted
under real-world conditions are “messier”; that is, there
are not as many controls in place, and real-world conditions introduce more alternative explanations of the
outcome—greater external validity at the expense of
internal validity.
When accumulating research evidence regarding a
particular intervention, you may use a process that begins with studies with high internal validity and moves
to studies with greater external validity. First, it is important to know if an intervention is effective under ideal
conditions; that is, a highly controlled study with strong
internal validity.
Once the efficacy of the intervention is established, future studies can examine the same intervention in more
typical practice conditions. The difference between these
studies can be referred to as efficacy versus effectiveness.
An efficacy study is one that emphasizes internal validity
and examines whether an intervention is effective under
ideal conditions. With efficacy studies, you can be more
confident that the intervention is what made the difference; however, the conditions of the study are likely to
differ from real-world conditions.
In an effectiveness study, the study conditions are
more reflective of real-world practice; however, the untidy nature of practice means that there could be more
threats to internal validity in play. Studies about therapy
practices will always have threats to validity. Researchers face significant challenges in designing a study and
must find a balance that involves minimizing threats,
being pragmatic, and operating ethically. From the
Evidence 5-2 provides an example of an effectiveness
study that carries out a strength training intervention in
existing community fitness facilities.
EXERCISE 5-3
Managing Threats to Validity
in a Particular Research Study
(LO3, LO4, LO5, and LO6)
intervention designed to increase the amount and
intensity of physical activity for children in primary
grades 1 through 3. Two schools have agreed to
participate in the study. One school is located in
an urban setting with children from mostly low
socioeconomic and racially diverse backgrounds.
Another school is located in a suburban setting
with children from mostly high socioeconomic and
Caucasian backgrounds.
QUESTIONS
Consider the following issues and describe how the researcher
might reduce threats to validity. The following situations
address both internal and external validity issues.
1. The schools in which the researcher plans to implement the
study will not allow the researcher to randomly assign children to groups. What is the threat to validity, and how
can the researcher manage this threat?
2. The researcher plans to increase interest in physical activity
by including a climbing wall and other new but expensive
equipment as part of the activity program. What is the
threat to validity, and how can the researcher manage
this threat?
3. To determine if the activity at school carries over to home,
parents are asked to keep a log for one week of their child’s
participation in activity, which includes type of activity,
time engaged, and level of intensity. What is the threat
to validity, and how can the researcher manage this
threat?
Childhood obesity is a major public health risk, and
many efforts have been made to address the problem. A researcher is interested in studying a new
4366_Ch05_081-102.indd 98
27/10/16 5:15 pm
FROM THE EVIDENCE 5-2
An Example of an Effectiveness Study
Minges, K. E., Cormick, G., Unglik, E., & Dunstan, D. W. (2011). Evaluation of a resistance training program for adults with or at risk
of developing diabetes: An effectiveness study in a community setting. International Journal of Behavioral Nutrition and Physical
Activity, 8, 50. doi:10.1186/1479-5868-8-50.
Note A: The researchers were interested
in taking an intervention with efficacy in a
controlled condition and assessing its
effectiveness in existing fitness facilities.
Note B: The large number of dropouts
(there were 86 participants at 2
months, but only 32 at 6 months) is not
unexpected in a fitness center. People
often discontinue their fitness program.
BACKGROUND:
To examine the effects of a community-based resistance training program (Lift for Life®)
on waist circumference and functional measures in adults with or at risk of developing
type 2 diabetes.
METHODS:
Lift for Life is a research-to-practice initiative designed to disseminate an
evidence-based resistance training program for adults with or at risk of developing type
2 diabetes to existing health and fitness facilities in the Australian community. A
retrospective assessment was undertaken on 86 participants who had accessed the
program within 4 active providers in Melbourne, Australia. The primary goal of this
longitudinal study was to assess the effectiveness of a community-based resistance
training program, thereby precluding a randomized, controlled study design. Waist
circumference, lower body (chair sit-to-stand) and upper body (arm curl test) strength,
and agility (timed up-and-go) measures were collected at baseline and repeated at 2
months (n = 86) and again at 6 months (n = 32).
RESULTS:
Relative to baseline, there was a significant decrease in mean waist circumference
(-1.9 cm, 95% CI: -2.8 to -1.0) and the timed agility test (-0.8 sec, 95% CI: -1.0 to -0.6);
and significant increases in lower body (number of repetitions: 2.2, 95% CI: 1.4-3.0)
and upper body (number of repetitions: 3.8, 95% CI: 3.0-4.6) strength at the completion
of 8 weeks. Significant differences remained at the 16-week assessment. Pooled time
series regression analyses adjusted for age and sex in the 32 participants who had
complete measures at baseline and 24-week follow-up revealed significant time effects
for waist circumference and functional measures, with the greatest change from
baseline observed at the 24-week assessment.
CONCLUSIONS:
These findings indicate that an evidence-based resistance training program
administered in the community setting for those with or at risk of developing type 2
diabetes can lead to favorable health benefits, including reductions in central obesity
and improved physical function.
Note C: Without a control group, you could be less certain that Lift for Life
made the difference. Perhaps individuals attending the fitness center took
advantage of other programs or were more likely to exercise outside the
program. Because the assessors are not blind to group assignment, they
may consciously or unconsciously show a bias in scoring individuals whom
they hoped were improving. This example demonstrates how improving
external validity can sometimes compromise internal validity.
FTE 5-2 Question
4366_Ch05_081-102.indd 99
Using the Lift for Life study example, how did improving external validity compromise internal validity?
27/10/16 5:15 pm
100
CHAPTER 5 ● Validity
CRITICAL THINKING QUESTIONS
1. Why is a large sample generally more desirable than a
small sample in research (give at least three reasons)?
6. Explain the differences between random selection and
random assignment. What aspects of validity are addressed by these research practices, and how?
7. Why is it difficult to design a study that is strong in
both internal and external validity? How can you balance the two types of validity?
2. Why is a randomized controlled trial considered the
strongest single study design?
ANSWERS
3. Why might random assignment to groups result in
ethical concerns?
4. Although pretests are generally desirable, how can
they potentially pose a threat to validity?
5. For each of the following three situations, how can
the researcher manage threats to validity to determine
whether the new intervention is effective?
• Comparing a new intervention with a no-treatment
control group
• Comparing a new intervention with a treatmentas-usual control group
• Comparing a new intervention with another
evidence-based intervention
4366_Ch05_081-102.indd 100
EXERCISE 5-1
1. There are two reasons why fishing threats exist: The
researcher does not have a research hypothesis, and
four outcomes are being studied.
2. The study would be stronger if the researcher had a prior
hypothesis about which orthoses would be best for which
outcomes. This could be based on existing research or
the researcher’s clinical experience. To address the fact
that multiple outcomes are studied, the researcher should
adjust the alpha level of the statistical analysis or use a
statistic to control for multiple comparisons.
3. Ten people divided into three groups will result in a
study with very low power.
4. The researcher will want to recruit additional participants and may need to use another clinic or conduct
the study over a longer period of time. Power can also
be increased by reducing the number of groups, so the
researcher could compare two orthoses (although it
would still be best to have more than 5 participants
per group) or use a crossover design in which all of the
participants try all of the orthotics.
EXERCISE 5-2
1. Study #1
A. Maturation—No. Although there is no control
group and the study goes on for several weeks,
consider the normal course of the disorder in determining whether or not maturation is a threat to
validity. The normal course of Parkinson’s disease
is progressively deteriorating, so you would not expect improvement without treatment.
27/10/16 5:15 pm
CHAPTER 5 ● Validity
B. History—Yes. The conclusion of the study is that
an interdisciplinary movement disorders program
was effective in improving movement problems for
people with Parkinson’s disease. The medication
adjustments could be a history threat: 81 of the
91 participants received a medication adjustment
during the intervention. Although participants
without medication adjustments had similar improvements, which provides some support for the
therapy, the design of this study makes it difficult to
determine whether it was the therapy or the medications (or both) that made the difference.
C. Testing—Yes. There are a few issues with testing.
The Timed Up and Go Test uses time as an outcome, and the two-minute walk test is measured
in terms of distance covered. With these objective
outcomes, you would not be as concerned about
biased assessments. However, the FIM and Berg
Balance Scale do involve judgment on the part of
the therapist, and a therapist who has been involved
in the intervention and wants to see improvement
may tend to rate the participants, albeit unintentionally, higher than an unbiased rater would. In
addition, medication effectiveness in Parkinson’s
disease varies across the day, so the time of day at
which assessments were administered could affect
the outcome.
2. Study #2
A. Maturation—No, in this case there is a no-treatment
control group with random assignment. If there was
an improvement in the control group, the intervention’s improvement was greater than the control’s
improvement and would suggest that the intervention group improved over and above any typical
development.
B. Selection—No, random assignment helps to promote equal group assignment. The table provides
additional support that the two groups were comparable at the outset.
C. Instrumentation—Yes. The self, parent, and teacher
reports are problematic. The children, parents,
and teachers knew about the intervention and may
have been biased toward providing a more positive report. The use of observational methods (i.e.,
observing the child in social situations) would enhance this aspect of the study.
D. Attrition—Yes. Ten children in the intervention
group were unavailable at the follow-up testing period: eight of these children dropped out, and two
were removed for behavioral reasons. It is possible that these 10 children were not responding as
well to the intervention and that could be why they
dropped out. The two who were removed were not
benefiting. If these children were included in the
findings, it is possible, even likely, that the results
would be less positive. This should be taken into
4366_Ch05_081-102.indd 101
101
account when evaluating how effective the intervention is and for how many.
EXERCISE 5-3
1. The major concern with lack of randomization is that
there will be selection threats to validity. To address
this concern, it is important to use strategies that will
reduce any differences that might occur between the
groups. In school-based research, it is common for one
classroom to receive an intervention while the other
classroom does not. You would not want to make one
school an intervention setting and one school a control
setting, because the distinct differences in the schools
might account for differences you find in the intervention. Instead, you could randomly assign classrooms at
each school to receive or not receive the intervention.
You might address ethical concerns for the control
group not receiving the intervention by using a waitlist control design (i.e., you will eventually provide the
intervention to the control group). A drawback to this
approach is that there is the potential for greater experimenter and participant bias. Blinding of the testers
and reducing exposure of the students and teachers,
particularly during physical activity, would help address these concerns. It would also be useful to provide
the control group with equal attention to something
new without introducing additional physical activity.
For example, you might have the control group participate in board games.
2. The inclusion of expensive equipment makes it less
likely that other schools will be able to implement this
intervention, thereby making it less generalizable. The
researcher should consider redesigning the intervention to use equipment that is typically available in most
school settings; however, in doing so the researcher
may lose the novelty or excitement that would be created by the new equipment.
3. Asking parents to keep a log introduces instrumentation threats. Maintaining a log for a week is asking a
great deal of the parents, and it is unlikely that you
will receive complete data. The researcher could use
more objective means, such as an accelerometer that
the children wear to record the time spent engaged
in activity. Another method that is less burdensome to
the parent is time sampling. In time sampling, usually
at random intervals, a timer indicates that a log entry
should be made. Using this method the parent only
has to respond to the timer and not keep records at all
times. Both of these methods still present concerns.
For example, the parents may forget to put the accelerometer on, the child may lose the accelerometer, or
the parent still may not respond to a time-sampling
approach.
All of these examples speak to the challenges of designing a study. It is virtually impossible to design a perfect
27/10/16 5:15 pm
102
CHAPTER 5 ● Validity
study with no threats to validity. Researchers typically
weigh their options and make choices given the particular research question, the ethical concerns presented, and
pragmatic issues.
FROM THE EVIDENCE 5-1
1. Yes, the p value for all of the comparisons is > 0.05,
indicating there is no statistically significant difference
between the groups at baseline.
2. If the groups start out at different levels of back pain,
this could affect/confound the results of the study.
For example, if the control group had less pain and
the intervention group had more pain at baseline, even
without the intervention, maturation may result in the
intervention group having a greater recovery, because
there may be more room for improvement in the intervention group. The control group may not be able
to improve much because they already are not experiencing a great deal of pain. In this example from the
evidence, it is a good thing that the groups are equivalent on the outcome measure of back pain as well as
other demographic variables.
FROM THE EVIDENCE 5-2
Without a control group, you could be less certain that
Lift for Life made the difference. Perhaps individuals attending the fitness center took advantage of other
programs or were more likely to exercise outside of the
program. Also, you would expect less precision among the
assessors, who would not be blind and may vary from site
to site.
REFERENCES
Biklen, D., Morton, M. W., Gold, D., Berrigan, C., & Swaminathins, S.
(1992). Facilitated communication: Implications for individuals with
autism. Topics in Language Disorders, 12(4), 1–28.
Centers for Disease Control and Prevention (CDC). (2012). Prevalence of autism spectrum disorders: Autism and developmental
disability monitoring network, 14 sites, US, 2008. Retrieved from
4366_Ch05_081-102.indd 102
http://www.cdc.gov/mmwr/preview/mmwrhtml/ss6103a1.htm?s_
cid=ss6103a1_w
del Pozo-Cruz, B., Parraca, J. A., del Pozo-Cruz, J., Adsuar, J. C., Hill, J.,
& Gusi, N. (2012). An occupational, internet-based intervention to
prevent chronicity in subacute lower back pain: A randomized controlled trial. Journal of Rehabilitation Medicine, 44, 581–587.
Frankel, F., Myatt, R., Sugar, C., Whitham, C., Gorospe, C. M., &
Laugeson, E. (2010, July). A randomized controlled study of parentassisted Children’s Friendship Training with children having autism
spectrum disorders. Journal of Autism and Developmental Disorders,
40(7), 827–842. doi:10.1007/s10803-009-0932-z
Godi, M., Franchignoni, F., Caligari, M., Giordano, A., Turcato, A. M.,
& Nardone, A. (2013). Comparison of reliability, validity, and
responsiveness of the Mini-BESTest and Berg Balance Scale in
patients with balance disorders. Physical Therapy, 93, 158–167.
Green, G. (1994). The facilitator’s influence: The quality of the evidence. In H. C. Shane (Ed.), Facilitated communication: The clinical
and social phenomenon (pp. 157–226). San Diego, CA: Singular.
Killen, J. D., Fortmann, S. P., Newman, B., &Varady, A. (1990). Evaluation of a treatment approach combining nicotine gum with
self-guided behavioral treatments for smoking relapse prevention.
Journal of Consulting and Clinical Psychology, 58, 85–92.
Mayo, E. (1949). Hawthorne and the Western Electric Company:
The social problems of an industrial civilisation. London, UK:
Routledge.
McCarney, R., Warner, J., Iliffe, S., van Haselen, R., Griffin, M., &
Fisher, P. (2007, July 3). The Hawthorne effect: A randomised, controlled trial. BMC Medical Research and Methodology,7, 30.
Minges, K. E., Cormick, G., Unglik, E., & Dunstan, D. W. (2011,
May 25). Evaluation of a resistance training program for adults with
or at risk of developing diabetes: An effectiveness study in a community setting. International Journal of Behavioral Nutrition and Physical
Activity, 8, 50. doi:10.1186/1479-5868-8-50
Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the classroom. New York,
NY: Holt, Reinhart & Winston.
Sabbag, S., Twamley, E. W., Vella, L., Heaton, R. K., Patterson, T. L.,
& Harvey, P. D. (2012). Predictors of the accuracy of self assessment
of everyday functioning in people with schizophrenia. Schizophrenia
Research, 137, 190–195.
Sammons Preston. (n.d.). Jamar hand dynamometer owner’s manual.
Retrieved from https://content.pattersonmedical.com/PDF/spr/
Product/288115.pdf
Sutherland, R. J., Mott, J. M., Lanier, S. H., Williams, W., Ready, D. J.,
& Teng, E. J. (2012). A pilot study of a 12-week model of groupbased exposure therapy for veterans with PTSD. Journal of Trauma
and Stress, 25(2), 150–156.
Tsang, W. W. (2013). Tai Chi training is effective in reducing balance
impairments and falls in patients with Parkinson’s disease. Journal of
Physiotherapy, 59, 55.
27/10/16 5:15 pm
“Research is formalized curiosity. It is poking and prying with a purpose.”
—Zora Neale Hurston, writer
6
Choosing Interventions
for Practice
Designs to Answer Eff icacy Questions
CHAPTER OUTLINE
LEARNING OUTCOMES
Factorial Designs
KEY TERMS
Single-Subject Designs
INTRODUCTION
Retrospective Intervention Studies
RESEARCH DESIGN NOTATION
SAMPLE SIZE AND INTERVENTION RESEARCH
BETWEEN- AND WITHIN-GROUP COMPARISONS
USING A SCALE TO EVALUATE THE STRENGTH
OF A STUDY
RESEARCH DESIGNS FOR ANSWERING EFFICACY
QUESTIONS
COST EFFECTIVENESS AS AN OUTCOME
Designs Without a Control Group
CRITICAL THINKING QUESTIONS
Randomized Controlled Trials
ANSWERS
Crossover Designs
REFERENCES
Nonrandomized Controlled Trials
LEARNING OUTCOMES
1. Use scientific notation to explicate the design of a given study.
2. Differentiate between group comparisons, within-group comparisons, and interaction effects.
3. Identify the design of a given intervention study.
4. Use levels of evidence, threats to validity, and the PEDro Scale to evaluate the strength of the evidence
for a given study.
103
4366_Ch06_103-126.indd 103
28/10/16 5:19 pm
104
CHAPTER 6 ● Choosing Interventions for Practice
KEY TERMS
between-group
comparison
quality-adjusted life year
(QALY)
cluster randomized
controlled trial
quasi-experimental
study
control group
randomized controlled
trial (RCT)
crossover study
design
factorial design
interaction effect
nonequivalent control
group design
nonrandomized
controlled trial
pre-experimental
design
research design notation
retrospective cohort
study
retrospective
intervention study
single-subject design
within-group
comparison
prospective
INTRODUCTION
I
t is important for evidence-based practitioners to evaluate the strength of the evidence. This point has been well
established. That said, if a randomized controlled trial (RCT)
provides the strongest evidence for intervention studies,
why don’t all researchers use this design? One reason other
designs are employed relates to the use of resources: A randomized controlled trial can be expensive, in terms of actual
cost, time, and people power. When evaluating a new intervention, it may be unwise to invest a lot of time, energy, and
money in a study when the outcomes are as yet unknown.
Researchers frequently decide to first test an intervention using a simpler design (e.g., a pretest-posttest without a control
group) as a pilot study. This approach informs the researcher
if it is prudent to proceed with a more complex study design.
In addition, funding agencies such as the National Institutes
of Health often expect preliminary results that show promise
for an intervention before they will fund a larger-scale RCT. If
a number of smaller studies demonstrate that a specific intervention appears to have merit, an RCT is often the next
logical step.
Additional reasons for using other research designs include ethical and pragmatic considerations. In the case of
intervention research, the researcher is motivated to design a study that best answers questions about the efficacy
of the intervention. In doing so, individual needs are not
considered when participants are randomly assigned to
groups. This is particularly true when one group receives
an intervention and the other group does not. Knowledge
4366_Ch06_103-126.indd 104
may be acquired from the study that helps future clients,
but the individuals participating in the study may not receive the best treatment. Therefore, an ethical decision
may require use of a study design in which all participants
receive the preferred intervention. In other situations, there
may not be enough individuals with a particular condition
to conduct a two-group comparison, or the intervention
provided is so individualized that it is unreasonable to
combine participant results.
This chapter introduces various research designs that can
be used to answer questions about the efficacy of interventions. The randomized controlled trial is described, as are several other design options. Familiarity with these designs will
provide information to use when selecting interventions for
practice.
RESEARCH DESIGN NOTATION
Research design notation is a system that uses characters to diagram the design of intervention studies.
Although research articles rarely display the notation,
familiarity with research design notation is useful
because it can help a reader break down and analyze a
particular study.
The primary characters used in the notation system
include:
R = randomization
N = nonrandomization
X = treatment
O = dependent variable or outcome
Consider a simple RCT of two groups in which one
group receives the experimental treatment and the other
group acts as a control group, receiving no treatment.
There is a pretest and posttest. The research design notation for this study would look like this, with the first O
representing the pretest and the last O representing the
posttest:
R
R
O
O
X
O
O
In studies that compare treatments, the X can be
further specified with a subscript, such as X1 and X2, or
with letters, as in Xa and Xb. In fact, designating the
treatments with letters that represent the intervention
can be even more effective. For example, Keser et al
(2013) compared Bobath trunk exercises with routine
neurorehabilitation for individuals with multiple sclerosis; they used the Trunk Impairment Scale (TIS), Berg
Balance Scale (BBS), International Cooperative Ataxia
Rating Scale (ICARS), and Multiple Sclerosis Functional Composite (MSFC) as outcome measures before
and after the intervention. In this case, the treatments
could be notated as Xb (Bobath exercises) and Xr (routine neurorehabilitation). In addition, this study utilized
multiple outcome measures, which can be identified in
28/10/16 5:19 pm
CHAPTER 6 ● Choosing Interventions for Practice
the notation. A comprehensive notation of this study
might look like this:
R
Xb
R
Xr
OTIS, BBS, ICARS, MSFC
OTIS, BBS, ICARS, MSFC
OTIS, BBS, ICARS, MSFC
OTIS, BBS, ICARS, MSFC
The specificity with which a study is notated depends
on your need to describe the study design. If you wish to
better understand when different measures are administered during a study, you may choose to abbreviate and
specify the particular measures, as done in the preceding
example. However, if you simply want to know at what
points the measures were administered, you may simply
identify the testing time with an O, but use no further
abbreviation.
This chapter describes several different designs that
are commonly used in intervention research, each of
which includes the research design notation with the
description. In addition, you have the opportunity to
practice creating notations with examples from the
research.
BETWEEN- AND WITHIN-GROUP
COMPARISONS
In intervention research, most designs include comparisons both “between” and “within” groups. A
between-group comparison is one in which a comparison is made to identify the differences between
two or more groups. In most efficacy studies, the
between-group comparison examines the differences
between the intervention group and control group. The
control group may receive no treatment, standard treatment, or some form of alternate treatment. As the name
implies, a within-group comparison makes a comparison inside the same group, most often a comparison of
scores before and after an intervention. An interaction
effect combines the between- and within-group comparisons, so an interaction is said to occur when there is
a difference in how a group performs over time. These
distinctions are important when reading and analyzing
the results of research studies.
The previous example from the multiple sclerosis
study (Keser et al, 2013) can be used to demonstrate
between- and within-group comparisons. In this study,
the two groups included the Bobath treatment group
and the routine treatment group. Consider one outcome measure, the Trunk Impairment Scale (TIS). A
between-group comparison might compare the two
groups before the intervention takes place, as shown
here:
4366_Ch06_103-126.indd 105
R
OTIS
Xb
OTIS
R
OTIS
Xr
OTIS
105
The study could also compare the two groups after the
intervention takes place:
R
OTIS
Xb
OTIS
R
OTIS
Xr
OTIS
In contrast, a within-group comparison examines differences in the pretest and posttest scores for each group
separately:
R
OTIS
Xb
OTIS
R
OTIS
Xr
OTIS
and
However, the most important comparison in an intervention study is the combination of the between and
within comparisons, or the interaction effect. It is important to determine whether there was a difference in the
way that one group performed from pretest to posttest,
compared with the other group’s performance from pretest to posttest. When there is a difference in how two
groups respond, an interaction has occurred; if that difference is large enough, the interaction effect is statistically
significant. When the intervention group improves more
than the control group, the intervention was more effective than the control group treatment in causing a positive
change in the outcome. Often, a study concludes that the
intervention group improved (a within-group comparison), but so did the control or comparison group. If the
degree of improvement is comparable for both groups, no
interaction effect is present.
Researchers use different types of analyses (many still
based on the t-test and ANOVA statistics) to determine
if two groups differ over time. One common approach
involves treating the pretest as a covariate and analyzing
the difference in the posttest scores of the two groups
using an analysis of covariance (ANCOVA). In other
studies, the researcher might first compute the difference in pretest and posttest scores and then compare the
two groups in terms of their difference scores. If there
are only two groups, the comparison can be made using
an independent sample t-test; if there are three or more
groups, the comparison can be made with a one-way
ANOVA.
It is helpful to graph the pretest and posttest scores
of two groups to illustrate the interaction. Figures 6-1
through 6-4 show several different examples of potential
outcomes and identify whether or not an interaction effect occurred. Each graph provides a visual to help determine whether an interaction occurred and the degree to
which the two groups differed. The between and within
comparisons are also described as main effects; the combination of main effects becomes the interaction.
28/10/16 5:19 pm
106
CHAPTER 6 ● Choosing Interventions for Practice
Understanding Statistics 6-1
Difference statistics are used to analyze the within, between, and interaction effects described in this section.
In Chapter 4, t-tests and ANOVAs were identified
as common statistics for analyzing differences. Several approaches can be used, but a basic and common
method of analyzing a two-group comparison with a
pretest and posttest utilizes t-tests and ANOVAs. The
between-group comparison at pretest or posttest can
be made using an independent sample t-test; a separate
t-test would be conducted for the pretest and for the
posttest. If the t-test results in a p value of less than 0.05,
the difference between the two groups is statistically
significant. In the following table, the arrow indicates
where the between-group comparisons occur.
Pretest
Posttest
Intervention
Control
Pretest
Intervention
Control
A mixed-model ANOVA is used to combine the
within and between comparisons to determine if an interaction effect has occurred. If p < 0.05, an interaction
effect has occurred. In the literature, a mixed-model
ANOVA is sometimes described as a repeated measures ANOVA, even when it includes both between and
within comparisons.
It is important for evidence-based practitioners to
know that the interaction effect is the way to determine whether the intervention group improved more
than the control group. The arrows in the following
table indicate that the analysis is comparing both the
intervention and control groups at pretest and posttest.
The within-group comparison can be made using
a dependent sample t-test. Again, two separate t-tests
would be done, one for the intervention group and one
for the control group. A p value of less than 0.05 would
mean that there is a difference between the pretest and
posttest scores. In the following table, the arrow indicates where the within-group comparisons occur.
Figure 6-1 shows no effects. In this case, there are
neither main effects nor interaction effects. There is no
difference between or within groups, which makes it impossible for an interaction effect to occur. In this example,
neither group improved after the intervention.
5
Intervention
Control
4
Pretest
Posttest
Intervention
Control
Figure 6-2 shows main effects within groups, but no
interaction effect. There is a main effect within a group
(pre- to post-improvement), but there is only a small difference between groups (no main effect between groups),
and the parallel lines indicate that there is no interaction
5
Intervention
Control
4
3
3
2
2
1
1
0
Posttest
0
Pretest
Posttest
FIGURE 6-1 Example of a graph showing no effects.
Pretest
Posttest
FIGURE 6-2 Example of a graph showing main effects within groups,
but no interaction effect.
4366_Ch06_103-126.indd 106
28/10/16 5:19 pm
CHAPTER 6 ● Choosing Interventions for Practice
effect. In this case, both groups improved in a similar
fashion.
Figure 6-3 is a graph showing main and interaction
effects. In this case, there is a clear interaction effect (i.e.,
the lines cross). The control group actually gets worse,
whereas the intervention group improves. There are also
main effects both between and within groups. There is a
difference between the groups at pretest and at posttest,
and there is a difference within the groups from pretest
to posttest, although the difference is in the opposite direction for both groups.
Figure 6-4 is a graph showing some main effects
and an interaction effect. In this example, the groups
start out at the same place, so there is no between-group
main effect at pretest. However, the groups separate at
posttest, indicating a between-group main effect. Both
groups improved to some degree, resulting in a withingroup main effect. In addition, there is an interaction
effect because the pattern of improvement is different:
The intervention group improves more than the control
group.
A well-written research article will include a graph of
the interaction effect when relevant; however, when it is
not included, you can graph the interaction yourself.
5
Intervention
Control
4
3
107
EXERCISE 6-1
Graphing and Describing
Between-Group Comparisons,
Within-Group Comparisons, and
Interaction Effects From a Study (LO2)
This exercise returns to the multiple sclerosis study
(Keser et al, 2013). The following are actual data
from the study measuring the pretest and posttest
scores for both groups on the Trunk Impairment
Scale:
Group
Pretest Score
Posttest Score
Bobath
13.90
19.20
Routine
12.60
18.00
QUESTIONS
1. Graph these results using a line graph similar to those
shown in Figures 6-1 through 6-4. Next, identify
whether there are main effects for the between-group
and within-group comparisons and whether there is an
interaction effect. Finally, write a sentence summarizing the results of this study for the Trunk Impairment
Scale.
2
1
0
Pretest
Posttest
FIGURE 6-3 Example of a graph showing main and interaction
effects.
5
Intervention
Control
4
3
2
RESEARCH DESIGNS FOR ANSWERING
EFFICACY QUESTIONS
1
0
Pretest
Posttest
FIGURE 6-4 Example of a graph showing some main effects and an
interaction effect.
4366_Ch06_103-126.indd 107
This section provides more detailed information about
the specific designs typically used to answer efficacy questions about an intervention. Each section includes the
28/10/16 5:19 pm
108
CHAPTER 6 ● Choosing Interventions for Practice
scientific notation for that design and an example of an
actual study from the research literature.
notation for a “true” randomized controlled trial is as
follows:
R
R
Designs Without a Control Group
Technically speaking, a study without a control group is
not considered an experimental design, because one of
the requirements of experimental design is that there be a
comparison between two groups. However, it is important
to understand the pretest-posttest design without a control group, because that design is frequently encountered
in the literature. Researchers may use this design initially
to determine whether an intervention has the potential to
make a difference, before investing the time and money in
a more extensive RCT. This design is sometimes referred
to as a pre-experimental design because the intent is
to examine a cause-and-effect relationship. However, the
lack of a control group significantly limits the researcher’s
ability to conclude that such a relationship exists. Recall
that, according to the levels-of-evidence hierarchy, this
design type yields a lower level of evidence: Level IV,
meaning it is weaker than either an RCT or a nonrandomized controlled trial.
The notation for a pre-experimental design looks like
this:
O
X
Randomized Controlled Trials
Randomized controlled trials (RCTs) are extremely
valuable in intervention research. The RCT is the highest level of evidence for a single study, rated a Level II
in the level-of-evidence hierarchy used in this text. A
well-designed RCT with a large sample size that finds an
advantage for the target intervention provides strong support for the conclusion that the intervention caused the
positive outcome.
The use of the term control merits some additional discussion. Strictly speaking, a control group is one in which
the participants do not receive an intervention. A drug
study in which one group receives a drug and the other
group receives a placebo is a true control condition. The
4366_Ch06_103-126.indd 108
X
O
O
However, no-treatment control groups are often
avoided for ethical reasons, because denying treatment
can have adverse consequences. In addition, in rehabilitation research it is naturally more difficult to provide a true
placebo. An exception is a study by Chang and colleagues
(2010) that examined the efficacy of Kinesio taping to
increase grip strength in college athletes. In this study,
three conditions were compared: (1) Kinesio taping,
(2) no treatment, and (3) placebo Kinesio taping. The
results indicated no difference in grip strength across the
three conditions.
Because a no-treatment control condition is often
avoided in therapy studies, an alternative design that is
frequently used compares a new intervention with treatment as usual, or standard therapy. The term randomized
controlled trial typically is still used to describe these studies. In other studies, two or more specific interventions
may be compared to identify the preferred approach. The
notation for the basic design of these studies looks like
this:
R
R
O
The absence of a control group is a significant limitation of this study type. As described in Chapter 5, many
threats to validity, such as maturation and regression to
the mean, are managed with the use of a control group.
In an example, Seo and Park (2016) examined the efficacy
of gyrokinesis exercises to improve gait for women with
low back pain and found improvements on a number of
gait outcomes; however, this study provides only preliminary evidence, and future studies are needed to compare
the intervention with a control group to eliminate threats
to validity such as maturation. From the Evidence 6-1
presents a table from the study showing the differences in
pretest and posttest scores.
O
O
O
O
Xa
Xb
O
O
One limitation of a study design without a true control
is the existence of a maturation threat to validity. Without
a no-intervention group, you cannot know what would
have happened as a result of natural healing or development. An example of standard versus new treatment is
provided by a study that compared a dynamic exercise
program with conventional joint rehabilitation for individuals with arthritis (Baillet et al, 2009). The results
found short-term benefits for the dynamic exercise program; however, these benefits were not sustained in the
long term. A protocol for another study describes a headto-head comparison of Rapid Syllable Transition Treatment and the Nuffield Dyspraxia Programme to address
childhood apraxia (Murray, McCabe, & Ballard, 2012).
These studies have the potential to assist therapists in
making clinical decisions about the preferred approach.
Yet another research approach is to compare a standard
intervention with standard intervention plus a new intervention. The notation includes the combined treatment:
R
R
O
O
Xs
Xs + Xa
O
O
For example, in a study of individuals with eating disorders, both groups received standard outpatient treatment,
but the intervention group also received Basic Body Awareness (Catalan-Matamoros et al, 2011). The Basic Body
Awareness intervention involved exercises with focused attention on the experience of the movement. Participants
28/10/16 5:19 pm
CHAPTER 6 ● Choosing Interventions for Practice
109
FROM THE EVIDENCE 6-1
Pretest-Posttest Without a Control Group
Whitson, H.E., Whitaker, D., Potter, G., McConnell, E., Tripp, F., Sanders, L.L., Muir, K.W., Cohen, H.J., & Cousins,S.W. (2013). A low-vision
rehabilitation program for patients with mild cognitive deficits. JAMA Ophthamology, 131, 912–919.
Table 4. Comparison of Functional and Cognitive Measures Before and After
Participation in MORE-LVR
Mean (SD)
Before
MORE-LVR
Measure
Vision-Related Function: Self-reported by Patient
VFQ-25 Composite score
47.2
VFQ-25 Near-activities score
21.5
VFQ-25 Social functioning score
56.3
VFQ-25 Distance-activities score
27.8
VFQ-25 Dependency score
45.1
VFQ-25 Role difficulties score
39.5
Individual goal attainment score
0.4
(range, 0 to 9)
–1.4
Patient-reported satisfaction with
IADL ability (range, -24 to 24)
After
MORE-LVR
p Valuea
(16.3)
(14.0)
(37.6)
(17.8)
(22.6)
(22.6)
(0.5)
54.8 (13.8)
41.0 (23.1)
80.3 (21.7)
31.8 (15.9)
53.5 (28.8)
33.4 (24.7)
5.7 (2.8)
.01b
.02b
.06
.75
.53
.57
.001b
(12.0)
6.1 (9.9)
.05b
Timed Performance Measures (No. of Seconds to Complete Each Task)
Filling in a crossword puzzle answer
205 (103)
123 (92)
Making a 4-item grocery list
155 (116)
99 (62)
Looking up a telephone number in a
221 (99)
228 (79)
telephone book
Answering questions about a recipe
230 (99)
240 (80)
in a cookbook
Neurocognitive Scores
Logical memory
Immediate recall
Delayed recall
19.7 (9.7)
13.0 (8.5)
22.9 (9.9)
18.7 (12.4)
.003b
.03b
.30
.99
.07
.02b
Abbreviations: IADL. instrumental activities of daily living; MORE-LVR, Memory or
Reasoning Enhanced Low Vision Rehabilitation; VFQ-25. Vision Function
Questionnaire42
a Comparison based on Wilcoxon signed rank test.
b Values that are significant at an error level of P .05.
Note A: This table illustrates a within-group comparison of the pretest
and posttest scores on several measures. There is no betweengroup comparison, as this study is made up of only one group.
FTE 6-1 Question
4366_Ch06_103-126.indd 109
On which outcome was the improvement the greatest?
28/10/16 5:19 pm
110
CHAPTER 6 ● Choosing Interventions for Practice
receiving Basic Body Awareness had decreased symptoms
of eating disorders, such as a drive to thinness and body
dissatisfaction, than the group that only received standard
treatment. This study design provides the benefit of showing whether the additional treatment offered an additional
advantage. However, the Hawthorne effect can threaten the
validity of such a design, because those individuals receiving the additional treatment are also receiving additional
attention. (Recall from Chapter 5 that the Hawthorne effect
occurs when participants respond to the fact that they are
participating in a study rather than to the actual intervention.) From the Evidence 6-2 is a flowchart of the procedures used in the study.
When the efficacy of a particular intervention is established, subsequent studies may use RCTs to further
specify the optimal conditions of administration of the
intervention, such as intensity, training of the providers,
and setting of administration. A follow-up study may also
compare variations of the intervention. A notation is not
provided here, because the designs can vary considerably.
However, to meet the criteria for an RCT, the design
must include randomization to at least two conditions.
A study of constraint-induced therapy provides an example. In this study, both groups received the same type
and amount of practice, but participants were randomly
assigned to wear a sling during the practice or voluntarily
constrain (not use) their nonaffected arm during practice
(Krawczyk et al, 2012). The results indicated that both
groups improved, but there was no difference between
the two conditions.
In some randomized controlled trials, a pretest is not
used. This can present a threat to validity in that, without
a pretest, one does not know if the groups were equivalent
at the start of the outcome of interest. In addition, this
design does not provide information regarding the extent
of change that occurred. With a large sample, the randomization to group will likely lead to equivalence, but
that is not a certainty. Posttest-only studies are typically
those in which a pretest would influence the outcome of
the posttest (a testing threat to validity) or those in which
it is expected that all participants will start out at a similar
point. For example, if the outcome of interest is rate of
rehospitalization, a pretest is not possible. As another example, a fall prevention program may enroll participants
who have not experienced a fall, but as an outcome assess
the number of falls after the intervention period. The notation for a posttest-only RCT is:
R
R
X
O
O
Crossover Designs
Crossover study designs, in which participants are randomly assigned to groups, are considered to be at the
same level of evidence as randomized controlled trials
4366_Ch06_103-126.indd 110
(Level II). In a crossover study design, all participants receive the same treatment, but in a different order. In some
crossover studies, a no-treatment condition is compared
with a treatment condition. In this case, one group starts
with the treatment and then receives the control condition; the other group begins with the control and moves
to the intervention. The notation provides a useful illustration of this design:
R
R
O
O
X
O
O
X
O
O
In other crossover studies, two different interventions
are compared in different orders:
R
R
O
O
Xa
Xb
O
O
Xb
Xa
O
O
Crossover designs are most useful for interventions in
which a permanent change is not expected. Otherwise,
the second condition will be affected, which would present a history threat to validity. Crossover designs are often
used in studies of assistive devices/technologies. For example, in one study of individuals with brain injury, two
reminder systems were compared: typical reminder systems such as calendars, lists, and reminders from service
providers, versus Television Assisted Prompting (TAP),
in which reminders were programmed and then provided
audiovisually through the client’s at-home television
(Lemoncello, Sohlberg, Fickas, & Prideaux, 2011). One
group received TAP prompting first, followed by typical
prompting; the other group received the interventions
in the reverse order. From the Evidence 6-3 shows the
order in which the interventions were presented for each
group. The study found that prompting using TAP was
effective in increasing memory.
Nonrandomized Controlled Trials
As the name implies, the only difference between a randomized controlled trial and a nonrandomized controlled
trial relates to the allocation of subjects to groups. In a
nonrandomized controlled trial, participants do not
have an equal chance of being assigned to a condition.
The lack of randomization can lead to bias or differences
between the two groups. For this reason a nonrandomized controlled trial yields a lower level of evidence than
a randomized controlled trial. The nonrandomized controlled trial is a Level III in this text’s evidence hierarchy.
Instead of random assignment, allocation to groups can
occur by asking for volunteers for the intervention first
and then matching individuals to a control group. Individuals who volunteer for an intervention may be more
amenable to treatment and thereby differ from individuals who participate in the control group that does not
receive the intervention. The nonrandomized approach
often is used for pragmatic or ethical reasons, with one
setting receiving the intervention and the other serving
28/10/16 5:19 pm
CHAPTER 6 ● Choosing Interventions for Practice
111
FROM THE EVIDENCE 6-2
Randomized Controlled Trial
Catalan-Matamoros, D., Helvik-Skjaerven, L., Labajos-Manzanares, M. T., Martínez-de-Salazar-Arbolea, A, & Sánchez-Guerrero, E.
(2011). A pilot study on the effect of Basic Body Awareness Therapy in patients with eating disorders: A randomized controlled trial.
Clinical Rehabilitation, 25(7), 617–626.
N = 102
met the inclusion criteria
N = 74
refused to participate
Note A: This flowchart
illustrates the selection
process and random
assignment to groups.
N = 28
were pretested and randomly
allocated into two groups
N = 14
experimental group
N = 14
control group
N = 14
Received BBAT
intervention
Losses: N = 6
n=2; lack of time
n=2; lack of transportation
n=2; other reasons
Losses: N=0
N=8
followed the posttest
after 10 weeks of the pretest
N = 14
followed the posttest
after 10 weeks of the pretest
Note B: The control group lost six participants,
whereas all participants in the intervention group
were available at follow-up. Mortality is a threat to
validity; the fact that all of the dropouts were in the
control group suggests that control participants may
have experienced compensatory demoralization
as a threat to internal validity.
FTE 6-2 Question
4366_Ch06_103-126.indd 111
Would the threat to mortality in this study be even greater because of the small sample size?
28/10/16 5:19 pm
112
CHAPTER 6 ● Choosing Interventions for Practice
Understanding Statistics 6-2
Even with randomization to groups, it is still possible for groups to differ on important demographic
characteristics or outcome variables. This is particularly true when sample sizes are small. If the researcher identifies differences between groups, the
variable in which the difference lies can be covaried so that the groups are made equivalent statistically. In this case an ANCOVA is typically used.
For example, in a study of children, the subjects in
the intervention group are older than the children
in the control group; hence, differences in development and education that may be related to age could
affect the outcomes of the study. In this case, the
researcher may choose to covary age; the statistic
removes the variability associated with age from the
analysis.
as the control or standard treatment group. In this case,
differences in the settings can bias the results and present
a selection threat to validity.
When the settings are randomly assigned to group,
the study design may be called a cluster randomized
controlled trial; however, this is not a true RCT, as individuals do not have an equal chance of being assigned
to the study conditions. Although there are advantages
associated with randomly assigning the settings to a condition, the potential exists for setting bias, making these
designs nonrandomized. A nonrandomized controlled
trial is strengthened by efforts to ensure that there are as
few differences as possible between the two groups. Testing for any systematic differences in the groups that could
threaten the validity of findings should be done prior to
the start of the study, to avoid conducting a study that is
seriously flawed from the outset.
All of the designs described in the randomized controlled trial section could apply to the nonrandomized
controlled trial, with the difference occurring at the initial
FROM THE EVIDENCE 6-3
Crossover Design
Lemoncello, R., Sohlberg, M. M., Fickas, S., & Prideaux, J. (2011). A randomised controlled crossover trial evaluating Television Assisted
Prompting (TAP) for adults with acquired brain injury. Neuropsychological Rehabilitation, 21, 825–826.
Assessment and
goal-setting
sessions
TAP x2
weeks
TYP x2
weeks
TAP x2
weeks
TYP x2
weeks
Random
assignment
Group
A
Final data
collection and
interview
Group
B
TYP x2
weeks
TAP x2
weeks
TYP x2
weeks
TAP x2
weeks
Note A: In a crossover design, both groups
receive both treatments, but in a different order.
TAP = television-assisted prompting
TYP = typical reminders
FTE 6-3 Question
4366_Ch06_103-126.indd 112
What is the benefit of providing TAP and typical reminders twice to each group?
28/10/16 5:19 pm
CHAPTER 6 ● Choosing Interventions for Practice
group assignment, such that nonrandomized designs can
include a true control:
N
N
O
O
X
O
O
Xa
Xb
O
O
or a comparison treatment:
N
N
O
O
Nonrandomized controlled trials may also be referred
to as quasi-experimental studies or nonequivalent
control group designs, indicating that the two groups
113
may be different due to the lack of randomization. For
example, Ferguson and colleagues (Ferguson, Jelsma,
Jelsma, & Smits-Engelsman, 2013) compared neuromotor training with a Wii fitness program to improve motor
skills in children with developmental coordination disorder. The children’s allocation to the treatment group
depended on the school they attended, so that children
at two schools received neuromotor training, and children at the other school received the Wii fitness program.
Therefore, children did not have an equal chance of being
assigned to either intervention. From the Evidence 6-4
provides a more detailed description of the rationale for
FROM THE EVIDENCE 6-4
Nonrandomized Controlled Trial
Ferguson, G. D., Jelsma, D., Jelsma, J., & Smits-Engelsman, B. C. (2013). The efficacy of two task-orientated interventions for children
with developmental coordination disorder: Neuromotor task training and Nintendo Wii Fit training. Research in Developmental
Disabilities, 34(9), 2449–2461. ISSN 0891-4222, http://dx.doi.org/10.1016/j.ridd.2013.05.007.
2.1. Research design and setting
A pragmatic, single blinded, quasi-experimental design was used
to compare the effect of two intervention programmes. Cluster
sampling was used to select three mainstream primary schools
(i.e., A, B and C) located within a low-income community in Cape
Town, South Africa.
Allocation to treatment group was determined by school of
attendance. Children attending schools A and B received NTT
while children attending school C received Nintendo Wii training. A
non-randomized approach was used as it was not possible to
provide Nintendo Wii training at either school A or B over the study
period due to damage to the power supply at both schools. Apart
from the functioning power supply, there were no significant
differences between schools in terms of playground facilities,
socioeconomic backgrounds of the learners, school fees, staff
ratios or curriculum.
NTT = neuromotor task training
FTE 6-4 Question The study is described as single-blinded. Based on the description, who would be blind, and how
does this blinding strengthen the validity of the study?
4366_Ch06_103-126.indd 113
28/10/16 5:19 pm
114
CHAPTER 6 ● Choosing Interventions for Practice
the design, with an excerpt from the methods section of
the journal article. The study found that both interventions were effective, but neuromotor training resulted
in greater improvements in more areas of motor performance. The authors explain why randomization was not
possible and provide some evidence to support the contention that the classrooms are equivalent. However, not
all differences can be accounted for, so there is still the
potential for bias or differences in the classroom receiving
the Wii training.
Factorial Designs
Factorial designs use either a randomized or nonrandomized approach to group assignment. However, factorial designs are distinguished from other designs by including
more than one independent variable. In a factorial design
for an intervention study, one independent variable is the
intervention condition. The additional independent variable is typically included to determine if the intervention
had differential effects on that additional variable. For example, a study may include gender as a second independent variable and then determine if the intervention was
more effective in males or females. In the following notation, two interventions are designated as a and b, and gender is designated as m and f. The notation looks like this:
R
R
R
R
O
O
O
O
Xam
Xaf
Xbm
Xbf
O
O
O
O
The notation illustrates the additional complexity of
a factorial design. In this instance, there are four conditions: intervention A with males, intervention A with females, intervention B with males, and intervention B with
females. For the researcher, this complexity also translates into additional challenges for recruitment, because
a larger sample size is necessary to allow for additional
comparisons. Each condition requires enough participants to detect a difference and represent the population.
Factorial designs are also described in terms of the
number of levels within each independent variable. The
preceding example is a 2 ⫻ 2 design because there are
two levels of the intervention, A and B, and two levels of
gender, male and female. A factorial design that compares
two different interventions (first independent variable)
and three different settings (second independent variable)
would be described as a 2 ⫻ 3 design. Factorial designs
can include a third independent variable, although doing
so greatly increases the complexity of the analysis and interpretation. The two preceding examples could be combined, such that two interventions are compared, along
with gender and setting, creating a 2 ⫻ 2 ⫻ 3 design.
Factorial designs can also be used to compare intervention conditions (the first independent variable) and
health conditions (the second independent variable). For
4366_Ch06_103-126.indd 114
example, a memory intervention can be compared with
a control condition for healthy older adults, older adults
with Alzheimer’s disease, and older adults with Parkinson’s disease (2 ⫻ 3 factorial design).
In still another variation of the factorial design, two interventions can be examined together. As an example of
this design, a study compared whole body vibration therapy
(WBV) with a control condition (the first independent variable) and two doses of vitamin D, a conventional dosage and
a higher dosage, for women over age 70 (Verschueren et al,
2011). This 2 ⫻ 2 factorial design found improvements in all
groups; however, the group with the conventional dosage
of vitamin D and without WBV had musculoskeletal outcomes comparable to those of the other three groups, suggesting that there was no benefit to higher doses of vitamin
D or WBV therapy. From the Evidence 6-5 is a table of
the findings comparing the four groups.
EXERCISE 6-2
Identifying the Research Design Using
Scientific Notation of a Study (LO1)
Use your knowledge of scientific notation to diagram
the study described here, recognizing that many
real-world studies will include variations and/or combinations of the designs described in this chapter.
Kreisman, B. M., Mazevski, A. G., Schum, D. J., & Sockalingam, R.
(2010, March). Improvements in speech understanding with wireless binaural broadband digital hearing instruments in adults with sensorineural
hearing loss. Trends in Amplification, 14(1), 3-11. [Epub 2010 May 10].
doi:10.1177/1084713810364396
“This investigation examined whether speech intelligibility in noise can
be improved using a new, binaural broadband hearing instrument system.
Participants were 36 adults with symmetrical, sensorineural hearing loss
(18 experienced hearing instrument users and 18 without prior experience).
Participants were fit binaurally in a planned comparison, randomized crossover design study with binaural broadband hearing instruments and advanced digital hearing instruments. Following an adjustment period with
each device, participants underwent two speech-in-noise tests: the QuickSIN
and the Hearing in Noise Test (HINT). Results suggested significantly better
performance on the QuickSIN and the HINT measures with the binaural
broadband hearing instruments, when compared with the advanced digital
hearing instruments and unaided, across and within all noise conditions.”
28/10/16 5:19 pm
CHAPTER 6 ● Choosing Interventions for Practice
115
Understanding Statistics 6-3
Number of Names Remembered
The statistical analysis of a factorial design can get
very complicated. In a factorial design, there are also
interaction effects and main effects. In the immediately preceding example, the interaction effect would
examine the pattern of differences for a memory intervention and control condition for healthy older
adults, older adults with Alzheimer’s disease, and
older adults with Parkinson’s disease. This 2 ⫻ 3 factorial design would have six groups. An ANOVA can
reveal if there is an interaction between the independent variables of intervention condition and health
condition. In other words, does one health group
benefit more than another health group AND is
there a difference between the control and intervention conditions? Figure 6-5 is a hypothetical graph
that illustrates the number of names remembered as
the outcome of the intervention. The interaction effect would show if there is a difference in the health
conditions and intervention conditions, and the graph
does in fact appear to indicate a difference (i.e., the
healthy condition benefits more than Alzheimer’s and
Parkinson’s conditions).
However, the interaction effect is only part of
the story. You would still want to know if there is
a difference between the intervention and control
groups (a main effect), and if there are differences
between the three health conditions (another main
effect). These follow-up tests are typically referred
to as post hoc analyses. There are many different ways
in which these analyses can be done. In this case it
would be possible to compare the intervention and
control group using a between-group t-test. In this
main effect, all of the health conditions are combined
into one group.
Main Effect of Treatment Condition
(Independent Sample t-test)
Intervention
Control
You could compare the three groups (healthy,
Alzheimer’s, and Parkinson’s) with their intervention and control groups combined, using a oneway ANOVA to examine the main effect of health
condition.
Main Effect of Health Condition
(One-Way ANOVA)
Healthy
Alzheimer’s
Parkinson’s
10
9
8
7
6
5
4
3
2
1
0
Healthy
Alzheimer’s
Parkinson’s
Health Condition
Intervention
Control
FIGURE 6-5 Interaction effect of health condition and treatment
condition (ANOVA analysis).
4366_Ch06_103-126.indd 115
This analysis will reveal if there is a difference
among the three groups.
There are still other post hoc tests that could be
done. For example, you may want to know if there is
a difference between the Alzheimer’s and Parkinson’s
groups for individuals who received the intervention,
which could be determined with a between-group
t-test. There are many ways that post hoc analyses can
be carried out and in many cases researchers use statistics that help control for Type I error due to multiple
analyses. The particular statistics are beyond the scope
of this chapter, but evidence-based practitioners who
understand all of the different comparisons that can
be made should be better able to interpret the results
sections of research articles.
28/10/16 5:19 pm
116
CHAPTER 6 ● Choosing Interventions for Practice
FROM THE EVIDENCE 6-5
Factorial Design
Verschueren, S. M., Bogaerts, A., Delecluse, C., Claessens, A. L., Haentjens, P., Vanderschueren, D., & Boonen, S. (2011). The effects of
whole-body vibration training and vitamin D supplementation on muscle strength, muscle mass, and bone density in institutionalized
elderly women: A 6-month randomized, controlled trial. Journal of Bone and Mineral Research, 26, 42–49. doi:10.1002/jbmr.181.
Note A: The 2 X 2 design of this study
uses two different levels of two different
interventions (WBV and vitamin D).
Table 5. Separate Group Results and Interaction Effects
Whole body vibration (WBV)
training programme
No whole body vibration (WBV)
training programme
High dose
1600 IU
vitamin D
daily (n = 26)
High dose
1600 IU
vitamin D
daily (n = 29)
Conventional
dose 880 IU
vitamin D
daily (n = 28)
Conventional
dose 880 IU
vitamin D
daily (n = 28)
p value for an
interaction effect
between WBV
training
programme and
vitamin D
medication
1. Percentage change from baseline (SE) in each group and p values for an
interaction effect between the two interventions.
Isometric muscle
strength (Nm)
+6.07%
(2.14)
+3.01%
(2.67)
+1.10%
(2.44)
+0.11%
(3.18)
.330
Dynamic muscle
strength (Nm)
+11.41%
(4.42)
+4.71%
(2.13)
+4.94%
(2.66)
+8.07%
(3.17)
.600
Muscle mass
(cm3)
−0.36%
(0.72)
−0.16%
(0.57)
+0.02%
(0.72)
−0.25%
(0.38)
.350
Hip bone mineral
density (g/cm2)
+0.78%
(0.39)
+0.71%
(0.42)
+0.78%
(0.39)
+0.99%
(0.51)
.179
Serum vitamin
D level (nmol/L)
+200.01% +146.80%
(46.89)
(35.78)
+172.25%
(37.91)
+183.02%
(38.34)
.668
Note B: Because none of the p values is
< .05, there is no difference between the four
groups on any of the outcome measures.
FTE 6-5 Question Although the table does not indicate any differences among the groups, which of the following
questions can be answered by the study?
1. Is high-dose vitamin D more effective than conventional-dose vitamin D?
2. Is whole body vibration training plus vitamin D treatment more effective than vitamin D treatment alone?
3. Is whole body vibration training more effective than no treatment?
4366_Ch06_103-126.indd 116
28/10/16 5:19 pm
CHAPTER 6 ● Choosing Interventions for Practice
Single-Subject Designs
Unlike the study designs described earlier, single-subject
designs do not aggregate the scores of participants.
Instead, the results of each individual are examined separately to determine if the intervention was effective.
Because average scores are not calculated and groups
are not compared, a different methodology is employed
to infer cause-and-effect relationships. The basis of
the single-subject design is to compare an individual’s
response under different conditions. Cause-and-effect
relationships can be inferred from a strong singlesubject design study when there is a clear difference
between behavior that occurs when an intervention is
present and that which occurs when the intervention
is absent. The methodology of the single-subject study
differs from group designs, but still answers questions of
causality.
Single-subject research is a within-participant design;
that is, each participant is his or her own control. Repeated measures must be taken over time, while certain
conditions are held constant. Typically the conditions involve a baseline in which no treatment is applied, which
is compared with a treatment phase.
A different notation system is used for a singlesubject design. A simple example is an ABA design. The first
A represents the initial phase of the study and comprises
a baseline observation and collection of data. Then an
intervention is provided, which is represented by the
B phase. The same observation and collection of data
continue. Finally, the intervention is removed, and observation and data collection continues, indicated by
the second A. If a cause-and-effect relationship exists,
a change in behavior occurs during the intervention.
Just as importantly, when the intervention is removed,
the improvement should either disappear or decrease to
some degree. Without the last phase and a change from
the intervention phase, it is more difficult to attribute the
change to the intervention.
The expectation that behaviors will return to the previous level once the intervention is discontinued, or that the
improvement will wane, suggests that this design is most
useful for interventions in which a permanent change in
behavior is not expected. The single-subject design is useful for conditions in which there are few individuals to
study or it is undesirable to aggregate the results because
there is great variability in the outcomes measured or the
intervention provided.
For example, Collins and Dworkin (2011) studied
the efficacy of a weighted vest for increasing time on
task for typically developing second graders. From the
Evidence 6-6 shows a graph from the study depicting
the response of each child to the weighted vest. The
participant response was variable, with some children
increasing their time on task during the wearing of the
vest and other children decreasing their time on task.
4366_Ch06_103-126.indd 117
117
Therefore, the results did not support the use of the
weighted vest in this situation.
A major limitation of single-subject research is the
problem of generalizability from a small sample. Using
the previous example, it is possible that if only two students were included and those students happened to be
the ones with dramatic positive responses, the conclusions
of the study would have been different.
Replication is an important concept in single-subject research. Replication occurs in the form of participants; each
participant replicates the design. With multiple participants,
if the results are similar for all participants, you can have
greater confidence that the intervention caused the outcome. Furthermore, the use of multiple baseline designs
can strengthen the cause-and-effect conclusion. It is not
unusual to see an ABAB design or even an ABABAB design
in single-subject studies. If improvement during the intervention and a decline at baseline are consistently shown,
evidence-based practitioners can be more assured that the
intervention was effective and caused the change in behavior.
Frequently a second intervention period is added to
single-subject designs, in which case the study would be
described as ABAB. Multiple baselines help support the
intervention as the change agent. Any number of baseline
and intervention periods may be included. For example,
some single-subject designs use more than one type of
intervention, resulting in a design notated as ABACA.
Retrospective Intervention Studies
The study designs presented up to this point have been
prospective in nature; that is, the researcher designs the
study and then administers the intervention and collects
the data. In a retrospective intervention study, the
researcher looks back at something that has already occurred and uses existing records to collect the data.
Retrospective studies are not experimental because the
independent variable is not manipulated. Instead, these
studies are described as observational, because existing
conditions are observed. Sometimes these studies are
called retrospective cohort studies because they utilize
and compare existing groups (cohorts) of individuals.
The primary disadvantage of a retrospective study
is that conditions cannot be controlled, and numerous
threats to internal validity are likely present in the existing
conditions. However, because a retrospective intervention
study typically examines practices that have taken place in
real-world clinical situations, the study can have greater
external validity. From the Evidence 6-7 is the abstract
of a retrospective study that compared individuals with
stroke receiving less than 3 hours and more than 3 hours
of rehabilitation therapies (Wang et al, 2013). The study,
which was conducted after individuals were discharged
from the rehabilitation center, found benefits for individuals who received more therapy.
28/10/16 5:19 pm
118
CHAPTER 6 ● Choosing Interventions for Practice
FROM THE EVIDENCE 6-6
Single-Subject Design
Collins, A., & Dworkin, R. J. (2011). Pilot study of the effectiveness of weighted vests. American Journal of Occupational Therapy,
65(6), 688–694. doi:10.5014/ajot.2011.000596.
100
Participant 1
Participant 3
Participant 4
Participant 7
Participant 8
Participant 9
Participant 11
% of Time on Task
90
80
70
60
50
40
30
20
10
0
Baseline
Intervention
Phase
Withdrawal
Note A: The graph illustrates the performance
of each participant in the intervention group.
This study differs from many single-subject design
studies in that it includes a control group. As depicted
in the graph, the childrenʼs data were analyzed
individually rather than using a group average.
FTE 6-6 Question
What conclusion would you draw as to the efficacy of the weighted vest for improving time
on task?
EXERCISE 6-3
Identify the Study Design (LO3)
Locate the abstracts on PubMed for the following
studies examining the efficacy of fall prevention
programs. After reading each abstract, determine
the study design and identify the independent variable(s) in the study. Is the intervention compared
with a control group, usual care, or another intervention? Is it a factorial study with more than one
independent variable?
4366_Ch06_103-126.indd 118
1. Li, F., Harmer, P., Stock, R., Fitzgerald, K., Stevens, J.,
Gladieux, M., . . . Voit, J. (2013). Implementing an
evidence-based fall prevention program in an outpatient clinical setting. Journal of the American Geriatrics
Society, 61(12), 2142-2149.
2. Bhatt, T., & Pai, Y. C. (2009). Prevention of sliprelated backward balance loss: The effect of session
28/10/16 5:19 pm
CHAPTER 6 ● Choosing Interventions for Practice
119
FROM THE EVIDENCE 6-7
Retrospective Cohort Study
Wang, H., Camiciam, N. M., Terdiman, J., Mannava, M. K., Sidney, S., & Sandel, M. E. (2013). Daily treatment time and functional gains
of stroke patients during inpatient rehabilitation. Physical Medicine and Rehabilitation, 5, 122–128.
Note A: The study began after the patients had
received treatment and were discharged.
OBJECTIVE: To study the effects of daily treatment time on functional gain of patients who have had a stroke.
DESIGN: A retrospective cohort study.
SETTING: An inpatient rehabilitation hospital (IRH) in northern California.
PARTICIPANTS: Three hundred sixty patients who had a stroke and were discharged from the IRH in 2007.
INTERVENTIONS: Average minutes of rehabilitation therapy per day, including physical therapy, occupational therapy, speech
and language therapy, and total treatment.
MAIN OUTCOME MEASURES: Functional gain measured by the Functional Independence Measure, including activities of
daily living, mobility, cognition, and the total of the Functional Independence Measure (FIM) scores.
RESULTS: The study sample had a mean age of 64.8 years; 57.4% were men and 61.4% were white. The mean total daily
therapy time was 190.3 minutes, and the mean total functional gain was 26.0. A longer daily therapeutic duration was
significantly associated with total functional gain (r = .23, P = .0094). Patients who received a total therapy time of <3.0 hours
per day had significantly lower total functional gain than did those treated ≥3.0 hours. No significant difference in total
functional gain was found between patients treated ≥3.0 but <3.5 hours and ≥3.5 hours per day. The daily treatment time of
physical therapy, occupational therapy, and speech and language therapy also was significantly associated with corresponding
subscale functional gains. In addition, hemorrhagic stroke, left brain injury, earlier IRH admission, and a longer IRH stay were
associated with total functional improvement.
CONCLUSIONS: The study demonstrated a significant relationship between daily therapeutic duration and functional gain
during IRH stay and showed treatment time thresholds for optimal functional outcomes for patients in inpatient rehabilitation
who had a stroke.
Note B: Retrospectively, the
researchers divided the patients
into groups of those who received
< 3 hours of therapy per day and
those who received > 3 hours
of therapy per day.
Note C: Although the researchers imply that more treatment
resulted in greater functional gain, the retrospective design
means it is likely that individuals in the two groups are
different on other factors. For example, those receiving less
therapy may have greater impairments and therefore be less
responsive to intervention.
FTE 6-7 Question This abstract indicates one possible reason why the groups may not be equivalent (i.e., individuals
with more severe conditions may not receive as much therapy). What are other possible differences between the groups
that could have influenced the outcomes?
4366_Ch06_103-126.indd 119
28/10/16 5:19 pm
120
CHAPTER 6 ● Choosing Interventions for Practice
intensity and frequency on long-term retention. Archives of Physical Medicine and Rehabilitation, 90, 34-42.
3. Donat, H., & Ozcan, A. (2007). Comparison of the effectiveness of two programmes on older adults at risk
of falling: Unsupervised home exercise and supervised
group exercise. Clinical Rehabilitation, 21, 273-283.
4. Kerse, N., Butler, M., Robinson, E., & Todd, M.
(2004). Fall prevention in residential care: A cluster,
randomized controlled trial. Journal of the American
Geriatric Society, 52, 524-531.
5. Beling, J., & Roller, M. (2009). Multifactorial intervention with balance training as a core component
among fall-prone older adults. Journal of Geriatric
Physical Therapy, 32, 125-133.
SAMPLE SIZE AND INTERVENTION
RESEARCH
As explained in previous chapters, sample size is an important consideration in evaluating the strength of a study.
A larger sample reduces the likelihood of making a Type II
error (i.e., when the researcher finds no difference between
groups, but actually a difference exists) and in general reduces sampling error so that the results of the study are
more likely to reflect the true population. However, it is
expensive to use large samples, and researchers must always
balance sample size and pragmatics. Chapter 4 provides
more information about determining sample size based on
statistical formulas termed “power estimates.” Nevertheless, when evaluating the strength of the evidence, with all
other things being equal, a study with a larger number of
participants provides more reliable data than a study with a
smaller number of participants. When considering studies
with similar designs (e.g., comparing two well-designed
RCTs), a study with 100 participants provides stronger
evidence than a study with 20 participants. Furthermore,
in group comparison studies it is important that each
group be adequately represented. Therefore, every time
a researcher uses an additional group or, in the case of a
factorial study, a new independent variable, the researcher
must increase the number of participants.
USING A SCALE TO EVALUATE
THE STRENGTH OF A STUDY
The randomized controlled trial is the highest level of evidence of a single study, yet the quality of RCTs can vary.
As explained in Chapter 5, several threats to validity can
EVIDENCE IN THE REAL WORLD
When Is a Practice Really Evidence-Based?
Medical reports in the media often highlight findings based on a newly published research study, and can imply
that the intervention under study is the next best thing. Hopefully, by now you are well aware that a single study,
regardless of how well it is designed, does not provide sufficient evidence to warrant certainty that an intervention is effective. Because evidence-based practice is highly valued in health care, it is common to hear the term
evidence-based used as an adjective to describe a particular intervention. Critical consumers of research know that for
an intervention to truly be evidence-based, there must be an accumulation of strong evidence across multiple studies.
The best evidence for the efficacy of an intervention requires that several conditions be in place, including:
(1) multiple, well-designed randomized controlled trials (2) using large numbers of participants (3) finding the
intervention to be more effective than the control and other comparison conditions (4) on outcomes that matter
to the client, family, and society and (5) the intervention is generalizable to real-world practice. When sufficient
evidence is available, it is now commonplace for professional organizations and other relevant groups to develop
sets of practice guidelines that summarize and make recommendations for practice based on the evidence. The
use of practice guidelines is covered in greater detail in Chapter 10. However, particularly with new approaches,
the evidence is often sparse. The practitioner’s ability to evaluate the quality and strength of the individual studies
that do exist is extremely useful when making clinical decisions about an intervention of interest.
4366_Ch06_103-126.indd 120
28/10/16 5:19 pm
CHAPTER 6 ● Choosing Interventions for Practice
exist, even with a randomized controlled trial. Even more
threats can present when assignment to groups is not randomized. For example, when assessors are not blind to
group assignment, they may exhibit bias when scoring participants who are known to be in the intervention group.
The PEDro Scale (Maher et al, 2003) was developed so
that a numerical rating could be applied to a study to objectively assess the methodological quality of an individual
study. The term PEDro was used because it was initially developed to rate the quality of studies on the Physiotherapy
Evidence Database. Box 6-1 lists the 11 items of the PEDro
Scale. Each item is rated as present (1) or not present (0).
The 11 item scale includes one item that assesses external validity (Item #1), eight items that assess internal
validity (Items #2–9), and two items that assess the reporting of outcomes (Items #10 & 11). Points are only
awarded when a criterion is clearly satisfied. For criteria 4
and 7-11, key outcomes are those outcomes that provide
the primary measure of the effectiveness (or lack of effectiveness) of the therapy. In most studies, more than
one variable is used as an outcome measure.
The PEDro Scale is often used in rehabilitation to assess
the quality of a study design. The PEDro database for physical therapy and the OTseeker database for occupational
therapy include abstracts of relevant studies and their corresponding PEDro ratings. Some systematic reviews also
use the PEDro Scale to assess the quality of the existing
evidence.
BOX 61 PEDro Scale Items
1. Eligibility criteria were specified.
2. Subjects were randomly allocated to groups (in a
crossover study, subjects were randomly allocated
an order in which treatments were received).
3. Allocation was concealed.
4. The groups were similar at baseline regarding
the most important prognostic indicators.
5. There was blinding of all subjects.
6. There was blinding of all therapists who administered the therapy.
7. There was blinding of all assessors who measured at least one key outcome.
8. Measurements of at least one key outcome
were obtained from more than 85% of the subjects initially allocated to groups.
9. All subjects for whom outcome measurements
were available received the treatment or control condition as allocated or, where this was
not the case, data for at least one key outcome
was analyzed by “intent to treat.”
10. The results of between-group statistical comparisons are reported for at least one key outcome.
11. The study provided both point measurements
and measurements of variability for at least one
key outcome.
4366_Ch06_103-126.indd 121
121
EXERCISE 6-4
Considering Design, PEDro Ratings,
and Threats to Validity in Evaluating
the Strength of a Study (LO4)
The abstract and PEDro rating for a randomized
controlled trial are presented here. Based on this
information, identify which of the threats to validity
in the accompanying table were likely controlled
and which were a potential problem. Provide a brief
rationale for your responses.
Walking the line: a randomised trial on the effects of a
short term walking programme on cognition in dementia
Eggermont LH, Swaab DF, Hol EM, Scherder EJ
Journal of Neurology, Neurosurgery, and Psychiatry
2009 Jul;80(7):802-804
clinical trial
7/10 [Eligibility criteria: Yes; Random allocation: Yes;
Concealed allocation: No; Baseline comparability: Yes;
Blind subjects: No; Blind therapists: No; Blind assessors:
Yes; Adequate follow-up: Yes; Intention-to-treat analysis:
Yes; Between-group comparisons: Yes; Point estimates and
variability: Yes. Note: Eligibility criteria item does not
contribute to total score] *This score has been confirmed*
(These are the PEDro ratings).
Background
Walking has proven to be beneficial for cognition in healthy sedentary older people. The aim
of this study was to examine the effects of a walking intervention on cognition in older people
with dementia.
Methods
Ninety seven older nursing home residents
with moderate dementia (mean age 85.4 years;
79 female participants; mean Mini-Mental
State Examination 17.7) were randomly allocated to the experimental or control condition.
Participants assigned to the experimental condition walked for 30 min, 5 days a week, for
6 weeks. To control for personal communication, another group received social visits in the
same frequency. Neuropsychological tests were
assessed at baseline, directly after the 6 week
intervention and again 6 weeks later. Apolipoprotein E (ApoE) genotype was determined.
Results
Differences in cognition between both groups
at the three assessments were calculated using
a linear mixed model. Outcome measures included performance on tests that formed three
domains: a memory domain, an executive function domain and a total cognition domain. Results indicate that there were no significant time
28/10/16 5:19 pm
122
CHAPTER 6 ● Choosing Interventions for Practice
x group interaction effects or any time x group
x ApoE4 interaction effects.
Conclusion
Possible explanations for the lack of a beneficial effect of the walking programme on cognition could
be the level of physical activation of the intervention or the high frequency of comorbid cardiovascular disease in the present population of older
people with dementia.
Threat to Validity
Yes No
Rationale
Maturation
Assignment
possible, which suggests a quality of life worse than death.
The QALY is calculated by multiplying the number of years
of extra life by the quality-of-life indicator. For example, if
an intervention extended life by 4 years with a 0.6 quality of
life, the QALY value would be 4 ⫻ 0.6 = 2.4. A systematic review examining numerous approaches to fall prevention used
QALY to determine the approach with the highest economic
benefit (Frick et al, 2010). Although vitamin D was the least
expensive approach, home modifications resulted in the greatest increase in QALY points per dollar spent.
CRITICAL THINKING QUESTIONS
1. What factors should you take into account when
determining if a study’s findings are strong enough to
warrant your use of a particular intervention?
Rosenthal effect
Hawthorne effect
COST EFFECTIVENESS AS AN OUTCOME
An important consideration in the evaluation of an intervention is cost effectiveness. Policy makers want to apply
resources sensibly by spending money on things that
influence health the most; likewise, clients want to use
their health-care dollars wisely. In cost-effectiveness studies, different interventions can be compared as to their
cost and efficacy, or the cost of a single intervention can
be calculated so that interested consumers can assess its
value. For example, a systematic review of treatments for
aphasia found a cost of $9.54 for each percentage point of
improvement on the targeted outcome (Ellis, Lindrooth,
& Horner, 2013). In addition, this study found that initial
sessions yielded greater benefits than later sessions. The
first three sessions had a cost of $7 for a percentage point
of improvement, whereas later sessions cost more than
$20 to achieve the same improvement.
In a different type of analysis, Jutkowitz et al (2012)
studied the cost effectiveness of a particular intervention
called Advancing Better Living for Elders. This intervention, aimed at reducing functional difficulties and mortality
using occupational and physical therapy approaches in the
home, had a cost of $13,179 for each year of extended life.
Cost-effectiveness studies often use the qualityadjusted life year (QALY) to assess the impact of an intervention. A QALY combines an assessment of quality of life and
the number of years of life added by an intervention. Quality
of life is measured on a scale of 1 to 0, with 1 being the best
possible health and 0 representing death. A negative score is
4366_Ch06_103-126.indd 122
2. What is the difference between a randomized controlled trial and a cluster randomized controlled trial?
What threats to internal validity are more likely to be
present in a cluster randomized controlled trial?
3. Under what circumstances might a single-subject
design be preferable to a group comparison?
4. Why is the interaction effect most important in determining if an intervention is more effective than a
control or comparison condition?
28/10/16 5:19 pm
CHAPTER 6 ● Choosing Interventions for Practice
5. Describe a hypothetical study that uses a 3 × 2
design.
123
EXERCISE 6-2
e = experienced users; i = inexperienced users;
b = broadband; a = advanced; q = QuickSIN;
h = HINT
You may have chosen different abbreviations, but the
scientific notation should combine crossover and factorial
designs. The two factors include the two types of hearing
devices (broadband and advanced) and the level of experience (experienced and inexperienced users).
6. Why are some intervention approaches not amenable
to study with a crossover design?
Oqh
Oqh
Oqh
Oqh
Xeb
Xea
Xib
Xia
Oqh
Oqh
Oqh
Oqh
Xea
Xeb
Xia
Xib
Oqh
Oqh
Oqh
Oqh
EXERCISE 6-3
7. What does the PEDro Scale contribute beyond the
levels-of-evidence hierarchy?
Design Type
Li et al
(2013)
Pretestposttest without a control
No groups are
compared; this is
a within-group
comparison
Bhatt & Pai
(2009)
Factorial
design with a
randomized
controlled
trial
Intensity of the
intervention
and number of
sessions
ANSWERS
EXERCISE 6-1
1. The graph from the data should look similar to the
graph in Figure 6-2. There are no main effects between groups. Although there are slight differences
between groups both before and after treatment,
the differences are small (less than 1.5 points) on a
scale of scores that ranges up to 19.2. Therefore, it
is unlikely that these differences are statistically significant. In contrast, the differences within groups
from pretest to posttest are quite large: 5.3 points
for the intervention group and 5.4 points for the
control group. Thus, there is a main effect within
each group. No interaction effect is present. Both
groups improved, and they improved almost the
same amount (5.3 points vs. 5.4 points). Therefore,
you might conclude that both interventions were effective in improving trunk balance and coordination,
but neither intervention was more effective than the
other.
4366_Ch06_103-126.indd 123
Independent
Variables
Study
Donat &
Randomized
Setting of interOzcan (2007) controlled trial vention (unsupervised home
vs. supervised
group)
Kerse et al
(2004)
NonrandomTreatment
ized controlled (intervention
trial (the
vs. true control)
homes are
randomly
assigned but
the individuals
are not)
Beling &
Randomized
Treatment
Roller (2009) controlled trial (intervention vs.
true control)
28/10/16 5:19 pm
124
CHAPTER 6 ● Choosing Interventions for Practice
EXERCISE 6-4
Threat to
Validity
Maturation
Yes
No
Rationale
X
Randomization
to groups should
protect against
this threat because a difference
between groups
would indicate
that something
has occurred other than a passage
of time.
Sometimes a study finds no significant difference between the groups. The strength of the study design is
still important when no difference is found, as a stronger
study suggests that the finding is accurate. In the case of
insignificant results, sample size is an important consideration. When the sample is small, there is a possibility
of a Type II error. In this study the sample size was relatively large, so Type II error is unlikely. The researchers
suggest that the insignificant results may have been due
to inadequate intervention intensity or the participants’
disease condition.
FROM THE EVIDENCE 6-1
Although not typically considered an effect size measure,
the percentage of improvement provides an estimate of
the magnitude of improvement. Step width had the greatest percentage of improvement in this study.
FROM THE EVIDENCE 6-2
Assignment
Rosenthal
effect
Hawthorne
effect
4366_Ch06_103-126.indd 124
X
X
X
Random assignment and the fact
that the groups
were equal at
baseline suggest
that there are no
important differences between
the groups at the
outset of
the study.
The therapists
were not blind to
group assignment,
and their expectations from
the participants
could influence
the outcomes.
This is not a likely
threat because the
researchers controlled for this
by making sure
both groups
received equal
amounts of
attention.
Yes, the threat to mortality in this study would be even
greater because the proportion of participants lost is
greater. In this study, almost 50% of the participants
dropped out of the control condition. If 6 individuals
dropped out of a control group of 100, you would be less
concerned about mortality as a threat.
FROM THE EVIDENCE 6-3
This design is particularly strong in showing cause-andeffect relationships. If there is a marked difference with
TAP compared with TYP (typical reminders) within each
group, and this occurs both times and occurs similarly
for both groups, you can be relatively confident that the
difference is due to the intervention.
FROM THE EVIDENCE 6-4
It is unreasonable to assume that the participants are
blind, because they would know if they are receiving the
Wii treatment, and the intervention leaders would know
what treatment they are providing. However, the testers
could be blind to group assignment. Therefore, this eliminates bias that the testers might have when administering
the outcome measures.
FROM THE EVIDENCE 6-5
Question 1 could be answered by examining the results of
the comparison of the two groups that received vitamin D
treatment without whole body vibration. Question 2 could
be answered by comparing each of the whole body vibration
plus vitamin D groups with the vitamin D alone groups.
28/10/16 5:20 pm
CHAPTER 6 ● Choosing Interventions for Practice
Question 3 cannot be answered because there is no control/
comparison group that does not receive treatment; however, you can examine the within-group results to determine
if any group improves from pretest to posttest; if not, you
know that the treatments were not effective. However, if the
within-group improvement is significant, you do not know
if this would be better than no treatment.
FROM THE EVIDENCE 6-6
Only 2 of the 7 participants improved their time on task
during the intervention, and many actually had poorer
performance during the intervention, so overall the intervention does not appear to be effective.
FROM THE EVIDENCE 6-7
There are many possible answers, but some alternative
explanations include: Individuals who were more motivated to participate in therapy may have received more
therapy minutes. It could be their motivation instead of
the therapy minutes that influenced the outcomes. The
therapists who evaluated the clients on the FIM may have
provided higher ratings to those individuals who spent
more time in therapy. The therapists could be biased toward wanting therapy to result in better outcomes.
REFERENCES
Baillet, A., Payraud, E., Niderprim, V. A., Nissen, M. J., Allenet, B.,
Francois, P., . . . Gaudin, P. (2009). A dynamic exercise programme
to improve patients’ disability in rheumatoid arthritis: A prospective
randomized controlled trial. Rheumatology, 48, 410–415.
Beling, J., & Roller, M. (2009). Multifactorial intervention with balance
training as a core component among fall-prone older adults. Journal
of Geriatric Physical Therapy, 32, 125–133.
Bhatt, T., & Pai, Y. C. (2009). Prevention of slip-related backward balance loss: The effect of session intensity and frequency on long-term
retention. Archives of Physical Medicine and Rehabilitation, 90, 34–42.
Catalan-Matamoros, D., Helvik-Skjaerven, L., Labajos-Manzanares,
M. T., Martínez-de-Salazar-Arbolea, A., & Sánchez-Guerrero, E.
(2011). A pilot study on the effect of Basic Body Awareness Therapy
in patients with eating disorders: A randomized controlled trial. Clinical Rehabilitation, 25(7), 617-626. doi:10.1177/0269215510394223
(Epub 2011 Mar 14).
Chang, H. Y., Chou, K. Y., Lin, J. J., Lin, C. F., & Wang, C. H. (2010).
Immediate effect of forearm Kinesio taping on maximal grip
strength and force sense in healthy collegiate athletes. Physical Therapy in Sport, 11, 122–127.
Collins, A., & Dworkin, R. J. (2011). Pilot study of the effectiveness
of weighted vests. American Journal of Occupational Therapy, 65,
688–694.
Donat, H., & Ozcan, A. (2007). Comparison of the effectiveness of two
programmes on older adults at risk of falling: Unsupervised home
exercise and supervised group exercise. Clinical Rehabilitation, 21,
273–283.
4366_Ch06_103-126.indd 125
125
Ellis, C., Lindrooth, R. C., & Horner, J. (2013). Retrospective costeffectiveness analysis of treatments for aphasia: An approach using
experimental data. American Journal of Speech and Language Pathology, 23(2), 186–195. doi:10.1044/2013_AJSLP-13-0037
Ferguson, G. D., Jelsma, D., Jelsma, J., & Smits-Engelsman, B. C.
(2013). The efficacy of two task-orientated interventions for children with developmental coordination disorder: Neuromotor task
training and Nintendo Wii Fit training. Research in Developmental
Disabilities, 34, 2449–2461.
Frick, K. D., Kung, J. Y., Parrish, J. M., & Narrett, M. J. (2010). Evaluating the cost-effectiveness of fall prevention programs that reduce
fall-related hip fractures in older adults. Journal of the American Geriatric Society, 58, 136–141.
Jutkowitz, E., Gitlin, L. N., Pizzi, L. T., Lee, E., & Dennis, M. P.
(2012). Cost effectiveness of a home-based intervention that helps
functionally vulnerable older adults age in place at home. Journal of
Aging Research, 2012, 680265. doi:10.1155/2012/680265
Kerse, N., Butler, M., Robinson, E., & Todd, M. (2004). Fall prevention
in residential care: A cluster, randomized controlled trial. Journal of
the American Geriatric Society, 52, 524–531.
Keser, I., Kirdi, N., Meric, A., Kurne, A. T., & Karabudak, R. (2013).
Comparing routine neurorehabilitation program with trunk exercises
based on Bobath concept in multiple sclerosis: Pilot study. Journal of
Rehabilitation and Research Developments, 50, 133–140.
Krawczyk, M., Sidaway, M., Radwanska, A., Zaborska, J., Ujma, R., &
Czlonkowska, A. (2012). Effects of sling and voluntary constraint
during constraint-induced movement therapy for the arm after
stroke: A randomized, prospective, single-centre, blinded observer
rated study. Clinical Rehabilitation, 26, 990–998.
Kreisman, B. M., Mazevski, A. G., Schum, D. J., & Sockalingam, R.
(2010, March). Improvements in speech understanding with wireless binaural broadband digital hearing instruments in adults
with sensorineural hearing loss. Trends in Amplification, 14(1), 3–11.
[Epub 2010 May 10]. doi:10.1177/1084713810364396
Lemoncello, R., Sohlberg, M. M., Fickas, S., & Prideaux, J. (2011). A
randomised controlled crossover trial evaluating Television Assisted
Prompting (TAP) for adults with acquired brain injury. Neuropsychological Rehabilitation, 21, 825–826.
Li, F., Harmer, P., Stock, R., Fitzgerald, K., Stevens, J., Gladieux, M., . . .
Voit, J. (2013). Implementing an evidence-based fall prevention
program in an outpatient clinical setting. Journal of the American
Geriatrics Society, 61(12), 2142–2149.
Maher, C. G., Sherrington, C., Herbert, R. D., Moseley, A. M., &
Elkins, M. (2003). Reliability of the PEDro Scale for rating quality
of randomized controlled trials. Physical Therapy, 83, 713–721.
Murray, E., McCabe, P., & Ballard, K. J. (2012). A comparison of two
treatments for childhood apraxia of speech: Methods and treatment
protocol for a parallel group randomised control trial. BMC Pediatrics, 12, 112.
Seo, K.E., & Park, T.J. (2016). Effects of gyrokinesis exercise on the
gait pattern of female patients with chronic low back pain. Journal of
Physical Therapy Science, 28, 511–514.
Verschueren, S. M., Bogaerts, A., Delecluse, C., Claessens, A. L.,
Haentjens, P., Vanderschueren, D., & Boonen, S. (2011). The effects
of whole-body vibration training and vitamin D supplementation on
muscle strength, muscle mass, and bone density in institutionalized
elderly women: A 6-month randomized, controlled trial. Journal of
Bone and Mineral Research, 26, 42–49.
Wang, H., Camiciam, N. M., Terdiman, J., Mannava, M. K., Sidney, S.,
& Sandel, M. E. (2013). Daily treatment time and functional gains
of stroke patients during inpatient rehabilitation. Physical Medicine
and Rehabilitation, 5, 122–128.
28/10/16 5:20 pm
4366_Ch06_103-126.indd 126
28/10/16 5:20 pm
“In school, you’re taught a lesson and then given a test. In life, you’re given a
test that teaches you a lesson.”
—Tom Bodett, author and radio host
7
Using the Evidence to Evaluate
Measurement Studies and
Select Appropriate Tests
CHAPTER OUTLINE
LEARNING OUTCOMES
KEY TERMS
INTRODUCTION
TYPES OF SCORING AND MEASURES
Continuous Versus Discrete Data
Norm-Referenced Versus Criterion-Referenced
Measures
Inter-Rater Reliability
Internal Consistency
TEST VALIDITY
Construct Validity
Sensitivity and Specificity
Relationship Between Reliability and Validity
RESPONSIVENESS
Norm-Referenced Measures
CRITICAL THINKING QUESTIONS
Criterion-Referenced Measures
ANSWERS
TEST RELIABILITY
REFERENCES
Standardized Tests
Test-Retest Reliability
LEARNING OUTCOMES
1. Distinguish between continuous and discrete data.
2. Distinguish between norm-referenced and criterion-referenced measures.
3. Evaluate sensitivity and specificity for a given measure.
4. Identify the types of reliability, validity, and/or responsiveness examined in a particular study.
5. Match the psychometric properties of a measure with the necessary qualities of a measure, given a specific
clinical situation.
127
4366_Ch07_127-144.indd 127
02/11/16 11:38 AM
128
CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
KEY TERMS
ceiling effect
Likert scale
clinically significant
difference
measurement error
concurrent validity
construct validity
continuous data
convergent validity
criterion-referenced
Cronbach’s alpha
dichotomous data
discrete data
discriminant validity
divergent validity
external responsiveness
floor effect
internal consistency
internal responsiveness
inter-rater reliability
intra-class correlation
coefficient (ICC)
method error
minimally clinically
important difference
(MCID)
norm-referenced
predictive validity
provocative test
psychometric properties
reliability
responsive measure
sensitivity
specificity
standardized test
statistically significant
difference
test-retest reliability
trait error
validity
INTRODUCTION
O
ccupational, physical, and speech therapists administer assessments to determine the needs and problems
of clients and determine the outcomes of interventions.
However, what if the score a therapist obtains when administering a test is inaccurate? Or perhaps the therapist administers a measure of motor performance, but the client’s
cognition influences the results more than his or her motor
skills? In these cases, the therapist could fail to identify an
impairment and/or incorrectly assess a client’s abilities. In
addition, the therapist may not learn the precise outcomes
of an intervention.
The ability to develop an effective intervention plan and
evaluate an intervention depends on the accuracy of the results
obtained from assessment measures. In selecting assessments,
therapists need to consider many psychometric properties,
such as reliability and validity of the assessment, sensitivity/
specificity, and the ability of an assessment to detect change.
4366_Ch07_127-144.indd 128
Psychometric properties are the quantifiable characteristics
of a test that reflect its consistency and accuracy. Studies examining the psychometric properties of an assessment provide therapists with the information they need to select the
best available test for a specific client. This chapter describes
how to evaluate the evidence associated with assessments
that therapists use every day in practice and then use this
information to select the right tool.
TYPES OF SCORING AND MEASURES
During the assessment process, therapists typically obtain
a score and then interpret that score to give it meaning.
The process of interpreting scores varies with the type of
scoring (continuous or discrete) and the type of measure
(norm-referenced or criterion-referenced).
Continuous Versus Discrete Data
Continuous data result from a test in which the score
can be any value within a particular continuum. For example, range of motion is continuous data expressed in
terms of the number of degrees of movement within a
360-degree range (although each joint has its own restriction of range).
Discrete data, also referred to as categorical data, are
obtained when classifying individuals or their performance into groups, such as gender or diagnosis. Discrete
data can also be given numerical values, although the
numbers assigned reflect a category more than a quantity; that is, the numbers typically indicate a construct.
For example, in manual muscle testing, a client’s strength
is categorized as fair and assigned a grade of 3 when that
client can complete the range of motion against gravity.
In this case, the numbers do indicate a range of less to
greater muscle strength, but they do not reflect a quantified value (i.e., a manual muscle test score of 4 is not twice
a manual muscle score of 2).
Dichotomous data are a type of discrete data with
only two categories: Typically something exists or
does not exist. This type of data may describe a condition
(e.g., a child does or does not have autism, or an athlete
does or does not have a ligament tear) or the status of an
individual (e.g., a client is or is not hospitalized, or a client
does or does not recover).
With continuous data, the numbers reflect a real
value, such as the quantity of correct answers, the time
taken to complete a task, or the decibels of hearing loss.
With discrete data, categories are assigned and possibly
given a numerical value. In part, it is important to know
the difference between continuous and discrete data because different statistical methods are used to analyze the
different types of data. These methods are described in
more detail later in this chapter.
02/11/16 11:38 AM
CHAPTER 7
●
Some scoring methods, such as Likert scales, can be
difficult to classify into discrete or continuous categories.
For example, in a Likert scale, individuals respond to a
range of responses, most typically on a continuum, such
as “strongly disagree” to “strongly agree,” or “never” to
“always” (Fig. 7-1). These responses are then assigned
numerical ratings. Multiple items using this type of response rating are added together to obtain a score for
the scale. Although the numbers indicate increasingly
greater amounts of some construct, such as agreement,
the difference between a 1 and 2 may not be equal to
the difference between a 2 and 3. For example, a healthy
lifestyle measure may include an item about eating
breakfast (to which the respondent indicates “rarely”)
and eating a serving of fruit (to which the respondent
indicates “sometimes”)—yet the respondent could have
eaten breakfast one day of the week and fruit four days
of the week. Although “rarely” and “sometimes” seem
reasonable responses, in this example, “rarely” occurred
1 day more than “never,” and “sometimes” occurred
3 days more than “rarely.”
Nevertheless, in practice Likert scales are commonly
considered to provide continuous data. Although the
practice of treating Likert scales as continuous data is
somewhat controversial, there is logical and statistical
support indicating that Likert scales perform similarly
to scales with equal intervals (Carifio & Perla, 2007).
This is particularly true when the scale comprises more
items.
When creating an assessment measure with a Likert
scale, test developers often include items that are written to suggest both the presence and the absence of
the construct being measured. For example, on a scale
addressing social play, the following two items may be
included:
• “My child approaches unfamiliar children in play
situations.”
• “My child avoids unfamiliar children in play situations.”
These items would then be reverse-scored, so that on
a five-point scale, if both items were marked as “always,”
the first item would receive a score of five, and the second
item a score of one. Including opposing items can prevent
the respondent from ignoring the content of the question
and marking all items with the same response. When all
of the items are marked at the same end of the scale, it
1
FIGURE 7-1 Example of a Likert scale.
4366_Ch07_127-144.indd 129
129
Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
suggests that the respondent was not paying attention to
the items.
Norm-Referenced Versus
Criterion-Referenced Measures
It is important for evidence-based practitioners to understand the differences between norm- and criterionreferenced measures, so that they can properly evaluate
the evidence when determining how to use and interpret
a particular measure. In general, the differences hinge on
whether you want to know how an individual performs
in comparison to others (norm-referenced) or how well
an individual has mastered a particular skill (criterionreferenced).
Norm-Referenced Measures
With a norm-referenced measure, a client’s scores are
compared with those of other individuals. The purpose of
a norm-referenced test is to discriminate between individuals so as to determine if an individual’s abilities fall within
or outside of a typical range of abilities. Each client’s scores
are compared with those of a larger group. Generally a raw
score is obtained; it is then converted to a standard score
or percentile rank, which can better be interpreted. For
example, an individual at the 90th percentile earns scores
higher than 90 percent of the population.
Norms have been established for many physical measures, such as grip strength and development milestones.
Norm-referenced tests are often used to identify a particular
impairment and in some cases to determine if a client is
eligible for services. For example, the Patterned Elicitation
Syntax Test is used to identify speech impairments.
Norm-referenced tests are less likely to provide information
on whether an individual has achieved a particular skill, and
they may be limited in terms of sensitivity to change. For
example, IQ tests are norm-referenced. IQ is a relatively
stable construct and would not be a good measure to use
to assess improvement or change in cognitive functioning.
The most accurate norms are those derived from a large
sample. Norms are always sample-dependent; for example,
norms derived from a sample of adults should not be applied
to children, and norms from males should not be applied
to females. When selecting a measure, it is important to
consider the similarity between the normative sample and
2
3
4
5
• Strongly
disagree
• Disagree
• Neutral
• Agree
• Strongly
agree
• Never
• Rarely
• Sometimes
• Often
• Always
02/11/16 11:38 AM
130
CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
the client on important factors that could affect the outcomes of the test, such as age, gender, education, and culture. Other aspects of norms include the population from
which the norms arose and how closely a client matches
the sample. For example, if the norms are based on data
gathered in forensic settings and your client has no history
of legal trouble, it would be inaccurate to apply these norms
to your client.
Criterion-Referenced Measures
When therapists are interested in an individual’s specific
ability or response to an intervention (versus how that person compares with others), a criterion-referenced measure
is appropriate. A criterion-referenced test is based on a
standard or fixed point, which is established by experts.
The individual is then tested to determine how his or her
performance compares with the established benchmark.
The widely used Functional Independence Measure
(FIM) is an example of a criterion-referenced test. The individual is rated on items such as eating, locomotion, and
social interaction on a scale of 0 (activity does not occur)
to 8 (complete independence), and the criterion is complete independence. The measure is typically administered
in rehabilitation settings at admission and discharge so
that improvement can be measured. Another example of
a criterion-referenced test is the School Function Assessment, which examines the performance of children with
disabilities in nonacademic activities, such as eating in the
lunchroom and being transported to and from school.
69 without balance problems are expected to score
between 54 and 56.
EXERCISE 7-1
Distinguishing Between Discrete and
Continuous Data and Norm-Referenced
and Criterion-Referenced Measures
(LO1 and LO2)
QUESTIONS
For each of the following descriptions, identify whether the
measure uses discrete or continuous data and whether it is a
norm- or criterion-referenced measure.
2. Glasgow Coma Scale: Individuals are rated on a sixpoint scale in the areas of eye opening and verbal and
motor responses. The total score is the cumulative
score of the three responses and ranges from 3 (deep
coma) to 15 (fully conscious).
1. Berg Balance Scale: Individuals are assessed using
15 functional tasks. For each task they can receive
a score of 0 to 4, with a total score possible ranging
from 4 to 56. Community-dwelling females aged 60 to
Modified Glasgow Coma Scale
1
2
3
4
5
6
Eye
Does not
open eyes
Opens eyes in
response to
painful stimuli
Opens eyes in
response to
voice
Opens eyes
spontaneously
N/A
N/A
Verbal
Makes no
sounds
Incomprehensible
sounds
Utters inappro- Confused,
priate words
disoriented
Oriented,
converses
normally
N/A
Motor
Makes no
movements
Extension to
painful stimuli
(decerebrate
response)
Abnormal flex- Flexion/Withdrawal
ion to painful
to painful stimuli
stimuli (decorticate response)
Localizes
painful
stimuli
Obeys
commands
Data from: Teasdale, G. M., & Jennett, B. (1976). Assessment and prognosis of coma after head injury. Acta Neurochirurgica, 34, 45–55.
4366_Ch07_127-144.indd 130
02/11/16 11:38 AM
CHAPTER 7
●
Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
3. The Phonological Awareness Test 2 assesses a student’s awareness of oral language segments that comprise words (i.e., syllables and phonemes). Students
receive a score of 1 for a correct response and 0 for
an incorrect response. There are eight subscales (e.g.,
rhyming discrimination and production), which are
scored by adding the number of correct responses
for each item on a subscale. The raw score for each
subscale can be compared with age equivalents and
percentile ranks.
4. Phalen’s Test: A provocative test for diagnosing
carpal tunnel syndrome. The individual is placed in
90 degrees of wrist flexion under the influence of
gravity and holds the position for 1 minute. If the
symptoms of carpal tunnel syndrome are elicited, this
test is positive.
131
perfectly reliable; however, reliability can be enhanced by
reducing systematic error.
One method for increasing reliability is to increase
the number of test items or the number of skills or behaviors to be evaluated. With more items, it is less likely
that a single item will unduly influence the outcome of
the test. As a student with test-taking experience, you
understand that missing one item on a five-point test
has a much greater impact on your overall score than
missing one item on a 50-point test. A test with five
well-constructed items is still better than a test with
50 poorly constructed items, but with all other characteristics being equal, a test with more items is more reliable than a test with fewer items. In practice, of course,
there are often time restraints, and a long assessment
may be impractical, but this aspect of reliability should
always be considered. In cases in which a brief measure
is used, it is even more important that the components
of the test have good reliability.
Standardized Tests
A standardized test is one in which the administration of
the test is the same for all clients. In a standardized test there
are specific procedures to be followed, such as specifications
TEST RELIABILITY
Reliability describes the stability of a test score. A reliable assessment measure is one for which the scores
are expected to be trustworthy and consistent. From a
measurement theory perspective, a reliable test is one in
which measurement error (i.e., the difference between
a true score and an individual’s actual score) is reduced.
Theoretically, a true score exists in which some construct
is perfectly measured for an individual. However, in the
real world, measurement is flawed; some degree of error
will occur, so that an individual’s actual score is a reflection of his or her true score plus error (which can either
lower or raise the true score). Some error is to be expected in the context of administering an assessment in
the real world.
One type of measurement error is method error.
Method error occurs when there is a discrepancy between
an individual’s true potential test score and his or her actual test score due to an aspect of the testing situation or
the test itself. For example, distractions in the room or a
biased or inexperienced test administrator would result
in method errors. Trait error occurs when aspects of the
test taker, such as fatigue or poor test-taking skills, interfere with his or her true score.
As with validity, reliability is measured on a continuum.
No assessment is perfect, and therefore no assessment is
4366_Ch07_127-144.indd 131
Understanding Statistics 7-1
Reliability is measured with a reliability coefficient.
There are different types of reliability coefficients
(see Understanding Statistics 7-2), but these coefficients can be interpreted similarly. Reliability
estimates range from 0 to 1.0. A reliability coefficient of 1.0. means that all variability is due to true
differences, and a reliability coefficient of 0 indicates
that all variability is due to error. The reliability
coefficient represents the amount of variability that
is due to the true score. For example, a reliability
coefficient of 0.85 means that 85 percent of variability is due to the true score, and 15 percent is due to
error. Unlike the relationship statistics explained in
Chapter 4, the correlation coefficient for reliability
is not squared to determine variance.
Adequate reliability is always a judgment call
(Lance, Butts, & Michels, 2006). Although a standard of .80 is sometimes used (Nunnally, 1978) as
a minimal requirement, in some instances a much
higher standard may be necessary (e.g., using a test
to determine if someone should be given a risky
treatment). In another situation, a lower standard
may be logical if meeting the reliability standard
would require selection of an invalid test.
02/11/16 11:38 AM
132
CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
regarding the environment in which the test is administered,
the tools that are used, the instructions that are provided to
the client, and the manner in which the test is scored. For
example, the Executive Function Performance Test (EFPT)
includes four tasks (cooking, telephone use, medication
management, and bill paying) that are administered using
specific instruction with graded cues provided by the therapist (Baum, Morrison, Hahn, & Edwards, 2003). A scoring
rubric is applied to rate executive function components, capacity for independent functioning, and type of assistance
needed. A standardized test can be either norm-referenced
or criterion-referenced.
The process of administering a standardized test enhances reliability because it reduces variability in the
testing situation of clients. When a test is standardized, it is important to follow the specific instructions
for administration. If a therapist modifies the instructions, the scoring (normative or criterion) is no longer
valid. Standardization is a desirable characteristic of a
test; however, in some cases the standardized environment can limit the therapist’s ability to assess real-world
performance. For example, assessing driving skills in a
standardized driving simulator provides a more consistent method of testing, but the driving simulator creates
an unnatural environment that may not represent some
aspects of on-the-road driving. In many clinical situations, it is desirable to also include nonstandardized
methods of observation in the assessment process. For
example, observing a child at play or an adult performing an activity of daily living at home will capture information that may not be obtained in a more structured
testing situation.
Test-Retest Reliability
If there are no changes in the client, the re-administration
of a test should result in the same score as the original
administration. If so, this suggests that the test provides
consistent results. Test-retest reliability provides an estimate of the stability of a test over time. In a study examining test-retest reliability, the same test is given to a
group of individuals at two points in time. The analysis
examines the similarity in scores for each individual at the
two time points.
Although there are no definitive rules for the length
of time between test administrations, the amount of time
between tests is a critical issue in test-retest reliability. It
is important for enough time to elapse so that the benefits
from memory or practice do not influence the results, but
not enough time so that history or maturation affects the
scores. The type of measure and the specific individual
being tested will affect this decision. For example, the
time lapse for retesting of a developmental measure such
as the Denver Developmental Screening should be much
4366_Ch07_127-144.indd 132
briefer for infants than for older children, as developmental changes occur more quickly in infants.
Inter-Rater Reliability
The instructions and methods for administering and
scoring a measure should be clear enough that multiple
testers can perform the procedures consistently and obtain similar results. When a test has strong inter-rater
reliability, you can assume that different raters will arrive at comparable conclusions. In studies that examine
inter-rater reliability, it is useful for the raters to conduct
the assessment with the same client at the same point in
time, so that other factors that could influence measurement error are reduced. This is sometimes accomplished
by several raters watching an assessment or even the videotape of an assessment. A measure with poor inter-rater
reliability is less likely to result in dependable scores. For
example, if two raters are evaluating driving performance,
and one rater gives much higher scores than another rater,
the more accurate score is unknown.
When using a categorical measure, Cohen’s kappa is
typically used to assess inter-rater reliability. Kappa depicts the amount of agreement between raters and, like
the other reliability coefficients, ranges from 0 to 1.0. For
example, Phalen’s test is a measure that determines categorically if someone does or does not have carpal tunnel
syndrome. One review of inter-rater reliability studies
of the Phalen’s test found kappa ratings of 0.52 to 0.79
(Walker-Bone, Palmer, Reading, & Cooper, 2003).
Understanding Statistics 7-2
Pearson correlations may be used for test-retest or
inter-rater reliability, but more frequently researchers use the intra-class correlation coefficient
(ICC). When calculating reliability coefficients,
there must be multiple administrations of the test.
With test-retest reliability, the test is given at least
two times to the same person. With inter-rater reliability, at least two testers administer the same
test. Similar to other correlation coefficients, the
value of the ICC ranges from 0 to 1.0, with a higher
number indicating greater stability. An ICC of
1.0 does not mean perfect agreement; rather, the
difference between raters or testing times is perfectly predictable (e.g., every time rater 1’s scores
increase by 2 points, rater 2’s scores increase by
4 points). Remember, when interpreting the
amount of variability accounted for by the true
score, the coefficient is not squared.
02/11/16 11:38 AM
CHAPTER 7
●
Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
From the Evidence 7-1 provides an example of a study
that examines both test-retest reliability and inter-rater
reliability.
Internal Consistency
Internal consistency refers to the unity or similarity of
items on a multi-item measure. When examining internal
consistency, each item is correlated with the total score or
with the other items on the measure. An internally consistent measure will find consistency in how the items perform, suggesting that each item measures the same thing.
133
This type of reliability is relevant when it is expected that
all of the items of a measure assess the same construct.
For example, a measure of tactile defensiveness should
only include items measuring that specific construct; it
should exclude items that are more indicative of anxiety
and irritability.
Internal consistency is also pertinent with measures
comprised of multiple subscales. In this case, internal
consistency is examined within each subscale. With subscales, it is desirable for items within a subscale to be more
highly correlated to the total score of that subscale and
less correlated to other subscales.
FROM THE EVIDENCE 7-1
Test-Retest and Inter-Rater Reliability
Gailey, R. S., Gaunaurd, I. A., Raya, M. A., Roach, K. E., Linberg, A. A., Campbell, S. M., Jayne, D. M., & Scoville, C. (2013). Development
and reliability testing of the Comprehensive High-Level Activity Mobility Predictor (CHAMP) in male servicemembers with traumatic
lower-limb loss. Journal of Rehabilitation Research & Development, 50(7), 905–918. doi:10.1682/JRRD.2012.05.0099.
The opportunity for wounded servicemembers (SMs) to return to high-level
activity and return to duty has improved with advances in surgery, rehabilitation,
and prosthetic technology. As a result, there is now a need for a high-level
mobility outcome measure to assess progress toward high-level mobility during
and after rehabilitation. The purpose of this study was to develop and determine
the reliability of a new outcome measure called the Comprehensive High-Level
Activity Mobility Predictor (CHAMP). The CHAMP consists of the Single Limb
Stance, Edgren Side Step Test, T-Test, and Illinois Agility Test. CHAMP reliability
was determined for SMs with lower-limb loss (LLL) (inter-rater: n = 118;
test-retest: n = 111) and without LLL (n = 97). A linear system was developed to
combine the CHAMP items and produce a composite score that ranges from 0
to 40, with higher scores indicating better performance. Inter-rater and
test-retest intraclass correlation coefficient values for the CHAMP were 1.0 and
0.97, respectively. A CHAMP score equal to or greater than 33 points is within
the range for SMs without LLL. The CHAMP was found to be a safe and reliable
measure of high-level mobility in SMs with traumatic LLL.
Note A: ICC values were
calculated for both inter-rater
reliability and test-retest reliability.
FTE 7-1 Question
4366_Ch07_127-144.indd 133
Note B: Reliability was strong,
with perfect agreement for
inter-rater reliability.
Provide an interpretation of the ICC for both test-retest reliability and inter-rater reliability.
02/11/16 11:38 AM
134
CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
Understanding Statistics 7-3
Internal consistency is often measured using Cronbach’s alpha; like the ICC, Cronbach’s alpha is
measured on a scale of 0 to 1.0, with higher numbers indicating greater internal consistency. Unlike
the ICC, a Cronbach’s alpha of 1.0 may be considered too high. When items are performing too
similarly, this can indicate that items are redundant
and may be unnecessary. With internal consistency,
the measure need only be given one time; within
this single administration, the items are correlated
to one another. When using Cronbach’s alpha, it
is helpful to have a large sample size in the study,
which will result in more stable findings.
In an example, Boyle (2013) developed the Self-Stigma
of Stuttering Scale. The measure is made up of three subscales: stigma awareness, stereotype agreement, and selfstigma concurrence. The three subscales and the overall
measure were assessed for internal consistency. The values obtained can be found in From the Evidence 7-2.
TEST VALIDITY
Validity refers to the ability of a test to measure what
the test is intended to measure. A test of strength should
measure strength and not coordination; a test of capacity to perform activities of daily living (ADLs) should
measure the actual ability to carry out ADLs rather
than someone’s perception of that ability. Validity, as
discussed in this chapter, refers to measurement validity
FROM THE EVIDENCE 7-2
Internal Consistency
Boyle, M. P. (2013). Assessment of stigma associated with stuttering: Development and evaluation of the Self-Stigma of Stuttering
Scale (4S). Journal of Speech, Language, and Hearing Research, 56, 1517–1529. doi:10.1044/1092-4388(2013/12-0280).
Table 2. Reliability statistics for the 4S and subscales.
Variable
Overall 4S
Stigma awareness
Stereotype agreement
Stigma self-concurrence
Cronbach’s ␣
Test-retest
correlation
.87
.84
.70
.89
.80
.62
.55
.82
Note: The time between test and retest was approximately
2 weeks for 41 individuals who stutter.
Note A: The internal consistency
values, as measured by Cronbach’s
alpha, are relatively high for the three
subscales and the overall measure.
FTE 7-2 Question Which of the Self-Stigma of Stuttering Scale subscales has the weakest internal consistency? For
which subscale is internal consistency the highest?
4366_Ch07_127-144.indd 134
02/11/16 11:38 AM
CHAPTER 7
●
Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
135
EVIDENCE IN THE REAL WORLD
Improving the Internal Consistency of the Adolescent/Adult Sensory Profile
I examined internal consistency when developing items for the Adolescent/Adult Sensory Profile (Brown & Dunn,
2002), which includes four subscales: low registration, sensation seeking, sensory sensitivity, and sensation avoiding.
An early item included on the sensation-avoiding subscale was, “I like to wear sunglasses,” with the rationale that
individuals who tended to avoid sensation would wear sunglasses more frequently than individuals who did not avoid
sensation. The wearing of sunglasses would serve the purpose of reducing the intensity of visual input.
However, the internal consistency analysis revealed that the item, “I like to wear sunglasses,” was more highly
correlated with the sensation-seeking total score than it was with the sensation-avoiding total score. The reason
for this mismatch is not known, but people likely wear sunglasses for reasons other than blocking the sun or other
visual sensations. Sensation seekers may wear sunglasses more frequently for fashion or because it is more common
for sensation-seeking activities. Whatever the reason, the internal consistency analysis indicated that the item did
not perform as expected, so it was removed from the scale.
and thus is different from research validity, as described
in Chapter 5.
Historically, test validity has been divided into different types, including concurrent, convergent, divergent, discriminant, construct, and predictive. However,
more modern views of test theory suggest that all validity is construct validity. There are different methods of
providing evidence to support the validity of a measure
(Messick, 1995); therefore, the types of validity identified
previously are now considered different approaches that
come together to support the construct validity of a test.
In short, concurrent, convergent, divergent, discriminant,
and predictive validity together provide evidence for the
construct validity of a test. The greater the cumulative evidence, the more confident one can be that an assessment
measures the intended construct.
Construct Validity
As with reliability, determining the validity of a test is a
process. A single study does not prove the validity of a
test; rather, evidence is accumulated to support its validity. A classic method for examining construct validity
is to correlate the scores of the index measure with an
established gold standard. For example, the validity of
the dynamometer for assessing grip strength has been
compared to the gold standard of isokinetic testing (Stark
et al, 2011). In isokinetic testing, sophisticated equipment
is used to apply constant resistance over range of motion
and speed.
Recall that validity is a continuum, and some constructs are more difficult to measure than others. A true
“gold standard” may not exist in some areas of testing
(e.g., diagnosing autism or quality-of-life assessment).
When a gold standard does not exist, the new measure
is judged against the best measure available or one that
4366_Ch07_127-144.indd 135
is widely accepted in practice. For example, the validity
of the Stroke Rehabilitation Assessment of Movement
(STREAM) was evaluated by correlating the results of
the STREAM with two established measures in stroke
rehabilitation: the Functional Independence Measure
and the Stroke Impact Scale (Ward, Pivko, Brooks, &
Parkin, 2011).
The terms concurrent validity and convergent validity
are sometimes used to describe the process of correlating measures that are expected to yield similar results.
Concurrent validity and convergent validity use the
same process of finding relationships between the index
measure and other measures of the same construct to
support construct validity. The difference between the
two types of validity lies in their purpose. Concurrent
validity is used to predict scores on another measure,
whereas convergent validity is used to find evidence
that the new measure is similar to a measure of the same
construct. In the research literature, it can be difficult
to distinguish between the two, and in the end doing so
is not crucial. For example, in a study described as testing concurrent validity, a new version of the Pediatric
Evaluation of Disability Inventory (PEDI-CAT), which
was administered by computer, was compared with the
original PEDI. There was strong support for concurrent validity, with r = 0.82 (Dumas & Fragala-Pinkham,
2012). It would not be a mistake to also describe this
study as a convergent validity study.
A measure should correlate with similar measures;
likewise, a measure should not correlate with measures
of constructs that are purported to be unrelated. In other
words, a test with strong validity should deviate from
dissimilar constructs. For example, a test of motor function should not require visual skills to earn a good score.
This process of examining validity is sometimes described
as divergent validity. Giesinger, Kuster, Behrend, and
02/11/16 11:38 AM
136
CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
Giesinger (2013) found problems with the divergent validity of the Western Ontario and McMaster University
Osteoarthritis Index (WOMAC). This widely used measure of pain, stiffness, and function for individuals with
hip and knee osteoarthritis was highly related to measures
of psychological functioning. Thus, this relationship indicates that the WOMAC measures the physical functions
it professes to measure, but in addition captures some aspects of emotion experienced by individuals with osteoarthritis and thereby suggests some limits to the construct
validity of the measure.
Another way to assess the construct validity of a measure involves comparing scores for two groups of individuals (those who would be expected to possess the
construct that is measured and those who would not). A
valid measure should be able to discriminate between the
two groups; hence, this process is often described as discriminant validity. For example, a discriminant validity
study of the Sensory Profile found that children with autism had lower sensory processing scores compared with
typically developing children on all subscales (Brown,
Leo, & Austin, 2008).
Predictive validity is a very important type of validity
for clinical practice. An instrument that has predictive validity as a component of construct validity examines the
accuracy of a measure in determining future performance.
Pankratz (2007) examined the usefulness of the Renfrew
Bus Story, which assesses language skills in children
through the retelling of a story. This study found that the
measure was useful for predicting language impairments
three years later.
The accumulation of all types of validity evidence provides support for or against the collective construct validity of a measure. Figure 7-2 illustrates how the different
types of validity come together to make up construct
validity.
Concurrent—the measure
can predict performance
on another measure with
the same construct
Predictive—the measure
predicts future
performance in a
direction that
is consistent with
the construct
Understanding Statistics 7-4
When examining the relationship between two
measures, a correlation coefficient is calculated (typically using a Pearson product moment correlation
or a Spearman correlation). Therefore, an r value
indicates the strength of the relationship between
the two measures, with values ranging from 0 to 1.0.
Sensitivity and Specificity
Sensitivity and specificity are considered to be aspects of
validity because they are related to the predictive validity
of an assessment. Sensitivity is the ability of a test to detect a condition when it is present, which is also known
as a true positive. A sensitive test will accurately identify
individuals who have a specific condition or diagnosis.
However, in doing so, overdiagnosis may occur, such that
people are diagnosed with a condition they do not have.
This is known as a false positive.
Specificity is the ability of a test to avoid detecting a
condition when it does not exist, otherwise known as a
true negative. Likewise, mistakes can occur with specificity, and some individuals may be missed who do have the
condition, resulting in a false negative. Figure 7-3 illustrates the concepts of sensitivity and specificity.
There is often a trade-off between sensitivity and
specificity. When a test is made more sensitive, the
changes will likely be detrimental to specificity, and vice
versa. The choice of greater sensitivity versus greater
specificity often depends on the situation. For example,
in a study addressing fall prevention, researchers may
strive for greater sensitivity because they are willing to
Convergent—the measure
performs similarly to
another measure with
the same construct
Construct
Validity
Divergent—the measure
performs differently than
another measure of a
different construct
Discriminant—the
measure discriminates
between different groups
of people, one that is
expected to possess the
construct and another
that is not
FIGURE 7-2 Different types of validity contribute to the overall construct validity of a measure.
4366_Ch07_127-144.indd 136
02/11/16 11:38 AM
CHAPTER 7
High sensitivity
Few false negatives (blue)
●
Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
Low specificity
Many false positives (black)
TABLE 7-1 Sensitivity and Specificity
Result
Test Positive
Result
Failed test
Passed test
Low sensitivity
Many false negatives (blue)
High specificity
Few false positives (black)
137
True Status
Condition
present
Condition
not present
True positive
A
False positive
B
Negative False negative
C
True negative
D
provokes the condition. For example, O’Brien’s Test,
which involves comparing the pain experienced with resistance to shoulder flexion with the thumb up and down,
is used to diagnosis a labral tear of the shoulder. If pain
is present when the thumb is down, but not when the
thumb is up, the results are considered positive for a tear.
Sensitivity and specificity were examined in a study of the
O’Brien’s Test (McFarland, Kim, & Savino, 2012).
Failed test
Passed test
EXERCISE 7-2
FIGURE 7-3 Sensitivity and specificity.
identify more people at risk than is actually the case,
given that the intervention is relatively noninvasive
and they do not want to miss people who might fall. In
contrast, in a study related to autism diagnosis, the researcher might desire greater specificity, because there is
the potential for greater negative consequences by misdiagnosing individuals.
Sensitivity and specificity data are useful when examining the validity of provocative tests used to diagnose
physiological abnormalities. With a provocative test,
an abnormality is induced through a manipulation that
Determining Sensitivity
and Specificity (LO3)
McFarland, Kim, and Savino (2012) calculated
sensitivity and specificity by comparing results of
O’Brien’s Test to labral tears identified with diagnostic arthroscopy. There were 371 controls (individuals
without a tear) and 38 individuals with a tear. The
2 ⫻ 2 table from the study would look like this:
Sensitivity and Specificity for O’Brien’s Test
Diagnostic Arthroscopy
Sensitivity and specificity are relevant only for tests
that result in dichotomous decisions (i.e., the condition does or does not exist). There are formulas
for sensitivity and specificity based on a 2 ⫻ 2 table
(Table 7-1).
Sensitivity ⫽ a/(a ⫹ c)
Specificity ⫽ d/(b ⫹ d)
4366_Ch07_127-144.indd 137
O’Brien’s Test
Understanding Statistics 7-5
Result
Tear
No Tear
Positive
18
168
Negative
20
203
Data from: McFarland, E. G., Kim, T. K., & Savino, R. M. (2012). Clinical
assessment of three common tests for superior labral anterior-posterior
lesions. Archives of Clinical Neuropsychology, 27, 781–789.
02/11/16 11:38 AM
138
CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
QUESTIONS
1. Use the formula to calculate:
Sensitivity =
Specificity =
2. Based on the results of the McFarland et al (2012)
study, what conclusions would you draw about the sensitivity and specificity of O’Brien’s Test?
Tests with continuous data can still provide dichotomous results by establishing a cut-off score at which the
condition is classified as present. For example, Wang and
colleagues (2012) developed the Route Map Recall Test
(RMRT) to predict “getting lost” behavior in individuals
with dementia. The test involves using a pen to designate the route on a paper map. The total possible score is
104, and the developers established a cut-off score of 93.5,
meaning that individuals who scored below the “cut”
score were at risk of getting lost. Scores on the RMRT
were then compared to actual getting-lost behavior, as
reported by caregivers of individuals with mild dementia.
The results indicated a sensitivity of 100% and specificity
of 67%; all of the individuals who did get lost were accurately identified, but many individuals who did not get
lost were misclassified. From the Evidence 7-3 provides
another example of sensitivity/specificity analysis with a
swallow test (Crary et al, 2013).
Relationship Between Reliability
and Validity
It is possible for a test to have excellent reliability but poor
validity. The test may provide consistent results (i.e., have
good inter-rater and test-retest reliability), but still not
test what it is intended to test. However, a test with poor
reliability can never have good validity, because reliability
affects validity. If a test lacks consistency and stability, it
cannot assess what it is intended to assess.
RESPONSIVENESS
Therapists frequently use measures that determine
whether or not an individual has changed or improved
during an intervention period or as a part of the natural healing or development process. In this case, it is important to use a measure that is responsive: A responsive
measure is one that can detect change.
Some characteristics of measures, known as floor and
ceiling effects, can interfere with responsiveness. A floor
effect means that a test is so difficult or the construct is
4366_Ch07_127-144.indd 138
so rare that almost all individuals receive the very lowest
score. In this case, even when change occurs, the test may
not be capable of identifying that a difference exists. For
example, if an assessment measures clients’ driving ability
by the number of car collisions they had during the previous year, a researcher may identify many individuals with
no accidents; however, this does not necessarily mean that
these individuals have no driving problems.
With a ceiling effect, the test is too easy; everyone gets
a very high score to begin with, and there is no room for
improvement. For example, if a measure of independent
living only includes the simpler tasks, such as dressing and
feeding, and excludes more complex tasks, such as cooking
and money management, depending on the sample, many
individuals might get a perfect score, even though they
have impairments in performing more complex activities.
When children with disabilities are compared to typically developing children on a norm-referenced test, it
may be difficult to detect change, because the child’s improvement will likely not keep pace with typical development. Even with consistent progress, children with a
disability may still fall behind their typically developing
peers. The inability to detect change in the children with
disabilities on certain norm-referenced tests is due to the
floor effect of the instrument. In this situation and others
like it, criterion-referenced tests are more useful for detecting improvement over time.
Another reason why criterion-referenced tests are
more sensitive to change is that therapy is often focused
on the criterion as opposed to a more general skill area.
For example, a speech-language therapist may work on
specific vocabulary words (the criterion) and then test on
the words that were taught. In another example, an occupational therapist may work on cooking skills (independence in cooking is the criterion) and then specifically test
improvement in cooking ability.
There is no general consensus as to what constitutes a
responsive measure. Husted, Cook, Farewell, and Gladman
(2000) studied the difference between internal responsiveness and external responsiveness. Internal responsiveness
means that a measure is able to detect change when a known
effective treatment is assessed. For example, a systematic
review of the Fugl Meyer Assessment of Motor Recovery
After Stroke consistently found differences in clients receiving constraint-induced motor therapy (Shi, Tian, Yang, &
Zhao, 2011). In this study, there was a statistically significant
difference in the Fugl Meyer scores before and after the
intervention. A statistically significant difference reflects
internal responsiveness and means that, when scores were
compared before and after treatment, there was a statistical
difference. However, the magnitude of the difference may
not be great enough for the clinician or client to find the
difference meaningful. A clinically significant difference
is a change that would be regarded by clinicians and the
client as meaningful and important.
02/11/16 11:38 AM
CHAPTER 7
●
Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
139
FROM THE EVIDENCE 7-3
Example of Sensitivity/Specificity Analysis
Crary, M. A., Carnaby, G. D., Sia, I., Khanna, A., & Waters, M. F. (2013). Spontaneous swallowing frequency has potential to identify
dysphagia in acute stroke. Stroke, 44(12), 3452–3457. doi:10.1161/STROKEAHA.113.003048.
Note A: The established cut-off value
of 0.40 is based on an average
number of swallows/minute during a
30-minute time period.
BACKGROUND AND PURPOSE: Spontaneous swallowing frequency has been
described as an index of dysphagia in various health conditions. This study
evaluated the potential of spontaneous swallow frequency analysis as a screening
protocol for dysphagia in acute stroke.
METHODS: In a cohort of 63 acute stroke cases, swallow frequency rates
(swallows per minute [SPM]) were compared with stroke and swallow severity
indices, age, time from stroke to assessment, and consciousness level. Mean
differences in SPM were compared between patients with versus without clinically
significant dysphagia. Receiver operating characteristic curve analysis was used to
identify the optimal threshold in SPM, which was compared with a validated clinical
dysphagia examination for identification of dysphagia cases. Time series analysis
was used to identify the minimally adequate time period to complete spontaneous
swallow frequency analysis.
RESULTS: SPM correlated significantly with stroke and swallow severity indices
but not with age, time from stroke onset, or consciousness level. Patients with
dysphagia demonstrated significantly lower SPM rates. SPM differed by dysphagia
severity. Receiver operating characteristic curve analysis yielded a threshold of
SPM 0.40 that identified dysphagia (per the criterion referent) with 0.96
sensitivity, 0.68 specificity, and 0.96 negative predictive value. Time series
analysis indicated that a 5- to 10-minute sampling window was sufficient to
calculate spontaneous swallow frequency to identify dysphagia cases in acute
stroke.
CONCLUSIONS: Spontaneous swallowing frequency presents high potential to
screen for dysphagia in acute stroke without the need for trained, available
personnel.
Note B: Sensitivity is higher than
specificity; thus, there are more
likely to be false positives than
false negatives.
FTE 7-3 Question In this study, sensitivity is better than specificity, with a cut-off score of less than 0.40. Would you
raise or lower the cut-off score to improve specificity? Why?
4366_Ch07_127-144.indd 139
02/11/16 11:38 AM
140
CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
External responsiveness describes a measure that is
able to detect a clinically significant difference. One term
used to describe external responsiveness is minimally
clinically important difference (MCID), or the amount
of change on a particular measure that is deemed clinically important to the client. There are different methods for computing MCID, but typically the MCID index
includes ratings from the client as to what constitutes
meaningful change. For example, MCID was computed
for the Six-Minute Walk test for individuals with chronic
obstructive pulmonary disease (COPD) by utilizing the
test before and after a rehabilitation program, and asking
participants to rate the amount of change in their own
walking after participating in the program (Holland et al,
2010). Individuals were then identified as having expressed
that they made no change, some change, or substantial
change. Using several methods, the researchers estimated
a MICD of 25 meters; that is, someone would need to
improve at least 25 meters on the Six-Minute Walk test
for the change to be deemed clinically important.
EXERCISE 7-3
Identifying Reliability, Validity,
and Responsiveness Data
From Studies of Measures (LO4)
3. Sears and Chung (2010) found that, when using the
Michigan Hand Outcomes questionnaire (MHQ) as
the gold standard, the Jebson Taylor Hand Function
Test (JTT) tended to miss problems with hand function. Many patients with high scores on the JTT did
not have high scores on the MHQ.
EXERCISE 7-4
Matching the Psychometric Properties
of a Measure With the Necessary
Qualities of the Measure (LO5)
QUESTIONS
For the following situations, identify which psychometric
property is most important when selecting the measure, and
explain why.
1. Different therapists will be using the same measure to
make judgments about client progress.
QUESTIONS
Based on your knowledge of how studies of assessments are
designed and the statistical analyses that are done, identify
the type of psychometric properties (test-retest reliability,
inter-rater reliability, internal consistency, validity, sensitivity and specificity, responsiveness) evaluated in the
following studies:
1. In a study of the Bruininks-Oseretsky Test of Motor
Proficiency-Second Edition, Wuang and Su (2009)
tested 100 children with intellectual disability at three
points in time. They found an ICC of 0.99 and an
alpha of 0.92.
2. A study of the Tinnitus Questionnaire (Adamchic
et al, 2012) found that an improvement of -5 was
enough improvement to be clinically meaningful.
4366_Ch07_127-144.indd 140
2. It is most important that the practitioner not incorrectly diagnose someone as having a cognitive
impairment.
3. Identify small changes that occur from before to after
an intervention program.
4. Accurately assess cognition without involvement of
motor abilities.
02/11/16 11:38 AM
CHAPTER 7
●
Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
CRITICAL THINKING QUESTIONS
141
6. Can a test be equally sensitive and specific? Why, or
why not?
1. In what situations would a therapist select a normreferenced test, and when is a criterion-referenced
test more appropriate?
7. Why is sensitivity/specificity a type of validity?
2. How does standardization increase the reliability of
a test?
8. Is it important for all measures to have good responsiveness? Why, or why not?
3. What types of measures are most relevant for studies
of internal consistency?
ANSWERS
4. What are the features of the three primary psychometric properties of reliability, validity, and
responsiveness?
5. Explain the statement: “All validity is construct
validity.”
4366_Ch07_127-144.indd 141
EXERCISE 7-1
1. Continuous data: Individuals are scored in terms
of how well they perform on a number of tasks.
Norm-referenced measure: The scores are compared
against a standard, such as community-dwelling adults
presented in the example.
2. Discrete data: Although different numbers are used to
label each rating, the numbers represent a particular
type of response and not a continuous score. The rating is more of a rank order than a total score. Criterionreferenced measure: The level of coma is described as
criteria met, and the individual is not compared with a
normative sample.
3. Continuous data: Although each item is rated dichotomously, a total subscale score is calculated based
on the number correct. Norm-referenced measure:
Whenever age equivalents are used, the raw score is
compared to a normative sample.
02/11/16 11:38 AM
142
CHAPTER 7 ● Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
4. Discrete data: This is a dichotomous ranking of
whether the diagnosis is present or not present.
Criterion-referenced measure: A testee is not compared to a normative sample; rather, the individual
meets the criteria for the diagnosis or does not meet
the criteria for the diagnosis.
EXERCISE 7-2
1. Sensitivity: 18/(18 ⫹ 20) ⫽ 47%; Specificity: 203/(203 ⫹
168) ⫽ 55%
2. These numbers indicate that the O’Brien’s test was
neither very sensitive nor very specific. Many people were misclassified as false positives and false
negatives.
EXERCISE 7-3
1. Test-retest reliability and internal consistency
2. Responsivity
3. Validity
EXERCISE 7-4
1. Inter-rater reliability: Getting consistent scores from
different therapists requires a measure with good
inter-rater reliability.
2. Specificity: The practitioner wants to avoid false positives, and a measure with good specificity will limit the
false positives.
3. Responsivity: To detect changes or improvement, one
needs a measure that is responsive to change.
4. Validity (specifically convergent and divergent ability):
Practitioners want the measure to be related to
other cognitive measures, but not measures of motor
ability.
FROM THE EVIDENCE 7-1
There was perfect consistency between the raters for
every individual who was assessed; that is, the raters
scored the servicemembers in the same way. This does not
necessarily mean that they provided the exact same score,
but if rater 1 gave a high score, then rater 2 gave the same
individual a high score in relation to the other individuals
that tester rated. For test-retest reliability, 97 percent of
the variability is accounted for by the true score and only
3 percent is due to error variance.
FROM THE EVIDENCE 7-2
The subscale for stereotype agreement has the weakest internal consistency, at 0.70; the subscale for stigma
self-concurrence has the strongest, at 0.89. All levels are
considered adequate if using the criteria of Nunnally
(1978), where the minimum standard for Cronbach’s
alpha is 0.70.
4366_Ch07_127-144.indd 142
FROM THE EVIDENCE 7-3
You would raise the swallow rate because fewer swallows suggest more dysphagia. If you raise the number of
swallows, you will reduce sensitivity (i.e., identify fewer
people with dysphagia) but increase the specificity (i.e.,
avoid identifying people with dysphagia who do not really
have dysphagia).
REFERENCES
Adamchic, I., Tass, P. A., Langquth, B., Hauptmann, C., Koller, M.,
Schecklmann, M., Zeman, F., & Landgrebe, M. (2012). Linking
the Tinnitus Questionnaire and the subjective Clinical Global
Impression: Which differences are clinically important. Health and
Quality of Life Outcomes, 10, 79.
Baum, C. M., Morrison, T., Hahn, M., & Edwards, D. (2003). Executive
Function Performance Test: Test protocol booklet. St. Louis, MO: Washington University School of Medicine, Program in Occupational Therapy.
Boyle, M. P. (2013). Assessment of stigma associated with stuttering: Development and evaluation of the Self-Stigma of Stuttering
Scale (4S). Journal of Speech and Language and Hearing Research, 56,
1517–1529.
Brown, C., & Dunn, W. (2002). Adolescent/Adult Sensory Profile. San
Antonio, TX: Psychological Corp.
Brown, T., Leo, M., & Austin, D. W. (2008). Discriminant validity of
the Sensory Profile in Australian children with autism spectrum
disorder. Physical and Occupational Therapy in Pediatrics, 28, 253–266.
Carifio, J., & Perla, R. J. (2007). Ten common misunderstandings,
misconceptions, persistent myths and urban legends about Likert
scales and Likert response formats and their antidotes. Journal of
Social Sciences, 3, 106–116.
Crary, M. A., Carnaby, G. D., Sia, I., Khanna, A., & Waters, M. F.
(2013, December). Spontaneous swallowing frequency has potential to identify dysphagia in acute stroke. Stroke, 44(12), 3452–3457
[Epub 2013 October 22]. doi:10.1161/STROKEAHA.113.003048
Dumas, H. M., & Fragala-Pinkham, M. A. (2012). Concurrent validity
and reliability of the Pediatric Evaluation of Disability InventoryComputer Adaptive Test mobility domain. Pediatric Physical Therapy,
24, 171–176.
Gailey, R. S., Gaunaurd, I. A., Raya, M. A., Roach, K. E., Linberg, A. A.,
Campbell, S. M., Jayne, D. M., & Scoville, C. (2013). Development
and reliability testing of the Comprehensive High-Level Activity
Mobility Predictor (CHAMP) in male servicemembers with traumatic lower-limb loss. Journal of Rehabilitation Research & Development, 50(7), 905–18. doi:10.1682/JRRD.2012.05.0099
Giesinger, J. M., Kuster, M. S., Behrend, H., & Giesinger, K. (2013).
Association of psychological status and patient-reported physical
outcome measures in joint arthroplasty: A lack of divergent validity.
Health and Quality of Life Outcomes, 11, 64.
Holland, A. E., Hill, C. J., Rasekaba, T., Lee, A., Naughton, M. T., &
McDonald, C. F. (2010). Updating the minimal importance difference for Six-Minute Walk distance in patients with cardiopulmonary
disorder. Archives of Physical Medicine and Rehabilitation, 91, 221–225.
Husted, J. A., Cook, R. J., Farewell, V. T., & Gladman, D. D. (2000).
Methods for assessing responsiveness: A critical review and recommendation. Journal of Clinical Epidemiology, 53, 459–468.
Lance, C.E., Butts, M.M., & Michels, L.C. (2006). The sources of
four commonly reported cutoff criteria: What did they really say?
Organizational Research Methods, 9, 202–220.
McFarland, E. G., Kim, T. K., & Savino, R. M. (2012). Clinical assessment of three common tests for superior labral anterior-posterior
lesions. Archives of Clinical Neuropsychology, 27, 781–789.
02/11/16 11:38 AM
CHAPTER 7
●
Using the Evidence to Evaluate Measurement Studies and Select Appropriate Tests
Messick, S. (1995). Standards of validity and the validity of standards
in performance assessment. Educational Measurement: Issues and
Practice, 14(4), 5–8.
Nunnally, C. (1978). Psychometric theory (2nd ed.). New York, NY:
McGraw-Hill.
Pankratz, M. E. (2007). The diagnostic and predictive validity of the
Renfrew bus story. Language, Speech & Hearing Services in Schools,
38(4), 390.
Sears, E. D., & Chung, K. C. (2010). Validity and responsiveness of the
Jebsen-Taylor Hand Function Test. Journal of Hand Surgery, 35, 30–37.
Shi, Y. X., Tian, J. H., Yang, K. H., & Zhao, Y. (2011). Modified
constraint-induced movement therapy versus traditional rehabilitation in patients with upper-extremity dysfunction after stroke:
A systematic review and meta-analysis. Archives of Physical Medicine
and Rehabilitation, 92, 972–982.
Stark, T., Walker, B., Phillips, J. K., Feier, R., & Beck, R. (2011). Handheld dynamometry correlation with the gold standard isokinetic
dynamometry: A systematic review. Physical Medicine and Rehabilitation, 3, 472–479.
4366_Ch07_127-144.indd 143
143
Teasdale, G. M., & Jennett, B. (1976). Assessment and prognosis of
coma after head injury. Acta Neurochirurgica, 34, 45–55.
Walker-Bone, K. E., Palmer, K. T., Reading, I., & Cooper, C. (2003).
Criteria for assessing pain and nonarticular soft-tissue rheumatic
disorders of the neck and upper limb. Seminars in Arthritis and
Rheumatism, 33, 168–184.
Wang, T. Y., Kuo, Y. C., Ma, H. I., Lee, C. C., & Pai, M. C. (2012).
Validation of the Route Map Recall Test for getting lost behavior in
Alzheimer’s disease patients. Archives of Clinical Neuropsychology, 27,
781–789.
Ward, J., Pivko, S., Brooks, G., & Parkin, K. (2011). Validity of the
Stroke Rehabilitation Assessment of Movement Scale in acute
rehabilitation: A comparison with the Functional Independence
Measure and Stroke Impact Scale-16. Physical Medicine and Rehabilitation, 3, 1013–1021.
Wuang, Y. P., & Su, C. Y. (2009). Reliability and responsiveness of the
Bruininks-Oseretsky Test of Motor Proficiency-Second Edition
in children with intellectual disability. Research in Developmental
Disabilities, 30, 847–855.
02/11/16 11:38 AM
4366_Ch07_127-144.indd 144
02/11/16 11:38 AM
“The fact that an opinion has been widely held is no evidence whatever that it is
not utterly absurd; indeed, in view of the silliness of the majority of mankind, a
widespread belief is more likely to be foolish than sensible.”
—Bertrand Russell (1872-1970) a Nobel prize winner for literature (Russell was a philosopher, mathematician
and social activist)
8
Descriptive and Predictive
Research Designs
Understanding Conditions
and Making Clinical Predictions
CHAPTER OUTLINE
LEARNING OUTCOMES
KEY TERMS
Predictive Studies Using Group Comparison
Methods
INTRODUCTION
Case-Control Studies
DESCRIPTIVE RESEARCH FOR UNDERSTANDING
CONDITIONS AND POPULATIONS
Cohort Studies
Incidence and Prevalence Studies
Group Comparison Studies
Survey Research
STUDY DESIGNS TO PREDICT AN OUTCOME
Predictive Studies Using Correlational Methods
Simple Prediction Between Two Variables
EVALUATING DESCRIPTIVE AND PREDICTIVE
STUDIES
LEVELS OF EVIDENCE FOR PROGNOSTIC
STUDIES
CRITICAL THINKING QUESTIONS
ANSWERS
REFERENCES
Multiple Predictors for a Single Outcome
LEARNING OUTCOMES
1. Identify research designs that answer clinical descriptive and predictive questions.
2. Identify the appropriate statistical analysis for specific clinical descriptive and predictive research questions.
3. Evaluate the strength of the evidence of a given published study.
145
4366_Ch08_145-162.indd 145
28/10/16 12:11 pm
146
CHAPTER 8 ● Descriptive and Predictive Research Designs
KEY TERMS
case-control design
odds ratio
correlation
predictive studies
correlational
studies
prevalence
descriptive
studies
epidemiology
ex post facto
comparison
hazard ratio
incidence
multicollinearity
prospective cohort
study
response rate
Incidence and Prevalence Studies
retrospective cohort
study
Epidemiology, which is the study of health conditions
in populations, includes descriptive research methods
aimed at identifying the incidence and prevalence of specific conditions. As a body of research intended to describe a population, these studies often use large numbers
of research participants. To obtain large samples of data
for epidemiological studies, researchers may access major
health-care databases such as the National Health and
Nutrition Examination Surveys (NHANES) (Centers for
Disease Control and Prevention [CDC], n.d.) and the
Centers for Medicare & Medicaid Services (CMS, n.d.).
NHANES includes a large national sample of individuals
who have participated in health interviews and physical
examinations, and CMS maintains data on the health services and outcomes of Medicare and Medicaid recipients.
Rehabilitation professionals are particularly interested
in epidemiological studies that consider the functional
aspects of health conditions, such as the incidence and
prevalence of language impairments, ambulation problems, and activity limitations.
Incidence is the frequency of new occurrences of a
condition during a specific time period. Incidence is calculated as the number of new cases during a time period, divided by the total population at risk. The focus on
“new” cases is a distinguishing characteristic of incidence
studies. For example, Huang et al (2014) studied the incidence of dysphagia after surgery in children with traumatic brain injury. They collected data for 8 years, from
2000 to 2008, and found an incidence of 12.3 percent.
This means that, of all of the children in the study sample who experienced surgery for traumatic brain injury,
12.3 percent developed dysphagia. This information is
useful to therapists for anticipating the number of children with traumatic brain injury who may need swallowing assessment and treatment.
Prevalence refers to the number of individuals in a
population who have a specific condition at a given point
in time, regardless of onset. Prevalence is a measure of
how widespread a condition is, whereas incidence provides an estimation of the risk of developing the condition. Prevalence is calculated as the number of cases at a
given time point, divided by the total population at risk.
For example, Lingam et al (2009) used data from a larger
risk ratio (RR)
survey research
systematic review
multiple logistic
regression
INTRODUCTION
uppose you have been working for several years in an
orthopedic practice, primarily with clients who have experienced fractures, traumatic and repetitive use injuries, and
joint replacements. You have decided to take a new job in an
inpatient neurological rehabilitation unit and wish to prepare
yourself by learning about this new population. Although it
will be important to examine the evidence regarding interventions, first it would be helpful to learn more about the
population itself.
Descriptive studies explain health conditions and provide information about the incidence and prevalence of
certain conditions within a diagnostic group, such as the
prevalence of left-sided neglect in stroke and the incidence
of skin breakdown in hospitalized patients with spinal cord
injury. Descriptive studies can also provide practitioners with
information about comorbidities that are common with a
particular condition.
Equally important are predictive studies, which provide
information about factors that are related to a particular outcome. For example, a predictive study might identify what
activities of daily living are most important for a successful
discharge to home and which motor abilities are most likely
to result in independence in mobility.
This chapter describes specific nonexperimental designs
and statistical analyses used in descriptive and predictive
studies. In addition, it identifies characteristics that indicate
a strong design and provides a levels-of-evidence hierarchy
for predictive studies.
4366_Ch08_145-162.indd 146
Studies that are undertaken to understand conditions and
populations are descriptive in nature. They are intended
to observe the naturally occurring characteristics of individuals. Because no variables are manipulated in these
studies, they are considered nonexperimental.
response bias
multiple linear
regression
S
DESCRIPTIVE RESEARCH
FOR UNDERSTANDING
CONDITIONS AND POPULATIONS
28/10/16 12:11 pm
CHAPTER 8 ● Descriptive and Predictive Research Designs
Understanding Statistics 8-1
Incidence is measured using frequency statistics,
typically expressed in terms of a percentage or proportion. It is calculated using the following formula:
Number of new cases
during a time period
Incidence =
Total population
at risk
For example, from the study of dysphagia (Huang
et al, 2014), the denominator in the equation was
based on the 6,290 children who constituted the
total study sample. The study found that 775 had
severe dysphagia, which became the numerator in
the equation. The incidence of 12.3% was determined with this simple calculation:
Incidence of dysphagia = 775/6,290
Understanding Statistics 8-2
Like incidence, prevalence uses frequency statistics
such as percentages and proportions. The formula
is expressed as:
Number of cases at
a given time point
Prevalence =
Total population
at risk
Using the Lingam et al (2009) study as an
example, the researchers had a total sample of
6,990 children and found that 119 met the criteria
for developmental coordination disorder. Prevalence =
119/6,990, or 1.7% (17 out of 1,000 children).
project called the Avon Longitudinal Study of Parents
and Children to identify children with developmental
coordination disorder. The study collected data on manual dexterity, ball skills, and balance, and found that 1.7
percent, or 17 out of 1,000 children, met the criteria for
developmental coordination disorder. Prevalence is often
expressed as a whole number of individuals with the condition, such as 17 out of 1,000 children.
Incidence and prevalence studies help practitioners
know how widespread a particular condition is and how
likely someone is to develop it. This information can then
be used in intervention planning and implementation. For
example, knowing that the incidence of developing pressure sores is high in individuals with spinal cord injuries,
4366_Ch08_145-162.indd 147
147
practitioners can provide education and seating to help
prevent the occurrence of this problem.
Subgroups within a population, such as gender and
age, are often analyzed in prevalence and incidence studies. For example, the CDC (2012) reported that in 2010
the prevalence for stroke was 2.6 percent; however, the
rate varied by race and was highest among American Indians at 5.9 percent and lowest among Asians at 1.5 percent.
Prevalence is influenced by both incidence and the duration of the disease; prevalence is much higher than incidence with chronic conditions because of the extended
duration of the disease. Consider the difference between
incidence and prevalence rates when comparing hip fractures with spinal cord injury. If you have a hip fracture in
2015, it will heal, and you will no longer have a hip fracture in 2017. However, if you have a spinal cord injury in
2015, in most cases you will still have a spinal cord injury
in 2017. Incidence and prevalence will be more similar for
hip fractures, because of the acute nature of the condition,
and less similar for spinal cord injury.
With incidence and prevalence studies it is important
to recognize that, as with all nonexperimental studies, it is
not possible to make conclusions regarding causation. For
example, even though one study found that the highest
prevalence for obesity in terms of occupational categories
included health care (49.2 percent) and transportation
workers (46.6 percent), it would not be logical or correct
to assume that these jobs cause obesity (Gu et al, 2014).
The large samples needed in epidemiological studies
often result in the use of efficient data collection methods
such as surveys. The methods by which data are collected
should be carefully considered in incidence and prevalence
studies; for example, physical and/or objective measures
are more likely to provide reliable data than self-reported
data that come from a survey. The table in From the
Evidence 8-1 comes from a study that compared different reporting methods to collect data on the prevalence of
lymphedema following breast cancer (Bulley et al, 2013).
Perometry, an objective measure of swelling, was compared with two self-reported measures. The study found
different prevalence rates depending on the data collection
method used.
Group Comparison Studies
In health-care research, descriptive studies often compare
a disability group to a group of individuals without a disability. These cross-sectional designs that compare two or
more groups at one point in time, sometimes referred to as
ex post facto comparisons, can answer important questions about the differences between groups. These designs
are not experimental because the independent variable of
group assignment is not manipulated; instead, the difference in groups is based on pre-existing conditions.
Therapists are particularly interested in descriptive studies that characterize the functional aspects of a
28/10/16 12:11 pm
148
CHAPTER 8 ● Descriptive and Predictive Research Designs
FROM THE EVIDENCE 8-1
Prevalence Rates by Reporting Method
Bulley, C., Gaal, S., Coutts, F., Blyth, C., Jack, W., Chetty, U., Barber, M., & Tan, C. W. (2013). Comparison of breast cancer-related
lymphedema (upper limb swelling) prevalence estimated using objective and subjective criteria and relationship with quality of life.
Biomedical Research International, 807569. doi:10.1155/2013/807569.
Table 2
Prevalence of Lymphedema According
to Each Measurement Tool.
Method of
Measurement
Prevalence:
all available data
for each tool
% frequency
Perometry
26.2
LBCQa
23.9
MSTb
20.5
bMST:
aLBCQ:
Note A:
Difference in
prevalence rates
based on
measurement
method.
Morbidity screening tool.
Lymphedema and breast cancer
questionnaire.
FTE 8-1 Question How might the different findings from the Bulley et al study influence your clinical decisionmaking about assessment in lymphedema?
disability. Studies that compare individuals with and
without a disability on measures of, for example, mobility, instrumental activities of daily living, and language
provide information on what to expect with a given
diagnosis.
The table in From the Evidence 8-2 compares three
age groups of healthy controls and individuals with stroke
using a one-way ANOVA.
Developmental research with a longitudinal design
may also use a group comparison approach. As an example,
Understanding Statistics 8-3
When means are compared between groups, a t-test
for independent samples is used for two groups,
whereas a one-way ANOVA is used to compare three
or more groups. In the results section of a study, the
t-test may be described in terms of the t-value, degrees of freedom, and p value (statistical significance).
The ANOVA provides similar results, but instead of
a t-value, an F-value is used. A strong study results
section will also provide the mean scores and standard
deviations for each group.
A Chi-Square analysis is used to compare frequencies
between groups; in this case, the frequency (percentage)
4366_Ch08_145-162.indd 148
with which something happens is compared between
groups. The results are described using a Chi-Square
value (F), degrees of freedom (df), and p value. For example, McKinnon (1991) looked at assistance needed for
activities of daily living in elderly Canadians. Individuals
were categorized as needing or not needing assistance. In
an age and gender comparison, elderly women (61.5%)
were much more likely than elderly men (41.8%) to
require assistance with yardwork (F = 550.7, df = 5,
p < 0.00001), whereas elderly men (81.7%) were more
likely than elderly women (40.4%) to require assistance
with housework (F = 739.3, df = 3, p < 0.00001).
28/10/16 12:11 pm
CHAPTER 8 ● Descriptive and Predictive Research Designs
149
FROM THE EVIDENCE 8-2
Comparison of Groups Using One-Way ANOVA
Rand, D., Katz, N., & Weiss, P. L. (2007). Evaluation of virtual shopping in the VMall: Comparison of post-stroke participants to healthy
control groups. Disability Rehabilitation, 29, 1719–1720.
Table III. Time to complete the four-item shopping
task comparison between groups.
Group
Children
Young adults
Older adults
Stroke patients
Time to shop for four items (min)
Mean ± SD (range)
4.6 ± 1.9 (2.7–10.6)
5.1 ± 2.4 (2.3–15.3)
5.9 ± 2.7 (3.24–15.3)
11.2 ± 3.3 (5.6–16.20)
One-way ANOVA – F(3,97) = 23.28, P < 0.000.
Note B: The numbers in parentheses
are the degrees of freedom, which is
based on the number of groups – 1
and the number of participants.
FTE 8-2 Question
What does the table tell you about individuals with stroke and their ability to shop?
therapists working with children need to know how diseases and disabilities can affect patterns of growth and
development. Studies that compare typical children with
a disability group over time can answer these descriptive
questions. In a 2013 study, Martin et al compared different areas of language production in boys with fragile X,
boys with Down syndrome, and typically developing
boys. Data collected over three years suggested that the
disability groups had different types of language issues.
Boys with Down syndrome had more problems related
to syntax, whereas boys with fragile X experienced more
challenges related to pragmatic language.
In ex post facto group comparisons, the lack of random
assignment and manipulation of the independent variable
present potential threats to validity. Differences between
the groups may exist outside of the dependent variable of
interest and confound the comparison. For example, when
4366_Ch08_145-162.indd 149
Note A:
Individuals with
stroke took
much longer to
shop than
healthy controls.
comparing individuals with and without schizophrenia on
a fine motor measure, factors other than the health condition, such as medication side effects, could explain the
differences. In group comparison studies with preassigned
groups, it is important to match the groups on as many
potentially confounding variables as possible. However, in
some cases that may be difficult to do, such as in the preceding example regarding medication issues. Nevertheless,
group comparison studies of existing groups help therapists
better understand the characteristics of disability groups.
Survey Research
Survey research is a common approach to gathering
descriptive information about health conditions. In survey research a questionnaire is administered via mail,
electronic media, telephone, or face-to-face contact.
28/10/16 12:11 pm
150
CHAPTER 8 ● Descriptive and Predictive Research Designs
A major advantage of survey research is the ease with
which large amounts of data can be collected, particularly when surveys are administered electronically. Surveys may be used to gather incidence and prevalence
data, but they can be extended beyond these areas of
research to describe virtually any phenomenon that can
be investigated using a questionnaire. Another advantage of survey research is the opportunity to use random
sampling methods, as it is possible to reach individuals
in different geographic locations.
A major consideration in survey research is response
rate, or the percentage of individuals who return a survey
based on the total numbers of surveys administered. If
large numbers of individuals choose not to respond to a
survey, it is likely that a response bias exists. Response
bias is a general term for a measurement error that creates inaccuracy in the survey results. In the case of few
responders to a survey, the responders may be different
from nonresponders. Perhaps the nonresponders do not
want to share sensitive information, or maybe they are less
motivated to participate. For example, a survey to identify
binge drinking in college students may have a poor response rate among the binge drinkers, which would make
the findings unreliable. Surveys related to alcohol use have
notoriously high nonresponse rates (Meiklejohn, Connor,
& Kypri, 2012). Survey researchers typically make multiple contacts to try to reach nonresponders, but even these
efforts can be unsuccessful.
Although there is no universally accepted response rate,
some journals now require a certain level of response before they will publish a study. For example, the American
Journal of Pharmacy Education requires a response rate of
60 percent for most research, and 80 percent representation of all colleges for educational research (Finchman,
2008). However, when response rates are low, researchers
may be able to provide data to demonstrate that the sample is similar to the population by comparing important
demographic characteristics.
Another factor affecting the reliability of survey research involves self-reporting issues. People generally
have a desire to present themselves in a favorable light,
and thus may underreport certain conditions. For example, in a study of self-reported body mass index, researchers found that many individuals underreported or refused
to report their body mass index (Chau et al, 2013).
Some methods to reduce nonresponse and underreporting are face-to-face interviews, training of the
interviewers in establishing rapport, well-designed questionnaires, and the use of objective measures to verify
self-reports.
Most surveys used in research are cross-sectional in nature, meaning they gather data at a single point in time.
Other surveys are longitudinal and collect data over periods of years or decades. The NHANES (CDC, n.d.) is a
large health-care database that uses multiple strategies to
increase the reliability and validity of the data. Thousands
of individuals are included in the study, and sampling is
4366_Ch08_145-162.indd 150
carefully done to represent the U.S. population. Face-toface interviews are conducted in participants’ homes by
trained interviewers, and the survey data are supported
by physical examinations conducted in mobile stations.
More recently, data collection has included physical activity monitoring with an accelerometer. A portion of the
participants are followed for years.
The NHANES data have contributed to many important research findings. One example is the first significant measurement of physical activity data in the
United States, which found that actual physical activity
was much less than self-reported physical activity, and
less than 5 percent of Americans get the recommended
30 minutes of physical activity per day. Of course, the
NHANES project requires large resources in time,
money, and people to gather the data.
STUDY DESIGNS TO PREDICT
AN OUTCOME
Several different research designs can be used to answer
questions about predicting an outcome. The two major
categories of such studies are (1) studies that use correlational methods, and (2) studies using group comparison
methods. In all cases, the purpose is to identify factors
that are most predictive of an outcome. For example, if
therapists work with young adults with autism who are
trying to find a job, they would be interested in factors
that predict employment. They may wonder if social skills
or cognitive skills are more important in obtaining and
maintaining employment. Both correlational studies and
group comparison studies can answer such questions.
Predictive Studies Using
Correlational Methods
One way of predicting outcomes is through the use of
correlational studies that examine the relationships
between variables. These designs include predictor and
outcome variables within a target sample and offer one
approach for making a prognosis or predicting an outcome. Group comparison studies are another approach,
as discussed in the following text.
Simple Prediction Between Two Variables
In a simple correlation, the association between two
variables is determined. Correlational studies are crosssectional, with data collected at a single point in time. At least
two measures are administered and related; however, multiple measures can be administered, with the results presented
in a correlation matrix. For example, a study examining factors associated with quality of life in clients with multiple
sclerosis could correlate quality of life with several other
variables, such as fatigue, depression, and strength. Each
relationship between two variables results in a correlation;
28/10/16 12:11 pm
CHAPTER 8 ● Descriptive and Predictive Research Designs
that is, correlations for quality of life and fatigue, quality of
life and depression, and quality of life and strength.
Often researchers examine the relationships among predictors as well. Using the same example, the study could
result in correlations for fatigue and depression, fatigue and
strength, and depression and strength. Figure 8-1 illustrates the relationship between fatigue and quality of life.
As mentioned previously, correlational studies may administer multiple measures and explore the relationships
among all of these measures. In that case, the results should
be presented in a correlation matrix. A hypothetical correlation matrix based on the example of predictors of quality of life in multiple sclerosis is illustrated in Table 8-1.
Multiple Predictors for a Single Outcome
Complex correlational designs look at multiple predictors
for a single outcome. The same design is used to collect the
data as with a simple correlation, but the analysis is conducted differently. When sufficient cases and a robust design are in place, a regression analysis is used. Larger sample
sizes (often in the hundreds) are needed with multiple predictors so that the results are stable. With small samples,
the predictors found in one study are not likely to be the
same in another study. Using the previous example, multiple measures (i.e., quality of life, fatigue, depression, and
strength) are administered to the target population. Rather
than looking at two variables at a time, multiple variables
are entered into the regression equation. These studies have
the advantage of examining the total amount of variance
accounted for by multiple predictors and the relative importance of each individual variable as a predictor.
Take quality of life as the outcome, for example. One
would expect that many factors contribute to quality of
life, and that many factors taken together (e.g., fatigue,
depression, and strength) would be a better predictor than
151
Understanding Statistics 8-4
As discussed in Chapter 4, correlation statistics describe the relationship between two variables. The
Pearson Product Moment Correlation is the most
common correlation statistic used to examine the
strength of the relationship between two continuous
variables (e.g., speed and distance, or age and grip
strength). If one or both of the variables is measured
on an ordinal or nominal scale, and is rank-ordered,
such as manual muscle testing grades, a Spearman
Correlation should be used.
In the results section of a research article, the
correlation is presented as an r value. R values can
be considered effect sizes. Cohen (1992) provided a
rule of thumb for interpreting the strength of r values within the applied sciences: 0.10 = weak effect/
relationship, 0.30 = moderate effect/relationship,
and 0.50 = large effect/relationship. Correlation statistics provide three types of information: strength
of the relationship, direction of the relationship, and
statistical significance. From the preceding example, consider the relationship of fatigue and quality
of life. If the correlation of these two variables is
r = -0.30, the correlation is moderately strong, the
direction is negative (i.e., higher fatigue means
lower quality of life), and p < 0.05, indicating that
the relationship is statistically significant.
Another way to evaluate the strength of the relationship is to consider the amount of variance
accounted for by the relationship. Variance is calculated by squaring the correlation coefficient. Using
the preceding example, the square of the correlation
results in a variance of r2 = 0.09; 9% of the variance
is accounted for, and 91% is left unaccounted for;
in other words, 91 percent must be attributed to a
factor other than fatigue.
The overlap represents the amount of
variance accounted for by the relationship
between fatigue and quality of life
TABLE 81 Correlation Matrix
Fatigue Depression Strength
Fatigue
Quality of
life
Quality of life -0.40*
-0.60*
0.12
Fatigue
0.30*
-0.25*
Depression
0.02
Strength
FIGURE 8-1 Variance accounted for in the relationship between
fatigue and quality of life in multiple sclerosis.
4366_Ch08_145-162.indd 151
* Statistically significant relationships, p < 0.05.
28/10/16 12:11 pm
152
CHAPTER 8 ● Descriptive and Predictive Research Designs
just one factor (e.g., strength). Perhaps the greatest challenge in regression involves the selection of predictors. It
is critical to include important predictors.
Another important consideration in selecting predictors
is the phenomenon of multicollinearity, a term that refers
to the circumstance in which variables (or, in the case of
regression, predictors) are correlated with one another. In
regression, it is preferable that multicollinearity be kept
to a minimum and that each predictor contribute as much
unique variance as possible. An overlap in predictors suggests that they are measuring the same or similar thing. In
the example, only a moderate amount of multicollinearity appears, as fatigue is likely associated with strength,
because individuals who lack strength will fatigue more
quickly. Similarly, depression and fatigue often co-occur.
To determine the unique variance of a predictor, it is
entered last into the regression equation. Returning to the
example of predicting quality of life, if fatigue were entered
last into the equation and depression and strength were entered previously, one could determine the unique variance
of fatigue. In other words, the unique variance of fatigue is
the amount of variance by which fatigue predicts quality of
life when all of the other predictors are taken into account.
Due to multicollinearity, the unique variance will always be less than the variance accounted for in a simple
correlation. In the example, the variance between fatigue
and quality of life was 0.16. In the multiple regression,
the unique variance of fatigue as a predictor would be
r2 < 0.16, because fatigue overlaps with depression and
strength when predicting quality of life. Figure 8-2 illustrates this more complex relationship.
There are two primary types of regression analyses that
examine multiple predictors: multiple linear regression
and multiple logistic regression.
Unique variance of
depression and
quality of life
Unique variance of
fatigue and
quality of life
Quality of life
Strength
Depression
Fatigue
Shared variance of
fatigue, depression, and
quality of life
FIGURE 8-2 Multiple predictors of quality of life.
4366_Ch08_145-162.indd 152
Understanding Statistics 8-5
Multiple linear regression is an extension of bivariate correlation (i.e., the relationship between two
variables). When the results are presented for multiple linear regression, often the bivariate correlations are presented as well. In the case of multiple
linear regression, consider a set of predictors and
an outcome or criterion. The analysis reveals the
degree to which the set of predictors accounts for
the outcome, as well as the relative importance of
the individual predictors. The results are provided
as a multiple correlation, or R, and a squared multiple correlation, R2 or variance. The R2 change is
the difference in variance accounted for by a second
set of predictors or a single predictor that takes into
account the previous predictors. R2 change is the
unique variance of the last predictor entered into
the equation.
Multiple Linear Regression
In multiple linear regression, the outcome is a continuous variable. Quality of life as measured by a questionnaire is an example of a continuous variable, which
would be analyzed with linear regression. The multiple
refers to multiple predictors. The multiple linear regression reveals the total amount of variance accounted for
by all of the predictors, as well as which predictors are
most important.
From the Evidence 8-3 provides an example of a
multiple linear regression. This study examined predictors of speech recognition in older adults and included
both a correlation matrix and results from a linear regression analysis (Krull, Humes, & Kidd, 2013).
Multiple Logistic Regression
When the outcome and predictors are categorical, a multiple logistic regression analysis is used, and results are
reported in terms of an odds ratio. The odds ratio is a
probability statistic that determines the likelihood that, if
one condition occurs, a specific outcome will also occur.
An odds ratio of 1.0 means there is no difference between
the groups. When the odds ratio is greater than 1.0, there
is a greater chance of experiencing the outcome; when
the odds ratio is less than 1.0, there is a lower chance of
experiencing the outcome.
Table 8-2 shows an odds ratio based on a 2 ⫻ 2 table
in which OR = AD/BC. This example is drawn from a
study of elderly individuals that considered falls as the
outcome and alcohol consumption as the predictor.
To calculate the odds ratio:
OR = 50 ⫻ 40 / 10 ⫻ 20 = 2,000/200 = 10.00
28/10/16 12:11 pm
CHAPTER 8 ● Descriptive and Predictive Research Designs
153
FROM THE EVIDENCE 8-3
Predictors of Speech Recognition
Krull, V., Humes, L. E., & Kidd, G. R. (2013). Reconstructing wholes from parts: Effects of modality, age and hearing loss on word
recognition. Ear and Hearing, 34(2), e14– e23. doi:10.1097/AUD.0b013e31826d0c27.
Relationships (Pearson correlation coefficients) between age, high-frequency pure tone average
(HFPTA), auditory (a_SSN: Speech-shaped noise; a_INT: Interrupted speech; a_FILT: Filtered
speech), visual (v_SSN: Text in Gaussian noise; v_INT: Text masked by bar pattern), and
cognitive measures (MU: Memory updating; SS: Sentence span; SSTM: Spatial short-term
memory) in elderly subjects (pooled data). Statistically significant relationships are indicated by
asterisks.
HFPTA a_SSN
Age
0.20
HFPTA
a_INT
a_FILT
v_SSN
v_INT
MU
SS
SSTM
–0.39*
–0.38*
–0.32*
0.11
–0.11
–0.21
0.04
–0.15
–0.08
0.11
0.20
0.16
–0.24
–0.11
0.11
0.05
0.68**
0.82**
–0.02
0.08
0.13
0.24
0.15
0.81**
0.10
0.11
0.22
0.27
0.19
0.21
0.17
0.31
0.12
0.43**
–0.01
0.12
0.02
0.14
0.25
0.18
0.56**
0.40*
a_SSN
a_INT
0.19
a_FILT
v_SSN
v_INT
MU
Note A: Age is
negatively correlated
with the speech
recognition measures,
and the speech
measures are
intercorrelated;
however, the cognitive
measures are not
related to the speech
measures.
0.33*
SS
**P < 0.01 (2-tailed); *P < 0.05 (2-tailed).
A standard forward stepwise linear regression analysis (SPSS) was used to analyze each of the
three dependent auditory measures (a_SSN: Speech-shaped noise; a_INT: Interrupted speech;
a_FILT: Filtered speech) in elderly adults (pooled data). Age, high-frequency pure tone average
(HFPTA), visual (v_INT: Text masked by bar pattern; v_SSN: Text in Gaussian noise) and
cognitive measures (SS: Sentence span; SSTM: Spatial short-term memory; MU: Memory
updating) were included as independent variables in each analysis. The independent variables
entered the model according to their statistical contribution (F-to-enter criterion) in explaining the
variance in the dependent variable (speech recognition); only significant independent variables
are included in the table.
Dependent
Variable
Independent
Variable(s)
␤
R Square
P
a_SSN
Age
–0.411
0.169
0.009**
a_INT
Age
–0.388
0.151
0.015*
a_FILT
Age
–0.333
0.111
0.038*
**P < 0.01 (2-tailed); *P < 0.05 (2-tailed).
Note B: The
dependent variables
would be considered
the outcome or
criterion, and the
independent variables
would be considered
the predictors.
Note C: Only age is a
significant predictor of
speech recognition.
FTE 8-3 Question How would you interpret the findings from the correlation matrix and linear regression? In other
words, what predicts speech recognition?
4366_Ch08_145-162.indd 153
28/10/16 12:11 pm
154
CHAPTER 8 ● Descriptive and Predictive Research Designs
This example indicates that elderly individuals who
consume more than one drink per day have a tenfold
chance of falling.
Ninety-five percent confidence intervals are typically
presented along with the odds ratio. The confidence interval
is the range in which you would expect the odds ratio to
fall if the study were conducted again. If the 95 percent
confidence interval includes the value of 1.0, the odds ratio
will not be statistically significant. From the Evidence 8-4
provides an example of predicting outcomes for back pain.
TABLE 82 Sample Odds Ratio
Number of Drinks
Per Day
No Falls
Falls
< 1 drink
A 50
B 10
> 1 drink
C 20
D 40
FROM THE EVIDENCE 8-4
Predictors of Improvement in Back Pain
Hicks, G. E., Benvenuti, F., Fiaschi, V., Lombardi, B., Segenni, L., Stuart, M., . . . Macchi, C. (2012). Adherence to a community-based
exercise program is a strong predictor of improved back pain status in older adults: An observational study. Clinical Journal of Pain,
28(3), 195–203. http://doi.org/10.1097/AJP.0b013e318226c411.
Multivariate logistic regression analysis of factors associated with improved back pain status
Odds Ratio
95% Cl
p-value
Lives alone
0.60
0.31–1.18
.140
Currently working
2.14
0.81–5.63
.125
High Depressive Symptoms (GDS>5)
0.47
0.25–0.89
.019
SPPB score >8
1.71
0.88–3.34
.114
Poor Self-Related Health
0.20
0.08–0.51
.001
Numeric Pain Rating Scale
0.94
0.85–1.05
.273
Roland Morris Scale
1.01
0.95–1.07
.786
13.88
8.17–23.59
< .001
Positive rating of trainer
1.13
0.63–2.02
.675
Satisified with hours of operation
1.36
0.67–2.75
.391
Demographic/Social factors
Health Status Factors
Back Pain Factors
Note A: Only
depression,
self-rated health,
and adherence
to a back pain
program were
predictive of
improvements
in back pain.
Accessibility/Attendance Factors*
Adherent to APA program
APA Satisfaction Factors*
GDS = Geriatric Depression Scale; SPPB = Short Physical Performance Battery
*These factors are from the follow-up telephone interview, but all other factors are
from the baseline assessment.
FTE 8-4 Question
4366_Ch08_145-162.indd 154
Which is the strongest predictor in determining outcomes for back pain?
28/10/16 12:11 pm
CHAPTER 8 ● Descriptive and Predictive Research Designs
Predictive Studies Using Group
Comparison Methods
Studies using correlational methods typically involve a
single group of participants. However, predictive studies
may also use group comparison designs. In health-care research, case-control designs and cohort studies are commonly used for answering questions about prognosis and
predicting outcomes.
Case-Control Studies
A case-control design is an observational, retrospective,
cross-sectional study that can be used to answer prognostic research questions concerning which risk factors
predict a condition. This design is commonly used in epidemiological research that is conducted after a condition
has developed. In this type of research design, individuals
who already have a condition constitute one group; they
are matched and compared with individuals without the
condition. There is no random assignment.
The landmark example of a case-control design is research that compared individuals with lung cancer with individuals without lung cancer and identified smoking as a
predictor (Wynder & Graham, 1950; Doll & Hill, 1950).
Because randomization to group was not possible, cigarette companies used the argument that the research design
could not confirm causation. However, the use of very large
samples, replication of the findings, and cohort studies (described later) eventually led to confidence in the link.
In a rehabilitation example, Brennan and colleagues
(2014) examined rheumatoid arthritis (RA) as a risk factor
for fracture in women. In a large study of several thousand
women aged 35 and older, 1.9 percent of individuals with
RA had suffered a fracture, compared with 1.2 percent
of women without RA. This slight increase in risk could
warrant interventions to increase bone strength and/or
reduce falls in women with RA.
The case-control research design uses the same group
comparison described previously for descriptive research.
The difference lies in the research question. Case-control
designs answer prognostic and predictive questions, and
may use analyses that are more suitable for answering predictive questions (which differs from simply describing a
condition). In the preceding RA example, the groups were
compared in terms of differences in frequencies. However, case-control designs are often reported in terms of
hazard ratios, or the likelihood of a particular event occurring in one group compared with the likelihood of that
event occurring in another group.
Cohort Studies
A cohort study is also observational, but differs from a
case-control design in that participants are followed over
time, making this design longitudinal. In cohort studies,
a hypothesized risk factor is identified, and then the study
4366_Ch08_145-162.indd 155
155
Understanding Statistics 8-6
Hazard ratios (HRs) are estimates of risk over
time and are conceptually similar to odds ratios.
A hazards ratio greater than 1.0 indicates a greater
risk, whereas a hazards ratio less than 1.0 indicates a lesser risk. Hazard ratios (as opposed to
risk ratios, which are described later) are used in
case-control designs in which the risk factor is analyzed retrospectively.
For example, a large study that retrospectively
examined the records of 66,583 individuals regarding the risk of ischemic stroke after total hip replacement (Lalmohamed et al, 2012) reported the
following: HR = 4.69; 95% CI = 3.12-7.06, indicating there is a 4.7-fold increased risk for experiencing an ischemic stroke in individuals who undergo
total hip replacement.
follows individuals with and without the risk factor. At a
certain time point in the future, the risk factor is analyzed
to determine its impact on the outcome. For example,
Xiao et al (2013) examined lack of sleep as a risk factor
for developing obesity. For 7.5 years, the researchers followed a large sample of individuals who were not obese at
the onset of the study. This example is considered a prospective cohort study because the research question was
identified before the study began, and individuals were
followed over time to determine who did and who did not
develop the condition. The individuals who consistently
got less than 5 hours of sleep were 40 percent more likely
to become obese than individuals who got more sleep.
These results suggest that sleep interventions may be useful in preventing obesity. As a prospective cohort study in
which problems with sleep occur before obesity, there is
some evidence to support a cause-and-effect relationship.
However, caution should be exercised when drawing conclusions from prospective data, because cohort studies do
not use experimental designs and are therefore weaker in
terms of supporting causative conclusions.
Another type of cohort design is the retrospective
cohort study in which existing records or the client’s
report on past behavior is used to determine if changes
occurred over time. One cohort study retrospectively examined an existing database of individuals with moderate
to severe traumatic brain injury and found that educational attainment was predictive of “disability-free recovery” (Schneider et al, 2014). More specifically, more of
the individuals with disability-free recovery had greater
than 12 years of education prior to the injury. In cohort
studies (both retrospective and prospective), risk ratios
(RRs), also referred to as relative risks, are used to express
the degree of risk for an outcome over time.
28/10/16 12:11 pm
156
CHAPTER 8 ● Descriptive and Predictive Research Designs
EVIDENCE IN THE REAL WORLD
Predicting Outcomes Is Not an Exact Science
Practitioners must be cautious when interpreting predictive study results for their clients. Quantitative studies consolidate the findings of many individuals and, in the process, can lose the distinctiveness of the individual’s situation. When predicting an outcome, the evidence provides important information about the
relationship between variables, but a study can never include all of the factors that contribute to a particular individual’s outcome. Factors such as financial resources, the client’s level of engagement in therapy,
degree of social support, other psychiatric or medical conditions, and the client’s physical condition may not
be factored into a particular study. Therefore, practitioners should avoid making absolute predictions for
clients.
A case in point involves an individual who was involved in a serious cycling accident that resulted in a fractured
scapula, clavicle, and two ribs. She is an avid road cyclist who enjoys participating in hill-climbing races. She asked
her physician if she would be able to participate in an upcoming hill-climb event scheduled for 9 weeks after her
accident. Based on clinical experience and research evidence, the physician said it was unrealistic for her to expect
to even be back on the bike in 9 weeks. Not only did she end up participating in the event, but she improved her
time by 6 minutes over the previous year!
It is likely that you know similar stories of individuals who have defied the odds. In other circumstances, individuals with relatively minor conditions may have outcomes that are more severe than would be predicted based
on the research. Predictive studies can provide direction, but remember that each client is an individual with a
unique life situation.
Understanding Statistics 8-7
The calculation of risk ratios differs from that of
odds ratios (ORs) and hazard ratios (HRs) and is
expressed as:
RR = [A/A + B]/[C/C + D]
Risk ratios are generally lower than odds ratios,
particularly when the events that are measured occur
frequently. When the occurrence of an event is rare or
low, the RR and OR will be similar.
For example, returning to the previous research
example involving falls and alcohol consumption, the
odds ratio was very large, at 10.0. However, if a risk
ratio were calculated from the same data (assuming
the study used a longitudinal design), the RR would
be much smaller, as shown in Table 8-3.
To explain the differences in interpretation of odds ratios, hazard ratios, and risk ratios, it is important to remember the designs in which they are used. In cross-sectional
research such as case-control designs, the odds ratio or
hazard ratio describes the odds of having one condition if
the person has another. It does not take time into account
and is similar to a correlation coefficient. In cohort studies
that use a longitudinal design, risk ratios interpret the risk of
developing one condition if exposed to another, and it takes
time into account.
4366_Ch08_145-162.indd 156
RR = [50/50 + 10]/[20/20 + 40] = 50/60 + 20/20
= 0.83/0.33 = 2.5
TABLE 83 Sample Risk Ratio
Number of Drinks
Per Day
No Falls
Falls
< 1 drink
A 50
B 10
> 1 drink
C 20
D 40
Table 8-4 summarizes the differences among odds
ratios, hazard ratios, and risk ratios. Because one event
occurs before the other, risk ratio analysis provides
stronger support for causation; that is, for conclusions that the earlier event caused the later event. For
example, Strand, Lechuga, Zachariah, and Beaulieu
(2013) studied the risk of concussion for young female
soccer players and found RR = 2.09. As a risk ratio, this statistic suggests that playing soccer is not just associated with
a concussion, but plays a role in causing the concussion.
28/10/16 12:11 pm
CHAPTER 8 ● Descriptive and Predictive Research Designs
EXERCISE 8-1
EXERCISE 8-2
Identifying Study Designs That Answer
Clinical Descriptive and Predictive
Questions (LO1)
Matching the Research Design With the
Typical Statistical Analysis (LO2)
157
QUESTIONS
QUESTIONS
Consider the following list of research questions from a hypothetical clinician working in a Veterans Administration
hospital with individuals who have traumatic brain injury. For each question, identify the type of study in which
the therapist should look for the answer.
1. What cognitive impairments are most closely associated with unemployment after traumatic brain
injury?
2. How many individuals with traumatic brain injury
have sleep disturbances?
3. Is depression a risk factor for homelessness in individuals with traumatic brain injury?
Match the four research designs with the corresponding
typical statistical analysis.
1. Prevalence study
A. odds ratio
2. Predictive study
with a continuous
outcome
B. risk ratio
3. Case-control study
C. multiple linear regression
D. frequencies/percentages
4. Prospective cohort
study
EVALUATING DESCRIPTIVE
AND PREDICTIVE STUDIES
The quality of descriptive and predictive studies cannot be
analyzed using the levels-of-evidence hierarchy applied to
efficacy studies, because the lack of random assignment to
groups and absence of manipulation of the independent
variable create the opportunity for alternative explanations.
For example, in the cohort design study described earlier
TABLE 84 Differences Among Odds Ratios, Hazard Ratios, and Risk Ratios
Statistical Test
Research Design
Interpretation
Example
Odds ratio
Case-control study
Degree to which the
presence of one condition
is associated with another
condition
The degree to which having
diabetes is associated with
carpal tunnel syndrome
Hazard ratio
Retrospective
cohort study
Risk of developing a
condition in one group
over another group (at one
point in time)
Risk of readmission for individuals with stroke who received
high-intensity therapy versus
those who received low-intensity
therapy
Risk ratio
Prospective cohort
study
Risk of developing a
condition in one group
over another group
(cumulatively over time)
Risk of having a concussion
for individuals playing
soccer and versus those who
do not play soccer over a
season
4366_Ch08_145-162.indd 157
28/10/16 12:11 pm
158
CHAPTER 8 ● Descriptive and Predictive Research Designs
concerning sleep and obesity, alternative explanations for the
results are possible. Perhaps it is not lack of sleep per se that
causes increased risk for obesity, but the fact that people who
sleep less may also exercise less or have more time to eat.
An important consideration in evaluating predictive
studies is controlling for alternative explanations. One
way to put controls in place is to gather data on other
predictors and include those alternatives in a regression
analysis or odds ratio. These additional variables can be
compared as predictors or used as control variables and
removed from the variance.
Matching is another strategy to improve the strength
of a nonrandomized design. When two or more groups
are compared, as in a descriptive group comparison study,
case-control, or cohort design, matching the two groups
on important variables at the outset will reduce the possibility that differences other than the grouping variable
are affecting the outcome. For example, when comparing runners with and without stress fractures, matching
individuals in the two groups on characteristics such as
running distance, intensity, frequency, and terrain would
be useful. The challenge in statistically controlling (e.g.,
through linear regression, odds ratio) or matching is identifying the relevant controlling variables ahead of time.
Researchers will never be able to account for every factor
and can miss important variables; however, knowledge of
the research literature and experience with the condition
is helpful in selecting the variables to include.
Generally speaking, prospective studies are stronger
than retrospective analyses. When the researcher identifies the question ahead of time, there is less opportunity
to capitalize on chance findings, and the researcher can
put more controls in place. For example, with a prospective study the researcher can match groups when selecting participants; however, with archival data from existing
records, the researcher must rely on the participants for
whom existing records are available.
Sample size is another important consideration in evaluating descriptive and predictive studies. It is particularly
important to include large samples in epidemiological
research, because the intention of description and prediction is to represent the naturally occurring phenomenon
of a population. Epidemiological research that is intended
to describe incidence and prevalence or to identify risk
factors of a population is more trustworthy if it uses large
samples of hundreds or thousands of participants. Findings from small samples are more difficult to replicate and
less likely to be representative of the population.
As with all research, sampling bias should be considered. The description of the sample should match the
known characteristics of the population. For example, a
sample of married older adults may present a bias and
misrepresent the general population of older adults.
The measurement methods used in descriptive and
predictive studies should also be evaluated. As mentioned previously, response rate is a critical consideration
4366_Ch08_145-162.indd 158
in survey research. Studies with low response rates are
more likely to be biased and less representative of the
response that could be expected if a higher response rate
occurred. In addition, survey methods and self-reported
measures are frequently used in descriptive and predictive studies. When participants must rely on their memory or describe undesirable behaviors, problems with the
reliability of the data should be expected. Research that
includes more objective measurement methods produces
more reliable results. For example, accelerometers present
a more objective measure of physical activity than does
self-report of physical activity. Nevertheless, many constructs of interest to rehabilitation practitioners, such as
fatigue and anxiety, rely heavily on self-report. In these
instances, it is most useful when established measures with
good validity and reliability estimates are used.
LEVELS OF EVIDENCE
FOR PROGNOSTIC STUDIES
The levels-of-evidence hierarchy for efficacy studies
should not be applied to prognostic studies, which observe
existing phenomenon. Variables are not manipulated;
therefore, a randomized controlled trial is not appropriate for answering prognostic questions. For example, a
researcher interested in predicting concussions among
soccer players could not practically or ethically assign individuals to receive or not receive a concussion. Instead, in
a predictive study, the relationship between certain potential predictors (e.g., history of prior concussion, position
on the team, gender, number of games played) and concussions can be studied (correlational design); people with
and without concussions can be compared (case-control
design) to identify potential predictors; or individuals can
be followed over time to identify differences (predictors)
between those who do and do not experience a concussion
(retrospective and prospective cohort design).
The criteria discussed in this chapter—replication,
prospective designs, and longitudinal approaches—are
applied to the levels-of-evidence hierarchy for prognostic
studies. Table 8-5 provides a hierarchy adapted from the
Oxford Centre for Evidence-Based Medicine (2009).
A systematic review that includes two or more prospective cohort studies provides the highest level of evidence
for studies that focus on prediction. A systematic review
combines the results from multiple studies, and replication of results strengthens the findings. The prospective
cohort study provides Level II evidence. This design
follows individuals over time and, with a prospective approach, is able to put desirable controls into place. The
retrospective cohort also uses a longitudinal approach,
which is preferable when attempting to identify risks or
make a prognosis. However, looking back at existing data
makes it more difficult to control potentially confounding
variables. The case-control design, at Level IV, represents
28/10/16 12:12 pm
CHAPTER 8 ● Descriptive and Predictive Research Designs
TABLE 85 Levels of Evidence for Prognostic
Studies
159
total daily physical activity predicts incident AD
and cognitive decline.
Methods
Level of Evidence
Research Design
I
Systematic reviews of
prospective cohort studies
II
Individual prospective
cohort study
III
Retrospective cohort study
IV
Case-control design
V
Expert opinion, case study
a lower level of evidence because the cross-sectional nature of the study does not allow for the identification of
temporal relationships; that is, the researcher does not
know if the condition or risk factor occurred before the
outcome. Finally, expert opinion and case studies are considered to be the lowest level of evidence.
EXERCISE 8-3
Evaluating the Strength
of the Evidence (LO3)
QUESTION
Total daily exercise and nonexercise physical activity was measured continuously for up to 10 days
with actigraphy (Actical®; Philips Healthcare,
Bend, OR) from 716 older individuals without
dementia participating in the Rush Memory and
Aging Project, a prospective, observational cohort study. All participants underwent structured
annual clinical examination including a battery of
19 cognitive tests.
Results
During an average follow-up of about 4 years,
71 subjects developed clinical AD. In a Cox proportional hazards model adjusting for age, sex,
and education, total daily physical activity was associated with incident AD (hazard ratio = 0.477;
95% confidence interval 0.273-0.832). The association remained after adjusting for self-report physical, social, and cognitive activities, as well as current
level of motor function, depressive symptoms, chronic
health conditions, and APOE allele status. In a linear
mixed-effect model, the level of total daily physical
activity was associated with the rate of global cognitive decline (estimate 0.033, SE 0.012, p = 0.007).
Conclusions
A higher level of total daily physical activity is associated with a reduced risk of AD.
CRITICAL THINKING QUESTIONS
Read the following research abstract and evaluate the study
based on the criteria described in this chapter. What are your
general conclusions about the strength of the evidence?
1. What are differences and similarities between descriptive and predictive studies?
Buchman, A. S., Boyle, P. A., Yu, L., Shah, R. C., Wilson, R. S., & Bennett,
D. A. (2012). Total daily physical activity and the risk of AD and cognitive decline in older adults. Neurology, 78(17), 1323-1329. [Epub 2012, April 18].
doi:10.1212/WNL.0b013e3182535d35
2. What is the primary distinction between incidence
and prevalence?
Objective
Studies examining the link between objective
measures of total daily physical activity and incident Alzheimer disease (AD) are lacking. We
tested the hypothesis that an objective measure of
4366_Ch08_145-162.indd 159
28/10/16 12:12 pm
160
CHAPTER 8 ● Descriptive and Predictive Research Designs
3. Would the following factors increase or decrease prevalence rates for a particular condition? (a) an increase
in the survival rates; (b) an increase in the cure rates;
(c) an increase in mortality rates.
4. A comparison between two or more groups with
and without a condition can answer both descriptive
and predictive questions. What is the difference in
how the two types of group comparison studies are
analyzed?
8. What are the differences between correlational designs and group comparisons designs for answering
questions about prognosis?
9. Cohort and case-control studies may also be used to
study the efficacy of an intervention. What types of
groups would be compared in these nonexperimental
designs?
10. What are the levels of evidence for predictive studies?
5. What strategies may be useful for increasing the response rate of a survey?
ANSWERS
6. How does a low response rate affect both the internal
and external validity of a study?
7. What are the primary characteristics of strong descriptive and predictive studies?
4366_Ch08_145-162.indd 160
EXERCISE 8-1
1. Correlational study that uses logistic regression, because
the outcome is categorical in nature: employed versus
unemployed. Odds ratios can then reveal what cognitive
impairments are most predictive of unemployment.
2. Prevalence, because the study concerns how many
individuals have a specific condition, rather than the
number of new cases, which would be incidence.
3. Case-control design, because the question is written
in a way that suggests a comparison of individuals with
TBI, some with depression and some without, and
then examination to determine if there is a difference
in homelessness rates among those two groups. If you
were to follow people over time (e.g., those who are
depressed and those who are not) and identify who becomes homeless, it would be a cohort study.
28/10/16 12:12 pm
CHAPTER 8 ● Descriptive and Predictive Research Designs
EXERCISE 8-2
1.
2.
3.
4.
D. frequencies/percentages
C. multiple linear regression
A. odds ratio
B. risk ratio
EXERCISE 8-3
This cohort study illustrates many of the desirable
qualities of a nonexperimental study: Many potential
confounds or predictors were taken into account when
assessing the impact of physical activity, such as involvement in activities, motor functioning, and depression.
The study was prospective, had a relatively large sample
of more than 700 participants, and utilized many objective measures.
FROM THE EVIDENCE 8-1
Although the reported prevalence is not dramatically different, use of the objective measure of perometry seems
preferable, if possible, because some individuals who
could benefit from treatment may be missed with selfreport. If self-report is the only option, the data from this
study suggest that the more complete Lymphedema and
Breast Cancer Questionnaire provides estimates closer to
perometry.
FROM THE EVIDENCE 8-2
This study indicates that individuals with stroke take
considerably longer to shop than individuals without
stroke; however, little is known about why or what takes
them longer to shop. That is, since stroke is a complex condition with different impairments, the cause of
the difference is unknown. Cognitive concerns, motor
problems, or another factor could be interfering with
performance. Still, knowing that shopping is challenging is useful information for therapists. The therapist
would have to use other sources to determine the specific limitations that make it challenging for an individual with stroke to do shopping.
FROM THE EVIDENCE 8-3
Cognition was not related to speech recognition in this
study of elderly adults, but age was negatively correlated
with the speech outcomes. Not surprisingly, these results
indicate that the older you get, the more problems you
have with speech recognition. However, it is important to
note that the R2 or variance accounted for by age remains
relatively small, at 0.11-0.17, suggesting that factors/
predictors not included in this study account for most of
the variance.
4366_Ch08_145-162.indd 161
161
FROM THE EVIDENCE 8-4
By far the strongest predictor in determining outcomes
for back pain is adherence to the adaptive physical activity program, with an odds ratio of 13.88. Depression and
self-reported health are also identified as significant predictors, with depression and poor health associated with
less recovery. However, the magnitude of these predictors,
at 0.47 and 0.20 respectively, is much less than adherence.
None of the other predictors are statistically significant,
as all have confidence intervals that include 1.0.
REFERENCES
Brennan, S. L., Toomey, L., Kotowicz, M. A., Henry, M. J., Griffiths, H.,
& Pasco, J. A. (2014). Rheumatoid arthritis and incident fracture in
women: A case-control study. BMC Musculoskeletal Disorders, 15, 13.
Buchman, A. S., Boyle, P. A., Yu, L., Shah, R. C., Wilson, R. S., & Bennett,
D. A. (2012). Total daily physical activity and the risk of AD and cognitive decline in older adults. Neurology, 78(17), 1323–1329. [Epub 2012,
April 18]. doi:10.1212/WNL.0b013e3182535d35
Bulley, C., Gaal, S., Coutts, F., Blyth, C., Jack, W., Chetty, U.,
Barber, M., & Tan, C. W. (2013). Comparison of breast cancerrelated lymphedema (upper limb swelling) prevalence estimated
using objective and subjective criteria and relationship with quality
of life. Biomedical Research International, 2013, 807569. [Epub 2013
June 18]. doi:10.1155/2013/807569
Centers for Disease Control and Prevention (CDC). (n.d.). National
Health and Nutrition Examination surveys. Retrieved from http://
www.cdc.gov/nchs/nhanes.htm
Centers for Disease Control and Prevention (CDC). (2012). Prevalence
of stroke in the US 2006-2010. Morbidity and Mortality Weekly Report,
61(20), 379–382.
Centers for Medicare & Medicaid Services (CMS). (n.d.). Research,
statistics, data & systems [home page]. Retrieved from http://www
.cms.gov/Research-Statistics-Data-and-Systems/Research-StatisticsData-and-Systems.html
Chau, N., Chau, K., Mayet, A., Baumann, M., Legleve, S., & Falissard, B.
(2013). Self-reporting and measurement of body mass index in adolescents: Refusals and validity, and the possible role of socioeconomic and health-related factors. BMC Public Health, 13, 815.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Doll, R., & Hill, A. B. (1950). Smoking and carcinoma of the lung.
British Medical Journal, 2, 740–748.
Finchman, J. E. (2008). Response rates and responsiveness for surveys,
standards and the Journal. American Journal of Pharmacy Education,
72, 42–46.
Gu, J. K., Charles, L. E., Bang, K. M., Ma, C. C., Andrew, M. E., Biolanti,
J. M., & Burchfiel, C. M. (2014). Prevalence of obesity by occupation
among US workers: The National Health Interview Survey 2004-2011.
Journal of Occupational and Environmental Medicine, 56, 516–528.
Hicks, G. E., Benvenuti, F., Fiaschi, V., Lombardi, B., Segenni, L.,
Stuart, M., . . . Macchi, C. (2012). Adherence to a community-based
exercise program is a strong predictor of improved back pain status
in older adults: An observational study. Clinical Journal of Pain, 28,
195–203.
Huang, C. T., Lin, W. C., Ho, C. H., Tung, L. C., Chu, C. C., Chou,
W., & Wang, C. H. (2014). Incidence of severe dysphagia after brain
surgery in pediatric traumatic brain injury: A nationwide populationbased retrospective study. Journal of Head Trauma Rehabilitation,
28, 1–6.
28/10/16 12:12 pm
162
CHAPTER 8 ● Descriptive and Predictive Research Designs
Krull, V., Humes, L. E., & Kidd, G. R. (2013). Reconstructing wholes
from parts: Effects of modality, age and hearing loss on word recognition. Ear and Hearing, 32, e14–e23.
Lalmohamed, A., Vestergaard, P., Cooper, C., deBoer, A., Leufkens, H. G.,
van Staa, T. P., & de Vries, F. (2012). Timing of stroke in patients
undergoing total hip replacement and matched controls: A nationwide cohort study. Stroke, 43, 3225–3229.
Lingam, R., Hunt, L., Golding, J., Jognmans, M., & Emond, A. (2009).
Prevalence of developmental coordination disorder using the
DSM-IV at 7 years of age: A UK population-based study. Pediatrics,
123, 693–700.
McKinnon, J. (1991). Occupational performance of activities of daily
living among elderly Canadians in the community. Canadian Journal
of Occupational Therapy, 58, 60–66.
Martin, G. E., Losh, M., Estigarribia, B., Sideris, J., & Roberts, J. (2013).
Longitudinal profiles of expressive vocabulary, syntax and pragmatic
language in boys with fragile X syndrome or Down syndrome. International Journal of Language and Communication Disorders, 48, 432–443.
Meiklejohn, J., Connor, J., & Kypri, K. (2012). The effect of low survey response rates on estimates of alcohol consumption in a general
population survey. PLoS One, 7, e35527.
4366_Ch08_145-162.indd 162
Oxford Centre for Evidence-Based Medicine. (2009). Levels of evidence. Retrieved from http://www.cebm.net/index.aspx?o=1025
Rand, D., Katz, N., & Weiss, P. L. (2007). Evaluation of virtual shopping in the VMall: Comparison of post-stroke participants to
healthy control groups. Disability Rehabilitation, 29, 1719–1720.
Schneider, E. B., Sur, S., Raymont, V., Duckworth, J., Kowalski, R. G.,
Efron, D. T., . . . Stevens, R. D. (2014). Functional recovery after
moderate/severe traumatic brain injury: A role for cognitive reserve.
Neurology, 82(18), 1636–1642. [Epub 2014 April 23]. doi:10.1212/
WNL.0000000000000379
Strand, S., Lechuga, D., Zachariah, T., & Beaulieu, K. (2013). Relative risk for concussion in young female soccer players. Applied
Neuropsychology—Child, 4(1), 58–64. [Epub 2013 December 2]. doi:
10.1080/21622965.2013.802650
Wynder, E. L., & Graham, E. A. (1950). Tobacco smoking as a possible
etiological factor in bronchogenic carcinoma: A study of six hundred
and eight-four proved cases. JAMA, 143, 329–336.
Xiao, Q., Arem, H., Moore, S. C., Hollenbeck, A. R., & Matthews, C. E.
(2013). A large prospective investigation of sleep duration, weight
change, and obesity in the NIH-AARP Diet and Health Study cohort.
American Journal of Epidemiology, 178, 1600–1610.
28/10/16 12:12 pm
“No question is so difficult to answer as that to which the answer is obvious.”
—George Bernard Shaw (1856-1950)–An Irish playwright who explored social problems through his art, especially
issues of social class.
9
Qualitative Designs
and Methods
Exploring the Lived Experience
CHAPTER OUTLINE
LEARNING OUTCOMES
Grounded Theory
KEY TERMS
Ethnography
INTRODUCTION
Narrative
THE PHILOSOPHY AND PROCESS OF QUALITATIVE
RESEARCH
Mixed-Methods Research
PROPERTIES OF STRONG QUALITATIVE STUDIES
Philosophy
Credibility
Research Questions
Transferability
Selection of Participants and Settings
Dependability
Methods of Data Collection
Confirmability
Data Analysis
QUALITATIVE RESEARCH DESIGNS
Phenomenology
CRITICAL THINKING QUESTIONS
ANSWERS
REFERENCES
LEARNING OUTCOMES
1. Describe the philosophy and primary characteristics of qualitative research.
2. Distinguish the types of qualitative designs and their purposes.
3. Determine the characteristics of trustworthiness in a given qualitative study.
163
4366_Ch09_163-182.indd 163
27/10/16 2:09 pm
164
CHAPTER 9 ● Qualitative Designs and Methods
KEY TERMS
artifacts
mixed-methods research
audit trail
narrative research
axial coding
naturalistic inquiry
bracketing
naturalistic observation
code-recode procedure
open coding
confirmability
open-ended interview
connecting data
participant observation
constant comparative
method
phenomenology
constructivism
credibility
dependability
embedding data
ethnography
field notes
focus group
grounded theory
inductive reasoning
informant
life history
member checking
prolonged engagement
purposive sampling
qualitative research
reflexive journal
reflexivity
saturation
selective coding
snowball sampling
themes
thick description
transferability
triangulation
trustworthiness
merging data
INTRODUCTION
A
lthough well-designed, randomized controlled trials are highly valued in evidence-based practice, the
research framework and scientific methods employed in
these studies are not the only way of thinking about or conducting research. Quantitative research comes from a positivist worldview that is based on deductive reasoning. From
this perspective, by reducing phenomena to their smallest
parts and objectively measuring those parts, we can better
understand the world. Nevertheless, there are limitations
to this type of research design. In a randomized controlled
trial, the average or mean of a group is used for the purposes
of comparison. However, practitioners do not provide
services to an average; rather, they work with individuals
whose stories and circumstances may be similar to or very
different from those of the individuals who participated in
the study. Consider the “average” American family: In the
4366_Ch09_163-182.indd 164
2010 census, there were 2.58 members per household (U.S.
Census Bureau, 2012), a characterization that does not accurately describe any actual household and could be very
different from your own.
In contrast, qualitative research acknowledges that
there are multiple realities among individuals, and that the
human experience is complex and diverse. Qualitative
research uses a naturalistic tradition with an emphasis on
understanding phenomena in the real world. This chapter provides information that evidence-based practitioners
can use to better understand qualitative research and better understand their clients. The chapter describes the philosophy and processes of qualitative research and outlines
common qualitative research designs. In qualitative research,
the criteria for determining the strength of the evidence are
based on the concept of trustworthiness. After completing
this chapter, you will understand the concept of trustworthiness in this context and be able to evaluate qualitative
studies.
THE PHILOSOPHY AND PROCESS
OF QUALITATIVE RESEARCH
Like quantitative research, qualitative research studies
are framed around a research question(s), and the resulting reports, articles, or papers include the same major
sections of introduction, methods, results, and discussion. Narrative research articles can be an exception to
this rule, as they usually follow a format that resembles
a story. In qualitative research, the steps of writing and
answering the research question, designing the study, and
collecting and analyzing the data differ substantially from
those steps in quantitative research. Learning the basic
nature of and information about the process of qualitative
research is also useful for reading, interpreting, and evaluating qualitative research.
Philosophy
The philosophical underpinnings of qualitative research
come from naturalistic inquiry, which suggests that a
phenomenon is only understood in context and that multiple perspectives can and do exist and differ among individuals. Imagine your own experience as a student. If
someone interviewed one of your classmates about his or
her motivations regarding a career choice and reflections
on being a student, you would not expect the answers
to mirror your own. Similarly, one student might think
a particular test was difficult, whereas another student
found it easy. Although the conclusions about the test are
very different, they can both still be true.
Qualitative research uses inductive reasoning, in
which data are collected and, based on that data, an
understanding is reached. Health-care professionals
27/10/16 2:09 pm
CHAPTER 9 ● Qualitative Designs and Methods
frequently use inductive reasoning. Therapists interview clients and administer assessments and, based on
this information, begin to understand their clients and
develop intervention plans. In another example, therapists
use inductive reasoning when they evaluate the progress
a client makes to determine if an intervention is working.
Qualitative researchers use data to drive the research
process. In contrast, quantitative research begins with a
hypothesis to be tested (a form of deductive reasoning).
Another important belief in qualitative research is
related to constructivism, which is the philosophy that
our understanding of the world is “constructed.” From
this perspective, an objective reality either does not
exist or cannot be known. Instead, reality is filtered by
our experiences and our past knowledge, so it is always
subjective and interpreted. The constructions that exist
to explain our world are inseparable from the people
who give those constructions meaning. Therefore, from
a qualitative perspective, both study participants and researchers are influenced by their worldview and cannot
be totally unbiased. Thus, the philosophy that forms
the foundation of qualitative research also influences its
processes and practices.
Research Questions
Qualitative research is often used to explore aspects of
practice that are not well understood and for which theory has not been developed. For this reason, qualitative
research questions are very different from quantitative
questions. Qualitative research is aimed at discovering
new information, so hypotheses are avoided; it is essential that there be no preconceived notions about the
phenomenon to be researched. Instead of specific, quantifiable questions, qualitative research questions are broad
and general. In addition, qualitative questions are open to
revision and often change during the course of a study,
exemplifying the discovery perspective that is central to
qualitative research.
Qualitative questions steer clear of terminology such
as cause or relate, because these words suggest an expected
outcome. Instead, they are more likely to use words such
as discover, inquire, describe, and explore. Also, consistent
with an exploratory perspective, qualitative questions
often begin with “what” or “how,” rather than “why.” For
example, in a study of driving perceptions of veterans with
traumatic brain injury (TBI) and posttraumatic stress disorder (PTSD), Hannold et al (2013) asked the following
qualitative questions:
1. How do Veterans describe their current driving habits,
behaviors, and experiences?
2. What do Veterans identify as influences on their driving
habits and behaviors?
3. How insightful are Veterans regarding their driving
behavior?
4366_Ch09_163-182.indd 165
165
4. What, if any, driving strategies do Veterans report that
are related to Battlemind driving, mild TBI, or PTSD
issues (p. 1316)?
Selection of Participants and Settings
Each individual’s lived experience is unique and highly
influenced by the real-world environments in which he
or she lives. For this reason, qualitative research takes a
naturalistic approach to the selection of participants and
settings for study. The description of sample and setting
selection is included in the methods section of a qualitative research article.
Instead of random sampling or convenience sampling,
qualitative research uses purposive sampling, in which
the participants included in the study and the settings
in which qualitative research takes place are selected for
a purpose or a specific reason. The selection of participants will vary depending on the specific purpose of
the research. In some cases, the research question may
necessitate including participants who represent what is
typical. In other instances, the researcher may decide to
include individuals with extreme experiences, or may select participants who represent a variety of experiences. It
depends on the purpose of the study.
Generally speaking, qualitative studies are not designed
for generalizability and involve an in-depth and intense
data collection process; consequently, small numbers of
participants are the norm. The veterans’ driving study
described earlier included only five individuals, who were
selected based on specific criteria related to combat experience and TBI/PTSD diagnosis (Hannold et al, 2013).
Sometimes qualitative researchers use a method known
as snowball sampling, in which the initial participants
are asked to recruit additional participants from their own
social networks. This method is useful because carefully
selected initial participants may be able to reach and recruit individuals whom the researcher does not have a way
to identify and/or contact.
If naturalistic inquiry is the philosophy underlying qualitative research, naturalistic observation is one of its methods. Instead of manipulating variables or environments, as is
done in quantitative research, qualitative research takes place
in a naturalist setting, or real-world environment, and events
are observed as they naturally occur. For example, if a study
is examining work, at least some of the research would be
expected to take place in the workplace. Spending extended
time in the natural environment allows the researcher to
gain access to the daily life of the participant and experience
reality as that individual experiences it. The description in
From the Evidence 9-1 comes from the methods section
of a study (Arntzen & Elstad, 2013) and describes a naturalist
setting. In this study, which examined the bodily experience
of apraxia, participants were observed for extended periods
of time during typical therapy sessions and at home. Videotape was used to capture the data from these sessions.
27/10/16 2:09 pm
166
CHAPTER 9 ● Qualitative Designs and Methods
FROM THE EVIDENCE 9-1
Example of a Naturalistic Setting
Arntzen, C., & Elstad, I. (2013). The bodily experience of apraxia in everyday activities: A phenomenological study. Disability
Rehabilitation, 35, 63–72.
Observation and video recording: Apractic difficulties, as experiences of bodily
movement, are silent, fluctuating, and frequently puzzling. The use of video
observations of rehabilitation sessions provided access to the phenomena and
context for the dialogue with the participants about their apraxia experiences. In
phenomenological studies of body movements, video observations do not aim to
establish an objective basis for validating subjective material, but a close and
detailed description of ongoing processes, subjected to a triangulation that
integrates interview and observational data [25,26]. The observational data were
particularly valuable for capturing the person’s immediate response to the
disturbed actions.
The ADL-activities observed in the rehabilitation units were ordinary morning
routines, breakfast preparation, making coffee or meals. We did not influence the
choice of activity, or when and where it should be carried out; it was as close to
ordinary practice as possible. The activities were performed in the patients’ rooms
and bathrooms, in the therapy kitchen, or other therapy rooms. Four participants
were followed up in their own homes and one in a nursing home. If possible, the
same activities were video filmed at the hospital and at home. Some participants
offered to demonstrate other activities, such as needlework and dinner
preparations. These were also videotaped. The video camera was handheld for
greater flexibility of position, adjustment of angle, and zooming to capture details in
the interaction. It was considered important to capture the therapist and the tools in
use as well as the patient. The video sequences varied from fifteen minutes to one
hour, in total twelve hours of edited video tape.
Note A: Observations took place
during typical practice and in
some cases at home.
FTE 9-1 Question 1 From the standpoint of a qualitative research philosophy, is it a problem that different participants were observed in different settings? Explain your answer.
Methods of Data Collection
Qualitative data collection uses methods that capture
the unique experience of the participant. These methods are described in the methods section of qualitative
research articles. Rather than depending on questionnaires and surveys, a fundamental method of data
4366_Ch09_163-182.indd 166
collection in qualitative research is the open-ended
interview, in which there are no set questions, so the
process directs the questioning. Interviews are typically
done face-to-face, so the interviewer is free to probe,
ask follow-up questions, and take the lead of the interviewee to pursue new and possibly unexpected areas of
inquiry.
27/10/16 2:09 pm
CHAPTER 9 ● Qualitative Designs and Methods
Another method, focus groups, allows multiple individuals to be interviewed at once. In focus groups,
an interview is conducted with a group of individuals
to target a particular topic. Although it may be more
difficult to get in-depth information from each individual, the dynamic nature of a focus group provides
participants with the opportunity to bounce ideas and
thoughts off one another. The group members may
confirm an individual’s experience or provide a different point of view.
The collection of artifacts is another methodology
used in qualitative research. Artifacts are objects that
provide information about the subject of interest. For
example, in a study exploring factors that facilitate return to work for individuals with spinal cord injury,
Wilbanks and Ivankova (2014/2015) asked to see assistive technology that was used in the workplace. Actual
photographs of these artifacts are included in the study
report.
As noted earlier, naturalistic observation is a common
component of qualitative data collection. This type of
observation can take different forms. Qualitative studies should specifically describe the observational process
so that the reader clearly understands how data collection occurred. In some studies the researcher is removed
from the experience and observes from afar, typically
recording observations in the form of field notes, the
most unobtrusive way of recording and observation. In
this instance, the focus is on watching and listening.
Field notes often describe both what is seen and the researcher’s impressions. For example, a researcher who is
interested in the ways in which peers provide support to
one another in a peer-operated drop-in center for individuals with serious mental illness might spend time at
the drop-in center recording observations; however, the
researcher does not personally engage in the activities at
the drop-in center.
Photographs, videotape, and audiotape may also be
used to capture the experience, as these methods have
the potential to provide greater detail and depth of data,
and they may capture information that the researcher
missed. When a study uses multiple methods of data collection, it is more likely to capture the complexity of an
experience.
FTE 9-1 Question 2
4366_Ch09_163-182.indd 167
167
A more immersive form of data collection is participant observation. With participant observation, the researcher engages with the participants in their naturally
occurring activities in order to gain a more in-depth appreciation of the situation. For example, Mynard, Howie,
and Collister (2009) described the process of participant
observation in a study examining the benefits of participating in an Australian Rules Football team for individuals with disadvantages such as mental illness, addiction,
unemployment, and homelessness:
The first author joined the team for an entire season, attending 11 training sessions, 10 of 11 games and the Presentation
Dinner. As a woman, she did not train or play, but assisted as
required with babysitting, preparing the barbeque and first aid.
During games her role was to fill water bottles and run drinks
to players on the field. This enabled her to experience games
from both on-field and off-field perspectives (pp. 268–269).
Qualitative researchers may also collect data by involving participants in creative means of expression, such as
photography, poetry, and music. For example, Tomar and
Stoffel (2014) used photovoice to explore the lived experience of returning veterans who had gone back to college.
Photovoice is a specific methodology in which participants
take photographs and write accompanying narratives to answer particular questions. Figure 9-1 displays an example
of a photovoice created by an individual with mental illness.
Data Analysis
Large amounts of data are generated from all of the methods described in this chapter. A lengthy analysis process
involves identifying patterns within the data that can be
categorized for easy retrieval. These key passages in text,
photographs, or artifacts are coded; in other words, excerpts of the data are given a name. For example, in a study
of physical activity, the category of “barriers” may be used
to identify data that related to obstacles to exercise. The
patterns within the codes are then further analyzed to identify the underlying meaning among the categories. These
patterns are labeled or described in terms of themes. The
themes are what the reader of qualitative research sees.
The results section of a qualitative study reports the
themes, describes what the themes mean, and illustrates
Why is video a particularly useful method of data collection in qualitative research?
27/10/16 2:09 pm
168
CHAPTER 9 ● Qualitative Designs and Methods
“When I became homeless, I went to C.A.S.S. I told them I have issues with behavioral and medical
health. They suggested I go to a different shelter. I was sad, mad, and confused. So I called four
different shelters, and they all turned me away. I believe that this is the norm here in Phoenix. There
needs to be more outreach to our community so that this doesn’t happen to “one more person.”
the themes through actual quotations from the study
participants. For example, the exercise barriers may be
further analyzed with a theme such as “exercise is difficult when life feels out of control.” The results section
would go on to explain the theme and include quotes that
support or illustrate the theme. From the Evidence 9-2
provides an example of one theme identified in the results
section of a qualitative study.
The process of analyzing qualitative data is often
accomplished through specialized software. Different
computer programs are available that code and identify
patterns among the data captured in transcripts of interviews and field notes. The software also sets up the data
in a format that is easy to use, by numbering each line and
providing a system for annotating the data.
EXERCISE 9-1
Distinguishing Characteristics of
Qualitative and Quantitative Data (LO1)
QUESTIONS
Using the following letter choices, identify the following characteristics as most likely:
A. Qualitative
B. Quantitative
C. Both qualitative and quantitative
1. A positivistic philosophy
2. In-depth open-ended interviews
4366_Ch09_163-182.indd 168
FIGURE 9-1 Example of a photovoice. This photovoice describes the
lived experience of an individual in
recovery from mental illness.
3. Articles organized around introduction, methods, results,
and discussion
4. Hypothesis-driven
5. May use computers for data analysis
6. Sampling by identifying individuals who will serve a
particular purpose
7. Results reported as themes
8. Often includes manipulation of the environment
9. Utilizes inductive reasoning
QUALITATIVE RESEARCH DESIGNS
Qualitative research is a broad term that encompasses
many different research designs. This section describes
the qualitative designs that are most commonly employed in health-care research, including phenomenology, grounded theory, ethnography, and narrative.
Table 9-1 summarizes the characteristics of the different qualitative designs. In addition, the mixed-methods
approach, which combines qualitative and quantitative
methods, is described.
Phenomenology
The purpose of phenomenology is to understand and
describe the lived experience from the point of view of the
research participant. As an approach, phenomenology is
particularly useful for situations that are poorly defined or
27/10/16 2:09 pm
CHAPTER 9 ● Qualitative Designs and Methods
169
FROM THE EVIDENCE 9-2
Example of Themes, Description, and Quotations
Leroy, K., Boyd, K., De Asis, K., Lee, R. W., Martin, R., Teachman, G., & Gibson, B. E. (2014). Balancing hope and realism in familycentered care: Physical therapists’ dilemmas in negotiating walking goals with parents of children with cerebral palsy. Physical and
Occupational Therapy in Pediatrics, epub.
In this study, which examined the negotiation of goals between physical therapists and parents of children with cerebral
palsy, the following theme and supporting information were provided:
Note A: The theme.
Balancing Hope and Realism
A consistent theme evident across participants’ accounts was the notion that physical therapists actively worked to balance their
beliefs about the value of walking and their expert knowledge about children's walking prognoses with families’ goals and hopes.
Finding a balance meant considering families’ goals and weighing these against their own beliefs and professional knowledge in
order to negotiate a treatment plan that participants felt was in the best interests of the children.
Participants underscored their beliefs regarding how walking facilitated accessibility in the environment and enabled social
participation in daily activities. Generally, all participants were consistent in believing that walking held, as Olivia stated, a “very
high value” in society. They agreed that walking has a number of benefits including making it easier to function in society and
various physiological benefits.
“ … saying that everybody has to be able to walk and we have to find some way for everyone to be able to walk, I don't agree
with that. I don't think that everyone has to be able to. We know that there are benefits though, to walking in terms of weight
bearing and … another way to explore your environment.” –Emily
Participants defined walking primarily by its functionality and they were less concerned about the appearance of gait or whether
gait aids or assistance were required. They expressed that the benefits to walking could be achieved through different
techniques including walking with a gait aid or push toy. When setting goals with the family, these beliefs served as a backdrop
to their clinical knowledge and experience, as they considered whether walking was realistically achievable for each child.
Physical therapists used available predictors of prognosis, such as the Gross Motor Function Classification System (GMFCS)
(Palisano et al., 1997), and their clinical experience to predict outcomes. Participants were confident that the GMFCS was a
good predictor of walking ability, but not perfect as they had witnessed some children surpassing their initial prognosis. Hence,
they were hesitant to definitively predict a child's walking outcomes or convey these to parents.
“The GMFCS classification system is great, you have a pretty good idea of whether the child is going to walk … but I'd never
say never, cause there are kids that you really thought would never walk and they do walk.” –Riley
Note C: Quotes to
illustrate themes.
Note B: Descriptions
of the theme.
FTE 9-2 Question The quotes used in qualitative research could be equated with what type of data that is often
reported in quantitative research?
4366_Ch09_163-182.indd 169
27/10/16 2:09 pm
170
CHAPTER 9 ● Qualitative Designs and Methods
TABLE 91 Qualitative Designs*
Design
Purpose
Methods
Results
Phenomenology
Describe the lived
experience
Interviews, focus groups, observation, Description of the
bracketing
phenomenon
Grounded theory
Develop a theory that is
derived from the data
Interviews, focus groups, observation, Theory to explain
constant comparative methods
the data
Ethnography
Describe a group of
people or a culture
Immersion in the field, participation
in the culture, examination of
artifacts
Description of a
culture and/or theory
about that culture
Narrative
Tell a story
In-depth interview, collection of
artifacts
Construction of a
coherent story
* Note that these methods are not mutually exclusive; there is often overlap of the designs.
potentially misunderstood, such as the process of coming
to terms with a particular diagnosis or the client’s perspective of the therapy experience. The emphasis is on
description rather than explanation. In phenomenology,
the insider’s perspective is of primary importance. Therefore, researchers must not impose their own beliefs on
the situation.
Because it is impossible to remain totally unbiased,
in phenomenological research, the researcher’s assumptions are identified and “bracketed.” When bracketing,
the researcher uses methods such as keeping a diary or
concept mapping to identify preconceived ideas about
a phenomenon. Then the researcher uses strategies to
keep these biases in abeyance while collecting and interpreting qualitative data. For example, a researcher
who is also a therapist may have personal experiences
with providing the specific therapy experience that is
being studied. The therapist would identify pre-existing
assumptions about the therapy, such as “This therapy
is well tolerated,” or “Most clients respond well to this
therapy.” The therapist would acknowledge the positive
bias and remain open to alternative views from the research participants.
In phenomenology, typically one person or a small,
selected group of participants are included in the study.
The primary methods of data collection include in-depth
interviews, discussion, and observation. The transcripts
and observations are then analyzed for themes. The reporting of the results includes the identified themes
supported by individual statements or observations. For
example, Gramstad, Storli, and Hamran (2014) used a
phenomenological design to examine the experience of
older adults during the process of receiving an assistive
technology device. From the Evidence 9-3 provides an
4366_Ch09_163-182.indd 170
illustration of one theme identified in the study: “Taking
charge or putting up.”
Grounded Theory
The purpose of the grounded theory qualitative research
design is to develop new theory from the data collected.
This type of research was developed partially as a reaction to the positivist perspective of developing theory first
and then collecting data to support or refute that theory.
Grounded theory takes the opposite approach, beginning
without a hypothesis or assumption. Instead, data are collected concerning a general question or topic. The theory
comes out of the data or, in other words, is “grounded”
in the data.
Grounded theory research uses the constant comparative method. Instead of waiting until all of the data
have been collected to do an analysis, some data are collected, and an analysis is performed to determine how
more data should be collected. For example, should additional people be interviewed? What questions should
be asked next? Multiple iterations of data collection and
analysis are conducted until saturation occurs. Saturation means that no new ideas or information are emerging from the data.
Data analysis in grounded theory research is a multistep process that starts with open coding, or identifying
simple categories within the data. Next, the categories are
brought together in a process called axial coding, which
identifies relationships between categories. Finally, selective coding involves the articulation of a theory based on
the categories and their relationships.
In a study of the sensory experiences of children with
and without autism and their effect on family occupations,
27/10/16 2:09 pm
CHAPTER 9 ● Qualitative Designs and Methods
171
FROM THE EVIDENCE 9-3
Example of a Theme
Gramstad, A., Storli, S. L., & Hamran, T. (2014). Older individuals’ experience during assistive device service delivery process.
Scandinavian Journal of Occupational Therapy, 21, 305–312.
Knowing who to call or where to go did not guarantee that the participant contacted
someone to correct difficulties. Putting up with a difficult situation including
hesitating to contact the occupational therapist when the ATD did not work the way
the participant expected it would. For some participants, contacting the
occupational therapist because they needed additional help was considered to
mean that they would be perceived as rude, ungrateful, and subject to negative
consequences. These concerns could make the participants delay or entirely omit
contacting the professional to correct the situation. One of the participants
explained her reluctance to contact the occupational therapist by saying, “I am not
one of those to complain. I never was. Complaining is just not my style.” [H]. To
explain why she delayed contacting the occupational therapist with her ATD that
did not work to her satisfaction another woman said, “I am not like that…I am not
pushy and aggressive. I can’t…nag to get what I want. That is the worst thing I
know.” [E]
Note A: Quotes are used to
illustrate the theme of taking
charge or putting up.
FTE 9-3 Question What important information from the perspective of individuals receiving assistive devices does
this study provide to a therapist?
Bagby, Dickie, & Baranek (2012) interviewed parents
of 12 children, 6 with autism and 6 without autism. In
the study, researchers identified the following simple
categories through open coding: (1) shared meaning
within the family, (2) preparation for what the family
was going to do, (3) the child’s sensory experiences, and
(4) family occupations. Axial coding indicated that family occupations were affected by the shared meaning,
preparation for the experience, and the child’s family
occupations. This led to a theory that explained how
family occupations are impacted differently by children
with and without autism.
In another study using grounded theory, the experience of a rotator cuff injury was studied (Minns
Lowe, Moser, & Barker, 2014). The figure in From the
Evidence 9-4 illustrates the theory derived from the
4366_Ch09_163-182.indd 171
research and demonstrates that the pain associated with this
injury has a wide-ranging impact on all aspects of daily life.
Ethnography
Perhaps the oldest of the qualitative research designs, ethnography has its roots in anthropology. The purpose of
ethnography is to describe a group of people, their behaviors, and/or their culture. In ethnography, the insider’s
viewpoint is sought in the context of understanding the
larger culture or social structure, with the “insider” being
the individuals who are being studied.
Ethnographic researchers become immersed within
the culture being studied. The ethnographer observes
and, once invited, participates in the routines, rituals,
and activities of the group. This process is referred to as
27/10/16 2:09 pm
172
CHAPTER 9 ● Qualitative Designs and Methods
FROM THE EVIDENCE 9-4
Example of Grounded Theory
Minns Lowe, C. J., Moser, J., & Barker, K. (2014). Living with a symptomatic rotator cuff tear “bad days, bad nights”: A qualitative
study. BMC Musculoskeletal Disorders, 15, 228. http://doi.org/10.1186/1471-2474-15-228.
Coping strategies,
Getting on with it,
Acceptance, Other shoulder,
Analgesia, Aids and adaptations
Impact on ADL, Leisure,
Occupation, Emotional impact,
Finance, Social support
Limited movement
Reduced strength
Audible sounds
Broken sleep,
Waking, Night pain,
Daytime tiredness
and irritability
Intense
shocking
surprising
pain
Figure 1. Diagrammatic summary of living with a rotator
cuff tear. This diagram shows how, like ripples spreading out
from a stone thrown into a pool, pain from a symptomatic
rotator cuff tear can impact on, and change, all areas of a
participant’s life.
FTE 9-4 Question Using the examples from From the Evidence 9-3 and From the Evidence 9-4, explain how grounded
theory differs from phenomenology in its purpose.
participant observation. Data are collected with field notes
and recorded via audiotape and/or videotape. Participants are generally referred to as informants because
they provide the researcher with an insider perspective.
Ethnography typically involves spending extended time
in the field until the researcher has collected enough
data to adequately understand the culture or situation.
Through participation and an extended presence, the
researcher can learn the language, behaviors, activities,
and beliefs of the people being studied.
Unlike phenomenology, which describes, ethnography
explains. After collecting data, the ethnographer spends
time outside of the field to reflect on the experience and
4366_Ch09_163-182.indd 172
analyze the data. Explanations and in some cases theory
arise from the data. Although theory developed from ethnography may be specific to the culture studied, some
ethnographic theory is developed to explain the more
general human experience. The researcher may return to
the field to verify the interpretations or theory by asking
informants to review and provide feedback on the analyses.
Using ethnographic methods, Hackman (2011) explored
the meaning of rehabilitation for people with life-threatening
central nervous system tumors, specifically examining their
perspective on the purpose of rehabilitation. As a physical
therapist, Hackman was already embedded within the culture, but asked participants to write narratives about their
27/10/16 2:09 pm
CHAPTER 9 ● Qualitative Designs and Methods
experience with rehabilitation and obtained additional
data through interviews and observations. From the
Evidence 9-5 presents a diagram from the study, which
used the analogy of an umbrella to represent the themes.
The primary theme involved equating rehabilitation to a
return to normalcy, with the physical therapist providing
supports to different dimensions of that return.
173
on the recounting of an event or series of events, often
in chronological order. Many narratives feature epiphanies, or turning points, in an individual’s life. Narrative
research often appears more literary than other forms of
qualitative research, and it is not unusual for a narrative
research article to contain a story line or plot.
Another feature of narrative research is the complexity
and depth of the data collection. Narratives typically provide many details about the individual and present the information like a short story. Although narrative research
can be biographical or autobiographical, with a focus on
the story of a single individual, some narrative research
collects stories from several individuals who have had a
common experience.
Narrative
Narrative research can be characterized as storytelling.
People are natural storytellers, and this method takes advantage of that propensity. Storytelling involves remembrances, retrospectives, and constructions that may focus
FROM THE EVIDENCE 9-5
Themes Represented With an Analogy
Hackman, D. (2011). “What’s the point?” Exploring rehabilitation for people with 10 CNS tumours using ethnography: Patient’s
perspectives. Physiotherapy Research International, 16, 201–217.
Note A: The
overarching theme.
Quality of Life
l talk
Confi
denc
e
H
self-efficacy
touch
role
trust in the
professional
chat
equipment
Physical therapy
structure
holism
Note B: Subthemes
involved in “getting
back to normal”
and the central role
of the physical
therapist.
e
op
e
nc
de
a
sion
fes
Pro
Ind
ep
en
Rehabilitation = “getting back to normal”
emotion
and function
emotional
support
environment
FTE 9-5 Question Why might analogies be a useful technique for presenting themes in qualitative research?
4366_Ch09_163-182.indd 173
27/10/16 2:09 pm
174
CHAPTER 9 ● Qualitative Designs and Methods
Similar to other forms of qualitative research, unstructured or semi-structured interviews are a major source of
data in narrative research. The collection of artifacts, in
the form of journals and letters, is also common. Some
extensive narratives take the form of a life history, in
which an individual’s life over an extended period of time
is examined. With this method, the researcher works to
uncover significant events and explore their meaning to
the individual. From the Evidence 9-6 provides an example of a narrative that uses a life history approach to tell
the story of an individual with a disability who becomes a
Paralympian (Kavanagh, 2012).
FROM THE EVIDENCE 9-6
Example of a Narrative Study
Kavanagh, E. (2012). Affirmation through disability: One athlete’s personal journey to the London Paralympic Games. Perspective in
Public Health, 132, 68–74.
Note A: Identification of
life history approach.
AIMS:
This article explores the personal narrative of a British Paralympic wheelchair
tennis player who experienced a spinal cord injury (SCI) following a motorcycle
accident in 2001 that left her paralyzed from the waist down. The study responds
to the call by Swain and French, among others, for alternative accounts of
disability that demonstrate how life following impairment need not be empty and
meaningless, but can actually reflect a positive, if different, social identity.
METHODS:
This study draws on life history data to investigate the journey of one athlete who
has managed to achieve international sporting success following a life-changing
accident. A pseudonym has not been used for this study as the athlete wanted to
be named in the research account and for her story to be shared.
RESULTS:
A chronological approach was adopted to map the pre- and post-accident recovery
process. The account examines life before the trauma, the impact of the accident,
the process of rehabilitation, and the journey to athletic accomplishment.
CONCLUSIONS:
Negative views of disability can be challenged if disability is viewed in the context
of positive life narratives. The story of one Paralympian demonstrates how an
“ordinary” person has made the most of an extraordinary situation and become a
world-class athlete. This paper demonstrates that in contrast to typical discourse in
disability studies, becoming disabled or living with a disability need not be a
tragedy but may on the contrary enhance life and lead to positive affirmation.
Note B: Described
chronologically, highlighting
major life events.
FTE 9-6 Question Describe the potential epiphany presented in the research abstract.
4366_Ch09_163-182.indd 174
27/10/16 2:09 pm
CHAPTER 9 ● Qualitative Designs and Methods
175
EVIDENCE IN THE REAL WORLD
Storytelling Can Be Useful for Clients
In his important book, The Wounded Storyteller: Body, Illness and Ethics, Frank (2013) discusses the importance of
storytelling as a way for the individual to understand his or her own suffering. So, not only is storytelling important to the researcher or clinician, but it is also useful for the individual who tells the story. Frank describes three
narrative genres: In the restitution narrative, the individual is ill but will be cured and return to a pre-illness state.
In the chaos narrative, there is no linear story, but a sense that illness leads to lack of control and a life of suffering.
In the quest narrative, the individual learns from the experience and develops a sense of purpose.
When individuals experience chronic illness or disability, the condition will not be resolved, so a restitution
narrative is not possible. The quest narrative allows for a story that gives meaning as opposed to despair. Frank
describes three subtypes of quest narratives. In the quest memoir, the story is characterized by acceptance, and
the illness or condition is incorporated into the individual’s daily life. In the quest manifesto, the individual uses
the insights gained from the experience to make things better for others through social reform. The quest automythology is a story of rebirth and reinvention of the self.
As therapists, we can facilitate storytelling by asking open-ended questions and being good listeners. In addition, we can help individuals find meaning from their experiences and identify their own personal quests.
Mixed-Methods Research
In mixed-methods research, both quantitative and
qualitative methods are used within a single study or a
series of studies to increase the breadth and depth of understanding of a research problem. Given that the two
methods arise from seemingly opposing philosophies,
you may question how positivism and constructivism can
be combined. Some mixed-methods researchers adopt
a pragmatic philosophy that takes a “whatever works”
stance and values both subjective and objective information (Morgan, 2007).
With mixed-methods research, there is not simply a
collection of qualitative and quantitative data, but a mixing of the data. Creswell and Plano Clark (2011) suggest
that there are three ways in which data can be combined:
1. The first approach, merging data, involves reporting
quantitative and qualitative data together. For example, the themes and quotes are supported by quantitative statistics.
2. With connecting data, one set of data is used to inform a second set of data, often chronologically. For
example, qualitative data may be used to develop items
for a quantitative measure, which is then examined for
reliability and validity.
3. The third approach, embedding data, involves one
dataset as the primary source of information and a second dataset that serves as a secondary supplement. For
example, a quantitative efficacy study may be supplemented with qualitative data on the experience of the
participants.
EXERCISE 9-2
Matching Purpose Statements
With Qualitative Designs (LO2)
QUESTIONS
The following purpose statements derive from actual qualitative studies. Identify the design suggested by each of the
following statements.
1. This study used the telling of life stories to examine
how engagement in creative occupations informed six
older retired people’s occupational identities (Howie,
Coulter, & Feldman, 2004).
2. This study aimed to describe and explain the responses of young people to their first episode of psychosis
(Henderson & Cock, 2014).
3. This study answered the question, “What kinds of
rituals, contextual circumstances, and personal health
beliefs are operating in the use of music as self-care?”
(Ruud, 2013).
From the Evidence 9-7 provides an example of a
mixed-methods study that examines a telerehabilitation
dysphagia assessment.
4366_Ch09_163-182.indd 175
27/10/16 2:09 pm
176
CHAPTER 9 ● Qualitative Designs and Methods
FROM THE EVIDENCE 9-7
Example of a Mixed- Methods Study
Ward, E. C., Burns, C. L., Theodoros, D. G., & Russell, T. G. (2013). Evaluation of a clinical service model for dysphagia assessment via
telerehabilitation. International Journal of Telemedicine and Applications, 918526. http://doi.org/10.1155/2013/918526.
Emerging research supports the feasibility and viability of conducting clinical swallow examinations
(CSE) for patients with dysphagia via telerehabilitation. However, minimal data have been reported to
date regarding the implementation of such services within the clinical setting or the user perceptions of
this type of clinical service. A mixed methods study design was employed to examine the outcomes of a
weekly dysphagia assessment clinic conducted via telerehabilitation and examine issues relating to
service delivery and user perceptions. Data were collected across a total of 100 patient assessments.
Information relating to primary patient outcomes, session statistics, patient perceptions, and clinician
perceptions was examined. Results revealed that session durations averaged 45 minutes, there was
minimal technical difficulty experienced, and clinical decisions made regarding primary patient outcomes
were comparable between the online and face to face clinicians. Patient satisfaction was high and
clinicians felt that they developed good rapport, found the system easy to use, and were satisfied with the
service in over 90% of the assessments conducted. Key factors relating to screening patient suitability,
having good general organization, and skilled staff were identified as facilitators for the service. This trial
has highlighted important issues for consideration when planning or implementing a telerehabilitation
service for dysphagia management.
Note A: Quantitative
data describe the
results from assessments and surveys.
Note B: Qualitative data from
clinicians are used to identify
factors that contributed to
patient satisfaction and
positive outcomes.
FTE 9-7 Question Which approach to mixing the data is used in this study: merging, connecting, or embedding
data? Explain why.
4. The aim of this study was to explore parental learning experiences to gain a better understanding of the
process parents use in learning to feed their preterm
infant (Stevens, Gazza, & Pickler, 2014).
PROPERTIES OF STRONG
QUALITATIVE STUDIES
Because qualitative and quantitative research are based
in different philosophies and paradigms, different criteria are used to evaluate their assets. Whereas quantitative
research is judged by its internal and external validity (see
4366_Ch09_163-182.indd 176
Chapter 4), qualitative research is more concerned with
trustworthiness, that is, the accurate representation of a
phenomenon. Because qualitative research is a reflection
of the unique experiences and meaning attributed to those
experiences of the participants, it is critical for qualitative
research to provide an accurate representation.
This section describes the major attributes associated
with trustworthiness. As identified in the classic book by
Lincoln and Guba (1985), there are four characteristics
of qualitative research that reflect its trustworthiness:
credibility, transferability, dependability, and confirmability. Evidence-based practitioners should consider
these four characteristics when reading a qualitative
study and determining whether to trust in the findings.
Table 9-2 summarizes information about these criteria
27/10/16 2:09 pm
CHAPTER 9 ● Qualitative Designs and Methods
177
TABLE 92 Criteria Used to Establish Trustworthiness of Qualitative Research
Quantitative
Counterpart
Criteria
Description
Credibility
Accurate representation of
the phenomenon from the
perspective of the participants
Internal validity
Prolonged engagement, extensive
open-ended interviews, triangulation,
member checking
Transferability
Application of information
from a study to other situations
External validity
Thick descriptions
Dependability
Consistency in the data
across time, participants, and
researchers
Reliability
Code-recode, independent coding
by multiple researchers, collection
of data over several time points
Confirmability
Corroboration of the data
Objectivity
Reflexivity
for evaluating qualitative studies. Although each of the
methods listed in the table is primarily associated with a
single criterion, many methods support more than one
aspect of trustworthiness.
Credibility
A qualitative study is considered to have credibility when
it is authentic; that is, when it accurately reflects the reality of the research participants. One measure of credibility
lies in the sample selection. In purposive sampling, the
participants are selected for a purpose. Thus, one evaluation of credibility involves questioning whether the
participants who were selected were consistent with the
study’s purpose.
In addition, credibility requires the researcher to use
methods to ensure that participants respond honestly
and openly. One such method, referred to as prolonged
engagement, involves the researcher spending enough
time getting to know individuals that a sense of trust and
familiarity is established.
Recall that qualitative research relies on open-ended
interviews that allow participants to direct the questioning. Credibility is enhanced when the interviews are extensive and conducted over multiple time periods.
Triangulation is another strategy that can enhance
the credibility of a study. With triangulation, multiple resources and methods are employed to verify and corroborate data; that is, use of several methods leads to the same
results in each case. The inclusion of multiple participants
and/or multiple observers is one way to accomplish triangulation. Triangulation can also be achieved by collecting
data using different methods, such as interviews, focus
groups, and participant observation.
4366_Ch09_163-182.indd 177
Methods
One of the most useful methods for producing credibility is member checking, in which participants are
regularly and repeatedly queried to ensure that the researcher’s impressions are accurate. For example, during
initial interviews, the researcher asks follow-up questions
and/or repeats statements back to the participant. During
data analysis, the researcher shares the themes with the
participant to ensure that the interpretations of the data
are accurate. Qualitative research is a recursive process
that involves back-and-forth dialogue between the researcher’s interpretation of the data and the participant’s
input to promote authenticity. Through member checking, research participants can expand upon the data and/
or correct errors.
Transferability
With qualitative research there is less emphasis on generalizability, particularly in the statistical manner of quantitative research. Its emphasis on the uniqueness of each
participant and context has resulted in some criticism that
the information in qualitative research is most important
in its own right (Myers, 2000). Others take a less strict
position and suggest that, when research situations are
similar, information acquired from qualitative research
may illuminate related situations. In the case of qualitative research, the term transferability is more often used.
Transferability is the extent to which the information
from a qualitative study may be extended, or applied, to
other situations. Regarding transferability, the burden lies
primarily on the practitioner who is interested in applying
the results. The practitioner must examine the research to
determine how similar the research conditions are to the
practitioner’s situation. The researcher may facilitate this
27/10/16 2:09 pm
178
CHAPTER 9 ● Qualitative Designs and Methods
process by providing a thick description, meaning that
enough details are provided about the people, situations,
and settings that readers can determine the transferability
to their own situations.
For example, Bertilsson, Von Koch, Tham, and Johansson (2015) studied the experience of the spouses of stroke
survivors. There is a fair amount of detail about the individuals. The study took place in Sweden over a one-year
time period, both during and after client-centered activities
of daily living (ADLs) rehabilitation was provided. The researchers studied six women and one man; three lived in
a city, and four lived in small villages. However, there is
little detail about the time frame during which the individuals received rehabilitation services. Therapists interested
in applying the results could expect the study findings to
be more transferable if the spouses they were interested in
supporting were also primarily female and if the intervention applied client-centered principles. It would be more
difficult to know how the Swedish experience transfers to
the country in which the therapists are providing services.
Dependability
Dependability is the extent to which qualitative data are
consistent. In qualitative research, it is recognized that perspectives change over time and different individuals have
different perspectives. Although these changes and differences are described and acknowledged, efforts can be made
to show that findings are supported and steady across individuals and time. One way in which consistency can be
examined is in terms of time. When a researcher collects
data over multiple time points, patterns across time can be
identified. In terms of dependability, multiple time points
are preferable to data collected at only one time point.
Another way to examine consistency is across multiple
coders. When coding transcripts, two or more researchers
can code independently and then compare their results.
For example, when reading a transcript, one rater may
code a set of comments from a research participant as
“resignation,” whereas another rater may code the same
comments as “acceptance.” Although related, these two
terms have different connotations. The raters could then
have a discussion and arrive at a conclusion as to which
code is more accurate, or if a different code would be
more effective in characterizing the statements. If two
coders see the same things, there is more consistency in
the identification of themes. A code-recode procedure
may be used with a single researcher. With this process,
transcripts are coded and then set aside for a time. When
the researcher returns, he or she then recodes and evaluates the findings.
Confirmability
Confirmability is the extent to which the findings of a
qualitative study can be corroborated by others. Because
4366_Ch09_163-182.indd 178
there is inherent bias in any human interpretation, efforts
must be taken to ensure that the researcher’s perspectives
do not distort the voice of the participants. Reflexivity,
or the process of identifying a researcher’s personal biases and perspectives so that they can be set aside, is necessary to make the views of the researcher transparent.
One strategy to make biases known involves keeping a
reflexive journal, in which entries are made throughout the research process to describe decisions made by
the researcher to expose the researcher’s own values and
beliefs. These journals are a type of diary that typically
remains private, but reminds the researcher of potential
biases. The researcher may then report these perspectives
in the manuscript so that readers are made aware of the
researcher’s positions. By taking a reflexive approach, it
is more likely that the results reflect the experiences and
beliefs of the participants and not the desires and biases
of the researcher.
An audit trail—the collection of documents from a
qualitative study that can be used to confirm the data analysis of the researcher—promotes confirmability by making the data available to outside sources. Although it is not
feasible to include all data in a manuscript, the researcher
keeps the transcripts, field notes, coding documents, diaries, and so on, and describes the process of data collection and analysis in the manuscript. These materials are
then made available if someone is interested in auditing
the study. Triangulation also supports confirmability, as
multiple sources with similar findings enhance verification of the results.
EXERCISE 9-3
Identifying Strategies to Promote
Trustworthiness (LO3)
QUESTIONS
Read the following excerpt from the methods section of a
study of 275 parents examining their hopes for their children receiving sensory integration therapy (Cohn, Kramer,
Schub, & May-Benson, 2014).
Data Analysis
Content and comparative analyses were used to
analyze parents’ responses to three questions on
developmental-sensory histories intake forms. . . .
First, parents’ responses were entered word for
word into a database. On the basis of a preliminary review of the entire data set, an occupational
therapy student identified initial codes for each
question to preserve the parents’ exact words
for their primary concerns. Two occupational
27/10/16 2:09 pm
CHAPTER 9 ● Qualitative Designs and Methods
therapists with pediatric and qualitative research
experience and familiarity with social participation
and sensory integration research literature, along
with the occupational therapy student, reviewed
the initial codes and compared the codes across
the data set for common themes of meaning.
The research team identified four condensed,
encompassing categorical codes that described
parents’ concerns and hopes and developed specific definitions for each code. Consolidating initial
codes to develop categorical codes involved comparing categorical codes with exemplar quotes for
each category and ensuring that all team members
agreed on categorization of the data.
Three parental explanatory models (EMs),
based on combinations of the four categorical
codes, were developed to describe how parents
conceptualized and linked concerns and hopes
about their children’s occupational performance.
These models were then reviewed and modified
for conceptual congruence by the entire research
team. Further, to check for theoretical relevancy,
the first author (Cohn) conducted a member check
with a mother whose child had sensory processing and learning challenges, participated in this
study, and was receiving occupational therapy
at the member check time. The EMs were presented to this mother, who was particularly insightful, provided the authors with additional data,
and confirmed that in her experience as a parent
and through her interactions with other parents
of children with sensory processing disorder, the
EMs captured what parents care about.
179
2. What is the difference in how theory is used in quantitative versus qualitative research?
3. Which sampling methods are commonly used in qualitative research?
4. What characteristics most distinguish the different
designs in qualitative research?
5. What are the four criteria for judging the strength of
qualitative research, and how does each one make a
study more trustworthy?
1. What approaches are used to address trustworthiness
in this study? What are its strengths and limitations?
CRITICAL THINKING QUESTIONS
6. Although the four criteria for qualitative research
are associated with four criteria for quantitative research, they are not the same thing. How do they
differ?
1. Why are quotes used most often to illustrate themes in
qualitative research articles?
4366_Ch09_163-182.indd 179
27/10/16 2:09 pm
180
CHAPTER 9 ● Qualitative Designs and Methods
ANSWERS
EXERCISE 9-1
1.
2.
3.
4.
5.
6.
7.
8.
9.
B
A
C
B
C
A
A
B
A
EXERCISE 9-2
1.
2.
3.
4.
Narrative (clue: life stories)
Grounded theory (clue: explain)
Ethnography (clue: beliefs, rituals)
Phenomenology (clue: understand)
EXERCISE 9-3
This study included strategies to address dependability
and confirmability, but was less strong in terms of transferability and credibility. Unlike most qualitative research,
this study involved large numbers of participants (275),
which enhances its dependability and confirmability;
the reports can be corroborated, and consistencies can
be examined. The use of multiple coders also enhanced
the dependability of the data, and member checking
improved the study’s credibility. However the large numbers also means that the data were not “thick”; it may
be more difficult to determine the transferability of the
results. Although member checking improves the study’s
credibility, the use of only three open-ended questions
using a written format, with no follow-up, makes the data
less dependable.
FROM THE EVIDENCE 9-1
1. No, this is not a problem. In qualitative research, controlling variables is not a goal when recruiting individuals or settings. Instead, participants are selected
who meet the characteristics of a study. Sampling is
not random, but deliberate or intentional.
2. Video recordings provide a qualitative researcher with
a permanent record of the person-environment interaction. The researcher can capture the person’s experience through his or her words and actions within a
real-world context.
FROM THE EVIDENCE 9-2
The quotes in qualitative research could be equated with
the reporting of means and standard deviations in quantitative research. Both would be considered more basic
4366_Ch09_163-182.indd 180
units of data that make up a larger whole. Both also provide readers with a level of transparency so that they may
draw some of their own conclusions.
FROM THE EVIDENCE 9-3
The individual’s perspective provides you with knowledge
that some clients are reluctant to contact you when they are
having trouble with a device. This suggests that follow-up
appointments or phone calls could be useful to identify potential problems.
FROM THE EVIDENCE 9-4
Phenomenology is focused on simply describing the lived
experience or phenomenon without interpretation,
whereas grounded theory becomes much more involved
in explaining the phenomenon or situation through identifying a theory.
FROM THE EVIDENCE 9-5
Analogies are consistent with a constructivist (qualitative)
point of view because the analogy provides a description
that is based on the researcher’s perspective and allows
readers to make their own meaning from the analogy.
In the case of the umbrella diagram, the picture might
be comparable to a chart in quantitative research, which
pulls multiple findings together in an illustration.
FROM THE EVIDENCE 9-6
The epiphany might be described as moving from seeing
the injury as tragedy to seeing the experience as one that
enhanced the individual’s life.
FROM THE EVIDENCE 9-7
This is an example of embedding data. The qualitative
data are used to supplement and help explain the quantitative data, which present the outcomes and level of satisfaction of clients.
REFERENCES
Arntzen, C., & Elstad, I. (2013). The bodily experience of apraxia in everyday activities: A phenomenological study. Disability Rehabilitation,
35, 63–72.
Bagby, M. S., Dickie, V. A., & Baranek, G. T. (2012). How sensory experiences of children with and without autism affect family occupations. American Journal of Occupational Therapy, 66, 78–86.
Bertilsson, A. S., Von Koch, L., Tham, K., & Johansson, U. (2015). Clientcentered ADL intervention after stroke: Significant others’ experience. Scandinavian Journal of Occupational Therapy, 22(5), 377–386.
Cohn, E. S., Kramer, J., Schub, J. A., & May-Benson, T. (2014). Parents’ explanatory models and hopes for outcomes of occupational
therapy using a sensory integration approach. American Journal of
Occupational Therapy, 68, 454–462.
Creswell, J. W., & Plano Clark, V. L. (2011). Designing and conducting
mixed methods research (2nd ed.). Thousand Oaks, CA: Sage.
27/10/16 2:09 pm
CHAPTER 9 ● Qualitative Designs and Methods
Frank, A. W. (2013). The wounded storyteller: Body, illness and ethics (2nd ed.).
Chicago, IL: University of Chicago Press.
Gramstad, A., Storli, S. L., & Hamran, T. (2014). Older individuals’ experience during assistive device service delivery process. Scandinavian
Journal of Occupational Therapy, 21, 305–312.
Hackman, D. (2011). “What’s the point?” Exploring rehabilitation for
people with 10 CNS tumours using ethnography: Patient’s perspectives. Physiotherapy Research International, 16, 201–217.
Hannold, E. M., Classen, S., Winter, S., Lanford, D. N., & Levy, C. E.
(2013). Exploratory pilot study of driving perceptions among OIF/
OEF veterans with mTBI and PTSD. Journal of Rehabilitation Research and Development, 50, 1315–1330.
Henderson, A. R., & Cock, A. (2014). The responses of young
people to their experiences of first-episode psychosis: Harnessing resilience. Community Mental Health Journal, 51(3), 322–
328. [Epub 2014 July 27 ahead of print]. doi:10.1007/s10597014-9769-9
Howie, L., Coulter, M., & Feldman, S. (2004). Crafting the self: Older persons’ narratives of occupational identity. American Journal of
Occupational Therapy, 58, 446–454.
Kavanagh, E. (2012). Affirmation through disability: One athlete’s personal journey to the London Paralympic Games. Perspective in Public
Health, 132, 68–74.
Leroy, K., Boyd, K., De Asis, K., Lee, R. W., Martin, R., Teachman, G., &
Gibson, B. E. (2015). Balancing hope and realism in family-centered
care: Physical therapists’ dilemmas in negotiating walking goals with
parents of children with cerebral palsy. Physical and Occupational Therapy
in Pediatrics, 35(3), 253–264.
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Newbury
Park, CA: Sage.
4366_Ch09_163-182.indd 181
181
Minns Lowe, C. J., Moser, J., & Barker, K. (2014). Living with a symptomatic rotator cuff tear “bad days, bad nights”: A qualitative study.
BMC Musculoskeletal Disorders, 15, 228.
Morgan, D. L. (2007). Paradigms lost and paradigms regained: Methodological implications of combining qualitative and quantitative
methods. Journal of Mixed Methods Research, 1(1), 48–76.
Myers, M. (2000). Qualitative research and the generalizability question: Standing firm with Proteus. The Qualitative Report, 4(3/4).
Mynard, L., Howie, L., & Collister, L. (2009). Belonging to a communitybased football team: An ethnographic study. Australian Occupational
Therapy Journal, 56, 266–274.
Ruud, E. (2013). Can music serve as a “cultural immunogen”? An explorative study. International Journal of Qualitative Studies of Health
and Well-Being, 8. [Epub. 2013 Aug 7]. doi:10.3402/qhw.v8i0.20597
Stevens, E. E., Gazza, E., & Pickler, R. (2014). Parental experience learning to feed their preterm infants. Advances in Neonatal Care, 14(5),
354–361. doi:10.1097/ANC.0000000000000105
Tomar, N., & Stoffel, V. (2014). Examining the lived experience and factors influencing education of two student veterans using photovoice
methodology. American Journal of Occupational Therapy, 68, 430–438.
U.S. Census Bureau. (2012). Households and families 2010. Retrieved
from http://www.census.gov/prod/cen2010/briefs/c2010br-14.pdf
Ward, E. C., Burns, C. L., Theodoros, D. G., & Russell, T. G. (2013).
Evaluation of a clinical service model for dysphagia assessment via
telerehabilitation. International Journal of Telemedicine and Applications, Article ID 918526. http://dx.doi.org/10.1155/2013/918526
Wilbanks, S. R., & Ivankova, N. V. (2014/2015). Exploring factors
facilitating adults with spinal cord injury rejoining the workforce:
A pilot study. Disability Rehabilitation, 37(9), 739–749. [Epub 2014
Jul 8, ahead of print]. doi:10.3109/09638288.2014.938177.
27/10/16 2:10 pm
4366_Ch09_163-182.indd 182
27/10/16 2:10 pm
10
“Research is creating new knowledge.”
—Neil Armstrong, an American astronaut and first person to walk on the moon
Tools for Practitioners That
Synthesize the Results
of Multiple Studies
Systematic Reviews and Practice Guidelines
CHAPTER OUTLINE
LEARNING OUTCOMES
DATA ANALYSIS IN SYSTEMATIC REVIEWS
KEY TERMS
Meta-Analyses
INTRODUCTION
Qualitative Thematic Synthesis
SYSTEMATIC REVIEWS
PRACTICE GUIDELINES
Finding Systematic Reviews
Finding Practice Guidelines
Reading Systematic Reviews
Evaluating the Strength of Practice Guidelines
Evaluating the Strength of Systematic
Reviews
THE COMPLEXITIES OF APPLYING AND USING SYSTEMATIC REVIEWS AND PRACTICE GUIDELINES
Replication
CRITICAL THINKING QUESTIONS
Publication Bias
ANSWERS
Heterogeneity
REFERENCES
LEARNING OUTCOMES
1. Locate and interpret the findings of systematic reviews and practice guidelines.
2. Interpret effect-size statistics and forest plots.
3. Evaluate the strengths and weaknesses of systematic reviews and practice guidelines.
183
4366_Ch10_183-202.indd 183
27/10/16 2:11 pm
184
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
KEY TERMS
effect size
publication bias
forest plot
replication
grey literature
secondary research
meta-analysis
study heterogeneity
narrative review
systematic review
practice guidelines
thematic synthesis
primary research
INTRODUCTION
T
he amount of information available on a given topic
can make clinicians feel overwhelmed. If you are looking for the answer to a clinical question, how do you know
how many studies exist, and whether you have found the
best unbiased studies with which to answer the question?
Evidence-based practice involves locating, reading, evaluating, and interpreting those studies.
Fortunately, for many clinical questions some of that
work has already been done through reports known as
systematic reviews and practice guidelines. You will remember that systematic reviews provide the highest level of
evidence when they include strong individual studies.
Practice guidelines are specifically designed with the practitioner in mind and include clinical recommendations.
However, as with individual studies, it is important to be
able to evaluate the quality of systematic reviews and
practice guidelines. This chapter introduces you to different types of documents that synthesize a body of research
and help clinicians evaluate and apply the information to
their practice.
SYSTEMATIC REVIEWS
Before evidence-based practice was widely adopted, it was
common for experts in the field of research to publish
narrative reviews that summarized the literature and in
some instances included clinical recommendations. The
limitation of the narrative review format is that the reader
must have a great deal of confidence in the author, trusting that the author has done a thorough and unbiased
report on the subject. Narrative reviews can still be found
in professional literature, but today it is more common to
use a systematic review as the mechanism for synthesizing
a body of research.
A systematic review uses a scientific approach to
answer a research question by synthesizing existing research rather than collecting new data. For this reason,
4366_Ch10_183-202.indd 184
systematic reviews are sometimes referred to as secondary research; the primary research is comprised of
the individual studies included in the review. Systematic
reviews are typically conducted after a body of research
has developed around a topic and by authors who are
not the primary authors of the primary research studies. Systematic reviews are not limited to a single type of
study, but can be used to assimilate the research for any
type of research question. Reviews of assessment tools,
intervention approaches, and descriptive and predictive
studies are common. Systematic reviews are also conducted on qualitative research.
The word systematic describes the ordered process that
is followed to conduct a systematic review, which mirrors the steps of primary research: A research question is
written, the methodology is defined, data are collected,
results are analyzed, and the findings are reported. The
reporting of systematic reviews has become more standardized with Preferred Reporting Items for Systematic
Reviews and Meta-Analyses (PRISMA, 2009). The purpose of PRISMA is to provide guidelines for authors to
increase the transparency and complete reporting of systematic reviews. Many journals now require that authors
use the PRISMA standards when writing their systematic
review articles.
Finding Systematic Reviews
Familiarity with certain databases and search strategies
makes it easier to find systematic reviews. Conducting
a search of PubMed or CINAHL using the limitation
of “systematic review” will constrain your search to
relevant articles. The physiotherapy evidence database
known as PEDro (http://www.pedro.org.au/), OTseeker
(OTseeker.com), and the Evidence Map from the American
Speech-Language-Hearing Association (ASHA) (http://
www.asha.org/Evidence-Maps/) provide searchable databases specific to the physical therapy, occupational
therapy, and speech-language pathology disciplines,
respectively.
In PEDro and OTseeker, once a topic is entered into
the database (e.g., a diagnosis, treatment approach, or
assessment), the relevant articles are listed in order of the
evidence hierarchy, with systematic reviews identified
first. The database provides the reference and abstract, but
the user must use another source to obtain the full text of
the article. For example, at the time of publication, when the
term cognitive remediation was entered into the search
for OTseeker, 14 systematic reviews were identified. In
this example, there are several reasons why there were
so many systematic reviews on the same topic: (1) they
addressed different diagnoses (e.g., brain injury, schizophrenia, bipolar disorder); (2) some were specific to
computer-assisted cognitive remediation; and (3) others
were specific to certain cognitive functions (e.g., executive function). In addition, since all systematic reviews
27/10/16 2:11 pm
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
are included, searchers will find many that are updates
of earlier reviews.
The Cochrane Collaboration is a key source of systematic reviews in all areas of medical practice. As of 2015,
there were close to 6,000 Cochrane systematic reviews
available at the Cochrane library (www.cochrane.org). The
organization conducts rigorous, high-quality reviews that
are made available to the public free of charge. Reviews
included in the Cochrane Collaboration must follow published guidelines for performing the review, such as those
established by PRISMA. Cochrane reviews typically limit
inclusion of studies to randomized controlled trials (RCTs).
Cochrane reviews are regularly updated, and outdated
reviews are removed when newer reviews become available. For rehabilitation practitioners, the limitations in
the available research often result in Cochrane reviews
indicating that few conclusions can be drawn and more
research is needed. For example, a Cochrane review by
Dal Bello-Haas and Florence (2013) examining exercise in amyotrophic lateral sclerosis (ALS) found only
two randomized controlled trials. Although both studies
found greater improvement in function for exercise when
compared with stretching, the review indicates that, because the studies were small (a total of 43 participants),
more research is needed to determine the extent to which
exercise is beneficial for ALS patients. In this case, rehabilitation practitioners may decide to try using exercise
for ALS but continue to look for new research while they
monitor their clients’ progress.
EXERCISE 10-1
Locating Systematic Reviews (LO1)
Conduct a search for systematic reviews that examine the efficacy of vestibular rehabilitation for
vestibular dysfunction at the Cochrane Library,
CINAHL, OTseeker, and PEDro.
QUESTIONS
How many systematic reviews did you find from each source,
and generally what were the conclusions?
1. Cochrane Library
2. CINAHL
4366_Ch10_183-202.indd 185
185
3. OTseeker
4. PEDro
Reading Systematic Reviews
Systematic reviews contain the same major sections as individual research studies: abstract, introduction, methods,
results, and discussion.
The abstract provides a summary of the review. The
Cochrane Library and some other databases sometimes
provide a more extensive summary than is typical of an
abstract. This summary is useful for practitioners because
it often includes clinical recommendations based on the
results of the review.
The introduction is similar to that for an individual
study, presenting background information on the topic,
the need for the review, and typically a purpose statement
or research question.
The methods section is significantly different from
individual studies, primarily because the data collected in
a systematic review are gleaned from other studies, not
from individuals/clients. The sample described in the
methods section is the number of studies and characteristics of those studies. Inclusion and exclusion criteria are
stated to clarify what studies were included in the review.
For example, many reviews of efficacy studies only include
randomized controlled trials. The way that the data were
collected is specified, including the databases searched
and key words used in the search. Other methods of finding studies are included, such as contacting prominent
researchers in a field and reviewing the reference lists of
relevant studies. The methods section also describes who
collected the data and what information was abstracted.
Ideally, the methods section is written in such a way
that the review is reproducible; if you were to utilize the
author’s methods, you would identify the same studies.
The results section begins by describing how many
studies were identified that met the criteria. A table that
summarizes each individual study is an important part
of this section and provides a measure of transparency.
By identifying the specific studies included in the review,
the review author allows the reader to review the primary
sources to verify the interpretation.
The results section often describes the individual studies in a narrative form, but most importantly it synthesizes
27/10/16 2:11 pm
186
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
the information to explain what the evidence as a whole
shows with regard to a research question. For example, the
majority of the studies may support an intervention, or they
may not yield evidence for its efficacy. It is not uncommon
to read a synthesis that includes both positive and negative
findings. An effective synthesis gives the reader an overall
impression of the research (e.g., the intervention is or is not
supported by the evidence). This synthesis is based on several factors, including the number of studies available, the
number of participants in the study, the quality of the individual studies, and the outcomes of each study.
Many results sections of systematic reviews include
data analysis in the form of a meta-analysis. This specific
form of systematic review is explained later in this section.
From the Evidence 10-1 comes from a review of descriptive studies examining speech impairments in Down
syndrome (Kent & Vorperian, 2013). The table lists each
of the studies in the review and summarizes the results.
The discussion section of a systematic review summarizes the results and, most importantly for evidence-based
practitioners, often provides clinical recommendations.
The limitations of the review are also acknowledged.
The abstract presented in From the Evidence 10-2
provides a summary of a systematic review comparing
unilateral with bilateral cochlear implants. It concludes
that, although the evidence has limitations, bilateral
cochlear implants appear to have superior outcomes to
unilateral implants.
Evaluating the Strength of Systematic
Reviews
Although systematic reviews have many characteristics of
a single study, they possess unique characteristics that are
important to understand and consider. When evaluating
systematic reviews, take into account their replication,
publication bias, and heterogeneity.
Replication
A systematic review provides the highest level of evidence
because it synthesizes the results from multiple studies.
Recall that replication (the same or similar study conducted more than once) is a principal element of the
scientific process. A single study provides one piece of
evidence from one researcher or a group of researchers.
Multiple studies enhance confidence in the findings because they provide evidence from several perspectives.
For example, the developer of an intervention often serves
as the lead researcher in an efficacy study. This is desirable
because new interventions should be studied. However,
the developer of the intervention will likely have a level
of commitment to the intervention that would not be expected from other researchers. In addition, the developer
will have extensive and intimate knowledge of the intervention and its implementation; although controls may
be in place, the developer is bound to have a strong bias
4366_Ch10_183-202.indd 186
toward seeing the intervention succeed. Therefore, it is
important that other researchers who are not associated
with development of the intervention replicate efficacy
studies.
A good example of replication of an intervention study
implemented by a researcher invested in the outcome
of this process is dialectical behavioral therapy (DBT).
Marsha Linehan (1993) developed DBT as an intervention for individuals with borderline personality disorder.
She conducted the initial studies of the approach and
continues to be involved in researching it. One of her earlier studies, a well-designed randomized controlled trial
with 101 participants, found that DBT was more effective
than community treatment for reducing suicide attempts
and hospitalization and lowering medical risk (Linehan
et al, 2006). A more recent study by Pasieczny and Connor
(2011) examined DBT for borderline personality disorder
in routine clinical settings in Australia. These researchers
found similar outcomes when comparing DBT with treatment as usual, with DBT resulting in better outcomes,
including decreased suicidal and self-injurious behaviors
and fewer hospitalizations. The results from these two
studies show a consistent pattern, but even more compelling is a Cochrane systematic review of psychological
therapies for borderline personality disorder, which found
the strongest evidence for DBT. The review included
eight studies of DBT: three from the Linehan group, and
five from other researchers (Stoffers et al, 2012). Again,
the review supported positive outcomes, particularly related to a reduction in suicidal and self-injurious behavior.
When multiple studies from a variety of settings replicate
the findings, consumers of research can have greater
confidence in the outcome.
However, the inclusion of multiple studies is not the
only factor to consider when determining the strength
of evidence provided by a systematic review. As a source
of evidence, a systematic review is only as strong as the
studies it contains. If a systematic review of efficacy studies contains no randomized controlled trials, it does not
offer Level I evidence. Likewise, predictive studies must
also be of high quality if the systematic review is to provide strong evidence. Remember that the highest level
of evidence for predictive studies is a prospective cohort
study. Therefore, a systematic review that is designed to
answer a question about prediction must contain multiple prospective cohort studies to be considered Level I
evidence.
Publication Bias
Studies are more likely to be published when they yield
positive results, and this is particularly true with efficacy
studies. Publication bias suggests that researchers are
more likely to submit research, and journals are more
likely to publish research, when the findings are positive. A Cochrane systematic review of clinical trials found
support for publication bias with an odds ratio = 3.90,
95% CI = 2.68-5.68 (Hopewell et al, 2009), meaning
27/10/16 2:11 pm
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
187
FROM THE EVIDENCE 10-1
Example of Table of Studies Included in a Systematic Review
Kent, R. D., & Vorperian, H. K. (2013). Speech impairment in Down Syndrome: A review. Journal of Speech, Language, and Hearing
Research, 56(1), 178–210. http://doi.org/10.1044/1092-4388(2012/12-0148).
Note A: The source
information allows you to
consult the original study.
Table 4: Summary of studies of speech intelligibility in individuals with DS.
Source
Participants
Method
Summary of Results
Barnes et al.
(2009)
See Table 2
Phonological assessment:
Perceptual and acoustic
measures of phonological
accuracy and processes
DS scored lower in accuracy and
processes and used fewer intelligible
words.
van Bysterveldt
(2009)
See Table 2
Transcription:
Determination of percentage of
intelligible utterances in
narratives and connected speech
DS had mean intelligibility scores of
83.1% for narratives and 80% for
connected speech.
Yoder,
Hooshyar, Klee,
& Schaffer
(1996)
N = 8 DS (mean age 83
mos)
N = 8 ATD* (mean age 44
mos)
Matched to DS group on
MLU. No DS, but language
delay
Perceptual assessment:
Intelligibility and length
determined with SALT
transcription program (Miller &
Chapman, 1990)
DS had over 3 times as many multi-word
partially intelligible utterances. However,
overall there were no significant
differences in intelligibility.
Bunton, Leddy,
& Miller (2007)
N = 5 DS (5M) (26 → 39
yrs)
Perceptual assessment:
Intelligibility test and perceptual
scoring by listeners and
transcribers
DS had wide range of intelligibility scores
(41%–75%); Errors that were ranked more
highly than others: cluster-singleton
production word initial and word final,
vowel errors and place of production for
stops and fricatives.
Note B: The table also provides basic
information about the participants, measures,
and results so you do not have to go to the
original source for those details.
FTE 10-1 Question
4366_Ch10_183-202.indd 187
What information do you learn about each individual study by looking at this table?
27/10/16 2:11 pm
188
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
FROM THE EVIDENCE 10-2
Components of a Systematic Review
Van Schoonhoven, J., Sparreboom, M., van Zanten, B. G., Scholten, R. J., Mylanus, E. A., Dreschler, W.A., Grolman, W., & Matt, B.
(2013). The effectiveness of bilateral cochlear implants for severe-to-profound deafness in adults: A systematic review. Otology and
Neurotology, 32, 190–198.
Note A: Identifies databases
searched. Includes only
published literature.
OBJECTIVE: Assessment of the clinical effectiveness of bilateral cochlear
implantation compared with unilateral cochlear implantation or bimodal stimulation
in adults with severe-to-profound hearing loss. In 2007, the National Institute for
Health and Clinical Excellence (NICE) in the U.K. conducted a systematic review
on cochlear implantation. This study forms an update of the adult part of the NICE
review.
DATA SOURCES: The electronic databases MEDLINE and Embase were
searched for English language studies published between October 2006 and
March 2011.
STUDY SELECTION: Studies were included that compared bilateral cochlear
implantation with unilateral cochlear implantation and/or with bimodal stimulation in
adults with severe-to-profound sensorineural hearing loss. Speech perception in
quiet and in noise, sound localization and lateralization, speech production,
health-related quality of life, and functional outcomes were analyzed.
DATA EXTRACTION: Data extraction forms were used to describe study
characteristics and the level of evidence. Data Synthesis: The effect size was
calculated to compare different outcome measures.
CONCLUSION: Pooling of data was not possible because of the heterogeneity of
the studies. As in the NICE review, the level of evidence of the included studies
was low, although some of the additional studies showed less risk of bias. All
studies showed a significant bilateral benefit in localization over unilateral cochlear
implantation. Bilateral cochlear implants were beneficial for speech perception in
noise under certain conditions and several self-reported measures. Most speech
perception in quiet outcomes did not show a bilateral benefit. The current review
provides additional evidence in favor of bilateral cochlear implantation, even in
complex listening situations.
Note B: The full article includes specific
information about each study, including the
level of evidence. Fourteen studies were
included; most were nonexperimental, using
existing individuals with unilateral or bilateral
implants. One was identified as an RCT but
did not include between-group analyses.
FTE 10-2 Question 1 The headings for the abstract do not match the sections described earlier (i.e., introduction,
methods, results, and discussion). Connect the headings in the abstract with each of these sections.
4366_Ch10_183-202.indd 188
27/10/16 2:11 pm
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
that studies with positive findings (i.e., the intervention
was effective) were almost four times more likely to be
published than studies with negative findings (i.e., the
intervention was not effective).
One of the challenges of determining if publication
bias exists involves finding unpublished research. Five
studies were included in the aforementioned review,
each of which used a slightly different methodology to
identify research that had and had not been published.
For examples in one study, the researchers contacted
an institutional review board; from all of the studies approved by that board, researchers identified which studies
had been published and which had not. Using another
approach, the researchers identified a group of studies
funded through the National Institutes of Health and
again determined which of these funded studies resulted
in publication. In the review, some studies suggested that
authors did not submit studies with negative findings because they thought their study was not interesting enough
or the journal was not likely to publish the findings.
The effect of publication bias on evidence-based practice is noteworthy. When a systematic review is conducted,
the intention of the review is to include all research that
meets the inclusion criteria. Although published research
will be easily accessible to the reviewer, limiting a systematic review to published studies will likely mean that
the findings will be skewed toward the positive. Consequently, evidence-based practitioners may expect an intervention to have more positive outcomes than would
reasonably be expected if all studies had been included in
the review. Unfortunately for the systematic reviewer and
evidence-based practitioners, finding the results of studies
that have not been published can be challenging.
When reading a systematic review, consider whether
the reviewers made efforts to collect unpublished studies. If so, the reviewer will report this information in the
methods section of the article. Finding unpublished research could necessitate contacting researchers who are
known to do work concerning the topic of interest or
conducting searches of grey literature. Grey literature
refers to print and electronic works that are not published
commercially or are difficult to find (Grey Literature
Network Service, 1999). Grey literature includes theses,
dissertations, conference proceedings, and government
reports. Inclusion of grey literature will decrease the likelihood or degree to which publication bias is a factor in a
systematic review.
Concerns for publication bias have led to efforts to reduce its effect on the outcomes of systematic reviews. One
important initiative is the registration of clinical trials. On
the website ClinicalTrials.gov, which is sponsored by the
National Institutes of Health, researchers register their
clinical trials. A clinical trial is defined as any prospective
study in which people are assigned to one or more interventions and health outcomes are collected. In 2005, the
International Committee of Medical Journal Editors
made it a condition of publication that researchers
4366_Ch10_183-202.indd 189
189
register their trials when beginning a study (ClinicalTrials.
gov, 2014). This requirement makes it more likely that researchers will register their studies. Although at the time
of this writing most rehabilitation journals do not require
registration, it is considered a best practice in the research
community. In 2008, ClinicalTrials.gov began to include a
results database of registered studies, so individuals conducting systematic reviews can use the registry to locate
studies that may not have been published in a journal. Even
if the results are not included in the registry, the reviewer
will have access to contact information for the researcher
and can follow up on the study findings.
Heterogeneity
A common criticism of systematic reviews concerns their
tendency to synthesize the results of studies that had
substantial differences. Study heterogeneity refers to
differences that often exist among studies in terms of
the samples, interventions, settings, outcome measures,
and other important variables. One reason for study
heterogeneity is that researchers in different settings
rarely collaborate to make their study designs similar.
In addition, researchers often have slightly different
questions or interest in studying different conditions. If
one researcher is building upon previous research, the
newer study will often expand on the previous work,
including making changes. For example, one study may
examine an intervention in a school setting and another
study in a private clinic—but the systematic review
combines the results of both studies to make a general
recommendation.
When evidence-based practitioners read the findings
of a systematic review, the challenge is interpreting and applying the results to a specific clinical site. For example, if
the school setting and private clinic results are combined,
the results do not replicate school practice; it becomes a
comparison of apples with oranges. Systematic reviews may
address study heterogeneity with strict inclusion criteria
that only allow very similar studies; however, with this approach the number of potential studies to include can be
severely limited. Another approach is to provide subanalyses of the results. Using the previous example, the results
may provide a separate analysis for all studies in a school
setting and another analysis for all studies in clinic settings.
As a reader of a systematic review, the evidence-based practitioner can have more confidence in the conclusions if the
studies in the review are very similar or subanalyses were
conducted to describe findings within different conditions.
Study heterogeneity can be particularly problematic
when different outcome measures or statistical analyses
are used or reported. In some cases, the same construct
may be measured (e.g., pain), but different measures are
used. Differences in the reliability and validity of measures can affect the results. In other instances, studies may
examine very different outcomes (e.g., one study looks at
pain and another at strength), making the combination of
27/10/16 2:11 pm
190
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
outcomes across studies invalid. Again, the reviewers may
include subanalyses based on the different outcomes. In
still other instances, the reporting of the data in statistical
terms can vary appreciably, making it difficult or impossible to combine or compare the results.
Evidence-based practitioners take into account the study
conditions within the systematic review, bearing in mind
the fact that those conditions that most resemble their own
practices will be the most applicable. Box 10-1 outlines
the factors to consider when evaluating the strength of a
particular systematic review.
because the strength of the evidence is weak (e.g., studies
lacked a control group), it is premature to conclude that
the intervention caused the positive outcomes. Systematic
reviews may use a quality assessment such as the PEDro
scale to evaluate the quality of each individual study.
Systematic reviews may also combine the findings of each
study to provide a result that reflects the combination of the
results. When quantitative results are combined, the systematic review becomes a meta-analysis. Qualitative results
are also combined in systematic reviews through a process
of thematic synthesis. Meta-analysis and thematic synthesis
are discussed in more detail in the following sections.
DATA ANALYSIS IN SYSTEMATIC REVIEWS
Meta-Analyses
A challenge in reporting the results of a systematic review
involves integrating the results from multiple studies. Many
times the results are summarized narratively and take into
account both the findings and the strength of the evidence.
For example, a review may report that the majority of the
findings indicate that the intervention was effective, but
A meta-analysis is a specific type of systematic review in
which the results of similar quantitative studies (i.e., using
the same theoretical constructs and measures) are pooled
using statistical methods (Fig. 10-1). The first step in a
meta-analysis involves calculating an effect size for each
individual study. This statistic may already be reported
FTE 10-2 Question 2 From the information provided in the abstract and additional comments, how would you
evaluate the strength of the evidence presented in From the Evidence 10-2?
BOX 101 Considerations in Evaluating
Systematic Reviews
Replication:
• Is there an adequate number of studies available from which to draw a conclusion?
• Do the available studies include researchers other than the original developers of the
intervention/assessment?
• Are the studies included in the review of the
highest level of evidence for a single study?
Publication bias: Do the reviewers make efforts to
obtain unpublished research (e.g., theses, dissertations, research reports from clinical trial
registries)?
Heterogeneity of the included studies: Are the studies so different that it is difficult to draw conclusions, or does the reviewer include subanalyses
to address differences in the studies?
4366_Ch10_183-202.indd 190
Systematic review
Meta-analysis
FIGURE 10-1 All meta-analyses are systematic reviews, but not all
systematic reviews are meta-analyses.
27/10/16 2:11 pm
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
in the results section; if not, and if descriptive statistics
are available, typically the reviewer can calculate the
effect size. The second step is to pool the effect size from
all of the studies to generate an overall effect. From the
Evidence 10-3 provides an example of a meta-analysis of
studies examining yoga for low back pain that includes the
pooled effect sizes.
191
There are several advantages to meta-analysis, including increased statistical power, the ability to estimate an
overall effect size, and the ability to systematically compare
differences between studies.
A meta-analysis increases statistical power simply
by increasing the number of participants when compared with a single study. Therefore, a meta-analysis can
FROM THE EVIDENCE 10-3
Effect Sizes From a Meta-Analysis
Holtzman, S., & Beggs, R. T. (2013). Yoga for chronic low back pain: A meta-analysis of randomized controlled trials. Pain Research
and Management: The Journal of the Canadian Pain Society, 18(5), 267–272.
OBJECTIVES: To evaluate the efficacy of yoga as an intervention for chronic low
back pain (CLBP) using a meta-analytical approach. Randomized controlled trials
(RCTs) that examined pain and⁄or functional disability as treatment outcomes were
included. Post-treatment and follow-up outcomes were assessed.
METHODS: A comprehensive search of relevant electronic databases, from the
time of their inception until November 2011, was conducted. Cohen's d effect sizes
were calculated and entered in a random-effects model.
RESULTS: Eight RCTs met the criteria for inclusion (eight assessing functional
disability and five assessing pain) and involved a total of 743 patients. At
post-treatment, yoga had a medium to large effect on functional disability (d =
0.645) and pain (d = 0.623). Despite a wide range of yoga styles and treatment
durations, heterogeneity in post-treatment effect sizes was low. Follow-up effect
sizes for functional disability and pain were smaller but remained significant (d =
0.397 and d = 0.486, respectively); however, there was a moderate to high level of
variability in these effect sizes.
DISCUSSION: The results of the present study indicate that yoga may be an
efficacious adjunctive treatment for CLBP. The strongest and most consistent
evidence emerged for the short-term benefits of yoga on functional disability.
However, before any definitive conclusions can be drawn, there are a number of
methodological concerns that need to be addressed. In particular, it is
recommended that future RCTs include an active control group to determine
whether yoga has specific treatment effects and whether yoga offers any
advantages over traditional exercise programs and other alternative therapies for
CLBP.
Note A: These are pooled effect
sizes for the combined studies.
FTE 10-3 Question The effect sizes were similar for functional disability and pain in both the short and long term.
Why might the authors have concluded: “The strongest and most consistent evidence emerged for the short-term
benefits of yoga on functional disability”?
4366_Ch10_183-202.indd 191
27/10/16 2:11 pm
192
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
potentially detect smaller differences among groups.
Effect sizes may be more relevant than statistical significance, because they provide evidence-based practitioners
with a measure of the impact of the intervention or condition on the client. A pooled effect size may provide a
more accurate representation of that impact, because it
takes into account multiple studies. Finally, as mentioned
earlier regarding heterogeneity, a meta-analysis may be
used to try to explain some of the differences between
studies. For instance, a meta-analysis could compare the
results of studies conducted in school settings with studies
in clinic settings to determine whether there is a difference
in the effect. If indeed there was a difference, and the effect
size was greater in school settings, the practitioner could
conclude that the intervention was more effective when
provided in a school setting than in a clinical setting.
Often the reviewers conducting a meta-analysis will
weight the studies, so that some studies exert more influence in the statistical analysis. In most cases, weighing/
weighting is based on sample size; those studies with more
participants have greater influence on the calculated effect
size of the meta-analysis. This is done intentionally, because larger studies should provide more reliable findings.
Nevertheless, the issue of heterogeneity of studies is still
a potential limitation of meta-analyses. The evidence-based
practitioner examines the differences of included studies to
determine if combining the results was logical.
Several commonly used effect-size statistics are described in Table 10-1.
EXERCISE 10-2
Interpreting Effect Sizes
for Your Clients (LO2)
QUESTIONS
Take the following effect-size conclusions from a metaanalysis and translate them into plain language to
interpret/describe this information for a client.
1. A meta-analysis examining the efficacy of eccentric
viewing training for age-related macular degeneration.
This training involves developing peripheral vision to
substitute for loss of central vision:
“Overall effect size was equal to 0.660 (95% CI,
0.232-1.088, p < 0.05). All five studies had positive results after eccentric viewing training with individuals
with AMD” (Hong, Park, Kwon, & Yoo, 2014, p. 593).
4366_Ch10_183-202.indd 192
2. A meta-analysis comparing very early mobilization
(VEM; i.e., patients are encouraged to get out of bed
and/or start moving within 36 hours of stroke) with
standard care for individuals with stroke reached the
following conclusion:
“VEM patients had significantly greater odds of independence compared with standard care patients (adjusted
odds ratio, 3.11; 95% confidence interval, 1.03–9.33)”
(Craig, Bernhardt, Langhorne, & Wu, 2010, p. 2632).
3. A meta-analysis examining the relationship of aging on
balance found that up to age 70, healthy older adults
have scores on the Berg Balance Scale that are close to
the maximum. However, after age 70:
“The analysis (meta-regression) shows the deterioration of the Berg Balance Scale score with increasing
age, (R2 = 0.81, p < 0.001)” (Downs, 2014, p. 88).
The results of a meta-analysis are often displayed
graphically in the form of a forest plot, which graphs
the effect size of each individual study against a reference point. The graph allows you to step back from the
pooled data and visualize the pattern of results from the
meta-analysis. The vertical line in the forest plot indicates
Understanding Statistics 10-1
The choice of effect size in a meta-analysis will
differ depending on the type of study and statistics used in the primary research. There are many
different effect-size statistics, but some of the most
common in health-care research include:
• Cohen’s d for difference studies with continuous
variables
• Odds ratios or hazards ratios for difference studies with dichotomous variables
• r values and r2 values for studies that use correlations to examine the strength of the relationship
between variables or predictive studies
27/10/16 2:11 pm
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
193
TABLE 101 Common Effect-Size Statistics Used in Meta-Analysis
Effect Size
Description/Explanation
Types of Studies
d-Index
• Represents the magnitude of the difference
between and/or within
groups based on means
and standard deviations
• Intervention studies comparing groups and/or
before and after treatment
differences
• Descriptive studies comparing groups of people
with existing conditions
Odds ratio/
Hazards ratio
• Based on 2 ⫻ 2 contin• Intervention studies
gency tables in which
comparing groups on a
two groups are compared
dichotomous outcome
on two dichotomous
(success/failure) or progoutcomes
nostic studies predicting an
• Explained in terms of the
outcome for two different
odds of success in one
groups
group compared with the
other group
r
• Correlation coefficient that describes the
strength of the relationship measured between
0.0 and 1.0
• Correlation coefficient
squared indicates the
amount of variance
accounted for in the
relationship
• Number will always be
smaller than the simple
r value
r2
• Correlational and predictive studies examining
relationships
Suggested Interpretations
of Size of the Effect*
Small = 0.20*
Medium = 0.50
Large = 0.80
Interpreted as probabilities
= no difference
= twice as likely
= three times as likely (etc.)
Small = 0.10*
Medium = 0.30
Large = 0.50
* Strength of the effect size is based on Cohen (1992).
the point at which there is no effect. For intervention
studies comparing an experimental with a control group,
0 indicates no effect. For odds ratios, hazard ratios, and
risk ratios, 1.0 indicates no effect.
In a forest plot, the effect of each study is plotted with a
square. The sizes of the squares typically vary, with those
studies having greater weight represented with a larger
square. The horizontal lines emanating from the square
indicate the confidence intervals. Remember that a large
confidence interval suggests that the calculated effect size
is a less precise estimate. The overall effect with all studies
pooled is indicated at the bottom of the forest plot. From
the Evidence 10-4 is an example of a forest plot from
4366_Ch10_183-202.indd 193
a meta-analysis of exercise studies focused on increasing
balance confidence in older adults. Consider the number
of horizontal lines, the placement of the squares in relation to the vertical line, and the sizes of the squares when
answering the From the Evidence questions.
Qualitative Thematic Synthesis
The results of multiple qualitative studies may also be synthesized using a systematic review process, although this
process can be controversial. Some researchers contend
that a review of qualitative research is inappropriate, because qualitative research is specific to context, time, and
27/10/16 2:11 pm
194
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
FROM THE EVIDENCE 10-4
Forest Plot
Rand, D., Miller, W. C., Yiu, J., & Eng, J. J. (2011). Interventions for addressing low balance confidence in older adults: A systematic
review and meta-analysis. Age and Ageing, 40(3), 297–306. http://doi.org/10.1093/ageing/afr037.
Study or
Subgroup
Aral 2007
Brouwer 2003
Campbell 1997
Devereux 2005
Lui-Ambrose 2004
Schoenfelder 2004
Southard 2006
Weerdesteyn 2006
Williams 2002
Exercise
Mean
-0.1
7.8
-2.5
0.3
4.9
1.5
12.01
3.5
9.3
SD
Control
Total
2.2
71
18.7
71
10.6 116
1.2
25
14.5
34
20.3
30
21.76 18
18.6
75
19.1
13
Mean
-0.1
4
-6.1
0.03
0.7
-3.5
11.95
-1.48
5.5
SD
Total Weight
2.5
65
12.6
17
10
117
0.5
25
23.7
32
22.4
28
16.32 17
20.3
26
19.3
18
Std. Mean Difference
IV, Random, 95% Cl
18.5%
7.4%
31.3%
6.7%
8.9%
7.8%
4.8%
10.5%
4.1%
0.00 [-0.34, 0.34]
0.21 [-0.32, 0.74]
0.35 [0.09, 0.61]
0.29 [-0.27, 0.85]
0.21 [-0.27, 0.70]
0.23 [-0.29, 0.75]
0.00 [-0.66, 0.67]
0.26 [-0.19, 0.71]
0.19 [-0.52, 0.91]
Total (95% Cl)
453
345 100.0%
Heterogeneity; Tau2 = 0.00; Chi2 = 3.09, df = 8 (P = 0.93); I2 = 0%
Test for overall effect: Z = 2.93 (P = 0.003)
0.22 [0.07, 0.36]
-1
Std. Mean Difference
IV, Random, 95% Cl
-0.5
favors
control
0
0.5
1
favors
exercise
Note A: Descriptive statistics are
provided along with the plot of the
effect size using the standard mean
difference between the groups.
FTE 10-4 Questions
1. Using this visual analysis of the results, how many studies were included in the meta-analysis? Of those studies,
how many suggest that the intervention condition was more effective than the control?
2. Which study in the meta-analysis was the most heavily weighted?
individual. However, as the importance of qualitative research as a source of evidence is increasingly recognized,
the synthesis of such research is also gaining acceptance
(Tong et al, 2012).
Different methods are emerging for the systematic review of qualitative research, including meta-ethnography,
critical interpretive synthesis, and meta-synthesis; however, detailed descriptions of these approaches are beyond
the scope of this chapter. Generally speaking, a thematic
synthesis is at the core of the process of these approaches.
A thematic synthesis uses a systematic process to identify themes from each individual study and then find
4366_Ch10_183-202.indd 194
similar themes in other studies. Each individual study may
use different words to describe analogous ideas. The reviewer looks for corroborating concepts among the studies, which become the synthesized themes.
It is important to note that a strong synthesis of qualitative research does more than merge the findings of several
studies; rather, it finds novel interpretations in the synthesis
of the data (Thomas & Harden, 2008). For example, Lee
et al (2015) examined factors that presented barriers to independent active free play in children. This large systematic review included 46 studies from a variety of disciplines.
Numerous barriers were identified, but the meta-synthesis
27/10/16 2:11 pm
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
found that parents’ safety concerns about strangers, bullies,
and traffic presented the greatest impediment to free play.
From the Evidence 10-5 is an example of a qualitative review that synthesizes studies examining the impact
of the environment on youth with disabilities.
PRACTICE GUIDELINES
Practice guidelines, also referred to as clinical guidelines
or clinical practice guidelines, provide recommendations for
practitioners to address specific clinical situations. For
example, practice guidelines may provide intervention
recommendations for specific diagnoses or describe the
optimal assessment process for a particular condition.
195
Unlike single studies and most systematic reviews,
practice guidelines are, as a rule, developed by organizations, such as the American Occupational Therapy
Association (AOTA), the American Physical Therapy
Association (APTA), and ASHA. As a member of a professional organization, you may have free access to practice
guidelines, or they may be available at a cost. For example, AOTA publishes numerous practice guidelines on
topics such as adults with low vision, adults with serious
mental illness, and children and adolescents with autism
(AOTA, n.d.). The Orthopaedic Section of the American
Physical Therapy Association (APTA, n.d.) publishes practice guidelines in the Journal of Orthopaedic and Sports Physical
Therapy, including guidelines on neck pain and impairments
FROM THE EVIDENCE 10-5
A Qualitative Systematic Review
Kramer, J. M., Olsen, S., Mermelstein, M., Balcells, A., & Lijenquist, K. (2012). Youth with disabilities perspectives of the environment
and participation: A qualitative meta-synthesis. Child Care and Health Development, 38, 763–777.
Note A: The process of reviewing
the studies was meta-synthesis.
Meta-syntheses can enhance our knowledge regarding the impact of the environment on the participation
of youth with disabilities and generate theoretical frameworks to inform policy and best practices. The
purpose of this study was to describe school-aged youth with disabilities' perspectives regarding the
impact of the environment and modifications on their participation. A meta-synthesis systematically
integrates qualitative evidence from multiple studies. Six databases were searched and 1287 citations
reviewed for inclusion by two independent raters; 15 qualitative articles were selected for inclusion. Two
independent reviewers evaluated the quality of each study and coded the results section. Patterns
between codes within and across articles were examined using a constant comparative approach.
Environments may be more or less inclusive for youth with disabilities depending on others'
understanding of individual abilities and needs, youth involvement in decisions about accommodations,
and quality of services and policies. Youth implemented strategies to negotiate environmental barriers
and appraised the quality of their participation based on the extent to which they engaged alongside
peers. This meta-synthesis generated a framework illustrating the relationship between the environment,
modifications and participation, and provided a conceptualization of participation grounded in the lived
experiences of youth with disabilities. Findings reveal gaps in current knowledge and highlight the
importance of involving youth with disabilities in decision making.
Note B: A new framework was developed
from the synthesis, indicating that this review
generated something new as opposed to a
summary of the available evidence.
FTE 10-5 Question The abstract indicates that two independent reviewers evaluated the quality of the evidence
and coded the results. Why might it be useful to have more than one reviewer of a systematic review?
4366_Ch10_183-202.indd 195
27/10/16 2:11 pm
196
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
in knee movement coordination. The recommendations for
AOTA’s practice guidelines on adults with low vision are
contained in From the Evidence 10-6.
Practice guidelines may also be developed by nonprofit organizations that focus on a specific condition,
such as the Alzheimer’s Association, the National Stroke
Foundation, and the Brain Trauma Foundation. Practice
guidelines may or may not have a strong evidence base.
The best practice guidelines include the highest level of
evidence available on the topic of interest. The evidence
is then combined with expert opinion, and in some
cases the client’s perspective, to help translate research
to practice. Practice guidelines are often more general
than systematic reviews and may provide guidance for
practitioners in a discipline to treat a particular condition (e.g., stroke, autism, shoulder pain), rather than
examining a specific intervention. In addition, practice
guidelines are focused on providing recommendations
to practitioners (as opposed to simply reporting the evidence); thus, they often incorporate expert opinion and
client perspectives into the process of developing the
guidelines (Fig. 10-2).
FROM THE EVIDENCE 10-6
Practice Guidelines Recommendations From the AOTA
Regarding Occupational Therapy for Low Vision
Kaldenberg, J., & Smallfield, S. (2013). Occupational therapy practice guidelines for older adults with low vision. Bethesda, MD:
AOTA Press.
Recommended
• Use of problem-solving strategies to increase participation in activities of daily living
(ADLs) and instrumental activities of daily living (IADLs) tasks and leisure and social
participation (A)
• Multicomponent patient education and training to improve occupational performance
(A)
• Increased illumination to improve reading performance (B)
• Increased illumination to improve social participation (B)
• Stand-based magnification systems to increase reading speed and
duration (B)
• Patient education programs to improve self-regulation in driving and
community mobility (B)
• Use of bioptics to improve simulated and on-road driving skills as well as outdoor
mobility skill (B)
• Contrast and typeface (sans serif), type size (14–16 points), and even spacing to
improve legibility and readability of print (B)
• Use of contrast, for example, yellow marking tape, colored filters, using a
white plate on a black placemat, to improve participation in occupations (C)
• Use of low-vision devices (e.g., high-add spectacles, nonilluminated and illuminated
handheld magnifiers, nonilluminated and illuminated stand magnifiers, high plus
lenses, telescopes, electronic magnifiers [such as closed-circuit TVs]) to improve
reading speed and reduce level of disability when performing ADL tasks (C)
• Eccentric viewing training to improve reading performance (C)
• Eccentric viewing in combination with instruction in magnification to improve reading
(C)
• Eccentric viewing completed with specific software programs for near
vision and ADLs (C)
• Use of optical magnifiers versus head-mounted magnification systems to improve
reading speed (C)
• Use of sensory substitution strategies (e.g., talking books) to maintain
engagement in desired occupations (C)
• Use of contrast to improve reading performance: colored overlays (I)
• Use of spectacle reading glasses to improve reading performance (I)
• Use of organizational strategies to compensate for vision loss (I)
4366_Ch10_183-202.indd 196
No
Recommendation
Not
Recommended
• Colored
overlays do not
improve reading
performance (B)
• Preferential use
of either
binocular or
monocular
viewing for
reading
performance (I)
• Use of a specific
light source (I)
Note A: This practice
guideline is based on a
systematic review. Each
recommendation is followed
by the level of evidence
supporting it.
27/10/16 2:11 pm
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
197
*Note: Criteria for levels of evidence are based on the standard language from the Agency for Healthcare Research and Quality
(2009). Suggested recommendations are based on the available evidence and content experts' clinical expertise regarding the
value of using the intervention in practice.
Definitions:
Strength of Recommendations
A–There is strong evidence that occupational therapy practitioners should routinely provide the intervention to eligible clients. Good
evidence was found that the intervention improves important outcomes and concludes that benefits substantially outweigh harm.
B–There is moderate evidence that occupational therapy practitioners should routinely provide the intervention to eligible clients.
At least fair evidence was found that the intervention improves important outcomes and concludes that benefits outweigh harm.
C–There is weak evidence that the intervention can improve outcomes, and the balance of the benefits and harms may result
either in a recommendation that occupational therapy practitioners routinely provide the intervention to eligible clients or in no
recommendation as the balance of the benefits and harm is too close to justify a general recommendation.
D–Recommend that occupational therapy practitioners do not provide the intervention to eligible clients. At least fair evidence
was found that the intervention is ineffective or that harm outweighs benefits.
I–Insufficient evidence to determine whether or not occupational therapy practitioners should be routinely providing the
intervention. Evidence that the intervention is effective is lacking, of poor quality, or conflicting and the balance of benefits and
harm cannot be determined.
FTE 10-6 Question
Which guidelines are based on the strongest research evidence?
EVIDENCE IN THE REAL WORLD
Clinical Uses for EBP Guidelines
Evidence-based practice (EBP) guidelines can serve many clinical purposes. They can be used in quality improvement
activities to ensure that best practices are implemented. Insurance companies often use practice guidelines to make
decisions about reimbursement for therapy services. In some cases, denials from insurance companies can be addressed
and overturned by use of an evidence-based practice guideline. In addition, practice guidelines can be used to educate
clients and families about best practice. Familiarity with available practice guidelines allows the evidence-based practitioner to make better decisions about intervention options that are both effective and reimbursable.
Finding Practice Guidelines
Like systematic reviews, databases such as PubMed and
CINAHL can be searched with limitations to find practice guideline documents. In PubMed, you can limit
your search to “practice guidelines” under article type; in
CINAHL the limit of “practice guidelines” can be found
under publication type.
A major source of practice guidelines is the National
Guideline Clearinghouse (NGC) at www.guidelines.gov.
4366_Ch10_183-202.indd 197
This website, developed by the Agency for Health Care
Research and Quality of the U.S. Department of Health
and Human Services, is an easy-to-search database of
practice guidelines for all areas of health care. The guidelines included on the NGC site must meet the following
criteria:
• Includes recommendations to optimize patient care
• Developed by a relevant health-care organization and
not an individual
27/10/16 2:11 pm
198
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
Review of the
research
Expert opinion
Perspectives of
clients
Practice guidelines
FIGURE 10-2 Practice guidelines use research evidence, opinions
from experts in the field, and the perspectives of clients to make recommendations for clinical practice.
• Based on a systematic review of the evidence
• Includes an assessment of benefits and harms of recommended care and alternative approaches
• Developed in the previous 5 years (U.S. Department of
Health and Human Services, 2014)
Evaluating the Strength of Practice
Guidelines
The development of practice guidelines is a complex and
lengthy process that requires extensive resources. In part
for this reason, practice guidelines are characteristically
developed by organizations instead of individuals. In addition, when developed by an organization, the practice
guidelines then carry the endorsement of that organization. One criterion to use when evaluating the quality of
a practice guideline is to consider the reputation and resources of the organization that created it. In addition, the
author should be free of possible conflicts of interest. For
example, a practice guideline is suspect if the organization
that created it can benefit financially from adoption of an
assessment or intervention recommended therein.
The time frame within which the practice guideline
was developed and published is another important consideration. Practice changes quickly, so practice guidelines can quickly become outdated. In addition, because
it takes a long time to develop a good practice guideline,
the evidence used to develop the guidelines may become
outdated. Review the reference list or table of studies in
the practice guideline to evaluate the timeliness of the research. You might also want to conduct your own search
on the topic to determine whether any important studies
have been completed since the most recent research cited
in the guidelines.
The specific population and setting for which a practice guideline is developed are also important. For example, practice guidelines for community-based practice
would not be relevant to practice in hospital settings. In
4366_Ch10_183-202.indd 198
addition, practice guidelines for shoulder rehabilitation
will differ for athletic injuries and stroke.
Practice guidelines should be transparent about the
process used to review and evaluate the evidence, and the
best practice guidelines follow the process of a systematic review. The recommendations provided in a practice guideline will carry more weight than the report of
a single RCT because the recommendations in a practice guideline are based on multiple high-level studies
with similar results. With replication, the strength of
the evidence is greatly increased. It is easier to arrive at a
consensus for a practice recommendation when there is
consistency across numerous, well-designed studies.
Practice guidelines should undergo a rigorous review
process that involves multiple components and many
individuals. Experts in the field, often individuals who
have published on the topic, typically constitute the initial
review board that develops the methodology, gathers and
evaluates the evidence, and drafts the initial recommendations. From there, a group of external reviewers who are
independent of the initial process should review the draft
results and provide critical feedback. The external reviewers should represent a diverse constituency of researchers,
practitioners, and clients.
The recommendations provided within practice guidelines should be stated in such a way that practitioners are
given information that is helpful in determining whether
a recommendation should be adopted in their practices.
This includes not only the level of evidence associated
with each recommendation, but also the potential impact
of the recommendation (e.g., effect size), and the applicability and generalizability of the recommendation.
The Appraisal of Guidelines for Research and Evaluation (AGREE) Instrument II was developed by an
international team as a standardized tool for evaluating
practice guidelines (Brouwers et al, 2010). This tool examines the scientific rigor of the guidelines, involvement
of relevant stakeholders, and transparency and clarity in
the report. Scores can range from 12 to 84. Guidelines are
not expected to earn perfect scores, but practice guidelines
that have been reviewed with the AGREE II instrument
and receive a high score can be considered to be strong
guidelines. Box 10-2 reviews the basic considerations in
evaluating the strength of practice guidelines.
EXERCISE 10-3
Evaluating a Practice Guideline (LO3)
Access the following practice guideline on alternative approaches to lowering blood pressure. It is
available as a free full-text document.
Brook et al and the American Heart Association Professional Education
Committee of the Council for High Blood Pressure Research, Council
on Cardiovascular and Stroke Nursing, Council on Epidemiology and
27/10/16 2:11 pm
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
Prevention, and Council on Nutrition, Physical Activity. (2013). Beyond
medications and diet: Alternative approaches to lowering blood pressure: A
scientific statement from the American Heart Association. Hypertension, 61,
1360-1383.
QUESTIONS
Examining each of the considerations, how would you evaluate the practice guidelines on the following points?
1. Reputation and resources of the organization
2. Timeliness
3. Strength of the research evidence
4. Review process
5. Recommendations include supporting evidence
BOX 102 Considerations in Evaluating
Practice Guidelines
• The reputation and resources of the developing
organization
• Timeliness in terms of publication of the practice
guidelines and the evidence included within the
guidelines
• Strength of the research evidence
• Rigorous review process that includes experts
and clients, when appropriate
• Each recommendation includes information on
level of supporting evidence, potential impact,
and generalizability
4366_Ch10_183-202.indd 199
199
THE COMPLEXITIES OF APPLYING
AND USING SYSTEMATIC REVIEWS
AND PRACTICE GUIDELINES
Systematic reviews and practice guidelines are useful
tools for health-care practitioners, because they condense
large amounts of research evidence and provide direction for practitioners. However, the other components
of evidence-based practice—practitioner experience
and client preference—are still essential to good clinical
decision-making. Even with strong research evidence,
practitioners need to determine whether the recommendations of systematic reviews and practice guidelines
are relevant and applicable to their particular situation.
For example, a systematic review of health promotion
programs for people with serious mental illness recommends that weight-loss interventions be provided for at
least 3 months (Bartels & Desilets, 2012). If the practitioner works in a setting with an average length of stay of
4 weeks, the evidence-based interventions described in
the review will need to be adapted to the situation.
Although clients should be involved in the decisionmaking process and informed of systematic reviews and
practice guidelines, they should also be offered alternative
approaches, when available. Guidelines and reviews may
simplify the decision-making process for the practitioner,
but it is important to remember that providing the best
care is both an art and a difficult process. The best practices
involve decisions that are made with the practitioner and
client working together (Montori, Brito, & Murad, 2013).
For example, within the deaf community there is controversy over the use of cochlear implants. Although cochlear
implants may provide profound improvements in hearing, some individuals elect not to receive the implants because they find sign language to be a more effective means
of communication that is respectful of their deaf culture.
When cochlear implants are used in children, decisions
still need to be made regarding the use of sign language
versus oral language in parent-child interactions (Bruin &
Nevoy, 2014).
The integration of research evidence, practitioner experience, and client preferences makes for the best clinical
decisions. This process is explained in Chapter 11.
CRITICAL THINKING QUESTIONS
1. When is a systematic review not Level I evidence?
27/10/16 2:11 pm
200
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
2. Why might different systematic reviews come to different conclusions?
8. What are the characteristics of a strong practice
guideline?
3. Why does publication bias exist?
9. Why should practitioners augment information from
practice guidelines with practitioner experience and
client preferences?
4. Why is heterogeneity a potential problem in systematic
reviews?
ANSWERS
EXERCISE 10-1
These answers are based on a search conducted in
September 2015. You will likely find more reviews.
5. What effect-size statistic would you expect to find
in a meta-analysis of an efficacy study? A predictive
study?
6. How are data synthesized from qualitative studies?
1. Cochrane—Two reviews were found, but only one
focused on vestibular rehabilitation for vestibular
dysfunction. The review indicated growing evidence
to support its use.
2. CINAHL—Two reviews were found, including the
Cochrane review and one other review, which also
found positive outcomes for vestibular rehabilitation.
3. OTseeker—There were no reviews specific to vestibular rehabilitation for vestibular dysfunction.
4. PEDro—There were seven reviews in English, one
in German, and one in Portuguese, and the reviews
included the Cochrane review. The German and Portuguese reviews included English abstracts. Overall,
the results indicated positive outcomes. One review
compared the Epley maneuver and vestibular rehabilitation and found the Epley maneuver more effective at
1 week but not at a 1 month follow-up.
EXERCISE 10-2
7. How do practice guidelines differ from systematic
reviews?
Your language may differ, but the following examples
provide an accurate interpretation of the corresponding
results, stated in plain language for clients.
1. Five studies of eccentric viewing training all found that
it was effective. When the results of the five studies
were combined, the intervention was found to have
made a moderate difference.
2. People who had very early interventions that involved
moving within 36 hours of their stroke were three
4366_Ch10_183-202.indd 200
27/10/16 2:11 pm
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
times more likely to be independent at the end of their
treatment than individuals who received standard care.
3. Healthy adults are likely to have normal balance until
the age of 70, but after that you are more likely to have
balance problems. The link between aging and balance
problems is very strong.
EXERCISE 10-3
1. Reputation and resources of the organization—The
American Heart Association is a well-established
and respected organization that is certified by the
National Health Council for excellence as an advocacy
organization.
2. Timeliness—The practice guidelines were published
in 2013, and the primary research included in the review is through 2011. This speaks to one of the issues
with practice guidelines: By the time practice guidelines are published, more recent evidence will probably have been published as well.
3. Strength of the research evidence—Only one intervention provided Level A evidence. As defined in
Table 1, this means multiple RCTs with different
populations. The intervention with Level A evidence
was dynamic aerobic exercise.
4. Review process—Reviewers included clinicians and
researchers. A systematic review of the evidence
was conducted. There was transparency in terms of
potential conflict of evidence reported, with no major
concerns identified. No clients were included in the
review process.
5. Recommendations include supporting evidence—
Extensive supporting evidence was included with each
recommendation. Effect sizes were included when
available.
Overall, this practice guideline meets most of the criteria for a strong practice guideline.
FROM THE EVIDENCE 10-1
The authors and date of publication; for most studies, the
number of participants and in some cases gender and age;
the measures/assessments used in the study; and a very
brief summary of the findings.
FROM THE EVIDENCE 10-2
1. The objective would be found in the introduction of
the full article, whereas data sources, selection, and
extraction would fit into the category of methods.
The conclusion of the abstract includes information
that would be found in both the results and discussion
sections of a full article.
2. There are many limitations to this review. Many studies were included in the review (14); however, most
were at a low level of evidence, with the exception of
one weak randomized controlled trial. Therefore, the
review could not be considered Level 1 evidence. The
4366_Ch10_183-202.indd 201
201
review is also limited in terms of study heterogeneity
and the inclusion of only published literature.
FROM THE EVIDENCE 10-3
Eight of the studies in the review assessed functional disability, but only five examined pain. Given the importance
of replication in systematic reviews/meta-analyses, this is
an important consideration bearing on the strength of the
evidence for a particular outcome.
FROM THE EVIDENCE 10-4
1. Seven out of nine studies had a positive effect. Two of
the nine found little to no difference between groups.
2. Campbell (1997) was weighted the strongest in the
meta-analysis and assigned a weight of 31.3%.
FROM THE EVIDENCE 10-5
The existence of two independent reviewers helps to
protect the review from bias. For example, each reviewer
would rate the quality of each individual study using a
rating scale. Then the two reviewers could compare
their results; if there are any discrepancies, they would
discuss those differences and come to a more informed
conclusion.
FROM THE EVIDENCE 10-6
The recommendations with Level A evidence include the
use of problem-solving strategies for activities of daily
living (ADLs) and instrumental activities of daily living
(IADLs), and multicomponent patient education and
training to improve occupational performance.
REFERENCES
American Occupational Therapy Association (AOTA). (n.d.). AOTA Press
catalogue. Retrieved from http://www.aota.org/en/Publications-News/
AOTAPress.aspx
American Physical Therapy Association. (APTA). (n.d.). Clinical practice guidelines. Retrieved from http://www.apta.org/InfoforPayers/
Resources/OrthopaedicClinicalGuidelines/
Bartels, S., & Desilets, R. (2012). Health promotion programs for people
with serious mental illness (Prepared by the Dartmouth Health Promotion Research Team). Washington, DC: SAMHSA-HRSA Center
for Integrated Health Solutions.
Brook et al; American Heart Association Professional Education Committee of the Council for High Blood Pressure Research, Council
on Cardiovascular and Stroke Nursing, Council on Epidemiology
and Prevention, and Council on Nutrition, Physical Activity. (2013).
Beyond medications and diet: Alternative approaches to lowering
blood pressure. A scientific statement from the American Heart
Association. Hypertension, 61, 1360–1383.
Brouwers, M., Kho, M. E., Browman, G. P., Burgers, J. S., Cluzeau,
F., Feder, G., . . . Zitzelsberger, L. for the AGREE Next Steps
Consortium. (2010). AGREE II: Advancing guideline development,
reporting and evaluation in healthcare. Canadian Medical Association
Journal, 182, E839–E842.
27/10/16 2:11 pm
202
CHAPTER 10 ● Tools for Practitioners That Synthesize the Results of Multiple Studies
Bruin, M., & Nevoy, A. (2014). Exploring the discourse on communication modality after cochlear implantation: A Foucauldian analysis
of parents’ narratives. Journal of Deaf Studies and Deaf Education, 19,
385–399.
ClinicalTrials.gov. (2014). History, policy and laws. Retrieved from
http://clinicaltrials.gov/ct2/about-site/history
Craig, L. E., Bernhardt, J., Langhorne, P., & Wu, O. (2010). Early
mobilization after stroke: An example of an individual patient data
meta-analysis of a complex intervention. Stroke, 41, 2632–2636.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.
Dal Bello-Haas, V., & Florence, J. M. (2013). Therapeutic exercise for
people with amyotrophic lateral sclerosis or motor neuron disease.
Cochrane Database of Systematic Reviews, 5, Art. No. CD005229.
doi:10.1002/14651858.CD005229.pub3
Downs, S. (2014). Normative scores on the Berg Balance Scale decline
after age 70 years in healthy community-dwelling people: A systematic review. Journal of Physiotherapy, 60(2), 85–89.
Grey Literature Network Service. (1999). 4th International conference on
grey literature: New frontiers in grey literature. Washington, DC: Author.
Holtzman, S., & Beggs, R. T. (2013). Yoga for chronic low back pain:
A meta-analysis of randomized controlled trials. Pain Research and
Management, 18, 267–272.
Hong, S. P., Park, H., Kwon, J. S., & Yoo, E. (2014). Effectiveness of
eccentric viewing training for daily visual activities for individuals
with age-related macular degeneration: A systematic review and
meta-analysis. NeuroRehabilitation, 34, 587–595.
Hopewell, S., Loudon, K., Clarke, M. J., Oxman, A. D., & Dickersin, K.
(2009). Publication bias in clinical trials due to statistical significance
or direction of trial results. Cochrane Database of Systematic Reviews,
21(1), MR000006. doi:10.1002/14651858.MR000006.pub3
Kaldenberg, J., & Smallfield, S. (2013). Occupational therapy practice
guidelines for older adults with low vision. Bethesda, MD: AOTA Press.
Kent, R. D., & Vorperian, H. K. (2013). Speech impairment in Down
syndrome: A review. Journal of Speech, Language and Hearing
Research, 56, 127–210.
Kramer, J. M., Olsen, S., Mermelstein, M., Balcells, A., & Lijenquist, K.
(2012). Youth with disabilities perspectives of the environment and
participation: A qualitative meta-synthesis. Child Care and Health
Development, 38, 763–777.
Lee, H., Tamminen, K. A., Clark, A. M., Slater, L., Spence, J. C., &
Holt, N. L. (2015). A meta-study of qualitative research examining
4366_Ch10_183-202.indd 202
determinants of children’s independent active free play. International
Journal of Behavior, Nutrition and Physical Activity, 12, 5.
Linehan, M. M. (1993). Cognitive behavioral treatment of borderline
personality disorder. New York, NY: Guilford Press.
Linehan, M. M., Comtois, K. A., Murray, A. M., Brown, M. Z.,
Gallop, R. J., Heard, H. L., . . . Lindenboim, N. (2006). Twoyear randomized controlled trial and follow-up of dialectical
behavior therapy vs therapy by experts for suicidal behaviors and
borderline personality disorder. Archives of General Psychiatry, 63,
757–766.
Montori, V. M., Brito, J. P., & Murad, M. H. (2013). The optimal practice of evidence-based medicine: Incorporating patient preferences
in practice guidelines. JAMA, 310, 2503–2504.
Pasieczny, N., & Connor, J. (2011). The effectiveness of dialectical
behaviour therapy in routine public mental health settings: An
Australian controlled trial. Behaviour Research and Therapy, 49,
4–10.
PRISMA. (2009). PRISMA statement. Retrieved from http://prismastatement.org/PRISMAStatement/Checklist.aspx
Rand, D., Miller, W. C., Yiu, J., & Eng, J. J. (2011). Interventions for
addressing low balance confidence in older adults: A systematic
review and meta-analysis. Age and Ageing, 40, 297–306.
Stoffers, J. M., Vollm, B. A., Rucker, G., Timmer, A., Huband, N., &
Lieb, K. (2012). Psychological therapies for borderline personality
disorder. Cochrane Database of Systematic Reviews, 8:CD005652.
doi:10.1002/14651858.CD005652.pub2
Thomas, J., & Harden, A. (2008). Method for the thematic synthesis
of qualitative research in systematic reviews. BMC Medical Research
Methodology, 8, 45.
Tong, A., Flemming, K., McInnes, E., Oliver, S., & Craig, J. (2012).
Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ. BMC Medical Research Methodology,12, 181.
U.S. Department of Health and Human Services. (2014). Inclusion criteria for National Guideline Clearinghouse. Retrieved from http://
www.guideline.gov/about/inclusion-criteria.aspx
Van Schoonhoven, J., Sparreboom, M., van Zanten, B. G.,
Scholten, R. J., Mylanus, E. A., Dreschler, W. A., Grolman, W., &
Matt, B. (2013). The effectiveness of bilateral cochlear implants for
severe-to-profound deafness in adults: A systematic review. Otology
and Neurotology, 32, 190–198.
27/10/16 2:11 pm
“Social innovation thrives on collaboration; on doing things with others rather
than just to them or for them.”
—Geoff Mulgan, Chief Executive of the National Endowment for Science, Technology and the Arts
11
Integrating Evidence
From Multiple Sources
Involving Clients and Families
in Decision-Making
CHAPTER OUTLINE
LEARNING OUTCOMES
Consensus Building
KEY TERMS
Agreement
INTRODUCTION
CLIENT-CENTERED PRACTICE
SHARED DECISION-MAKING
EDUCATION AND COMMUNICATION
Components of the Process
People Involved
Decision Aids
Content
Resources for Shared Decision-Making
CRITICAL THINKING QUESTIONS
ANSWERS
REFERENCES
Engaging the Client in the Process
LEARNING OUTCOMES
1. Describe how therapists can engage clients in the process of making treatment decisions, including sharing
intervention evidence.
2. Locate or create client-friendly resources to facilitate shared decision-making regarding a particular condition.
203
4366_Ch11_203-216.indd 203
27/10/16 2:12 pm
204
CHAPTER 11 ● Integrating Evidence From Multiple Sources
KEY TERMS
client-centered practice
decision aids
shared decision-making
INTRODUCTION
I
n the first chapter of this text, evidence-based practice
was described as the integration of information from the
research evidence, practitioner experience, and client values and preferences. However, the bulk of the remaining
chapters focused on how to consume research effectively.
Much of your current educational experience is informed
by practitioner experience, and you will cultivate your own
experiences in clinical education and eventually your own
practice.
However, the inclusion of client values and preferences is
perhaps the least understood component of evidence-based
practice and potentially the most difficult to implement
(Hoffmann, Montori, & Del Mar, 2014). Including the client’s
values and preferences is consistent with a broader approach
known as client-centered practice. This chapter looks closely at
the components of client-centered practice that are specific
to evidence-based practice, and does so by emphasizing a
process known as shared decision-making.
CLIENT-CENTERED PRACTICE
Client-centered practice is an approach to practice that
attaches importance to the individual’s values and preferences, and respects the expertise that the person brings
to the situation in the form of lived experience. Hammell
(2013) claims that the central construct of client-centered
practice is respect, which includes respect for clients and
their experiences, knowledge, and right to make choices
about their own lives. Clients want to be listened to,
cared for, and valued. Client-centered practice can enhance therapy outcomes because clients must engage in
the process in order to benefit from therapy. Engagement
includes attending appointments, participating in therapy, and following up with recommendations. When the
practitioner engages clients at the outset of the decisionmaking process, they may be more likely to fully participate in assessment, intervention, and follow-up.
Partly because it is a relatively new concept, and partly
because it can be difficult to measure, only a limited number of studies are available that examine client-centered
practice in therapy. Furthermore, the exact meaning of
client-centered therapy can differ greatly in application.
However, the two studies described here illustrate the
4366_Ch11_203-216.indd 204
importance of incorporating client preferences into one’s
practice: In both of the studies, the client preference was
inconsistent with the research evidence.
Cox, Schwartz, Noe, and Alexander (2011) conducted
a study to determine the best candidates for bilateral hearing aids. They found that audiometric hearing loss and
auditory lifestyle were poor predictors, as many individuals who were good candidates (based on the research evidence) for bilateral hearing aids were found at follow-up
to have chosen to wear only one. The study conclusions
emphasized the importance of collaboration with the client for determining preferences and making decisions on
unilateral versus bilateral hearing aids.
Sillem, Backman, Miller, and Li (2011) compared two
different splints (custom made and prefabricated) for
thumb osteoarthritis, using a crossover study design in
which all participants had the opportunity to wear both
splints. There was no difference in hand-function outcomes between the two splints, but the custom-made
splint had better outcomes for pain reduction. However,
a larger number of individuals expressed a preference
for the prefabricated splint. The researchers concluded
that ultimately the client should make the decision about
which splint to use.
The therapy process involves many decisions: which
intervention approach to pursue, which assistive device
to select, what discharge plan to follow, and so on. When
therapists and clients approach clinical decisions jointly,
better outcomes are possible, because both parties have
the opportunity to bring their respective expertise to
the situation to make an informed choice. This process,
known as shared decision-making, has been explicated
such that there are models and methods for sharing evidence with clients.
SHARED DECISION-MAKING
The therapist-client relationship is one of unequal positions, because the therapist is potentially in a position to
assume a paternalistic or authoritarian role and determine
the course of the client’s therapy. Therapists can be unaware of their position of power and the behaviors they
engage in that may exploit that power (Sumsion & Law,
2006). For example, developing an intervention plan without input from the client, or administering an intervention without explaining it to the client, are behaviors that
ignore the client’s role in the therapy process. In contrast,
the practice of shared decision-making promotes equality and mutuality between therapist and client. Because
therapists are in a position of power, it is incumbent upon
them to intentionally create an environment in which collaboration can occur.
Shared decision-making creates a common ground
for the confluence of evidence-based practice and
client-centered practice (Fig. 11-1). At the core of shared
decision-making is a process of information exchange.
27/10/16 2:12 pm
CHAPTER 11 ● Integrating Evidence From Multiple Sources
• Clinical
expertise
• Research
evidence
ⴙ
• Values
• Preferences
• Lifestyle
• Knowledge of
unique situation
ⴝ
Shared
decisionmaking
FIGURE 11-1 Shared decision-making occurs when evidence-based
practice and client-centered practice come together. (ThinkStock/iStock/
Katarzyna Bialasiewicz.)
The professional brings information about the clinical
condition and options for intervention, including the
risks and benefits of each option. Clients bring their
own information, which includes values, preferences,
lifestyle, and knowledge of their situation. The therapist
and client then work collaboratively to arrive at clinical
decisions.
Some clinical situations are more amenable to shared
decision-making than others. For example, in some cases
only one treatment is available or the research evidence
strongly supports only one approach. However, even if only
one choice exists, choosing no treatment is still an alternative. Many conditions may be addressed with more than
one approach and, when options exist, it is important to
share those choices with the client. In other situations, the
205
therapist may believe it is unethical to provide a particular
intervention; perhaps there is no clear benefit, and accepting payment would be unjustified, or there may be potential
adverse consequences. In this case, a principled approach
involves discussing concerns with the client. If the issues
cannot be resolved, the therapist may opt out of providing
therapy for that particular client.
Some clients do not wish to participate in the decisionmaking process, whereas others highly value the opportunity. Edwards and Elwyn (2006) described shared
decision-making as involving the individual in decisionmaking to the extent that he or she so desires. In fact,
though, it can be challenging for the therapist to determine the extent to which a client wants to be involved in
the process.
Figure 11-2 illustrates the continuum of the client’s
participation in the decision-making process. At one end
of the continuum is the client who cannot or does not
want to be involved in the decision-making process. Infants, individuals in a coma, and individuals in the later
stages of Alzheimer’s disease are examples of clients who
are unable to make their own decisions; in these instances,
caregivers may assume that position. Still, there will likely
be situations in which a caregiver is not involved or available to participate in the decision-making process.
Other clients may be capable of making decisions, but
choose for the therapist to be in control of the process.
In that case, the preference to not be involved is a choice
that should be respected (Gafni & Charles, 2009). At the
other end of the continuum is the client who is well informed and empowered to make decisions about his or
her own health care. Most clients will fall somewhere in
between these two extremes, with some level of interest
in obtaining information about options and a desire to
participate in the discussion. There is a place for shared
decision-making at all points in the continuum. Even if
a client does not want to be involved in the process, the
therapist should consider that client’s preferences and
unique life experiences and situation.
Client/family
has no involvement
in decision-making
Wants information
but does not want
to make decisions
Wants information
and wants to make
some decisions; relies
on therapist for
other decisions
Client fully involved
and makes
all decisions
Therapist gathers
information about
client and family and
incorporates this into
decision-making
Therapist gathers and
shares information and
utilizes knowledge about
client and family when
making decisions
Therapist gathers and
shares information and
collaborates with the
client in the decisionmaking process
Therapist gathers and
shares information and
respects clientʼs
decisions
FIGURE 11-2 Continuum of client involvement in the decision-making process and role of the therapist.
4366_Ch11_203-216.indd 205
27/10/16 2:12 pm
206
CHAPTER 11 ● Integrating Evidence From Multiple Sources
The collaboration between client and therapist differs
depending on the client’s desired level of participation.
One study suggested that what the client valued most in
the shared decision-making process was not the opportunity to make a collaborative decision, but the fact that
he or she was listened to and that the health-care professional provided information that was easy to understand
(Longo et al, 2006).
Consider the example of supervised physical therapy
versus a standardized home exercise program for individuals with total knee arthroplasty. There is evidence
suggesting that comparable outcomes for range of
motion and functional performance are achieved with
both interventions (Büker et al, 2014). For clients who
would prefer for the therapist to make the decision, the
practitioner might make different recommendations
for an individual with a history of regular involvement
in physical activity than for one who finds exercise boring and unpleasant. The therapist would also take into
consideration other unique considerations applicable to
each particular client, such as availability of transportation to therapy, insurance reimbursement, and family
support.
For clients who want to make the decision independently, the therapist could present the options of outpatient therapy versus a home program, explaining the
research evidence and the pragmatic issues of cost and
inconvenience. The client in the middle would likely desire all of this information in addition to input from the
therapist as to how best to proceed.
EDUCATION AND COMMUNICATION
Most clients are unfamiliar with the concepts surrounding shared decision-making and will simply expect the
therapist to prescribe a course of treatment. For this reason, the process of shared decision-making should begin
with education. The therapist may begin by describing
some of the decisions that will need to be made during
the course of therapy, and emphasize that the decisionmaking process involves information exchange and discussion concerning these options. The first decision
might focus on how involved the client would like to be
in the process.
Shared decision-making necessitates a progression
of communication that follows a specific course: listen, speak, and then listen again. The therapist listens
to the client regarding the client’s values, preferences,
and particular life circumstances. In the context of the
information provided by the client, the therapist presents options and explains the pros and cons of each. The
therapist then listens to the client as he or she weighs the
options. The therapist and client continue the discussion, which is focused on ultimately making a treatment
decision.
In the Evidence in the Real World box below, Judy is
managing at work, although it is difficult. She just works
through the pain, but finds that after work she has few
resources left to take care of her home, enjoy her grandchildren, or engage in previously valued leisure activities
such as swimming and gardening. She spends more time
than she would like to in bed or watching television.
During the speaking phase, you provide Judy with
some options for intervention, along with the evidence.
You discuss mind/body interventions, such as relaxation
and mindfulness, resistance training, aerobic training,
and aquatic exercise, and then you provide Judy with the
handout shown in Figure 11-3, which outlines the evidence in a simplified form.
Overall, the results indicate that the exercise interventions have better outcomes than the mind/body interventions, but in all cases the improvements are relatively
modest.
You then ask Judy if she has any additional questions
about these options and to express what she is thinking.
After some discussion, the two of you decide to start with
an aquatic exercise program, based on Judy’s interest in
swimming and the evidence suggesting that she may experience some pain relief from this activity.
EVIDENCE IN THE REAL WORLD
Shared Decision-Making
The following scenario illustrates the process of shared decision-making, including the types of questions one
might ask to facilitate communication. Judy (59 years old) has fibromyalgia and experiences significant pain. You
know that she is married and currently working. To gather some information on Judy’s personal preferences and
values, you prepare some questions; this is the first listen phase. Some questions you ask include:
1.
2.
3.
4.
How does fibromyalgia affect your work performance?
What would you like to do that you are currently not doing?
How are you currently managing the pain?
When you are feeling well, what sort of activities do you like to engage in?
4366_Ch11_203-216.indd 206
27/10/16 2:12 pm
CHAPTER 11 ● Integrating Evidence From Multiple Sources
207
Interventions for Fibromyalgia
The following table summarizes the effectiveness of four different types of
interventions for fibromyalgia based on the results of four Cochrane summaries.
These include: mind/body interventions such as relaxation, mindfulness, and
biofeedback (Theadom et al., 2015); resistance training, such as weight lifting
(Busch et al., 2013); aerobic exercise (most of the studies used walking) (Busch
et al., 2007); and aquatic exercise (Bidonde et al., 2014).
Improvements on a 100-point scale (interventions were compared with usual care)
Outcomes
Physical
function
Mind/Body
Interventions
Resistance
Training
Aerobic
Exercise
Aquatic
Exercise
2
6
7
6
2
2.5
Well-being
Pain
4
17
1.3
7
Moderate
Low
20 minutes,
2–3 times a
week for
2–24 weeks.
Need at least
12 weeks for
improvement
Not specified
Quality of the Research
Low
Low
Frequency
Not specified
Many
participants
dropped out of
the treatment
2–3 times a
week for
16–21 weeks
FIGURE 11-3 Sample handout outlining
potential interventions for fibromyalgia.
2.
EXERCISE 11-1
Crafting Questions to Elicit Information
(LO1)
QUESTIONS
Select one of the three following scenarios and write four
questions that you could ask to engage clients in the decisionmaking process and elicit information from them and/or
their families regarding values and preferences. Remember
that it is useful to ask open-ended questions that are respectful of the clients’ knowledge and expertise about their own
personal experience.
3.
a. A grade-school child with autism who has difficulty
making friends at school
b. A young adult with traumatic brain injury who is returning home and needs help with maintaining a schedule
c. An athlete with chronic joint pain
1.
4366_Ch11_203-216.indd 207
4.
27/10/16 2:12 pm
208
CHAPTER 11 ● Integrating Evidence From Multiple Sources
Components of the Process
In their seminal article on shared decision-making, Charles,
Gafni, and Whelan (1997) identified four components:
1. The interaction involves at least two people, including
the therapist and client.
2. Both individuals take steps to participate in decisionmaking.
3. Both individuals participate in consensus building.
4. An agreement is reached on the treatment to implement.
Each of these components is described in the following
subsections and applied to an example of an individual
with multiple sclerosis who is receiving physical therapy.
People Involved
The inclusion of two people, including the therapist and
client, is an obvious component, but it is important to
recognize that often more than two people will be involved. From the clinical side, therapists typically work
on a team, and other team members may be invested in
the treatment decision. On the client’s side, there may be
family members or other individuals who are concerned
about the treatment decision. It is especially important
to involve other individuals who are close to the client
when they assume caregiving responsibilities and will be
directly involved in carrying out the intervention, and
when they are impacted by the intervention choice.
Jim, who has multiple sclerosis, recently experienced a flare-up
and was referred to physical therapy to address issues related to
mobility and balance. He was also referred to occupational therapy
for energy conservation and speech-language therapy for problems with swallowing. Jim lives at home with his wife, who is
very involved and supportive in his treatment. Optimally shared
decision-making would include the physical therapist, occupational
therapist, speech-language therapist, Jim, and his wife.
Engaging the Client in the Process
The second component involves both individuals taking
steps to participate in the decision-making process. However, some clients are not prepared to make decisions about
their therapy options. Many will come to the situation expecting or expressly wanting the health-care professional
to make all the decisions. In actuality, the health-care professional will likely be the individual to establish the norms
of the interaction. Because most clients do not expect to be
involved in decision-making, the therapist will need to set
the stage. Therefore, it becomes even more important for
the therapist to create an environment in which the client
is both comfortable and empowered to take an active role.
The therapist can use many different strategies to promote collaboration. The physical environment in which
discussions take place is important. A space that is free
from distraction and allows for private discussion will
facilitate greater connection between therapist and client.
The manner in which the therapist communicates can go a
long way toward promoting inclusion of the client in shared
4366_Ch11_203-216.indd 208
decision-making. The therapist’s nonverbal cues, such as an
open and forward-leaning posture, eye contact, and head
nods, can indicate warmth and attentiveness. Open-ended
questions that promote discussion are more useful than
questions that can be answered with single-word responses,
and active listening with paraphrasing will likely encourage
more openness on the part of the client. Once the therapist
educates the client about shared decision-making, he or she
can determine the client’s desired level of involvement.
The physical therapist schedules a meeting with the other
therapists, Jim, and his wife. They meet in a quiet conference
room and spend some time getting to know each other. Shared
decision-making is explained, and Jim indicates that he is on
board with the idea of participating in the process. After the
recent flare-up of his disease, he expresses a desire to take a more
active role in his rehabilitation and overall wellness.
Consensus Building
During consensus building, therapist and client share relevant information. The therapist can begin the process
by asking the client to talk about his or her values and
preferences, including identifying desired outcomes. Outcomes should be discussed broadly and include impairment issues, such as strength, fatigue, pain, and cognition,
as well as activity and participation limitations. Particularly important is identifying any activities that the client
wants or needs to participate in, but which are currently
hampered by the condition. In addition, it is important
for the therapist to learn about the client’s situation, including financial resources, social supports, and living arrangements. Once the client shares this information, the
therapist can provide a summary to ensure that the client
was understood correctly. Then the client and therapist
can work on establishing priorities for therapy.
The therapist shares information about intervention
options, pragmatic issues associated with these options
(e.g., costs and time commitment), and the evidence
to support the interventions. The information should
be presented in language that is accessible to the client
and in an unbiased manner. It is easy for the therapist
to just present his or her preference—and this should be
included if requested by the client—but for true shared
decision-making to occur, the client must have a real
choice in the matter. This requires therapists to avoid
presenting their preferences as the only real option.
Jim explains that he is concerned about falling, and that this
fear makes him less likely to leave his home. He is particularly
reluctant to be in places where the ground is unstable and where
crowds are present. This fear has had a very negative impact
on his quality of life, because Jim enjoys the outdoors and bird
watching—both activities he shares with his wife.
The physical therapist describes numerous approaches to address balance and mobility, including strengthening exercises,
aquatic therapy, Tai Chi, and vestibular rehabilitation. These approaches are discussed in the context of the other therapies that Jim
will be receiving and with the occupational and speech-language
therapists.
27/10/16 2:12 pm
CHAPTER 11 ● Integrating Evidence From Multiple Sources
EXERCISE 11-2
Sharing Evidence With Clients
and Families (LO2)
You are working with a child who has ADHD and
motor delays, and the family asks you about Interactive Metronome therapy. You find three studies
(see the following abstracts).
STUDY #1
Shaffer, R. J., Jacokes, L. E., Cassily, J. F., Greenspan, S. I., Tuchman, R. F.,
& Stemmer, P. J., Jr. (2001, March-April). Effect of Interactive Metronome
Training on children with ADHD. American Journal of Occupational Therapy, 55(2),155-162.
Objective
The purpose of this study was to determine the
effects of a specific intervention, the Interactive
Metronome, on selected aspects of motor and cognitive skills in a group of children with attention
deficit hyperactivity disorder (ADHD).
Method
The study included 56 boys who were 6 years to
12 years of age and diagnosed before they entered
the study as having ADHD. The participants were
pretested and randomly assigned to one of three
matched groups. A group of 19 participants receiving
15 hr of Interactive Metronome training exercises
were compared with a group receiving no intervention and a group receiving training on selected
computer video games.
Results
A significant pattern of improvement across 53 of
58 variables favoring the Interactive Metronome
treatment was found. Additionally, several significant
differences were found among the treatment groups
and between pretreatment and posttreatment factors
on performance in areas of attention, motor control,
language processing, reading, and parental reports of
improvements in regulation of aggressive behavior.
Conclusion
The Interactive Metronome training appears to
facilitate a number of capacities, including attention,
motor control, and selected academic skills, in boys
with ADHD.
STUDY #2
Bartscherer, M. L., & Dole, R. L. (2005). Interactive Metronome Training
for a 9-year-old boy with attention and motor coordination difficulties. Physiotherapy Theory & Practice, 21(4), 257-269.
The purpose of this case report is to describe a
new intervention, the Interactive Metronome, for
4366_Ch11_203-216.indd 209
209
improving timing and coordination. A nine-yearold boy, with difficulties in attention and developmental delay of unspecified origin underwent a
seven-week training program with the Interactive
Metronome. Before, during, and after training timing accuracy was assessed with testing procedures
consistent with the Interactive Metronome training
protocol. Before and after training, his gross and
fine motor skills were examined with the BruininiksOseretsky Test of Motor Proficiency (BOTMP).
The child exhibited marked change in scores on
both timing accuracy and several BOTMP subtests.
Additionally his mother relayed anecdotal reports
of changes in behavior at home. This child’s participation in a new intervention for improving timing
and coordination was associated with changes in
timing accuracy, gross and fine motor abilities, and
parent reported behaviors. These findings warrant
further study.
STUDY #3
Cosper, S. M., Lee, G. P., Peters, S. B., & Bishop, E. (2009). Interactive Metronome Training in children with attention deficit and developmental coordination disorders. International Journal of Rehabilitation Research, 32(4),
331-336. doi:10.1097/MRR.0b013e328325a8cf
The objective of this study was to examine the
efficacy of Interactive Metronome (Interactive
Metronome, Sunrise, Florida, USA) training in a
group of children with mixed attentional and motor coordination disorders to further explore which
subcomponents of attentional control and motor
functioning the training influences. Twelve children who had been diagnosed with attention deficit
hyperactivity disorder, in conjunction with either
developmental coordination disorder (n=10) or
pervasive developmental disorder (n=2), underwent
15 1-h sessions of Interactive Metronome training
over a 15-week period. Each child was assessed
before and after the treatment using measures of
attention, coordination, and motor control to determine the efficacy of training on these cognitive
and behavioral realms. As a group, the children
made significant improvements in complex visual
choice reaction time and visuomotor control after
the training. There were, however, no significant
changes in sustained attention or inhibitory control over inappropriate motor responses after treatment. These results suggest Interactive Metronome
training may address deficits in visuomotor control
and speed, but appears to have little effect on sustained attention or motor inhibition.
QUESTIONS
How might you present these findings accurately and accessibly to the parents? Consider the levels of evidence when
assessing the studies.
27/10/16 2:12 pm
210
CHAPTER 11 ● Integrating Evidence From Multiple Sources
Write a few statements that you could use when discussing
the evidence with the family and client.
Agreement
Shared decision-making results in an agreement about the
direction that intervention will take. This does not necessarily mean that both parties conclude that the intervention chosen is the best one or that both parties are equally
involved in the decision. For example, after the process of
sharing information, the client may still ask the therapist to
make the final decision. Or, the client may choose an approach that is not the therapist’s preferred choice, although
the intervention fits well with the client’s situation and is
one in which the client will more likely engage. In a less
desirable scenario, the client may choose an option that
the therapist is not equipped to provide or one that the
therapist feels will result in negative outcomes, in which
case the therapist may decline to provide the intervention.
Shared decision-making is a fluid process that differs with
each client encounter.
Jim was initially very interested in aquatic therapy, but
decided against this option when he learned that the evidence
was only fair for its efficacy in improving balance for individuals with multiple sclerosis (Marinho-Buzelli, Bonnyman, &
Verrier, 2014). In addition, the clinic that provided aquatic
therapy was far from his home. Using information from practice guidelines (Latimer-Cheung et al, 2013), Jim and his
physical therapist decide on an intervention that incorporates
aerobic and strength training two times a week that Jim can
eventually do on his own at home with the support of his wife.
He is also interested in using a cane for everyday mobility and
is considering a wheelchair for those situations in which he feels
the most insecure.
Decision Aids
Decision aids are materials that provide information to
the client to support the decision-making process. They
may take the form of written materials, videos, interactive Internet presentations, or other resources. A Cochrane review (Stacey et al, 2011) of the use of decision
aids found that they increased client involvement, improved knowledge, created a more realistic perception
of outcomes, and reduced discretionary surgery. In addition, the review found no adverse effects from use of
decision aids.
4366_Ch11_203-216.indd 210
At the time of this writing, decision aids are just
beginning to be developed for health-care decisions,
and there are few decision aids available in the field of
rehabilitation; however, existing aids in other areas of
health care can serve as a model. Therapists can develop their own decision aids, in the form of written
materials to provide to clients when engaging in shared
decision-making.
The abstract in From the Evidence 11-1 provides an
example of a research team developing decision aids for
individuals with multiple sclerosis. This abstract describes
the author’s process of developing and evaluating decision
aids for individuals with multiple sclerosis. In addition to
increasing the client’s involvement in the decision-making
process, the authors examine the quality of the decisions
made and indicate that the research will examine client
involvement and decision quality.
Content
Decision aids generally provide several types of information, including:
1. Explanation of the condition, to help the client understand his or her condition and how interventions can
work to target specific aspects of it.
2. Identification of the decision that needs to be made,
including the different interventions that are available
and the option of no intervention.
3. Options and potential outcomes based on scientific evidence. This will likely be the most lengthy section of
the decision aid. It includes a summary of the evidence
for each option and often identifies the number of individuals who are expected to improve given a particular
situation or the percent of improvement that can be
expected.
4. Questions to help clients clarify their values. This is
often written in a workbook format, with space for
clients to write their responses. In rehabilitation, the
questions might focus on activities the individual client wants to return to, desired outcomes, the most
troublesome aspects of the condition, living arrangements, social support, financial resources, insurance
coverage, and so on.
Resources for Shared Decision-Making
An excellent source of information about shared decisionmaking is the Ottawa Hospital Research Institute, which
provides an online inventory of decision aids that can
be searched by health topics (http://decisionaid.ohri.ca/
decaids.html). It also includes information on the importance of shared decision-making and offers online tutorials for developing a decision aid based on the Ottawa
development process.
At the time of this writing, the Mayo Clinic is in the
process of developing decision aids. One example focuses on bone health: http://osteoporosisdecisionaid
.mayoclinic.org/index.php/site/index. This aid helps the
27/10/16 2:12 pm
CHAPTER 11 ● Integrating Evidence From Multiple Sources
211
FROM THE EVIDENCE 11-1
Article Discussing Development of Decision Aids
Heesen, C., Solari, A., Giordano, A., Kasper, J., & Kopke, S. (2011). Decisions on multiple sclerosis immunotherapy: New treatment
complexities urge patient engagement. Journal of Neurological Sciences, 306, 2192–2197.
For patients with multiple sclerosis (MS) involvement in treatment decisions becomes ever more
imperative. Recently new therapeutic options have become available for the treatment of MS, and more
will be licensed in the near future. Although more efficacious and easier to administer, the new drugs
pose increased risks of severe side effects. Also, new diagnostic criteria lead to more and earlier MS
diagnoses. Facing increasingly complex decisions, patients need up-to-date evidence-based information
and decision support systems in order to make informed decisions together with physicians based on
their autonomy preferences. This article summarizes recently terminated and ongoing trials on MS
patient education and decision aids conducted by the authors' study groups. Programs on relapse
management, immunotherapy, and for patients with suspected and early MS have been developed and
evaluated in randomized controlled clinical trials. It could be shown that the programs successfully
increase knowledge and allow patients to make informed decisions based on their preferences. For the
near future, we aim to develop a modular program for all relevant decisions in MS to increase patients'
self-management and empower patients to develop their individual approach with the disease. Faced by
a disease with many uncertainties, this should enhance patients' sense of control. Still, it remains a
challenge to adequately assess decision quality. Therefore, a study in six European and one Australian
centers will start soon aiming to establish adequate tools to assess decision-making quality.
Note A: This abstract indicates that decision
aids are important but also notes that the
process of developing them is complex and in
its early stages.
FTE 11-1 Question Identify a particular area of practice in which you think a decision aid would be helpful. Explain
why you chose this area.
individual assess the risk of having a fracture and evaluate
different treatment options for preventing fractures. As
with many decision aids, this one presents the benefits
and risks associated with deciding not to intervene.
Common Ground is a comprehensive web application program that prepares individuals with psychiatric
conditions to meet with their psychiatrist and treatment
team and arrive at decisions (https://www.patdeegan.com/
commonground/about). It includes applications for
shared decision-making and decision aids with a focus on
decisions about psychiatric medications. It provides an
excellent example of a process for gathering information
about client preferences and values and engaging in collaborative decision-making.
These sites are useful to explore to review examples of
decision aids. Existing decision aids can provide a useful
template for developing one’s own.
4366_Ch11_203-216.indd 211
EXERCISE 11-3
Creating Decision Aids for Clients
and Families (LO2)
Read the conclusions from the following Cochrane
review regarding interventions to reduce falls in
older adults.
Gillespie, L. D., Robertson, M. C., Gillespie, W. J., Sherrington, C., Gates, S.,
Clemson, L. M., & Lamb, S. E. (2012). Interventions for preventing falls in
older people living in the community. Cochrane Database System Reviews 9:
CD007146. doi:10.1002/14651858.CD007146.pub3
Background
Approximately 30% of people over 65 years of
age living in the community fall each year. This
27/10/16 2:12 pm
212
CHAPTER 11 ● Integrating Evidence From Multiple Sources
is an update of a Cochrane review first published
in 2009.
Objectives
To assess the effects of interventions designed to reduce the incidence of falls in older people living in
the community.
Search Methods
We searched the Cochrane Bone, Joint and Muscle
Trauma Group Specialised Register (February
2012), CENTRAL (The Cochrane Library 2012,
Issue 3), MEDLINE (1946 to March 2012),
EMBASE (1947 to March 2012), CINAHL (1982
to February 2012), and online trial registers.
Selection Criteria
Randomised trials of interventions to reduce falls in
community-dwelling older people.
Data Collection and Analysis
Two review authors independently assessed risk of
bias and extracted data. We used a rate ratio (RaR)
and 95% confidence interval (CI) to compare the
rate of falls (e.g. falls per person year) between intervention and control groups. For risk of falling,
we used a risk ratio (RR) and 95% CI based on the
number of people falling (fallers) in each group. We
pooled data where appropriate.
Main Results
We included 159 trials with 79,193 participants.
Most trials compared a fall prevention intervention with no intervention or an intervention not
expected to reduce falls. The most common interventions tested were exercise as a single intervention (59 trials) and multifactorial programmes
(40 trials). Sixty-two per cent (99/159) of trials
were at low risk of bias for sequence generation,
60% for attrition bias for falls (66/110), 73% for
attrition bias for fallers (96/131), and only 38%
(60/159) for allocation concealment. Multiplecomponent group exercise significantly reduced
rate of falls (RaR 0.71, 95% CI 0.63 to 0.82; 16
trials; 3622 participants) and risk of falling (RR
0.85, 95% CI 0.76 to 0.96; 22 trials; 5333 participants), as did multiple-component homebased exercise (RaR 0.68, 95% CI 0.58 to 0.80;
seven trials; 951 participants and RR 0.78, 95%
CI 0.64 to 0.94; six trials; 714 participants). For
Tai Chi, the reduction in rate of falls bordered
4366_Ch11_203-216.indd 212
on statistical significance (RaR 0.72, 95% CI 0.52
to 1.00; five trials; 1563 participants) but Tai Chi
did significantly reduce risk of falling (RR 0.71,
95% CI 0.57 to 0.87; six trials; 1625 participants).
Multifactorial interventions, which include individual risk assessment, reduced rate of falls (RaR
0.76, 95% CI 0.67 to 0.86; 19 trials; 9503 participants), but not risk of falling (RR 0.93, 95% CI
0.86 to 1.02; 34 trials; 13,617 participants). Overall, vitamin D did not reduce rate of falls (RaR
1.00, 95% CI 0.90 to 1.11; seven trials; 9324
participants) or risk of falling (RR 0.96, 95%
CI 0.89 to 1.03; 13 trials; 26,747 participants),
but may do so in people with lower vitamin D
levels before treatment. Home safety assessment
and modification interventions were effective
in reducing rate of falls (RR 0.81, 95% CI 0.68
to 0.97; six trials; 4208 participants) and risk of
falling (RR 0.88, 95% CI 0.80 to 0.96; seven
trials; 4051 participants). These interventions
were more effective in people at higher risk of
falling, including those with severe visual impairment. Home safety interventions appear to be
more effective when delivered by an occupational
therapist. An intervention to treat vision problems (616 participants) resulted in a significant
increase in the rate of falls (RaR 1.57, 95% CI
1.19 to 2.06) and risk of falling (RR 1.54, 95%
CI 1.24 to 1.91). When regular wearers of multifocal glasses (597 participants) were given single
lens glasses, all falls and outside falls were significantly reduced in the subgroup that regularly
took part in outside activities. Conversely, there
was a significant increase in outside falls in intervention group participants who took part in
little outside activity. Pacemakers reduced rate of
falls in people with carotid sinus hypersensitivity (RaR 0.73, 95% CI 0.57 to 0.93; three trials;
349 participants) but not risk of falling. First eye
cataract surgery in women reduced rate of falls
(RaR 0.66, 95% CI 0.45 to 0.95; one trial; 306
participants), but second eye cataract surgery did
not. Gradual withdrawal of psychotropic medication reduced rate of falls (RaR 0.34, 5% CI 0.16
to 0.73; one trial; 93 participants), but not risk
of falling. A prescribing modification programme
for primary care physicians significantly reduced
risk of falling (RR 0.61, 95% CI 0.41 to 0.91;
one trial; 659 participants). An anti-slip shoe device reduced rate of falls in icy conditions (RaR
0.42, 95% CI 0.22 to 0.78; one trial; 109 participants). One trial (305 participants) comparing
27/10/16 2:12 pm
CHAPTER 11 ● Integrating Evidence From Multiple Sources
multifaceted podiatry including foot and ankle
exercises with standard podiatry in people with
disabling foot pain significantly reduced the rate
of falls (RaR 0.64, 95% CI 0.45 to 0.91) but not
the risk of falling. There is no evidence of effect
for cognitive behavioural interventions on rate of
falls (RaR 1.00, 95% CI 0.37 to 2.72; one trial;
120 participants) or risk of falling (RR 1.11, 95%
CI 0.80 to 1.54; two trials; 350 participants).
Trials testing interventions to increase knowledge/
educate about fall prevention alone did not significantly reduce the rate of falls (RaR 0.33, 95% CI
0.09 to 1.20; one trial; 45 participants) or risk of
falling (RR 0.88, 95% CI 0.75 to 1.03; four trials;
2555 participants). No conclusions can be drawn
from the 47 trials reporting fall-related fractures.
Thirteen trials provided a comprehensive economic evaluation. Three of these indicated cost
savings for their interventions during the trial period: home-based exercise in over 80-year-olds,
home safety assessment and modification in those
with a previous fall, and one multifactorial programme targeting eight specific risk factors.
Authors’ Conclusions
Group and home-based exercise programmes,
and home safety interventions reduce rate of falls
and risk of falling. Multifactorial assessment and
intervention programmes reduce rate of falls but
not risk of falling; Tai Chi reduces risk of falling. Overall, vitamin D supplementation does
not appear to reduce falls but may be effective in
people who have lower vitamin D levels before
treatment.
Intervention
Reduced
the Rate
of Falling
(the Number
of Times
Individuals
Fell Within
a Year)
213
Reduced
the Risk
of Falling (the
Risk That an
Individual
Would Have
a Fall/Be
a Faller)
Multicomponent group exercise program
Multicomponent homebased exercise
program
Tai Chi
Home safety
assessment
and modification by an OT
CRITICAL THINKING QUESTIONS
1. What is the relationship between evidence-based practice,
client-centered practice, and shared decision-making?
QUESTIONS
Complete the following table, with the goal of including it in a decision aid that interprets the rate
ratio and risk ratios for clients and families for a
multicomponent group exercise program, a multicomponent home-based exercise program, Tai Chi,
and home safety assessment and modifications provided by an occupational therapist.
Identify the percentage of fall reduction for each
category. (You might want to refer to Chapter 8
to review the information on risk ratios. Risk ratios and rate ratios can be especially challenging
to interpret when they are less than 1.0. You must
subtract the risk ratio from 1.0 to get the percentage. For example, if the risk ratio = 0.85, there is a
15% decrease in the risk of falling.)
4366_Ch11_203-216.indd 213
2. How can therapists modify the shared decision-making
process for clients who do not want to be involved in
decision-making?
27/10/16 2:12 pm
214
CHAPTER 11 ● Integrating Evidence From Multiple Sources
3. What steps might therapists take to ensure that they do
not introduce their own biases into the decision-making
process? When is it acceptable to inform a client of
one’s own preferences?
4. Why are decision aids useful for facilitating shared
decision-making?
5. What barriers do therapists face when using shared
decision-making and decision aids?
• What do you know about your child’s experiences
with other children at school?
b) A young adult with traumatic brain injury who is
returning home and needs help with maintaining a
schedule
• What kind of technology do you currently use?
• What do you like and not like about using technology?
• How important is being on time and having a regular
schedule to you?
• When is maintaining a schedule most difficult
for you?
• What other people are involved in your life and
schedule?
c) An athlete with chronic joint pain
• How important is it to you to be able to return to
your sport?
• What other athletic endeavors are you interested in?
• When you start to feel pain, how do you respond?
• How easy is it for you to follow a prescribed treatment
program?
• How do you like to exercise (for example, alone or
with others)?
• How does your injury affect other aspects of your
daily life?
EXERCISE 11-2
ANSWERS
EXERCISE 11-1
Here are samples of potential answers. Remember that
they should be open-ended.
a) A grade-school child with autism who has difficulty
making friends at school
To ask of the child:
• What kinds of things do you like to do for fun?
• Who do you like to play with?
To ask of the parent/caregiver:
• What does a typical day for you and your child
look like?
• How much time do you have for providing therapy
at home?
4366_Ch11_203-216.indd 214
Examples of statements that could be made to present evidence to the client’s parents include:
There are very few studies that look at the effectiveness of Interactive Metronome training. Only one study
compared Interactive Metronome training to other treatments. There is very limited evidence that it may improve
some areas of attention and motor skills, but the studies
were very small and at this time there is not enough evidence to recommend or not recommend it as an option.
Are you interested in hearing about other approaches
with more research evidence?
EXERCISE 11-3
The percentages are derived from the rate of risk and
risk ratios provided in the study summary. Because you
consider the intervention to be effective if it reduces the
rate of falling or risk of falling, a number of less than 1.0
(where the confidence interval does not include 1.0) indicates an effective intervention. To determine the amount
of reduction, you would subtract the ratio from 1.
For example, the rate ratio for group exercise was 0.71,
so 1.0 – 0.71 = 0.29 or 29%.
27/10/16 2:12 pm
CHAPTER 11 ● Integrating Evidence From Multiple Sources
Reduced the
Rate of Falling
(the Number
of Times
Individuals
Fell Within
a Year)
Reduced the
Risk of Falling
(the Risk That
an Individual
Would Have
a Fall/Be
a Faller)
Multicomponent group
exercise
program
29%
15%
Multicomponent homebased exercise
program
32%
22%
Tai Chi
28%
29%
Home safety
19%
assessment and
modification
by an OT
12%
Intervention
FROM THE EVIDENCE 11-1
There is no single correct answer to this question. Decision aids are particularly useful when decisions are complex and more than one option is available.
REFERENCES
Bartscherer, M. L., & Dole, R. L. (2005). Interactive Metronome Training
for a 9-year-old boy with attention and motor coordination difficulties.
Physiotherapy Theory & Practice, 21(4), 257–269.
Bidonde, J., Busch, A. J., Webber, S. C., Schacter, C. L., Danyliw, A.,
Overend, T. J., Richards, R. S., & Rader, T. (2014). Aquatic exercise
training for fibromyalgia. Cochrane Database of Systematic Reviews, 10,
CD011336. doi:10.1002/14651858
Büker, N., Akkaya, S., Akkaya, N., Gökalp, O., Kavlak, E., Ok, N.,
Kiter, A. E., & Kitis, A. (2014). Comparison of effects of supervised
physiotherapy and a standardized home program on functional
status in patients with total knee arthroplasty: A prospective study.
Journal of Physical Therapy Science, 26, 1531–1536.
Busch, A. J., Barber, K. A. R., Overend, T. J., Peloso, P. M. J., &
Schachter, C. L. (2007). Exercise for treating fibromyalgia syndrome. Cochrane Database of Systematic Reviews, 4, CD003786.pub2.
Busch, A. J., Webber, S. C., Richards, R. S., Bidonde, J., Schachter, C.
L., Schafer, L. A., . . . Rader, T. (2013). Resistance exercise training for fibromyalgia. Cochrane Database of Systematic Reviews, 12,
CD010884. doi:10.1002/14651858
4366_Ch11_203-216.indd 215
215
Charles, C., Gafni, A., & Whelan, T. (1997). Shared decision-making in
the medical encounter: What does it mean? (or it takes at least two
to tango). Social Science and Medicine, 44, 681–692.
Cosper, S. M., Lee, G. P., Peters, S. B., & Bishop, E. (2009). Interactive
Metronome Training in children with attention deficit and developmental coordination disorders. International Journal of Rehabilitation
Research, 32(4), 331–336. doi:10.1097/MRR.0b013e328325a8cf
Cox, R. M., Schwartz, K. S., Noe, C. M., & Alexander, G. C. (2011).
Preference for one or two hearing aids among adult patients. Ear
and Hearing, 32, 181–197.
Edwards, A., & Elwyn, G. (2006). Inside the black box of shared decision
making: Distinguishing between the process of involvement and who
makes the decision. Health Expectations, 9, 307–320.
Gafni, A., & Charles, C. (2009). The physician-patient encounter:
An agency relationship? In A. Edwards & G. Elwyn (Eds.), Shared
decision-making in health care: Achieving evidence-based patient choice
(2nd ed., pp. 73–78). Oxford, UK: Oxford University Press.
Gillespie, L. D., Robertson, M. C., Gillespie, W. J., Sherrington, C.,
Gates, S., Clemson, L. M., & Lamb, S. E. (2012). Interventions for
preventing falls in older people living in the community. Cochrane
Database System Reviews, 9, CD007146. doi:10.1002/14651858.
CD007146.pub3
Hammell, K. R. W. (2013). Client-centered occupational therapy in
Canada: Refocusing our core values. Canadian Journal of Occupational
Therapy, 80, 141–149.
Heesen, C., Solari, A., Giordano, A., Kasper, J., & Kopke, S. (2011). Decisions on multiple sclerosis immunotherapy: New treatment complexities urge patient engagement. Journal of Neurological Sciences, 306,
2192–2197.
Hoffmann, T. C., Montori, V. M., & Del Mar, C. (2014). The connection between evidence-based medicine and shared decision making. Journal of the American Medical Association, 312(13), 1295–1296.
doi:10.1001/jama.2014.10186
Latimer-Cheung, A. E., Martin Ginis, K. A., Hicks, A. L., Motl, R. W.,
Pilutti, L. A., Duggan, M., . . . Smith, K. M. (2013). Development of
evidence-informed physical activity guidelines for adults with multiple
sclerosis. Archives of Physical Medicine and Rehabilitation, 94, 1829–1836.
Longo, M. F., Cohen, D. R., Hood, K., Edwards, A., Robling, M., Elwyn,
G., & Russell, I. T. (2006). Involving patients in primary care consultations: Assessing preferences using discrete choice experiments.
British Journal of General Practice, 56, 35–42.
Marinho-Buzelli, A. R., Bonnyman, A. M., & Verrier, M. C. (2014).
The effects of aquatic therapy on mobility of individuals with neurological diseases: A systematic review. Clinical Rehabilitation. PubMed
PMID: 26987621 [Epub ahead of print].
Shaffer, R. J., Jacokes, L. E., Cassily, J. F., Greenspan, S. I., Tuchman,
R. F., & Stemmer, P. J., Jr. (2001, March-April). Effect of Interactive
Metronome Training on children with ADHD. American Journal of
Occupational Therapy, 55(2),155–162.
Sillem, H., Backman, C. L., Miller, W. C., & Li, L. C. (2011). Comparison of two carpometacarpal stabilizing splints for individuals with
thumb osteoarthritis. Journal of Hand Therapy, 24, 216–225.
Stacey, D., Bennett, C. L., Barry, M. J., Col, N. F., Eden, K. B., HolmesRovner, M., . . . Thomson, R. (2011). Decision aids for people facing
health treatment or screening decisions. Cochrane Database of Systematic Reviews, 10, CD001431. doi:10.1002/14651858
Sumsion, T., & Law, M. (2006). A review of evidence on the conceptual elements informing client-centred practice. Canadian Journal of
Occupational Therapy, 73(3), 153–162.
Theadom, A., Cropley, M., Smith, H. E., Feigin, V. L., & McPherson, K.
(2015). Mind and body therapy for fibromyalgia. Cochrane Database
of Systematic Reviews, 4, CD001980.pub3. doi:10.1002/14651858.
CD001980.pub3
27/10/16 2:13 pm
4366_Ch11_203-216.indd 216
27/10/16 2:13 pm
Glossary
alternative treatment threat—a type of history threat in
which an unintended treatment is provided to participants in a study that accounts for differences over time.
analysis of covariance (ANCOVA)—a statistic used when
a researcher wants to examine differences and statistically control a variable that may affect the outcome of
a study.
applied research—research that has direct application to
health-care practices.
artifact—an object collected during qualitative research
to be used as data (e.g., pictures, documents, journals).
assignment threat—a problem created when differences
in groups due to assignment account for the differences
between groups in a study.
attrition—loss of participants who have enrolled in a
study. Attrition can occur for numerous reasons, including voluntary drop-out, relocation, or death. Also
called mortality.
audit trail—the collection of documents from a qualitative study that can be used to confirm a researcher’s
data analysis.
axial coding—the process of identifying relationships between categories during analysis of the data in qualitative research.
basic research—investigation of fundamental questions
that is directed toward better understanding of individual concepts.
between-group comparison—comparison of the results
of two or more groups.
Bonferroni correction—adjustment of an alpha level to
prevent a Type I research error; performed by dividing
0.05 by the number of comparisons.
Boolean operators—the words used in a database search
for relating key terms. Common Boolean operators
include AND, OR, and NOT.
bracketing—a process of qualitative research that involves
suspending judgment of others.
case-control design—an observational, retrospective,
cross-sectional study that can be used to answer prognostic research questions concerning which risk factors predict a condition.
categorical variable—variables that describe attributes
but do not have a quantitative value (e.g., gender,
political affiliation).
ceiling effect—a condition in which many individuals
achieve the highest possible score, leaving little to no
room to detect improvement.
client-centered practice—practice in which the client is
considered the expert regarding his or her situation
and is the primary decision maker for health-care
choices.
clinically significant difference—a change that would be
regarded by clinicians and the client as meaningful and
important.
cluster randomized controlled trial—a study design in
which the groups are randomly assigned (e.g., one
setting receives an intervention and another does
not), but the individuals within the groups are not
randomized.
code-recode procedure—the process of coding data and
then taking a break, returning and recoding the same
data, and then comparing the results of the two efforts.
Cohen’s d—a statistic that measures the strength of the
difference between two group means reported in standard deviation units.
compensatory demoralization—group leaders of the control condition or participants in the control group give
up due to disappointment at not receiving the experimental treatment.
compensatory equalization of treatment—group leaders
of the control condition or participants in the control
group work harder to compensate for not receiving
the experimental treatment.
concurrent validity—the process of supporting construct
validity by finding relationships between the index measure and other measures of the same construct.
confidence interval (CI)—a reliability estimate that suggests the range of outcomes expected when an analysis
is repeated.
confirmability—the extent to which qualitative data can
be corroborated.
connecting data—in mixed-methods research, when one
method (either quantitative or qualitative) informs
another. Typically one method is used first in one
217
4366_Glossary_217-224.indd 217
26/10/16 5:00 pm
218
Glossary
study and the results are used in designing the second
study.
constant comparative method—the back-and-forth process of collecting data and then determining methods
in qualitative research.
constructivism—a philosophical perspective which suggests that reality is subjective and is determined by the
individual based on experience and context.
construct validity—the ability of a test to measure the
construct it is intended to measure.
continuous data—data with values that fall on a continuum
from less to more.
continuous variable—variable in which the numbers have
meaning in relation to one another.
control—a condition that remains constant or the same
between groups or situations.
control group—a group in which participants receive an
alternate intervention, a standard intervention, or no
intervention.
control variable—a variable that remains constant or the
same between groups or situations.
convenience sampling—a type of sampling in which potential participants are selected based on the ease with
which they can be included in a study.
convergent validity—the process of supporting construct
validity by finding relationships between the index
measure and other measures of the same construct.
correlation—the degree to which two or more variables
fluctuate together.
correlational study—a study designed to determine whether
a relationship exists between two constructs and, if so,
to assess the strength of that relationship.
covary—a statistical process of controlling for a variable
that may be different between groups.
credibility—the extent to which qualitative data are authentic.
criterion-referenced—a measure in which an individual’s
scores are compared against some established standard.
critically appraised paper—a type of analysis that critiques a published research study and interprets the
results for practitioners.
Cronbach’s alpha—a measure of internal consistency
that ranges from 0.0 to 1.0 and indicates the degree to
which the items of a measure are unidimensional.
crossover study design—a design in which participants
are randomly assigned to groups and receive the same
treatments, but in a different order.
cross-sectional research—a type of study that collects
data at a single point in time.
database—an organized collection of data. In the case
of evidence-based practice, a bibliographic citation
database is a compilation of articles, books, and book
chapters.
decision aids—material (often in the form of pamphlets,
workbooks, or computer programs) that clients can
use in the process of making health-care decisions.
4366_Glossary_217-224.indd 218
dependability—the extent to which qualitative data are
consistent.
dependent sample t-test—a statistic that compares a dependent variable within the same group, either a pretest and
posttest or two different measures.
dependent variable—the variable that is intended to measure the outcome of a study.
descriptive statistics—statistics that provide an analysis of
data to help describe, show, or summarize the data in
a meaningful way such that, for example, patterns can
emerge from the data.
descriptive study—research that explains health conditions and provides information about the incidence
and prevalence of certain conditions within a diagnostic group.
dichotomous data—data that can take on only two values;
often used to distinguish the presence or nonpresence
of some construct.
directional hypothesis—a hypothesis in which the researcher
makes an assumption or expresses belief in a particular
outcome.
discrete data—data that can have only certain values.
Discrete data are often used to classify categories into
numerical data.
discriminant validity—the process of supporting construct
validity by finding that a test is able to differentiate between groups of individuals.
divergent validity—the process of supporting construct
validity by finding that the index measure is not related
to measures of irrelevant constructs.
ecological validity—the degree to which the environment
in which a study takes place represents the real world.
effectiveness study—research that examines the usefulness of an intervention under real-world conditions.
Effectiveness research has strong external validity.
effect size (ES)—a statistic that describes the magnitude
or strength of a statistic. It can refer to the magnitude
of the difference, relationship, or outcome.
efficacy study—a study that examines whether or not an
intervention has a positive outcome. Efficacy studies
are typically conducted under ideal conditions in an
effort to improve internal validity.
embedding data—type of data used in mixed-methods
research; one method (qualitative or quantitative) assumes a primary role, and the other is used to support
the results.
epidemiology—the study of the frequency and distribution of health conditions and factors that contribute to
incidence and prevalence.
ethnography—a qualitative research design that focuses
on understanding a culture.
evidence-based practice—the process of using the research evidence, practitioner experience, and the
client’s values and desires to make the best clinical
decisions.
26/10/16 5:00 pm
Glossary
experimental research—studies that examine cause-andeffect relationships.
experimenter bias—a threat to the validity of a study that
is caused by the involvement of the researcher in some
aspect of the study.
ex post facto comparisons—group comparisons that utilize existing groups rather than assigning groups, as in
an experimental study.
external responsiveness—the ability of a test to detect
change that is clinically meaningful.
external validity—the extent to which the results of a
study can be generalized to a particular population, setting, or situation.
extraneous variable—a variable that is tracked and then
later examined to determine its influence.
factorial design—a study with more than one independent
variable, in which the interaction or impact of both independent variables can be examined simultaneously.
field notes—remarks and descriptions made and collected
by researchers while observing and/or interviewing
participants.
fishing—searching for findings that the researcher did
not originally plan to explore.
floor effect—a situation in which many individuals receive the lowest possible score on a test due to its difficulty or the rarity of the condition.
focus group—an interview with a group of individuals
that gathers information through a targeted discussion
regarding a specific topic.
forest plot—a graph used to illustrate the effect sizes of individual studies in a meta-analysis and the pooled effect.
frequencies—statistics that describe how often something
occurs.
frequency distribution—a graph used to depict a count.
grey literature—literature that is unpublished or difficult
to obtain.
grounded theory—a qualitative design that has the purpose of developing theory from the data collected; that
is, the theory is grounded in the data.
Hawthorne effect—name for the occurrence when research
participants respond favorably to attention regardless of
the intervention approach.
hazard ratio—the chance of a particular event occurring
in one group compared with the chance of the event
occurring in another group.
history threat—the possibility that changes occurring
over time, which influence the outcome of a study, are
due to external events.
hypothesis—a proposed explanation for some phenomenon.
incidence—the risk of developing a condition within
a period of time or the frequency of new cases of a
health condition within a specified time period.
4366_Glossary_217-224.indd 219
219
independent sample t-test—a statistic that compares the
difference in the mean score for two groups that are
independent of, or unrelated to, each other.
independent variable—a variable that is manipulated or
compared in a study.
inductive reasoning—the process of drawing conclusions
from the data; sometimes referred to as moving from
the specific to the general.
inferential statistics—statistical techniques that use study
samples to make generalizations that apply to an entire
population.
informant—the individual providing data in a study.
institutional animal care and use committee—the organization that reviews and approves the ethical practices
associated with animal research.
institutional review board—the organization that reviews
and approves the ethical practices of research studies
that use human participants.
instrumentation threat—arises when problems with the
instrument itself (e.g., reliability, calibration) affect the
outcome of a study.
interaction effect—the pattern of differences of the dependent variable for at least two independent variables.
In intervention research, often one of the variables is a
within-group variable of time and the other variable is
the between-group variable of intervention vs. control.
internal consistency—unity or similarity of items on a
multi-item measure.
internal responsiveness—the ability of a test to detect
change.
internal validity (of a study)—the ability to draw conclusions about causal relationships; in the case of intervention research, the ability to draw conclusions as to
whether or not the intervention was effective.
inter-rater reliability—consistency in scores among two
or more testers or raters.
intervention study—a study that examines whether or not
an intervention has a positive outcome.
intra-class correlation coefficient (ICC)—a measure of
relationship, typically used in reliability studies, that
ranges from 0.0 to 1.0 and indicates the degree to
which two administrations of a measure are related.
level of significance—also called alpha or ␣, the level at
which a statistic is identified as statistically significant;
most typically set at less than 0.05.
levels of evidence—a hierarchical approach that rates research evidence from strongest to weakest.
life history—the story of an individual’s life or a specific
aspect of the individual’s life.
Likert scale—a response scale, commonly used in questionnaires, that provides several options on a continuum, such as “strongly disagree” to “strongly agree.”
linear regression—a statistical method in which several
values are entered into a regression equation to determine how well they predict a continuous outcome.
26/10/16 5:00 pm
220
Glossary
logistic regression—a statistical method in which several
values are entered into a regression equation to determine how well they predict a categorical outcome.
longitudinal research—a study in which data are collected
over at least two time points and typically cover an extended period of time, such as several years or decades,
with the purpose of examining changes over time.
matching—a process of assigning participants to groups
in which individuals are matched on some characteristic (e.g., gender, diagnosis) to ensure that equal numbers are assigned to each group.
maturation threat—a validity problem which arises when
changes that occur over time are due to natural changes
in the individual, such as developmental changes or the
healing process, rather than to the variable of interest.
mean—same as average; a descriptive statistic that balances
the scores above and below it.
measurement error—the difference between a true score
and an individual’s actual score.
measure of central tendency—the location of the center
of a distribution.
median—a descriptive statistic indicating the score value
that divides the distribution into equal lower and upper halves of the scores.
member checking—returning results to the research participants so that they can check and verify or correct
the data and analysis.
merging data—name used in mixed-methods research
when qualitative and quantitative research results are
reported together.
meta-analysis—a quantitative synthesis of multiple studies.
method error—measurement error that is due to some
inaccuracy inherent in the assessment itself or the
manner in which it was administered or scored.
minimally clinically important difference (MCID)—the
amount of change on a particular measure that is deemed
clinically important to the client.
mixed-methods research—a research design that combines qualitative and quantitative methods.
mixed model ANOVA—a difference statistic used when
between-group and within-group analyses are conducted
simultaneously; two or more groups are compared over
two or more time points.
mode—the score value that occurs most frequently in the
distribution.
mortality—also called attrition; loss of participants who
have enrolled in a study. Mortality can occur for numerous reasons, including voluntary drop-out, relocation, or death.
multicollinearity—the circumstance in which variables
(or, in the case of regression, predictors) are correlated
with one another.
multiple linear regression—a statistical analysis that examines the relationship of two or more predictors and
a continuous outcome.
4366_Glossary_217-224.indd 220
multiple logistic regression—a statistical analysis that examines the relationship of two or more predictors and
a categorical outcome.
narrative research—a qualitative design that uses a storytelling approach.
narrative review—a descriptive review of the literature on
a particular topic.
naturalistic inquiry—research done from the point of
view that multiple perspectives exist that can only be
understood within their natural context.
naturalistic observation—data collection that involves
observing individuals in real-world circumstances.
nondirectional hypothesis—an exploratory method of
study, suggesting that the researcher does not have
preconceptions about what the study results may be,
but may assume that a difference or relationship exists.
nonequivalent control group design—also known as a
nonrandomized controlled trial; a comparison of two or
more groups without randomization to group.
nonexperimental research—research that does not manipulate the conditions but rather observes a condition
as it exists; used to answer descriptive and relationship
questions.
nonrandomized controlled trial—a study in which at least
two groups are compared, but participants are not randomly assigned to groups; also called a quasi-experiment.
normal distribution—a type of frequency distribution
that represents many data points distributed in a symmetrical, bell-shaped curve.
norm-referenced—a measure in which an individual’s score
is compared against the scores of other individuals.
observational study—research in which only the naturally occurring circumstances are studied, as opposed
to assigning individuals to an intervention or research
condition.
odds ratio—an estimate of the odds when the presence or
absence of one variable is associated with the presence
or absence of another variable.
one-way ANOVA—a difference statistic that compares
three or more groups at a single point in time.
open coding—the process of identifying simple categories
within qualitative data.
open-ended interview—an interview without set questions
that allows the process to direct the questioning.
order effect—a type of testing effect in which the order in
which a test is administered affects the outcome.
participant bias—a threat to internal validity caused by
the research participant consciously or unconsciously
influencing the outcome.
participant observation—a method of data collection in
qualitative research in which the researcher becomes
a participant and engages in activity with the research
participant(s).
26/10/16 5:00 pm
Glossary
Pearson product-moment correlation—an inferential
statistic that examines the strength of the relationship
between two continuous variables.
peer-reviewed—an appraisal process that uses experts in
the field to determine whether or not an article should
be published in a scientific journal.
phenomenology—a design in qualitative research that
seeks to understand the experience of others from
their perspective.
PICO—a type of research question used to answer efficacy questions; includes the population or problem,
intervention, comparison, and outcome.
power—the ability of a study to detect a difference or
relationship.
practice effect—a type of testing effect in which exposure to the pretest affects the outcomes of subsequent
testing.
practice guidelines—also called clinical guidelines or clinical
practice guidelines; recommendations to practitioners for
addressing specific clinical situations, based on research
evidence and opinions of experts.
predictive study—a type of research that provides information about factors related to a particular outcome.
predictive validity—the process of supporting construct
validity by finding that a test is capable of predicting
an expected outcome.
pre-experimental design—a research design that examines the outcomes of an effect (such as an intervention) on a group of individuals without comparing that
effect to a control or comparison group.
pre-experimental research—a research design in which
a single group is compared before and after an
intervention.
prevalence—the proportion of individuals within a population who have a particular condition.
primary research—the original studies that are contained
in a systematic review.
primary source—research studies and professional and governmental reports that are based on original research or
data collection.
prolonged engagement—spending extended time in the
field to collect data to establish familiarity and trust.
prospective—a study that is designed before data collection takes place.
prospective cohort study—a research design that follows
two groups of individuals with different conditions
over time.
provocative test—a diagnostic test in which an abnormality is induced through a manipulation that provokes
the condition.
psychometric properties—quantifiable characteristics of a test that speak to an instrument’s consistency and accuracy; include reliability, validity, and
responsiveness.
publication bias—the tendency of scientific journals to
publish positive findings and reject negative findings.
4366_Glossary_217-224.indd 221
221
purposive sampling—identifying participants for a study
because they will serve a particular purpose.
Pygmalion effect—also called Rosenthal effect; occurs
when the intervention leader’s positive expectations
for an outcome lead the participants to respond more
favorably.
qualitative research—a type of research that studies questions about meaning and experience.
quality-adjusted life year (QALY)—combines an assessment of quality of life and the number of years of life
added by an intervention.
quantitative research—a type of research that uses statistics and describes outcomes in terms of numbers.
quasi-experimental study—a research design that compares at least two groups, but does not randomly assign participants to groups; also called a nonrandomized
controlled trial.
random assignment—when research participants are assigned to groups in such a way that each participant
has an equal chance of being assigned to the available
groups.
randomized controlled trial (RCT)—a type of research
design that includes at least two groups (typically an
experimental group and a control group), and participants are randomly assigned to the groups.
random sampling—a type of sampling in which each
potential participant has an equal chance of being
selected for participation in a study.
range—a descriptive statistic that indicates the lowest and
highest scores.
reflective practitioner—a practitioner who intentionally
approaches clinical situations with an inquisitive
mind and considers past experiences when making
decisions.
reflexive journal—a diary maintained by a qualitative researcher that identifies personal biases and perspectives.
reflexivity—the process in which a researcher identifies
personal biases and perspectives so that they can be
set aside.
regression equation—a calculation that determines the extent to which two or more variables predict a particular
outcome.
regression to the mean—the tendency of individuals who
have extreme scores on a measure to regress toward
the mean when the measure is re-administered.
reliability—the consistency and accuracy of a measure.
repeated measures ANOVA—a difference statistic, similar to a dependent sample or within-group t-test, that
is used when the means are compared over more than
two time periods or more than two different tests.
replication—conducting a study that duplicates most or
all of the features of a previous study.
research design—the specific plan for how a study is to be
organized and carried out.
26/10/16 5:01 pm
222
Glossary
research design notation—a symbolic representation of
the design of a research study.
response bias—a measurement error that creates inaccuracy in survey results.
response rate—the number of individuals who respond
to a survey divided by the number of individuals who
initially received the survey.
responsive measure—a measure that is able to detect
change, typically before and after an intervention.
retrospective cohort study—a research design that looks
back at existing data to compare different groups of
individuals over a specified time period.
retrospective intervention study—a study that looks at
the efficacy of an intervention, but does so after data
collection has already taken place and relies on the use
of existing data.
risk ratio (RR)—also called relative risk; the probability of
an event happening to one group of exposed individuals as compared with another group that is not exposed
to a particular condition.
Rosenthal effect—also called Pygmalion effect; occurs
when the intervention leader’s positive expectations
for an outcome lead the participants to respond more
favorably.
sampling error—the degree to which a sample does not
represent the intended population.
saturation—the point in qualitative data collection at
which no new data or insights are emerging.
scatterplot—a graph of plotted points that shows the
relationship between two sets of data.
scientific method—an approach to inquiry that requires
the measurement of observable phenomena.
secondary research—research that combines the results
of previously conducted studies, such as in a systematic
review.
secondary source—documents or publications that interpret or summarize a primary source.
selection threat—differences in groups that occur due to
the selection process.
selective coding—the process of articulating a theory
based on the categories and relationships between categories identified during data analysis.
sensitivity—the accurate identification of individuals who
possess the condition of interest.
shared decision-making—process of making health-care
decisions that includes the clinician and the client as
equal partners.
single-subject design—a research design in which each
participant’s response to an intervention is analyzed
individually.
skewed distribution—a frequency distribution in which
one tail is longer than the other.
snowball sampling—the selection of participants by having already-identified participants nominate people
they know.
4366_Glossary_217-224.indd 222
Spearman correlation—an inferential statistic that examines the strength of the relationship between
two variables when one or both of the variables is
rank-ordered.
specificity—the correct identification of individuals who
do not have a condition.
standard deviation—a descriptive statistic that expresses
the amount of spread in the frequency distribution and
the average amount of deviation by which each individual score varies from the mean.
standardized test—a test that has established methods of
administration and scoring.
statistical conclusion validity—the accuracy of the conclusions drawn from the statistical analysis of a study.
statistically significant difference—when the difference between two or more groups is not likely due to
chance.
statistical significance—expresses the probability that
the result of a given experiment or study could have
occurred purely by chance.
study heterogeneity—condition that exists when studies
within a systematic review differ on one or more important characteristics, such as interventions used,
outcomes studied, or settings.
survey research—descriptive study that uses a questionnaire to collect data.
systematic review—a methodical synthesis of the research
evidence regarding a single topic that can be replicated
by others.
testing effects—changes that occur as a result of the test
administration process.
test-retest reliability—consistency in scores across two or
more test administrations.
thematic synthesis—an approach to synthesizing the
results of multiple qualitative studies into a systematic review.
themes—patterns identified within qualitative data.
thick description—detailed accounts of qualitative data.
third variable problem—arises when two constructs may
be related, but a third variable could account for the
relationship or influence the relationship.
trait error—measurement error that is due to some characteristic of the individual taking the test.
transferability—the extent to which information from a
qualitative study may apply to another situation.
translational research—when findings from the laboratory
are used to generate clinical research.
triangulation—the use of multiple resources and methods
in qualitative research to verify and corroborate data.
true experiment—a randomized controlled trial.
trustworthiness—describes how accurately qualitative
research represents a phenomenon.
Type I error—a statistical conclusion error that occurs
when the hypothesis is accepted, yet the hypothesis is
actually false.
26/10/16 5:01 pm
Glossary
Type II error—a statistical conclusion error that occurs
when the hypothesis is rejected, yet the hypothesis is true.
validity—in the case of a measure, the ability of that measure to assess what it is intended to measure. In the
case of a study, when the conclusions drawn are based
on accurate interpretations of the study findings and
not confounded by alternative explanations.
4366_Glossary_217-224.indd 223
223
variability—the spread of scores in a distribution.
variable—characteristic of people, activities, situations,
or environments that is identified and/or measured in
a study and has more than one value.
within-group comparison—a comparison of changes or
differences within the same groups, such as a pretest
and posttest comparison.
26/10/16 5:01 pm
4366_Glossary_217-224.indd 224
26/10/16 5:01 pm
Index
Page numbers followed by b denote boxes; f denote figures; t denote tables.
Abstract
in evidence evaluation, 35
of systematic review, 185, 186, 188b
Acknowledgments, in evidence evaluation, 34, 37
Activities of daily living (ADLs), 134, 166b
Adams, John, 1
ADHD. See Attention deficit hyperactivity disorder
ADLs. See Activities of daily living
Adolescent/Adult Sensory Profile, 135b
Advanced search terms, 26–29, 27f, 28f, 30, 30b, 31
AGREE. See Appraisal of Guidelines for Research and Evaluation
Agreement, in shared decision-making, 210
ALS. See Amyotrophic lateral sclerosis
Alternative treatment threat, 89
Alzheimer’s Association, 196
Alzheimer’s disease, 14, 49, 78, 115b, 115t, 205
ApoE allele in, 159
Ginkgo biloba in, 92
American Journal of Audiology, 31
American Journal of Occupational Therapy, 31
American Journal of Pharmacy Education, 150
American Journal of Speech-Language Pathology, 31
American Occupational Therapy Association (AOTA), 6–7, 31, 34,
195, 196b–197b
American Physical Therapy Association (APTA), 31, 195
American Psychological Association, 24t, 37, 97b
American Speech-Language-Hearing Association (ASHA), 7, 31, 97b,
184, 195
Amyotrophic lateral sclerosis (ALS), 185
Analogy, in representing themes, 173b, 180
Analysis of covariance (ANCOVA), 69–70, 105
Analysis of variance (ANOVA)
critical thinking question of, 79
in design efficacy, 105, 106b, 112b, 115b, 115f
in inferential statistics, 66–69, 67t, 69f, 70b
ANCOVA. See Analysis of covariance
ANOVA. See Analysis of variance
Anterior cruciate ligament surgery, 22
Anxiety study, 14, 15b
AOTA. See American Occupational Therapy Association
Apolipoprotein E (ApoE), 121–122
in Alzheimer disease, 159
Applied research, 48–51, 51f, 56
Appraisal of Guidelines for Research and Evaluation (AGREE),
198, 199b
Apraxia, 108, 165, 166b
APTA. See American Physical Therapy Association
Archives of Physical Medicine and Rehabilitation, 31
Artifacts, 167
ASD. See Autism spectrum disorder
ASHA. See American Speech-Language-Hearing Association
Assessment studies, 13–14, 16
Assignment threat, 85t, 87, 122, 124
Attention deficit hyperactivity disorder (ADHD), 22
applied research in, 49
sampling error of, 96
in shared decision-making, 209
Attrition, 87t, 93
Audiology Research, 31
Audiotape, in data collection, 167
Audit trail, 178
Authorship, in evidence evaluation, 35
Autism, 8b, 34–35, 136
facilitated communication in, 97b
in shared decision-making, 207
Autism Speaks, 64b
Autism spectrum disorder (ASD), 64b. See also Autism
Avon Longitudinal Study of Parents and Children, 147
Axial coding, 170
Back pain, 2, 88b, 109b, 154b
as MeSH search term, 26
Bar graph, 65b
Basic Body Awareness intervention, 11b, 108–110
Basic research, 48–51, 51f, 56
BBS. See Berg Balance Scale
Bedrest, for back pain, 2
Berg Balance Scale (BBS), 91, 93–94, 104, 130, 192
Between-group comparison, 105–107, 106b, 106f, 107f
Bilateral hearing aids, 204
Bill and Melinda Gates Foundation, 34
Bimodal distribution, 61, 63f
Binge drinking, 150
Bodett, Tom, 127
Bonferroni correction, 83
Boolean operators, 26–27, 28f, 30, 30b
BOTMP. See Bruininks-Oseretsky Test of Motor Proficiency
Bracketing, 170
Brain injury, 51f, 112b, 207
Brain Trauma Foundation, 196
Breast cancer, 147, 148b
British Journal of Occupational Therapy, 25, 31
Bruininks-Oseretsky Test of Motor Proficiency (BOTMP), 140, 209
Caffeinated beverages, 2
Canadian Journal of Occupational Therapy, 31
Carotid sinus hypersensitivity, 212
Carpal tunnel syndrome, 131
Case-control design, 155, 157–159, 159t
Case reports, in evidence hierarchy, 11t, 12
Categorical data, 128
Categorical variables, 52
225
4366_Index_225-234.indd 225
26/10/16 5:00 pm
226
Index
CDC. See Centers for Disease Control and Prevention
Ceiling effect, 138
Centers for Disease Control and Prevention (CDC), 32, 34, 38, 146
Centers for Medicare & Medicaid Services (CMS), 146
CENTRAL database, 212
Central tendency measure, 61–62, 63f
Cerebral palsy, 43, 169b
Cerebrovascular accident (CVA), 17
CFT. See Children’s Friendship Training
CHAMP. See Comprehensive High-Level Activity Mobility Predictor
Childhood obesity, 98
Children’s Friendship Training (CFT), 94–95
Chi-squared analysis, 148b
Chronic joint pain, 207
Chronic low back pain (CLBP), 191b. See also Back pain
Chronic obstructive pulmonary disease (COPD), 14, 140
CI. See Confidence interval
CIMT. See Constraint-induced movement therapy
CINAHL. See Cumulative Index of Nursing and Allied Health
Literature
CLBP. See Chronic low back pain
Client-centered practice, 5, 6b, 204–206, 205f
Client’s lived experience, 15–16, 18
Client’s use of evidence, 33b
Clinical decision-making, 3, 3f. See also Shared decision-making
Clinical guidelines. See Practice guidelines
Clinically significant difference, 138
ClinicalTrials.gov, 23t, 189
Cluster randomized controlled trial, 112. See also Randomized
controlled trial
CMS. See Centers for Medicare & Medicaid Services
Cochlear implants, 75b, 188b, 199
Cochrane Bone, Joint and Muscle Trauma Group Specialised
Register, 212
Cochrane Collaboration, 185
Cochrane Database of Systematic Reviews, 7, 24, 25, 26f
Cochrane Library, 23t, 185
Code-recode procedure, 178
Cognitive remediation, 184
Cohen’s d, 76, 192b
Cohen’s kappa, 132
Cohort study, 117, 119b, 155, 157–158, 159t
Cold and flu, 43, 45f
Common Ground website, 211
Communication, in shared decision-making, 206–211, 206b,
207f, 211b
Compensatory demoralization, 86t, 92
Compensatory equalization, 86t, 92, 92b
Comprehensive High-Level Activity Mobility Predictor
(CHAMP), 133b
Concurrent validity, 135, 136f
Condition question, 16–17
Confidence interval (CI), 76–79, 154, 154b, 212–213
Confirmability, 177t, 178
Conflict of interest, 34
Connecting data, 175, 176b
Consensus building, 208
Constant comparative method, 170
Constraint-induced movement therapy (CIMT), 41, 42b, 49, 51f
Constructivism, 165
Construct validity, 135–136, 136f
Contemporary Issues in Communication Sciences and Disorders, 31
Continuous data, 128–131
Continuous variables, 52
Continuum of client involvement, 205, 205f
Control group, 11, 12b, 40, 41, 94–95, 108, 109b. See also
Nonequivalent control group
4366_Index_225-234.indd 226
Control variables, 53–55
Convenience sampling, 96
Convergent validity, 135, 136f
Coordination disorder, 113b
COPD. See Chronic obstructive pulmonary disease
Correlational studies. See Nonexperimental research
Correlation coefficient, 136b
Correlation matrix, 73, 73b, 151, 151t, 153b
Cost effectiveness, 122
Covariation, of age, 87–88
Crafting questions, in shared decision-making, 207
Credibility. See also Reliability; Validity
determination of, 31–34
duplicate publication in, 34
exercise in, 34–35
funding bias in, 34
impact factor in, 33
participant bias in, 86t, 91–93
peer-review process in, 33
threats to
experimenter bias in, 86t, 91–93
publication bias in, 34, 186–189, 190b, 200
in qualitative studies, 177, 177t
response bias in, 150
Criterion-referenced measure, 129–131
Critically appraised paper, 3
Critical thinking questions
of ANOVA, 79
of cross-sectional research, 56
of decision aids, 214
of EBP, 16–17, 37
of hypothesis, 56
of internal validity, 122
of publication bias, 200
of qualitative research, 56, 179
of RCT, 17
of reliability, 141
of research design efficacy, 122–123
of shared decision-making, 16, 213–214
of statistics, 78–79
of systematic review, 199–200
of validity, 100, 141
of variables, 56
Cronbach’s alpha, 134b, 142
Crossover study design, 55, 110, 112b, 123
Cross-sectional research, 47–48
critical thinking question of, 56
data collection in, 15
exercise in, 49–51
Cumulative Index of Nursing and Allied Health Literature
(CINAHL), 25, 26f, 31, 212
CVA. See Cerebrovascular accident
Data analysis, 167–168, 169b
Databases. See Health-care databases
Data collection
of cross-sectional research, 15
of longitudinal research, 15
methods of, 166–167, 167f, 168f
saturation in, 170
Decision aids, 210–213, 211b
critical thinking question of, 214
Decision-making, client inclusion in, 5, 6b
Degrees of freedom (df), 148b
Delayed treatment control group (DTC), 94–95
Dementia, 121–122, 138
Demographic variables, 87, 88b, 89, 102
26/10/16 5:00 pm
Index
Denver Developmental Screening, 132
Department of Health and Human Services, U.S., 197–198
Dependability, 177t, 178
Dependent sample t-test, 66, 67t, 71f
Dependent variables, 53–55, 54b
Depression, 14, 15b, 154b
Descriptive research
evaluation of, 157–158
group comparison in, 147–149, 148b, 149b
incidence and prevalence in, 146–147, 147b, 148b
survey research in, 149–150
Descriptive statistics, 60–64
Descriptive study design, 14
Design efficacy, 105, 106b, 112b, 115b, 115f, 122–123
Developmental coordination disorder, 147
df. See Degrees of freedom
Diabetes, 99b
Diagnostic arthroscopy, 137
Dichotomous data, 128–129
Difference studies. See Experimental research
d-Index, 193t
Directional hypothesis, 46, 56
Discrete data, 128–131
Discriminant validity, 136, 136f
Discussion section
in evidence evaluation, 37
of systematic review, 186
Divergent validity, 135–136, 136f
Down syndrome, 149, 187b
DTC. See Delayed treatment control group
Duplicate publication, 34
Dysphagia, 147b, 176b
Eating disorders, 111b
EBP. See Evidence-based practice
Ecological validity, 95t, 96–97. See also Validity
Educational Resources Information Center (ERIC), 23t
Effectiveness study, 98, 99b
Effect size (ES), 70b, 76, 79, 190–192, 191b, 192b, 193t, 200
Efficacy questions, 16
construction of, 10–11
experimental study of, 42b
levels of evidence in, 10–12, 10t, 11t, 12b
research designs for, 10–12
Efficacy studies. See Experimental research
EFPT. See Executive Function Performance Test
Ehrman, Bart, 21
Electronic databases. See Health-care databases
Eliciting information, in shared decision-making, 207
EMBASE database, 212
Embedding data, 175, 176b
EMs. See Explanatory models
EMST. See Expiratory muscle strength training
Epidemiology, 146–147
ERIC. See Educational Resources Information Center
ES. See Effect size
Eta squared (η2), 61t, 76
ETCH-M. See Evaluation Tool of Children’s Handwriting-Manuscript
Ethical issues, 36b, 41
Ethnography, 170t, 171–173, 173b
European Journal of Physiotherapy, 25
Evaluation Tool of Children’s Handwriting-Manuscript
(ETCH-M), 12b
Evidence-based practice (EBP)
components of, 3, 3f
critical thinking question of, 16–17, 37
evaluating evidence in, 7–8, 8b
4366_Index_225-234.indd 227
227
evaluating outcomes in, 8, 8b
guidelines of, 197b
implement findings in, 8, 8b
introduction to, 2–6
process of, 7–8, 7f
professional organizations in, 31
question formulation in, 7, 8b, 9–16, 9t
reasons for, 6–7
shared decision-making in, 204–206, 205f
Evidence evaluation, 31–37, 36b
Evidence hierarchy
case reports in, 11t, 12
control group in, 11, 12b
expert opinion in, 11t, 12
pretest-posttest in, 11, 11t, 12b
of research design, 10–13, 10t, 11t, 12b
Executive Function Performance Test (EFPT), 132
Exercises
in confidence intervals, 78
in continuous data, 130–131
in credibility, 34–35
in cross-sectional research, 49–51
in decision aids, 211–213
in discrete data, 130–131
in EBP question formulation, 16
in effect size, 192
in eliciting information, 207
in group comparison studies, 107
in health-care database, 22, 31, 118–120, 185
in internal validity, 93–95
in level-of-evidence, 13
in practice guidelines, 198–199
in psychometric properties, 140
in PubMed, 118–120
in qualitative research, 49–51, 168, 175
in quantitative research, 49–51
in research design, 49–51, 114, 118–120, 121–122, 157
in search strategies, 30
in sensitivity and specificity, 137–138
in shared decision-making, 209–210, 211–213
in statistical conclusion validity, 84–85
in statistics, 64, 71–72
in systematic review, 185
in trustworthiness, 178–179
in validity, 84–85, 93–95, 98
in variables, 55
Experimental designs. See Research designs
Experimental research
comparison to nonexperimental, 43, 45t
control group in, 40
efficacy question of, 42b
nonrandomized controlled trial in, 41
Experimenter bias, 86t, 91–93
Expert opinion
in evidence hierarchy, 11t, 12
as Level V evidence, 159, 159t
Expiratory muscle strength training (EMST), 50
Explanatory models (EMs), 179
Ex post facto comparisons, 147–149
External responsiveness, 140
External scientific evidence, 3, 3f
External validity
compared to internal, 97–98, 99b
ecological validity in, 95t, 96–97
sampling error in, 95t, 96
threats to, 95–98, 95t, 97b, 99b
Extraneous variables, 53–55
26/10/16 5:00 pm
228
Index
Facilitated communication, 97b
Factorial design, 53, 114, 115b, 116b, 118, 123
Falling risk, in older adults, 154, 154t, 211–213
False negative, 136, 137f, 137t, 139b
False positive, 136, 137f, 137t, 139b
Fibromyalgia, 206b, 207f
Field notes, 167
Filters, in database searches, 27–29, 28f, 30b
FIM. See Functional Independence Measure
Finger tapping test (FT), 93–94
Fishing, 83, 83t
Floor effect, 138
Focus groups, 167
Forest plot, 192–193, 194b
Fragile X syndrome, 149
Frank, Arthur, 175b
Frequency distribution, 60–61, 62b
FT. See Finger tapping test
F-test. See Analysis of variance
Fugl-Meyer Assessment of Motor Recovery After Stroke, 14, 138
Functional Independence Measure (FIM), 72, 74, 93–94, 119b, 130, 135
Funding bias, 34. See also Publication bias
Gates Foundation, 34
General Educational Development (GED), 77b
Ginkgo biloba, 92
Glasgow Coma Scale, 130, 130b
Google Scholar, 23t, 24, 37
Grade point average, 60
Grey literature, 189
Grounded theory, 170–171, 170t, 172b
Group comparison studies, 147–149, 148b, 149b. See also Experimental
research
of between-group, 105–107, 106b, 106f, 107f
exercise in, 107
in predictive research, 155–156
of within-group, 105–107, 106b, 106f, 107f, 109b
Guide to Healthy Web Surfing, 32b
Gyrokinesis, 109b
HAPI. See Health and Psychosocial Instruments
Hawthorne effect, 87t, 92, 110, 122, 124
Hazard ratio (HR), 50, 156, 156b, 157t, 192b, 193t
Health and Psychosocial Instruments (HAPI), 23t
Health-care databases
accessing evidence in, 29–30, 29f
exercise in, 22, 31, 118–120, 185
expanding searches for, 29, 30b
introduction to, 22
search strategies of, 25–29, 26f, 27f, 28f, 29f, 30b
selection of, 22–25, 23t–24t
Hearing in Noise Test (HINT), 114
Heterogeneity, 189–190, 190b, 200
Hierarchy of evidence, 158–159, 159t
of research design, 10–13, 10t, 11t, 12b
High motor competence (HMC), 69, 70b
HINT. See Hearing in Noise Test
History threats, 85t, 89–90
HLE. See Home literacy environment
HMC. See High motor competence
Home literacy environment (HLE), 51
HR. See Hazard ratio
Hurston, Zora, 103
Hypothesis
critical thinking question of, 56
in quantitative research, 43–46
variables in, 54b
Hypothesis testing, 52, 52t
4366_Index_225-234.indd 228
IACUC. See Institutional animal care and use committee
IADL. See Instrumental activities of daily living
ICARS. See International Cooperative Ataxia Rating Scale
ICC. See Intra-class correlation coefficient
ID. See Intellectual disability
IEP. See Individualized education plan
Impact factor, 33
Incidence, 146–147, 147b, 148b
Independent sample t-test, 66, 67t, 71f, 115b
Independent variables, 52–55, 54b
Individualized education plan (IEP), 61
Inductive reasoning, 164–165
Inferential statistics, 60. See also Statistics
for analyzing relationships, 72–76, 72f
ANCOVA in, 67t, 69–70
ANOVA in, 66–69, 67t, 69f, 70b
correlation matrix in, 73, 73b
introduction to, 65
one outcome with multiple predictors in, 74, 75b
scatterplots in, 72, 73f
significance in, 66
t-test in, 66, 67t, 68b, 71f, 78, 106b, 115b, 148b
Informants, 171–172
Institutional animal care and use committee (IACUC), 35–36, 36b
Institutional review board (IRB), 35–36, 36b
Instrumental activities of daily living (IADLs), 196b, 201
Instrumentation threats, 86t, 91
Intellectual disability (ID), 51
Interaction effect, 69, 69f, 70b, 115f
in group comparisons, 105–107, 106b, 106f, 107f
Interactive Metronome, 209, 214
Interlibrary loan system, 31
Internal consistency, 133–134, 134b, 135b, 141
Internal responsiveness, 138
Internal validity
assignment in, 85t, 87, 122, 124
attrition in, 87t, 93
compared to external validity, 97–98, 99b
critical thinking question of, 122
exercise in, 93–95
experimenter bias in, 86t, 91–93
history in, 85t, 89–90
instrumentation in, 86t, 91
maturation in, 85t, 88–89, 89b, 122, 124
mortality in, 87t, 93
participant bias in, 86t, 91–93
in RCT, 11
regression to the mean in, 86t, 90, 90f
selection in, 85t, 87
testing in, 86t, 90–91
threats to, 85–98, 85t–87t, 88b
International Committee of Medical Journal Editors, 189
International Cooperative Ataxia Rating Scale (ICARS), 104
International Journal of Therapy and Rehabilitation, 31
Inter-rater reliability, 132–133, 133b
Intervention question, 9, 16–17
Intervention studies. See Experimental research
Intra-class correlation coefficient (ICC), 132b, 133b, 134b
Introduction section
in evidence evaluation, 35
of systematic review, 185
IRB. See Institutional review board
JAMA. See Journal of the American Medical Association
Jebson Taylor Hand Function Test (JTT), 140
Johns Hopkins University, 36b
Journal of Communication Disorders, 25
Journal of Intellectual Disability Research, 51
26/10/16 5:00 pm
Index
Journal of Orthopaedic and Sports Physical Therapy, 195–196
Journal of Speech, Language, and Hearing Research, 31
Journal of the American Medical Association (JAMA), 29
JTT. See Jebson Taylor Hand Function Test
Key word search, 25–26, 25f, 26f, 30, 30b
Kinesio taping, 24, 27, 108
Knee injury, 22
Labral tear, of the shoulder, 137
Lee Silverman Voice Treatment (LSVT), 5
Level I evidence, 11, 11t, 158, 159t, 186, 199
Level II evidence, 11, 11t, 108, 158, 159t
Level III evidence, 11, 11t, 110, 158, 159t
Level IV evidence, 11–12, 11t, 12b, 108, 158–159, 159t
Level of significance, 66
Levels of evidence, 10–13, 10t, 11t, 12b, 158–159, 159t
Level V evidence, 12, 159, 159t
Librarian, 30
Lift for Life, 99b
Likert scale, 129, 129f
Limits, in database searches, 27–29, 28f, 30b, 37
Linear regression, 74, 75b, 78t, 152, 153b
Literature searches
accessing evidence in, 29–30, 29f
expanding of, 29, 30b
introduction to, 22
selection of databases in, 22–25, 23t–24t
strategies of, 25–29, 26f, 27f, 28f, 29f, 30b
Lived experience, 15–16, 18
LLL. See Lower-limb loss
LMC. See Low motor competence
Logistic regression, 74–76, 75t, 77b, 78t. See also Multiple linear
regression
Longitudinal research, 15, 47–48, 48b, 49–51
Lower back pain, 88b, 109b. See also Back pain
Lower-limb loss (LLL), 133b
Low motor competence (LMC), 69, 70b
Low power, 83–84, 83t
Low vision, 196b–197b
LSVT. See Lee Silverman Voice Treatment
MA. See Mental age
Matching study, 87
Maturation threats, 85t, 88–89, 89b, 122, 124
Mayo Clinic, decision aids by, 211–212
Mayo Clinic Health Letter, 25
MCID. See Minimally clinically important difference
Mean, 61, 63f, 65b
Measurement error, 131. See also Reliability
Measure of central tendency, 61–62, 63f
Median, 61, 63f
Medical research librarian, 30
Medical Subject Headings (MeSH®), 24–26, 25f, 30, 30b
Medline database, 23t, 32, 32b, 212
Member checking, 177
Memory or Reasoning Enhanced Low Vision
Rehabilitation (MORE-LVR), 109b
Mental age (MA), 51
Merging data, 175, 176b
MeSH®. See Medical Subject Headings
Meta-analysis, 190–192, 190f, 191b, 200
Method error, 131. See also Reliability
Methods section, 35–36, 36b, 185
Michigan Hand Outcomes questionnaire (MQ), 140
Mind/body interventions, 206, 207f
Mini-BESTest, 91
Minimally clinically important difference (MCID), 140
4366_Index_225-234.indd 229
229
Mirror therapy, 7, 78
Misconduct, 36b
Mixed-method research, 175, 176b
Mixed model analysis of variance, 67t, 69
Mode, 61, 63f
MORE-LVR. See Memory or Reasoning Enhanced Low Vision
Rehabilitation
Mortality threats, 11b, 87t, 93
Motion coordination disorder, 209
MQ. See Michigan Hand Outcomes questionnaire
MSFC. See Multiple Sclerosis Functional Composite
Mulgan, Geoff, 203
Multicollinearity, 152, 152f
Multiple linear regression, 152, 152b, 153b
Multiple logistic regression, 152–154, 154b, 154t
Multiple sclerosis, 14, 15b, 18, 65b
decision aids in, 211b
qualitative research of, 46, 48b
shared decision-making in, 208
Multiple Sclerosis Functional Composite (MSFC), 104
NARIC. See National Rehabilitation Information Center
Narrative research, 170t, 173–174, 174b
Narrative reviews, 184
National Center for Advancing Translational Sciences, 49
National Guideline Clearinghouse (NGC), 197–198
National Health and Nutrition Examination Surveys (NHANES),
146, 150
National Institute for Health and Clinical Excellence (NICE),
U.K., 188b
National Institutes of Health (NIH), 34, 49
Public Access Policy of, 30
National Library of Medicine, 24, 24t, 32
National Rehabilitation Information Center (NARIC), 23t
National Stroke Foundation, 196
Naturalistic inquiry, 164
Naturalistic observation, 165, 166b
Negatively skewed distribution, 61–64, 63f
Neuromotor task training (NTT), 113b
Neuroplasticity, 49, 51f
News media, in evaluating evidence, 32
NGC. See National Guideline Clearinghouse
NHANES. See National Health and Nutrition Examination Surveys
NICE. See National Institute for Health and Clinical Excellence
NIH. See National Institutes of Health
Nintendo Wii Fit training, 113b
Nondirectional hypothesis, 46
Nonequivalent control group, 113–114, 113b
Nonexperimental research, 41–43, 44b, 45f, 45t. See also Descriptive
research; Predictive research
Nonrandomized controlled trial, 11, 11t, 41, 110–114, 113b
Normal distribution, 61–64, 63f, 64f
Norm-referenced measure, 129–131, 141
NTT. See Neuromotor task training
Nuffield Dyspraxia Programme, 108
Obesity, 147
O’Brien’s Test, 137–138
Observational study. See Nonexperimental research
Occupational therapy (OT), 22, 77b
Occupational Therapy Code of Ethics and Ethics Standards, 6–7
Odds ratio (OR)
in effect size, 192b, 193t
hazard ratio compared to, 156, 156b, 157t
in logistic regression, 74–76, 75t, 77b
in multiple logistic regression, 152–154, 154b, 154t
risk ratio compared to, 156, 156b, 157t
Omega squared (ω2), 61t, 76
26/10/16 5:00 pm
230
Index
One-way analysis of variance, 67t, 68, 71f, 115b, 148–149, 148b, 149b
Open coding, 170
Open-ended interview, 166, 175b
OR. See Odds ratio
Order effects, 86t, 91
Orthopedic surgeon, 6b
Orthotic devices, 24
OT. See Occupational therapy
OT Practice, 25
OT Search, 23t. See also American Occupational Therapy Association
OTseeker database, 23t, 184–185
Ottawa Hospital Research Institute, 210
Outcome question, 16–18
Oxford Centre for Evidence-Based Medicine, 158–159, 159t
Pacemakers, 212
Parkinson’s disease, 5, 49–50, 65b, 97, 115b
nonexperimental research in, 43, 44b
Participants
bias of, 86t, 91–93
observation by, 171–172
selection of, 165
Patterned Elicitation Syntax Test, 129
PCI. See Phase coordination index
Pearson correlation coefficients, 153b
Pearson product-moment correlation, 73, 78t, 132b, 136b, 151b
PEDI. See Pediatric Evaluation of Disability Inventory
Pediatric Evaluation of Disability Inventory (PEDI), 135
PEDro
database of, 23t, 184–185
ratings of, 22b, 24t
scale of, 121–122, 121b, 123
Peer-review process, 33, 37
Phalen’s Test, 131, 132
Phase coordination index (PCI), 44b
Phenomenology, 168–170, 170t, 172b
Phonological Awareness Test, 74, 75b, 131
Photovoice, 167, 168f
Physical Therapy Journal, 31
Physiotherapy Evidence Database, 121
PICO question format, 10, 16
Positively skewed distribution, 61–64, 63f
Post hoc analysis, 115b
Posttest, in evidence hierarchy, 11, 11t, 12b
Posttraumatic stress disorder (PTSD), 97, 165
Power. See Low power
Practice effects, 86t, 90–91
Practice guidelines, 195–196, 196b–197b, 198f
applying and using, 199
evaluating strength of, 198–199, 199b
exercise in, 198–199
finding of, 184–185, 197–198
as Level I evidence, 158, 159t
reading of, 185–186, 187b, 188b
Practitioner experience, 3–5
Predictive research
case-control design in, 155, 157–159, 159t
cohort studies in, 155, 157–158, 159t
correlation matrix in, 73, 73b, 151, 151t, 153b
evaluation of, 157–158
group comparison studies in, 155–156
hazard ratios in, 50, 156, 156b, 157t, 192b, 193t
multiple linear regression in, 152, 152b, 153b
multiple logistic regression in, 152–154, 154b, 154t
odds ratios in, 152–154, 154b, 154t, 156, 156b, 157t
research design in, 14–15
risk ratios in, 155–156, 156b, 156t, 157t
using correlational methods, 150–154, 151b
4366_Index_225-234.indd 230
Predictive validity, 136, 136f
Pre-experimental research, 41
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
(PRISMA), 184
Pretest-posttest
ANOVA in, 69, 69f
control group in, 109b
in group comparisons, 105–107, 106b, 106f, 107f
research design of, 11, 11t, 12b, 41
t-test comparison in, 68b
Prevalence, 146–147, 147b, 148b, 157
Primary source
in evidence evaluation, 31–32
in systematic review, 184
Print awareness, 74, 75b
PRISMA. See Preferred Reporting Items for Systematic Reviews and
Meta-Analyses
Professional organizations, 31
Prolonged engagement, 177
Prospective cohort study, 155, 157, 158, 159t
Provocative test, 137–138
Psychometric properties, 128
exercise in, 140
internal consistency of, 133–134, 134b, 135b, 141
inter-rater reliability of, 132–133, 133b
responsiveness of, 138–140, 141
sensitivity and specificity of, 136–138, 137b, 137f, 137t, 139b
test-retest reliability of, 132, 133b
validity in, 134–138, 136f, 139b
Psychosis, 175
PsycINFO database, 24t
PT in Motion, 25
PT Now, 31
PTSD. See Posttraumatic stress disorder
Public Access Policy, 30
Publication bias, 34, 186–189, 190b
critical thinking question of, 200
Public Press, 32
PubMed, 24–25, 24t, 25f
exercise in, 118–120
search terms for, 27–29, 27f, 28f, 29f, 30, 30b
PubMed Central, 24t
Purposive sampling, 165
p value, 52, 148b
Pygmalion effect, 86t, 91–92
QALY. See Quality-adjusted life year
Qualitative research
analysis in, 167–168
critical thinking questions of, 56, 179
data collection in, 166–167, 167b, 168f
designs of, 168–176, 170t
differences to quantitative research, 46–47, 47t
ethnography in, 170t, 171–173, 173b
exercise in, 49–51, 168, 175
grounded theory in, 170–171, 170t, 172b
mixed-methods research in, 175, 176b
of multiple sclerosis, 46, 48b
narrative in, 170t, 173–174, 174b
phenomenology in, 168–170, 170t, 172b
philosophy of, 164–165, 166b
properties of, 176–178, 177t
questions in, 165
selection in, 165, 166b
themes of, 46, 167–168
example of, 169b, 171b
synthesis of, 193–195, 195b
Quality-adjusted life year (QALY), 122
26/10/16 5:00 pm
Index
Quantitative research, 43–46
differences to qualitative research, 47t
exercise in, 49–51
Quasi-experimental study, 41, 113–114, 113b
QuickSIN, 114
Quotation, in research, 167, 169b, 171b, 175
RA. See Rheumatoid arthritis
Random assignment, 11, 85t, 87–88, 95t
Randomized controlled trial (RCT)
critical thinking question of, 17
in evidence-based questions, 10, 11, 11t, 13b, 22, 40
in research design, 104–105, 108–110, 111b
Random sampling, 95t, 96
Range, 62–64, 63f
Rapid Syllable Transition Treatment, 108
Rate ratio (RaR), 212–213
RCT. See Randomized controlled trial
Reading articles for evidence, 35–37
References, in evidence evaluation, 37
Reflective practitioner, 3–5
Reflexive journal, 178
Regression analysis, 74, 151–154, 152b, 153b, 154b, 154t. See also
Linear regression; Logistic regression
Regression to the mean, 86t, 90, 90f
Related citations function, 29f
Reliability. See also Credibility; Validity
critical thinking question of, 141
internal consistency in, 133–134, 134b, 135b, 141
inter-rater conduct in, 132–133, 133b
relationship to validity of, 138
of standardized tests, 131–132
test-retest score in, 132, 133b
understanding statistics in, 131b, 132b
Renfrew Bus Story, 136
Repeated measures analysis of variance, 67t, 69, 71f, 72f
Replication
to promote generalizability, 97
in scientific method, 3
in systematic review, 186, 190b
Research designs
ANOVA in, 105, 106b, 112b, 115b, 115f
in assessment studies, 13–14, 16
case-control design in, 155, 157–159, 159t
for client’s lived experience, 16
control group in, 108, 109b
crossover study in, 55, 110, 112b, 123
for efficacy questions, 10–12
evaluation by scale of, 120–122, 121b
evidence hierarchy of, 10–13, 10t, 11t, 12b
exercise in, 49–51, 114, 118–120, 121–122, 157
factorial-type of, 53, 114, 115b, 116b, 118, 123
nonequivalent control group in, 113–114, 113b
in predictive studies, 14–15
pretest-posttest in, 11, 11t, 12b, 41
in qualitative research, 168–176, 170t
RCT in, 104–105, 108–110, 111b
of single-subject, 117, 118b, 122
Researcher bias. See Experimenter bias
Research librarian, 30
Response bias, 150. See also Participants
Response rate, 84, 150. See also Sample size
Responsiveness measure, 138–140, 141
Results section
in evidence evaluation, 36, 37
of systematic review, 185–186, 187b
Retrospective cohort study, 117, 119b, 155
as Level III evidence, 158, 159t
4366_Index_225-234.indd 231
231
Retrospective intervention study, 117
Rheumatoid arthritis (RA), 155
Risk ratio (RR), 155–156, 156b, 156t, 157t, 212–213
RMRT. See Route Map Recall Test
Rosenthal effect, 86t, 91–92, 122, 124
Rotator cuff injury, 171, 172b
Route Map Recall Test (RMRT), 138
RR. See Risk ratio
Rush Memory and Aging Project, 159
Russell, Bertrand, 145
r value, 192b, 193t
Sackett, David, 2, 3–5
Sample size, 120, 158. See also Effect size
Sampling error, 95t, 96
Saturation, in data collection, 170
Scatterplots, 72, 73f
Schizophrenia, 14, 54b, 78
Scholarly publication credibility, 33–34
Scientific method, 3
Scoring and measures, 128–129, 129–130
Search strategies, 25–30, 26f, 27f, 28f, 29f, 30b
Search terms, 26–29, 26f, 27f, 28f
Secondary research. See Systematic review
Secondary source evaluation, 32
Selection threat, 85t, 87
Selective coding, 170
Self-reporting issues, 150
Self-Stigma of Stuttering Scale, 134, 134b
Sensitivity and specificity, 13–14, 136–138, 137b, 137f, 137t,
139b, 141
Shared decision-making, 5, 33b
ADHD in, 209
agreement in, 210
autism in, 207
Common Ground website in, 211
communication in, 206–211, 206b, 207f, 211b
components of, 208–211, 211b
crafting questions in, 207
critical thinking question of, 16, 213–214
decision aids in, 210–213, 211b
in EBP, 204–206, 205f
eliciting information in, 207
exercise in, 209–210, 211–213
multiple sclerosis in, 208
Shaw, George, 163
Simple correlation, 150–151, 151f
Single-blinded study, 113b
Single-subject designs, 117, 118b, 122
Six-Minute Walk test, 140
Skewed distribution, 61, 63f
Snowball sampling, 165
Sort by relevance search strategy, 27–29, 28f
Spearman correlation, 73, 78t, 136b, 151b
Specificity. See Sensitivity and specificity
SpeechBite, 24t
Speech impairment, 187b
Speech recognition, predictors of, 153b
Spinal cord injury, 16
SPORTDiscus, 24t
SPP. See Student Performance Profile
SS-QOL. See Stroke Specific Quality of Life
Standard deviation, 62–64, 63f, 64f, 65b
Standardized test, 131–132
Statistical conclusion validity, 82–83, 83t
exercise in, 84–85
type I and II errors of, 52, 52t, 97, 115b
Statistical significance (or Statistically significant), 66, 138
26/10/16 5:00 pm
232
Index
Statistics. See also Inferential statistics; Understanding statistics
central tendency in, 61–62, 63f
confidence intervals in, 76–79
critical thinking questions of, 78–79
effect size in, 70b, 76, 79, 190–192, 191b, 200
exercise in, 64, 71–72
frequency distribution in, 60–61, 62b
standard deviation in, 62–64, 63f, 64f, 65b
symbols of, 60, 61t
variability in, 62–64, 63f, 64f
Storytelling, in data collection, 175b
STREAM. See Stroke Rehabilitation Assessment of Movement
Strength and balance training, 13b
Stroke, 3, 16–18, 72, 135, 149b, 192
mirror therapy for, 7, 78
retrospective cohort study of, 119b
Stroke Impact Scale, 135
Stroke Rehabilitation Assessment of Movement (STREAM), 135
Stroke Specific Quality of Life (SS-QOL), 72
Student Performance Profile (SPP), 62b
Study heterogeneity, 189–190
Study setting selection, 165, 166b
Survey research, 149–150
Swaddling, as database search term, 26, 27f
Symbols, of statistics, 60, 61t
Systematic review, 10–11
abstract in, 185, 186, 188b
applying and using, 199
components of, 188b, 190b, 201
critical thinking questions of, 199–200
data analysis in, 190–195, 190f, 191b, 193t, 194b
discussion section in, 186
evaluating strength of, 186–190, 190b
exercise in, 185
introduction section in, 185
primary source in, 184
replication in, 186, 190b
results section in, 185–186, 187b
Tai Chi, 97, 208, 212–213, 215
TAP. See Television assisted prompting
TBI. See Traumatic brain injury
Television assisted prompting (TAP), 122b
Test designs. See Research designs
Testing threats, 86t, 90–91
Test reliability
critical thinking question of, 141
internal consistency in, 133–134, 134b, 135b, 141
of inter-rater conduct, 132–133, 133b
relationship to validity of, 138
of standardized tests, 131–132
of test-retest score, 132, 133b
understanding statistics in, 131b, 132b
Themes, of qualitative research, 46, 167–168
example of, 169b, 171b
synthesis of, 193–195, 195b
The Wounded Storyteller: Body, Illness and Ethics (Frank), 175b
Thick description, 177–178
Third variable problem, 43, 45f
Threats, to credibility of research
of alternative treatment, 89
of assignment, 85t, 87, 122, 124
of attrition, 87t, 93
4366_Index_225-234.indd 232
of ecological validity, 95t, 96–97
of experimenter bias, 86t, 91–93
of external validity, 95–98, 95t, 97b, 99b
of funding bias, 34
of history, 85t, 89–90
of instrumentation, 86t, 91
of internal validity, 85–98, 85t–87t, 88b
of maturation, 85t, 88–89, 89b, 122, 124
of mortality, 11b, 87t, 93
of participant bias, 86t, 91–93
of publication bias, 34, 186–189, 190b, 200
of response bias, 150
of selection, 85t, 87
of testing, 86t, 90–91
Thumb osteoarthritis, 204
Timed up and go test (TUG), 93–94
TIS. See Trunk Impairment Scale
Title, in evidence evaluation, 35
TMW. See Two-minute walk test
Trait error, 131
Transferability, 177–178, 177t
Translational research, 49, 51f
Traumatic brain injury (TBI), 165
Triangulation, 177
True experiment. See Randomized controlled trial
Trunk Impairment Scale (TIS), 104, 105, 107
Trustworthiness, 176–179, 177t
t-test, 66, 67t, 68b, 71f, 78, 106b, 115b, 148b
TUG. See Timed up and go test
Twain, Mark, 81
Two-minute walk test (TMW), 93–94
Type I error, 52, 52t, 97, 115b
Type II error, 52, 52t
Understanding statistics
of ANOVA, 115b
of correlation, 136b, 151b
of factorial design, 115b
of group comparisons, 106b, 148b
incidence and prevalence in, 147b
of meta-analysis, 192b
of reliability, 131b, 132b
in research design, 112b
of sensitivity and specificity, 137b
Unethical practices, 36b
Vaccination, 34–35
Validity, 82. See also Credibility; Ecological validity; Reliability
of assessment tools, 13
critical thinking questions of, 100, 141
exercise in, 84–85, 93–95, 98
external threats to, 95–98, 95t, 97b, 99b
internal threats to, 85–98, 85t–87t, 88b
of statistical conclusions, 82–85, 83t
types of, 134–138, 136f, 139b
understanding statistics in, 136b, 137b
Variability, measures of, 62–64, 63f, 64f
Variable problem, 43, 45f
Variables, 52–55, 54b
critical thinking question of, 56
Variance, 75b, 151f. See also Analysis of covariance; Analysis of variance
Vatlin, Heinz, 2
Very early mobilization (VEM), 192
26/10/16 5:00 pm
Index
Video, 166b, 167
Virtual shopping, 149b
Vitamin D supplementation, 114, 116b, 212–213
Water, daily intake of, 2
WBV. See Whole body vibration therapy
Websites, in evidence credibility, 32, 32b
WeeFIM, 14
Western Ontario and McMaster University Osteoarthritis Index
(WOMAC), 136
4366_Index_225-234.indd 233
233
Whole body vibration therapy (WBV), 114, 116b
Wii Fit training, 113b
Within-group comparison, 105–107, 106b, 106f, 107f, 109b
Wolf Motor Function Test, 14, 17
WOMAC. See Western Ontario and McMaster University
Osteoarthritis Index
Woodcock-Johnson Writing Fluency and Writing Samples test, 12b
World Confederation of Physical Therapy, 7
Yoga, 191, 191b
26/10/16 5:00 pm
Download