What you need to know to perform biofeedback effectively Article 3. Why you need to know about research design and the placebo effect By Richard A. Sherman, Ph.D. Biofeedback practitioners benefit by knowing how to evaluate clinical research to assess whether a technique is efficacious. Common use of a technique is not a reliable indication of its efficacy. Balancing all the factors helps you, the professional make your own decision as a about whether a technique is efficacious or even worthy of a trial. Keeping an Open Mind The Weight of Evidence Many techniques currently in use will likely change or be modified in the future as scientific discovery contributes to our knowledge base. More studies supporting or detracting from the use of myriad techniques will be published. New techniques for assessment and control will be promulgated. As a biofeedback practitioner, you would benefit by having sufficient skills in searching the literature and assessing its quality to determine whether it is useful and clinically applicable. Do you know how effective you are as a clinician? A problem with the acceptance of biofeedback is a concern with unsubstantiated interventions incorporating a variety of techniques that are applied without a reasonable body of clinical evidence supporting their efficacy. The combination of 1) not enough properly designed studies with adequate numbers of subjects and sufficiently long follow-ups to be convincing and, 2) the use of unsubstantiated techniques is a common 1 concern raised by those wary of biofeedback. Credible clinical studies are necessary to support the use of specific biofeedback modalities and protocols in the treatment of any disorder. There are nine key elements of a credible clinical study/publication: 1) adequate diagnosis of the subjects. 2) adequate pretreatment baseline to establish symptom variability. 3) objective outcome measures relevant to the disorder 4) intensity of the intervention sufficient to produce an effect 5) a way to ascertain whether the intervention was successful (was the drug taken properly or the behavioral technique successfully learned and then used) 6) sufficient patient/subjects to make the result credible 7) appropriate design for the question (e.g. single group, controls, believable placebo, etc.) 8) sufficient descriptive statistics so the results are clear. 9) long enough follow-up so the duration of results can be established. There are five criteria for accepting a new technique as efficacious. Many organizations, such as the American Psychological Association (APA) and AAPB, have adopted requirements such as the following for determining that a treatment has been shown to be efficacious: 1) two studies with appropriate design and controls (group design or a series of single case design studies). 2) studies conducted by different researchers. 3) studies demonstrate clinical efficacy. The new treatment must be shown to be efficacious in comparison with medication, placebo, or another treatment. The treatment must be shown to be equally effective as an established treatment for the same problem. 4) waiting list controls are not sufficient for demonstrating efficacy. expectation effect.) (No 2 5) the diagnostic criteria must be clearly specified. Fatal flaws occur when the investigators are not sufficiently knowledgeable about: 1) the disorder they are working with 2) their own recording techniques 3) assessing outcome measures to ask the right question 4) the basic elements of research design. Look for these flaws when you read a study or hear about an idea in order to assess using a technique. One common design flaw is a failure to anchor the study outcome measures to the population having the problem being investigated. A good example is an early study on behavioral treatment of cancer. The design compared group therapy intervention to the records of similar patients. The outcome measure was years of survival with significance determined by the difference in average number of years of survival of each group. The investigators reported the patients receiving group therapy survived significantly longer than the control group and concluded group therapy was probably the reason for the longer survival. The investigators did not compare the survival rates of their tiny groups to the huge database of similar cancer patients starting at the same stages of the same type of cancer. In fact, 1) their groups were so small that known variability in survival would have made it very likely that one group would have an average survival time far longer than the other just by chance and 2) their failure to review well-known life table data on survival times caused them to miss the crucial point that the apparently longer survival of several participants was not out of line with the population. Sadly, the controls died earlier than would be expected, and numerous studies have now shown behavioral interventions to be ineffective for cancer survival. Another commonly encountered problem is a failure to analyze the data correctly because the investigators did not understand the outcome measures or human variability. For example, typical ESP studies have participants try to guess which one of five possible shapes — such as those on the cards shown below — a “sender” is thinking about. There are usually 25 cards in a deck, so a participant has a one in five chance of guessing the 3 card correctly. If there is no ESP, the participant would be expected to randomly guess five of the 25 cards, but, if he or she guesses substantially more or less than five, that person may be declared to have ESP. Regression to the mean tells us that a few people will guess a greater or lesser number correctly by chance if sufficient people are tested. It also tells us that the person guessing unusually more or less correct answers than chance is likely to guess at about chance level at the next test. Unfortunately for proponents of ESP, this is just what happens nearly all the time when such studies are correctly analyzed. It’s important to know the tools, equipment, and the recording methodology utilized in a study. An example of a past flawed methodology is the use of alpha EEG biofeedback for anxiety. Alpha frequency waves look just like muscle tension from the eyes. The illustration below shows filtered versions of a wave that could be either alpha EEG or elctrooculogram (EOG) waves. The early biofeedback devices couldn’t tell the difference and neither could some early clinicians, so all they did was teach people to increase eye muscle tension by rolling up their eyes. The methodology was flawed, but the treatments worked because people learned to sit quietly in a dim room and effective relaxation exercises were given as homework. Both “EOG” and alpha waves look like this: It’s also important to know the physiology and the disorder being treated. One of my early hypertension studies showed that clinically significant changes occurred as a result of habituation to the “biofeedback” recording environment. The following illustration shows changes in blood pressure over eight weeks when recordings were made for groups of hypertensives seen twice (blue circles) or eight times with either no (dotted line) or actual biofeedback treatment (solid line). 4 BP Weekly sessions (weeks 1 – 8) Look at the chart below of finger temperature baseline instability. The subjects were technicians never exposed to biofeedback devices. In this study, they are gradually relaxing, so blood flow – and thus skin temperature are increasing. It looks like a learning curve but the subjects couldn’t see the display. Temp Time in Minutes Here is an illustration of the difference between learning a technique and changes in intensity of a clinical problem. The table below shows the results of a clinical study that used biofeedback to treat a disorder. The authors sent only the left column of the table in with their paper for review and concluded biofeedback did not work for the disorder. However, there was sufficient data in the paper to construct the middle and right columns. The information in the middle column, subjects who learned the skill, and the right column, subjects who did not learn the skill, puts the original improvement date in an entirely different perspective. With this additional data, we are able to conclude that while most subjects did not learn the skill, most of those who did showed clinical improvement, and most of those who did not showed no clinical improvement. 5 Original Results of the Study Subjects Who Learned the Skill Subjects Who Did Not Learn the Skill # Subjects Who Showed No Improvement = 30 6 24 # Subjects Who Improved = 21 18 3 Nonspecific/placebo effects are the bane of uncontrolled clinical studies. They also help to explain why all the “new” treatments work. New treatments are initially tested using the pretreatment baseline – intervention – posttreatment baseline (A-B-A) design with no control group. Therefore, changes resulting from natural fluctuations in the disorder’s intensity (as with acute lower back pain) and nonspecific/placebo effects are missed. Consider the typical 30% placebo cure rate for headache. Nonspecific effects include patient-therapist bonding, placebo efects, changes with time, expectations, etc. Follow-ups are usually too short to observe the placebo effects wearing off. A good placebo control includes the following: 1) treatment expectation effects. 2) placebo effect from the belief that the treatment can work or is working. 3) habituation to the treatment environment and sufficient duration to elucidate changes with time. 4) good therapist-patient bonding with the therapist giving general support and expectation that the treatment will work. The following figure contrasts the results from a realistic placebo control (blue) and an actual treatment (red). The hatch marks indicate variability of responses. 6 Controls — especially realistic placebo controls — are critical because of the placebo effect and changes in problems with time. As an example, as long ago as 1976, Dohrman and Laskin (Journal of Dental Research 55: 249) conducted a study of 24 patients diagnosed as having jaw-area pain related to muscle tension. Eight patients were in a placebo group, and 16 were given biofeedback. Three-quarters of the patients treated with biofeedback showed “significant improvement of clinical symptoms and required no further treatment.” This sounds good, but unfortunately for people who believe that controls aren’t needed, half of the controls had comparable results. There was no long-term follow-up, so there was no opportunity to know whether the placebo effect wore off (it usually lasts six months or so) or whether the pain simply returned on its own. Elton offers an excellent review of the placebo effect on pain (Elton et al: Psychological Control of Pain. Grune & Stratton, NY, 1983) An especially good article is by Finniss and Bendetti (Pain 114, 3 – 6, 2005), which discusses mechanisms of the placebo response and their impact on clinical trials and clinical practice. A good article demonstrating the power of the placebo response is one by Price et al (Pain 127, 63 – 72, 2007), which shows that placebo analgesia results in large reduction in pain-related brain activity in patients with irritable bowel syndrome. This is one of many studies beginning to come out that show changing how the brain processes pain changes pain perception, regardless of the source of the pain or the method used to change the brain’s processing. Discover and Scientific American magazines have published several articles on brain scans, showing that placebos stop pain by changing processing of the signals (e.g. Epstein in Discover, page 26, Jan. 2006); Choi in Scientific American, page 36, Nov. 2005; and Ruvin in Discover, April 2006). For more on brain and pain, you may want to look at Nicoll and Alger’s article on the brain making its own pain relievers (Scientific American, Dec. 2004). 7 Open studies don’t show that a technique is actually effective. Single subject and single group designs are important to demonstrate that a change in significant outcome measures, such as pain intensity, ability to walk further, etc., takes place from beginning to end of the study period. Therefore, it is worth progressing to a much more extensive, complex design. Evaluation of a technique’s efficacy can’t stop at open studies because the change could just as easily be the result of time alone or placebo effects. This is why single group studies indicating effectiveness of interventions for low back pain such as (a) chiropractic manipulation and (b) low back surgery have little to no credibility in the medical community. As soon as a control for change with time is introduced, it turns out that subjects receiving chiropractic do no better that those not receiving any treatment. (Most of the comparison studies are “population change” based in which changes during a chiropractic study are compared with changes expected of the population.) As soon as a comparison control with other treatments is included, it turns out that surgery for low back pain is no better than well designed, intense behavioral and strengthening programs. When both are compared with no intervention, their results are less than impressive. Typically, tiny controlled studies of behavioral interventions frequently have fatal flaws. First, waiting list control groups do not have expectation/nonspecific/placebo effects. Also the people on the waiting lists are frequently very different from those in the study. They have sometimes turned down participation in the behavioral study, are too poor to make the required trips, or are involved in other situations. Another problem is that the typical 10-subject behavioral study is too small to adequately represent true patterns in the general population of people with the disorder. They are also too small to avoid the pitfalls of looking good or bad because of unusual reactions by a few people who are either very sensitive or insensitive to the treatment. The final issue is that follow-ups are generally, but not always, too short to observe placebo effects wearing off. Look beyond the “weight of clinical experience” or thinking along the line of “if the technique is in use now, it must be efficacious.” History is full of examples illustrating this. When the idea of pre-surgery handwashing was first extolled, it was considered a waste of time, certainly not something that would make a difference in health care. It was only accepted after a comparative controlled study considered surgical survival. Virtually all of the techniques that were in use around the time of World War 8 I are no longer used. At the time everybody knew they worked and laughed at the new ideas. Nearly all of the drugs and techniques that everyone supported during the World War II era are gone as well. Many of today’s accepted techniques have never been subjected to controlled study, so it’s virtually impossible to evaluate them. At least some of the techniques which are labeled alternative and complementary today will survive to be the standard techniques of tomorrow, and we will look back and laugh at techniques that we once swore by. With better education of clinicians, more audits by record keeping agencies, and enforcement of the law, more techniques will need to be proven effective before they reach wide use. Keeping credibility in mind, here are five positive signs to look for as you listen to discussions of alternative techniques: 1) numerous articles by different authors supporting use of a technique. 2) many articles with good, realistic placebo controls. 3) double-blind articles, not single-blind, and evaluations done by a neutral team. 4) patients who are randomly assigned to the alternative technique (not people who show up wanting it). 5) articles published in mainstream journals with high reputations (determined by citation scores, etc.) rather than only in a journal supported by practitioners of that technique. This article has offered some helpful hints for your current practice. One note on typical clinical research courses: Most are designed for biologists or people performing psychological studies. They don’t concentrate on how to recognize or perform good studies in the biofeedback clinical environment. Be sure to take a course relevant to your interests and applicable to clinical biofeedback practice. This article has provided a rationale in support of biofeedback practitioners having a working understanding of research methodology, experimental design, and the placebo effect. The author, Richard Sherman, Ph.D. is a Past President of AAPB and teaches basic science and biofeedback training courses. He can be reached at rsherman@nwinet.com. 9