The Limitations and Problems of Adopting Evidence

advertisement
The Limitations and Problems of Adopting Evidence-Based Medicine as an Ideal.
Zain A. Hakeem, DO, PGY-4
11/09/10
Evidence based medicine has become quite an entity in our education. It is so prevalent that it
sometimes seems as if it has always been present, and yet we are aware that the concept is of relatively
recent origin. The phrase “evidence-based” now has such cachet that it can be used as a descriptive
qualifier for physician actions, thought processes or even general quality of care. We are even wont to
use it as a personal description or denigration on occasion, as in “Dr. so-and-so? Well, he's nice enough,
but he's not very evidence-based.” The importance of the concept of EBM and the size of the role it
plays in both our education and in the practice of physicians throughout the country and the world is
growing constantly.
But, as with most models, the evidence-based practice model has a number of limitations that
require address for several reasons: First, to encourage doubt, since a skeptic's doubt is fundamental to
clear reasoning and is a necessary condition for questioning, especially in the face of the authority that
the evidence-based model has assumed. Second, to question that authority, and to show that evidencebased practice, in the form that is commonly promoted, actually undermines the scientific method, or at
least eschews physicians from engaging in that method. Third, to promote a return to the spirit of
openness that characterizes good science and has led to so many medical advances.
To fulfill these reasons, after discussing the history and evolution of EBM as policy, this paper
will begin by addressing some practical limitations of evidence based medicine, that is, those
limitations or problems that can (at least theoretically) be overcome with time, effort and/or education.
These include problems of access and application, both at the individual and the socio-political level.
After clarifying these problems by way of contradistinction, the paper will go on to discuss the main
philosophical problems with the adoption of EBM as policy, including problems of epistemology,
regression to the mean, and progress from excellence. Finally, the paper will briefly sketch the outline
of a model that could encompass the good aspects of EBM while avoiding some of the pitfalls.
To begin an exploration of evidence-based medicine, it is helpful to begin with an
understanding of its origins and evolution. As far back as Avicenna or longeri, physicians have sought
evidence of efficacy, if only to compete with each other. This is expected; as in any human endeavor;
as one author put it, the need for evidence that our decisions are valid or beneficial is a cognitive itch
that we seek to scratchii. Calls for an evidence matrix on which to base decisions came as long ago as
the 18th century. But the founding push for that development came in 1972 from Archie Cochrane, a
Scottish epidemiologist who expressed his dismay most aptly: “It is surely a great criticism of our
profession that we have not organized a critical summary, by specialty or subspecialty, updated
periodically, of all relevant randomized controlled trials.iii” A contemporary who estimated that 1520% of physicians' decisions were based on evidence received this rejoinder: “you're a damned liar,
you know it isn't more than 10%.iv”
Due to the efforts of Cochrane and others, the concept of evidence based practice became
increasingly prevalent in medical practice, though the form of that practice remained quite different –
the statistical methods were less sophisticated, and although the idea of outcomes research has been
done since 1834 (Louis' “Numerical Method”, later commended by Osler)v, placebo control has been
present since 1863 and controlled trials have been done since at least 1747 (scurvy)1, true
randomization was first introduced as late as 1928vi. The process of generating and analyzing medical
evidence led to the creation of a hierarchy of evidence, exemplified by the evidence pyramid, which
denigrates expert opinion as the “lowest” form of evidence . A telling statement in the wikipedia article
on EBM highlights this clearly - “EBM aims for the ideal that healthcare professionals should make
'conscientious, explicit, and judicious use of current best evidence' in their everyday practice. Ex
cathedra statements by the 'medical expert' are considered to be the least valid form of evidence. All
1 The reference “limey” evolved from British sailors' requirement for eating citrus juices while aboard a vessel, based on
this study which used 6 different concoctions in 6 pairs of sailors with scurvy and noted profound improvement only in
the citrus juice group. An n=12 would hardly be considered profound today, but it changed history.
'experts' are now expected to reference their pronouncements to scientific studies.”vii
Ultimately, however, an accessible body of evidence began to evolve, which physicians could
use to guide their decision-making, which was further refined by computerized access techniques. The
so-called information explosion (one might say the “noise” explosion) is upon us. Yet a 2007 review of
the cochrane database found that while 44% of the reviews found the intervention in question “likely to
be beneficial”, 49% concluded that the evidence “did not support either benefit or harm”, and only 7%
concluded that the intervention was “likely to be harmful”. Interestingly, 96% recommended further
researchviii(a 2004 review of alternative medicine Cochrane reviews found 38.4% positive, 56.6%
insufficient evidence, 4.8% no effect, and 0.69% harmfulix)2. The presence of this body of evidence,
however, became a double-edged sword with the progressive increase in health care costs. As costs
rose, and companies sought ways to control those costs, the question arose whether payment should be
offered for therapies that were known to be harmful or less effective.
With globalization has come awareness of other medical traditions, which are untested and
unknown, and with rapid advances in science and technology has come the “moving target” problem of
generating up-to-date evidence, which highlights the essential challenge to an evolving body of
evidence: The need to make decisions now, when clear evidence is unavailable. The atom bomb was
not used until 20 years after Einstein's theory was published and multiple experiments had given
evidence that the theory had better predictive capability than those that came before. Similar processes
occurred in other technological disciplines. A theory is proposed, experiments are designed to test it,
observations are made, and if the theory holds, then attempts are made to extrapolate technological
advances from the theory. Medicine faces a peculiar challenge as a science, because we must make
decisions in real time on patients that present to our care now.3
In general, however, the goals of evidence-based medicine seem reasonable and practical;
indeed, bordering on mundane – to look closely and conscientiously at the outcomes of our
interventions, to have guidance from others who have gone before us on how to make decisions that
will benefit our patients, and avoid choices that will harm them. In some ways, this is the essential
survival characteristic of humanity, the main function of language – the dissemination of experience
(Helen Keller describes her first understanding of language as the moment she became human)x. That
being so, what problems could such a reasonable goal entail?
***
The first problem (and arguably most minor) problem with evidence based medicine is that the
pot doesn't hold itself. If evidence based medicine is a clearly superior approach to practice, then those
persons, groups and institutions that practice evidence based medicine most accurately and most
conscientiously should have the best outcomes – and if there is in fact a large difference, a small n
value should be sufficient to give a study enough power to detect that difference. But in fact, this has
been difficult to show. One may pass over this issue as minor, but it deserves mention. If evidencebased practice is your touchstone, there should be evidence for it.
There is one particular problem with EBP that this article will highlight: that despite all the
good inherent to EBM as a concept, adopting EBM as a touchstone, an ideal, or a policy is fraught with
the danger of destroying our ability to achieve excellence, and thereby to provide better care for our
patients over time. This problem is analogous to the question raised by teaching residents – are we
prepared to offer substandard care once, twice, three, ten or a hundred times in order to guarantee that
we provide better care overall and better care in the future4? After highlighting this question, the author
2 If I were a pharma rep for alternative medicine, I'd probably say that this shows that standard medical practice is 1000%
more likely to be harmful.
3 One might argue that climate science faces a similar problem.
4 I make two assumptions, both of which are fairly well substantiated – that inexperience leads to lower quality care, and
that teaching hospitals have better outcomes overall. Furthermore, if no one ever lets the inexperienced physician
practice, what happens when all the experienced physicians finally die?
will attempt to provide an alternative to EBM that captures the benefits of rigorous retrospective
observation without preventing progress or excellence.
Before starting, however, one is forced to consider several other problems with EBM, to clarify
the objections which are not of interest, so that one may clarify the primary problem. These other
objections constitute what one may consider the “practical” problems of the evidence-based approach,
since however troubling they may be, they are nonetheless merely technical difficulties, much the way
programming a computer is trivial compared to inventing the concept of digital computing. This
presentation will only cover these practical problems briefly, both because they are well discussed
elsewhere, and because their solutions are comparatively easy (practical problems have practical
solutions).
After mentioning these problems, the presentation will turn to the deeper problem of the
philosophical underpinnings of evidence based practice and firm footing in the scientific method. The
question one must ultimately consider is this: if all practical problems were swept away by the forces
of time, money, and human will, would implementation of EBM as policy stand as a goal worthy of
such effort on its behalf?
***
The first serious practical problem, and perhaps the most obvious, is that of obtaining evidence.
Most of our research methods were designed to address chemical treatments, and when applied to nonchemical interventions (osteopathy, psychotherapy, surgery, etc.), may not give an accurate picture of
the importance of the intervention. Newer research methods are being developed to address these
interventions.
Another commonly cited problem with obtaining evidence is the so-called “parachute
problem”5. Just because there isn't evidence that one needs a parachute when jumping out of an
airplane, it's not reasonable or ethical to conduct a blinded trial to test that perceived need. Many gaps
in our knowledge cannot be filled without rather Nazi-ish lack of ethical regard6.
Still more troubling is the interference of special interest groups in creating “data” that promotes
their commercial benefit. The science of medicine has been, and continues to be corrupted by the
business of medicine. A recent review indicated that 8% or more of articles in a major peer-reviewed
journal were in fact ghostwritten by pharmaceutical company representativesxi,xii,xiii,xiv. There have
been studies that were later discredited when their authors admitted to falsifying dataxv.
Finally, even when appropriately chosen, appropriately performed and published studies are
done, there rises the problems of access and interpretation. Evidence-based practice requires
physicians to have a knowledge and skill set regarding the evaluation and use of sources that purport to
be evidence. The residency education process is being updated constantly to incorporate teaching these
skills, with ongoing EBM-focused lecture series. This practical aspect also includes problems of
money and access, such as the budget cuts that are forcing university libraries to decrease their
holdings of both print and online journal access, and problems of time, such as the time and education
needed to perform a complete decision analysis.
In the aforementioned skill set also lies the problem of evaluating subgroup and post-hoc
analyses. Many studies that are cited as evidence of particular benefits were actually not positive for
their main test, or were not powered sufficiently for even their main test7. Citing even an adequately
powered study's subgroup analysis is fraught with peril, as elegantly shown by Peter Sleight's
5 Thanks, Dr. Matt Mischler, I had not heard this term before, but it summarizes the problem quite expressively.
6 Thanks, Dr. Atif Shanawaz, for the following citation, contained in an e-mail he titled, 'When Men Were Men'. In this
1880 paper, three aneurism(sic) patients were placed on enforced bedrest and “Their diet was reduced till it was found
that their health was suffering.” then increased until health was maintained, in order to determine minimal nitrogen
excretion. Other similar starvation studies are cited in the paper.
"On the amount of Nitrogen excreted by Man at rest," by Samuel West, M.B. Oxon, and W. J. Russell, Ph.D., F.R.S.
7 I'm lookin' at you, Pediatrics
illustrative analysis of the ISIS-2 trial results (aspirin vs. placebo in AMI) by subgroups of astrological
sign. Two of these subgroups, Gemini and Libra, did not show a benefit of aspirin over placeboxvi.
This is the nature of statistics. Even a p<0.05 carries a 1 in 20 chance that the results are not reflective
of a true association. This seems a reasonable risk until you consider grouping all studies into sets of
20 and realize that if 1 study in each set were randomly declared invalid, we could have a large
problem. This effect seems even more problematic with high NNTs – if a treatment only benefits 1 in
100 patients, and there is a 1 in 20 chance that even that benefit is not real, but a statistical
happenstance, the importance of that treatment becomes a bit difficult to argue8.
Several studies have shown that physicians lack the basic statistical skills necessary to perform
an accurate decision tree analysis of possible optionsxvii,xviii,xix. At the very least, physicians are
frequently unable to apply Bayesian reasoning in the proper interpretation of the meaning of the tests
performed, and that misunderstanding leads to increased risk to patients. For instance, a patient with a
McIsaac score of 0 and a positive rapid strep antigen test has a post-test probability of ~15%; a patient
with a McIsaac score of 5 and a negative RSA has a post-test probability of ~17%xx. The odd situation
resulting is that the patient with the lower probability of disease is more likely to receive treatment
because the meaning of the test's result is not understood clearly. The risk of chronic complications
from non-treatment in both examples is around 2 in 10,000xxi; the risk of chronic or serious
complications from antibiotic treatment is not well defined, but is likewise rare. Consequently,
although the relative benefit of treatment in these examples is unclear, many physicians favor treatment
despite a poor understanding of the risk/benefit ratio9.
A more troubling practical problem is our social mileu regarding liability. As Levitt and Dubner
show in Freakonomics, incentives and disincentives are large forces in altering the behavior of
populationsxxii. The incentives for proper analysis and treatment in the above case is minimal, and the
disincentives are large. If a patient develops c.diff colitis, even requiring colectomy, partially as a
result of the antibiotics ordered to treat the positive rapid strep test, how liable is the ordering
physician? Not very, since c.diff is a known (supposedly random) complication of antibiotic therapy,
and it is impossible to prove a causative role in the individual case. On the other hand, if the patient
has group A strep, which is missed, or the risk of rheumatic heart disease is deliberately ignored in
accordance with the analysis above, how liable is the physician for not ordering the rapid strep test?
One suspects a lawyer could make a pretty good case, since once the heart condition is “found”, the
retrospective analysis (and blame) becomes much easier10.
In short, our social structure has not come to terms with the fact that we should be missing
certain cases, certain diagnoses, if we are trying to follow the evidence and maximize the overall
efficacy of our diagnostic and therapeutic methods. An evidence-based, statistical approach forces us
to realize that we must miss things, which is not in accordance with a justice system that penalizes
missed diagnoses, but not over-testing, nor over-diagnosis, nor over-treatment (often, “informed
consent” protects us even if there are immediate complications)11.
Moreover, the concept of playing the odds accurately is not in accordance with a medical
education system that promotes the concepts and virtues of vigilance and diligence with inappropriate
phrases like “be complete” or equates “thoroughness” with “ordering all possible related tests”xxiii,xxiv.
Medical education inspires an inappropriate fear of “missing something”, and a similarly inappropriate
lack of emphasis on a cognitive process that embraces missed diagnoses if the cognitive process itself
8 See the WOSCOPS trial.
9 I make this statement on the basis of both the previously cited articles and an unofficial sampling of physician selfreported anticipated practice patterns during a presentation of this material at OSF St. Francis.
10 One of the hopes of promoting Bayesian reasoning is that by making these probabilities and uncertainties explicit,
physicians may gain some better footing in such cases.
11 I have seen several coronary angiograms performed under the “just to be safe” rubric, despite the known rate of
complications.
was correct.
Consider the rise of HA-MRSA and C.diff., or the recent estimate that 0.6-3% of all cancer risk
to age 75 is attributable to medical diagnostic radiationxxv,xxvi,xxvii. This is a direct result of our inability
to follow the statistics we know, and our perpetuation of incorrect diagnostic and treatment patterns in
order to avoid liability. Even if we give a kid 8 courses of omnicef for ear infections that we know are
probably viral or self-limited, it's still just an accident when his brother is hospitalized for MRSA
infection.
In the end, we each make a personal choice about how inappropriate our practice is going to be,
based on our emotional perception of liability risk and potential modification of that risk12. But a full
discussion of decision making and cognitive diagnostic processes is beyond the scope of this paper.
Suffice it to say that one limitation of EBP is our society's invocation (justified or otherwise) of a fear
or worry that often trumps even our best evidence-based sensibilities.
The same problem in our social system manifests in another way, briefly mentioned above – the
rationing of resources will inevitably occur along lines of political power, which is merely
reinforcement of the status quo. The application of the status quo (in the form of practice guidelines) to
an individual patient, rather than being purely medical, becomes politicized by the question of
payment. Indeed, the question posed by managed care and insurance is not a truly medical question,
but one of personal freedom versus social contract. The essence of any insurance system is risk and
cost distribution, with weighting and estimation of risks, probable outcomes and helpful interventions
being key to that essence. The concept of evidence based medicine as policy is actively being twisted
into a stick-and-carrot routine for enforcing physician compliance with guidelines13. As health
economist Dr. Reinhardt states, “EBM is the sine qua non of managed care, the whole foundation of
it.”xxviii,14. Similarly, Sing and Ernst suggestxxix that the main appeal of EBM is “to health economists,
policymakers and managers, to whom it appears useful for measuring performance and rationing
resources.”xxx
Fundamentally we are faced with the problem of deciding how to allocate finite resources in
providing the care and services for patients. In discussing the subject with peers, this goal is generally
phrased as some version of “people are free to be wacky, but not on my dime.”15 More gently, one may
ask whether the communal resource of insurance money (whether obtained by premiums or by
taxation) should be used to pay for treatments of lesser or questionable efficacy?
Although Sackett's oft-cited definition of EBM states “The practice of evidence-based medicine
means integrating individual clinical expertise with the best available external clinical evidence from
systematic research”xxxi, the reality of implementing evidence-based medicine as a policy, particularly
as a policy of payment and incentives, belies this soothing rhetoric. As Dr. Reinhardt continues, “My
fear is that medicine will slide into the same intellectual morass in which economists now wallow,
often with politics practiced in the guise of science. In medicine, it might be profit maximizing in the
guise of science.”xxxii
But even that is a merely pragmatic issue – a large one, that will require a tremendous cultural
shift to correct, but still merely pragmatic. A deeper philosophical problem is epistemology – What
constitutes evidence? What constitutes proof? What determines the accuracy of a theory? Specifically
in medicine, the question arises: “Is fecundity enough?” Is the fact that lovastatin has more evidence
12 Actual analysis gives a somewhat different picture since likelihood of being sued is most closely related to rapport and
voice tone, whereas likelihood of jury conviction is most closely tied to degree of injury or sympathy – neither of which
have anything to do with medical decision making.
13 Pay for performance sounds better than pay for compliance.
14 By the way, if anyone happens to look up this reference, it's a completely ridiculous propaganda article against political
EBM. Unfortunately, despite the low quality of the article, the gentleman found an eloquent phrasing of the idea I was
trying to express, and once I had read it, I could not in good conscience fail to cite the quotation.
15 Thanks to Dr. Deepak Nair for this quote.
than garlic powder sufficient? Is the truth merely dependent on who can publish the most? Particularly
since our concept of evidence or knowledge is statistical, if a company can afford to run a trial 1000
times, it is naturally likely that they could publish a few trials that have a p<0.01, much the way one
could “prove” that a coin is twice (or thrice or four times) as likely to fall heads than tails, if allowed to
only publish the series of trials that suit that premise. Proof by fecundity rapidly devolves into proof
per dollar funded.
Along that line of thought is the inductive error and the nature of statistics – they can only show
correlation, not causation. We know this, but our current evidence-based model denies it. Just because
two things often occur together does not mean that there exists a causative relation between them. We
are told to follow the statistics, or even to evaluate the statistics; but not taught the deeper process of
recreating the models we were taught on the basis of certain correlations, or to reject the correlation on
the basis of its incompatibility with our model. There exist valid, reproducible, statistical analyses that
consistently correlate the incidence of rape and ice cream production, or global warming and the
decreasing number of pirates, or other variables that do in fact covariate but do not have a causal
relationship. Though he condemned a major Muslim prophet, one finds a certain sympathy for the
medical relevance of Pontius Pilate, apparently a fan of both epistemological questions and handwashing.
While the purpose of this article is not to reproduce the full extent of discourses in philosophy
of science and philosophy of knowledge or even causality, some key points will be pertinent to
communicating doubt in our standard medical process of “evidence-based practice”.
First, “causality is in the mind”. This concept, best known from David Hume, is simply that
humans infer causes from observations16. One sees a coin move toward another, stop nearby, and the
other coin continue moving, and this is seen consistently and often, and one extrapolates theories of
physics, momentum, kinetic energy and deformability of solids. What was actually seen in any
particular case had no causation – your mind supplied the concept and assumption of causation. What
was observed is completely distinct from the explanation given for those observations. Observation
alone, no matter how sophisticated, is only useful for prediction through the use of an explanation, a
model, a theory, which does not come from the world, but rather is something one's mind extends
toward the world.
To illustrate, when witnessing a magic trick, no matter how excellently or convincingly
performed, are you often inclined to rewrite or reexamine the laws of physics? Of course not, because
the strength of your faith in the theories of physical causation outweighs your faith in your current
observations, that is, your faith that your current observations are complete and accurate
representations of actual events. The key point here is that the theory of causation in our minds is often
more important to us than even direct, convincing evidence17.
The concept of probabilistic causation is more applicable to medicine (though similar problems
arise from quantum mechanics), but has the same basic flaw. Probabilistic causation of B by A is
implied if the probability of B is increased given A. Determining relationships such as this can be
accomplished via “causal calculus”, wherein interventional probabilities (e.g. P(cancer|do(smoking))
may be inferred from conditional probabilities (e.g. P(cancer|smoking)) through analysis of missing
arrows in Baysian Networksxxxiii. Without getting into the depths of such analyses, which I don't fully
understand, the point is that observations help make associations, but the inference of causality is
always in the mind, no matter how sophisticated the method of determination used.
The formation of theories from observation is essentially the entire function of the scientific
method, and when we attempt to use evidence directly in patient care, we short-circuit that method.
Observe, theorize, test, repeat. From Kepler to Pasteur to Einstein, this process remains the same.
16 That is, causation cannot be observed, only events can be observed, and causation is something we infer therefrom.
17 As I will argue later, this is as it should be.
Kepler's intensive observations of the motions of heavenly bodies would have been useless if published
as sheafs of statistical correlations; it is the laws derived from those observations that have proven
useful. Indeed, without those laws, there was nothing to test – no statement to prove or disprove.
Medical statistical literature often merely asserts covariance, and allows the reader to infer
causation without an explicit theory for support, or assumes that the readers are all familiar with the
same theory. This defeats the entire purpose of science, since theorizing is the essential function of the
scientific method, specifically, theorizing the particulars of causation. One cannot apply evidence to
patient care without a theory18, but by failing to make our theories explicit, we end up with poor quality
theories and inconsistent application. Even more troubling, we cannot revise our theories on the basis
of new evidence nor can we reject new evidence in favor of our theory, because our theory was never
made clear.
Take for example the Cochrane review of medication for treatment of anxiety disorders in
children and adolescents. It found 22 short term (<16weeks) double-blind, randomised, placebo
controlled trials which found that treatment response was greater in the treatment group (58.1%) than in
the placebo group (31.5%), but that the “medication was less well tolerated than placebo, as indicated
by the significant proportion of children and adolescents who dropped out due to adverse effects during
the short term trials.” Reading further, the majority of trials were for OCDxxxiv.
So what's the theory? That SSRIs are relaxing? That SSRIs effectively reduce anxiety
symptoms in children and adolescents? That SSRIs are effective treatments for children/adolescents
who meet DSMIV criteria for anxiety disorders? That SSRIs have a little better than 50% chance of
reducing anxiety symptoms in a child/adolescent with OCD? Or that while SSRIs have no direct
chemical effect on anxiety, the side effects of SSRIs are prominent enough to undermine doubleblinding and reinforce the placebo effect sufficiently to raise the efficacy of placebo treatment from 31
to 58 percent? Whichever theory one chooses, one cannot effectively prove that one's chosen theory
has much greater support than another.
To my knowledge, no study has been done comparing SSRIs with a chemical that reproduced
the same side effects but did not have the same chemical action in the brain. To do such a study, one
would have to have a very clear, accurate theory of the action of SSRIs. While several theories do
exist, there is (to my knowledge) no clear consensus on the exact mechanism of action exists
(dopamine, norepinephrine, sure ... so? Explain how that affects thinking... precisely), largely because
there exists no clear consensus on the nature, function and methods of the mind19.
This example is pertinent because on first glance the data seems “self-explanatory” – it seems
so simple, so straightforward, until one considers an alternate explanation that also fits the data, but has
completely different implications. It is not difficult to imagine that two different interpretations would
lead to rather different practice patterns.
One could cite any number of topics – cholesterol/statin studies or ppar-gamma studies, or
aspirin, or many other topics – and have a similar discussion. The main point, however, remains the
same – no matter how excellent the statistical correlation, without a theory or explanation that is clearly
expressed, no science has been done and no extrapolation, prediction or application can follow. This
process of story formation is the entire purpose of the scientific method – not merely to gather data, but
to weave that data into a model that has testable implications, then to test those implications to refine or
refute the model. Application of science in medical practice can only be managed on the basis of a
theory – data is not enough. Whatever concept of the scientific method we adopt, whether Popper's
falsifiability or Bayesian networks' elucidation of probabilistic causal relationships, we must adopt
some concept and use that concept as the basis of our practice.
18
Since evidence is merely a collection of data, unless one has some theory, one could never apply that evidence to a
set of patients, any more than one could apply the number one, or the observation “that apple fell”.
19 Again, I will not delve into the realm of philosophy of the mind – there are arguments for and against several theories.
Just as with the magic trick above, we may find that the logic of a particular theory is so
appealing that we would rather doubt the validity of some evidence than to reject that theory. When we
argue about the methodological quality of various papers, we are engaging this process – if you believe
lifestyle changes can make a difference in obesity, you will interpret papers that way, and critique the
methods of papers promoting genetic explanations of weight gain; the reverse is true if you think that
genetics is the controlling factor. This common process, when two opposing sides cite the literature
that supports their view and critique the methodology of papers with contrary findings, is the scientific
process at work. This debate of theories on the basis of seemingly conflicting evidence represents the
successful implementation of the scientific method on incompletely understood and incompletely
observed phenomena.
***
Of the cochrane SSRI analysis above, one is led to ask (of the meta-analysis or of each study
within), “What did this study really test?” The obvious answer is that it tested the efficacy of SSRIs in
treating anxiety in children and adolescents; but just as we asked, “efficacy according to whom?”, we
may now ask, “anxiety according to whom?” How well did each researcher tease out the symptoms of
anxiety? Did each one equally avoid asking leading questions? How precisely did the researchers
interpret the DSM criteria? The researchers were not interested in the ability of SSRIs to treat nausea
or abdominal pain, but how well did they differentiate these symptoms from anxiety?
Ultimately, the function of applying the DSM in this study was to separate or “sort” the patients
into a group that was amenable to study. One can postulate that some researchers were more effective
than others at accomplishing that sorting; that is, some researchers more accurately identified those
patients who would respond to treatments directed at reducing their anxiety. One could postulate that
other researchers were less accurate, but tended in particular to confuse depression symptoms with
anxiety symptoms in children. Others may have tended to mistakenly include children with normal
degrees or types of anxiety in the treatment group. Each of those researchers might get different results
from a trial of SSRIs. And that difference is not trivial. Ultimately, medicine is a process of sorting
patients into treatment (and prognosis) groups; the accuracy of this sorting process directly affects the
outcomes seen. Naturally, the excellence or mediocrity of the diagnostic (sorting) process profoundly
determines the efficacy of any intervention over a population.
Interestingly, this “sorting” can be considered as fundamentally procedural. It's a slow
procedure, performed over a population rather than over a single person, but it is a procedure, a series
of decisions and actions, nonetheless. Being given a mixed set of colored blocks and told to sort by
color, shape, and size, is fundamentally no different than being told to sort each patient who comes
through the door into diagnostic groups for treatment. Some doctors will be more or less accurate than
others, some more or less creative, or insightful, or aggressive, or convincing, and those differences
will determine their individual outcomes just as much as the efficacy of any particular treatment
modality. Like most procedures, from hernia repairs to colon cancer resection efficacy, to CABG,
efficacy will fall on a Bell curve20.
The usual response to this idea of variability is that these differences will average out among
multiple centers, or in meta-analysis of multiple trials, and that is true. As a result, this kind of
averaging is standard practice in randomized controlled trials and this is why the meta-analysis is held
in such high regard. But this same process deletes vital information; specifically, information about
what happens at the top of the Bell curve. We sacrifice that information in order to gain information
about the average. And while it is interesting and perhaps more directly pertinent to know whether
SSRIs are effective in the hands of “average” general pediatricians, it might be more interesting, or at
least interesting in a different way, to know whether they are effective in the hands of pediatric
psychiatrists. Similarly, some pediatric psychiatrists are simply better than others; a study done by
20 Support for this idea is provided later in the paper, so I have left the statement uncited here.
those psychiatrists may well have different results than one done by “average” psychiatrists.
Troublesomely, the data from the top of the bell curve is the data that most effectively informs
our model – What if SSRIs don't work for OCD that is accurately diagnosed? What if truly expert
psychiatrists can treat anxiety so effectively with counseling that the SSRI effect is negligible? What if
they sort so accurately that the SSRI effect is halved, or doubled or tripled? What would that say about
the nature of anxiety/OCD treatment and about the nature of SSRIs? Wouldn't that be more pertinent
data than information about the “average”? This is the sacrifice demanded by the apotheosis of the
meta-analysis. One receives information about the average, but loses information about
excellence.
What did the meta-analysis test? The efficacy of the SSRIs, or the mediocrity of the “average”
diagnostic process? Meta-analyses certainly convey some useful information, but not all the
information. Indeed, data that is most pertinent to our understanding of disease processes and optimal
treatment modalities is lost in our perpetual search for a higher n-value. The sheer quantity of the n
does not compensate for the loss in quality. V.S. Ramachandran (inventor of the light box therapy) is
described in The Brain that Changes Itself by Norman Doidge:
“He is a sleuth, solving mysteries one case at a time, as though utterly unaware that modern
science is now occupied with large statistical studies. He believes that individual cases have
everything to contribute to science. As he puts it, 'Imagine I were to present a pig to a skeptical
scientist, insisting it could speak English, then waved my hand, and the pig spoke English. Would
it really make sense for the skeptic to argue, 'But that is just one pig, Ramachandran. Show me
another, and I might believe you!' ' ”xxxv
Consider these two concepts – that the practice of medicine is a procedure, very slowly
performed over the population of patients each physician treats, and that that procedure's efficacy will
vary by physician on a Bell curve. Consider “being a patient of Dr. Zain Hakeem” as a risk factor for
mortality and morbidity. Where would you fall on that Bell curve? What factors determine one's
placement on the Bell Curve?
To answer this, start by examining other procedures. Hernia surgeries have an average
recurrence rate of 5-10%. At the best centers in the world, it is less than 1% (about one in 500). At
those centers, everything about the hernia repair is different. The surgeons do nothing else – six to
eight hundred a year, and the entire hospital is set up for hernia patients. “Their rooms have no phones
or televisions, and their meals are served in a downstairs dining hall; as a result, the patients have no
choice but to get up and walk around, thereby preventing problems associated with inactivity”xxxvi.
Carotid endarterectomies are recommended to be performed only at high volume centers for similar
reasonsxxxvii. Post surgical colon cancer resection, ten-year survival rates ranged from 20 to 63%,
depending on the surgeonxxxviii. A more conventionally “medical” illness exemplifies this process:
treatment of CF21.
CF, as an area of study, has an unusually long record of following outcomes data from particular
doctors. The reason is, as Atul Gawande says, “because, in the nineteen-sixties, a pediatrician from
Cleveland named LeRoy Matthews was driving people in the field crazy.” He was claiming an annual
mortality rate of less than two percent at a time when the national mortality rates were in excess of 20%
per year; he has an average life expectancy of 21 years at a time when the rest of the country's CF
patients died at 3. The CF foundation reviewed these claims, found them true, and began a registry of
outcomes by center to track the results of nationally adopting Matthews' treatment guidelines. Two
years later, average life expectancy nationally had reached 10 years of age. In the early nineteenseventies, the national average was 18 years, though Matthew's center's was higher. In 2003 the
average was 33 years; at the best center, it was more than 47. As Gawande puts it, “There was a bell
curve, and the spread had narrowed a little. Yet every time the average moved up Matthews and a few
others somehow managed to stay ahead of the pack.”
21 I copy the content of this and the following section to nearly the point of plagarism from Atul Gawande's article The Bell
Curve.
Gawande's article chronicles his visit to an average CF center – Cincinnati Children's at
Fairview, concluding that, “This was, it seemed to me, real medicine: untidy, human, but practiced
carefully and conscientiously—as well as anyone could ask for. Then I went to Minneapolis.” At the
best CF center in the country, consistently over almost 40 years, he found “Patients with CF at Fairview
got the same things that patients everywhere did ... Yet, somehow, everything he did was different.”
What was different? Gawande finds it difficult to characterize. Warwick, the director who took
over from Matthews, is a remarkable character. He invented “The Vest” in his garage. He invented and
uses a stereoscopic stethoscope, he's invented a new cough that he teaches to his patients. Ten percent
of his patients receive G-tube feeds solely because “by his standards, they aren't gaining enough
weight.” Though there is now evidence that “The Vest” is at least as effective as manual percussion,
Warwick has little evidence to justify his other individual interventions:
“There’s no published research showing that you need to do this. But not a single child or teenager at the center has died in years. Its oldest patient is now sixty-four.
The buzzword for clinicians these days is 'evidence-based practice'—good doctors are
supposed to follow research findings rather than their own intuition or ad-hoc experimentation. Yet
Warwick is almost contemptuous of established findings. National clinical guidelines for care are,
he says, 'a record of the past, and little more—they should have an expiration date.'”xxxix
One theme that persists among these examples is volume, and certainly we all recognize the
value of experience; another theme is consistency, and medicine has begun to recognize the importance
of a consistent approach, as demonstrated by the literature on cognitive biases, diagnostic aids,
cognitive forcing strategies and checklists. But there are surgeons with years of experience who do not
achieve 1/500 hernia repair failure rates. CF centers all over the country see similar numbers of
patients, but most do not achieve >100% predicted average lung function. The true difference is
highlighted in this comment from The Bell Curve:
“Unlike pediatricians elsewhere, Matthews viewed [emphasis added] CF as a cumulative
disease and provided aggressive treatment long before his patients became sick.” Similarly, his
protege's success is attributed to his belief that “excellence came from seeing [emphasis added], on
a daily basis, the difference between being 99.5-per-cent successful and being 99.95-per-cent
successful.”xl
The main difference at the top of the bell curve is a difference in vision. The physicians who
live and work at the top of the curve see the illness or the treatment process differently, in a way that
causes or forces them to approach each patient, and indeed each intervention, differently. The story
they tell themselves about the disease is different, and consequently, their story of optimal treatment is
different, and their results are different.
The hernia surgeons at Shouldice see the process of hernia surgery differently, they see the
importance of consistency on improved outcomes, and the act in accordance with that vision and now
(in retrospect), the effect of acting on that vision is apparent. Matthews saw CF differently than anyone
else at the time; even now, his protege Warwick sees each patient's outcomes on a different scale, they
see the potential of CF patients differently, and the actions they have taken in accordance with that
vision now (in retrospect) speak for themselves.
At the top of the Bell Curve, the view is just ... different. Studies show that our cardiac exams
are, on average, inaccurate. Should we stop doing them? Where is the evidence for the accuracy of
these? But some cardiologists are rather accurate with the echocardiograms they carry between their
ears. Should they also stop practicing these esoteric skills? Because there is no “evidence”?
Consider another case – there exists an immediately life-threatening illness for which there are
two surgical treatments. Treatment A extends the life expectancy by 40-50 years, but has late
complications that are thought to be due to surgical techniques. Perioperative mortality risk is about
6%. Treatment B is newer, and theoretically should avoid the late term complications. In the first
series of 77 cases, 18 children died perioperatively (23% mortality risk), and late-term data is as yet
unavailable. Which would be the preferred treatment in our EBM society?
The treatments described above, however, are the Senning/Mustard corrections of d-TGA and
the Jatene arterial switch procedures, respectively; the latter is the current standard of care, and indeed
the procedure has extended life expectancy by an additional 20 years beyond that offered by the
Mustard procedure, and now offers equivalent or lower perioperative risksxli,xlii 22, xliii.
But the Jatene procedure was not standard when first introduced. Indeed, it took 10 years from
the first successful arterial switch until the procedure was considered standard of carexliv, 23. In
between, for 10 years, some patients benefited by an average of 20 years or life due solely to the
placement of their surgeon on the bell curvexlv (by contrast, two-stage late arterial switch has had poor
performance, resulting in a high death rate after 12 years of age ... the physicians performing these
procedures were also on the high end of the bell curve, but realized after relatively few attempts
(largest case study was 35 patients) that the procedure was not yielding good resultsxlvi.
This is true throughout the historical surgical literature. Prior to the first Blalock-Taussig shunt
surgery on a 15 month-old girl, Blalock performed exactly one animal practice procedure (he relied on
his assistant, Thomas' experience)xlvii, and in 2009, despite the prevalence of the procedure for more
than half a century, systematic reviews were scantyxlviii. Reading “Evolution of Cardiopulmonary
Bypass”, one feels as if the two most common phrases are 'it was diastrous' and 'the patient died on the
operating table'. It is an interesting, if somewhat disturbing read. Between 1951 and 1955, 18
published attempts were made at bypass operations; 17 of those patients died. Dodrill tried autologous
oxygenation and isolated left and right heart bypasses for a total of 4 cases (one right, 2 left, one total,
the latter died), then abandoned the technique. Mustard tried rhesus monkey lungs and rubber ballon
pumps in seven patients, all of whom died. The article states, “Dodrill's [technique] and Mustard's ...
oxygenator were not thought by others to be the path to success, and both methods were abandoned.”
Gibbon developed his bypass machine and operated 4 times, with one success (though the
failures were often due to preoperative misdiagnosis). Lillehei operated 48 times using cross
circulation with a parental oxygenation donor; there were 28 survivors and 2 major donor adverse
events, one of which resulted in permanent harm (due to the anaesthesiologist accidentally pumping air
into the IV). His assistant, DeWall, was assigned to work on the problem of developing an artificial
oxygenator to allow the removal of risk to a donor, and to allow the higher flow rates needed for larger
children and adults.
Lillehei's instructions to DeWall are telling: “First of all, do not go to bubble oxygenator
systems because they have a very poor record of success. Second, avoid libraries and avoid literature
searches, as I want you to keep an open mind and not be prejudiced by the mistakes of others.”
Incidentally, the first successful heart-lung bypass machine, the DeWall oxygenator, was a bubble
systemxlix.
DeBakey's entire career might stand as an example of this kind of thinking: he went to the
department store to buy some nylon, but all they had was this new Dacron stuff and “it looked pretty
good to me, so I bought a yard of it” - the first Dacron graft patient was treated for AAA and lived 13
yearsl.
So perhaps we should learn to interpret the evidence differently. Should we say that we ought
to be better than we are, on average, at cardiac auscultation? Should we say that certain cardiac
surgeries should work, that they must be made to work? Should we say that the top of the bell curve
has shown us what is possible, but it is now our responsibility to move the average rightward? What is
the theory that allows us to interpret the evidence?
22 Interestingly, Mustard was a pediatric orthopedic surgeon who was asked by a senior surgeon to consider starting a
cardiac surgery program; he studied with Blalock at Hopkins for one month, and subsequently pioneered several cardiac
surgeries, some of which were successful.
23 How many years of life were lost? The number of births in those 10 years times the incidence rate of d-TGA, figured on
a logarithmic curve approaching 1 to estimate the relative rates of each procedure over time, times 20 years per patient.
Do you care what the number is? If I said 20 years of life lost versus 40 or 400? What number makes this slowness
acceptable in the name of appropriate caution?
This same reasoning pattern can be applied equally to any intervention or strategy that has
“insufficient evidence”, or “no evidence of benefit”, or even, like the cardiac surgeries above, ... dare I
say it? “evidence of harm”. Should we stop? Or should we, on average, improve ourselves until we can
create evidence of benefit? The worth of pursuing a treatment lies not in the immediate outcomes, not
in the evidence, for or against, but in the logic of its model. Outcomes are merely a means of refining
that logic.
Herein lies the value of anecdotal evidence or case series or single events and especially expert
opinion - proof of concept, or lessons from failure. Bypass surgery was made possible by a series of
failed attempts, and often, perseverance in spite of discouraging numbers. Acupuncture for pain
control may have “no evidence of benefit”, but there was a surgery that was done with only
acupuncture as anaesthesiali. There are patients that have gotten significant benefits that were not
achieved by their evidence-based (or evidence-bound) practitioners. Maybe we should look into the
potential of excellence rather than rejecting the failure of averages.
This is a fundamental problem with our current evidence-based model - it does not account for
individual excellence beyond the mean, largely because the model is designed solely to detect the
differences in average effects of different chemical compounds in standard doses24. Perhaps the
average acupuncturist is no better than placebo, but what about the best acupuncturists in the world?
What would their success (or failure) say about the presence (or absence) of energy meridians in the
body? Not that they exist, perhaps, but that belief in their existence may lead to patient benefit.
The reason we are so quick to reject the average failure of acupuncture rather than the average
failure of cardiac auscultation is because the former's basic model does not fit within our own, much
the way casual wine consumers' model of “good wine” is different than a connoisseur's. Several tests
have been done for wine tasting, and in fact, a taster's ability to reproducibly blind-select a wine from
amongst others is a part of the testing for master sommelier certification (there are only 177 people
who hold this certification worldwidelii). By contrast, the average consumer cannot reliably distinguish
coca-cola from pepsi if given a 1 out of 3 blind tastingliii,25. A failure of averages is not a failure of
concept. A proof of concept, however, can hinge on a single profound example26.
By similar analogy, there was a magician named John Scarne who had developed what he called
a magic trick. He could, on demand, cut directly to an ace. He could repeat this trick endlessly. The
effect is described:
“McManus cut the cards, stood back, and said, “Again!”
Silently there under the glare of lights John worked ... and worked. He worked seven nights in a row ... $200 an
hour for practicing what he liked to do best – and at last one of A.R.'s men broke.
“All right” he blurted, “that's enough; now, how do you do it?”
Scarne had expected it. “The only possible way,” he said. “You notice I always give the deck one riffle myself.
When I do it I count the cards so I can see the indices. When the ace falls I just count the number of cards that drop
into place on top of it. Then when I cut I count down that number of cards and break the deck there, and of course
there's the ace. That must be obvious now, isn't it?”
There was a long silence.
“If that's it,” Rothstein said, “it's uncanny.”
“You can do it easy, if you practice three or four hours a day, in – hmmm – twenty years,” said Scarne affably.
“And you're how old?” murmured McManus.
liv
“Nineteen. But,” John Scarne added hastily, “I've been practicing ten hours a day.”
Once again, excellence, shows us the potential we all have, but will not appear in large studies of
averages. What Scarne was able to do looked impossible, and was accepted only as a magic “trick”
24 One considers in passing the valid, if clichéd, argument along similar lines, that EBM has great difficulty in
commenting on outcomes that are important, like quality of life, because they are difficult to reduce to statistics.
25 Average consumers are able to differentiate coke and pepsi head to head, because it relies on simpler discriminations,
like sweetness, rather than on exact characterization and memory for particular flavors.
26 In some sense, this is the application of Bayes' Theorem to the scientific method – the contrasting example always has a
larger effect on the post-test probability than the confirming example.
initially, because no one could conceive of that level of skill27.
The difference in acceptance lies solely in the prevalence of the dogma underlying the model in
question, and in the relative power of the groups that promote each. When we mistakenly accept EBM
not as an aspect of truth finding, as a valuable way to explore information about averages, but as a
policy, as the (sole) way of determining what is funded or promoted or taught or performed, we push
EBM beyond the bounds of its capabilities and destroy our opportunities to see our patient populations
from a viewpoint that could push our treatments away from the average.
There is a CDR (cognitive disposition to respond, or bias) known as the aggregate bias – the
idea that group data doesn't apply to an individual patient. By analogy, there is an error that I call the
House fallacy – thinking that you as a physician, are better than the evidence. But adopting EBM as
policy makes that error in reverse – it assumes that all doctors are interchangeable, that there are no
Houses, or DeBakeys or Oslers in the world. When EBM becomes policy, that is, when it becomes our
explicit ideal in the practice of medicine, we cut off the benefits of having Houses in the population of
physicians that could lead the rest of us up the bell curve. EBM as policy accomplishes nothing but
purchasing a dependable mediocrity at the price of uncertain excellence.
Though the AHRQ states it with a positive spin, “One goal of quality improvement efforts
nationally is to reduce differences in health care quality that patients receive in one state versus
another.”lv, one can see the dangers of reducing variability. This is the most significant objection I can
make of adopting EBM as one's gold-standard of practice propriety – that the sole effect of that
adoption is not an improvement in care, but a narrowing of the bell curve around the average
(perhaps this is why the benefits of EBM have been so difficult to show), and thereby gives us the
comforting illusion of certainty and dependability.
Our desire for certainty has led us to adopt a policy that encourages the perception of certainty
through the absence (or ridicule) of dissent. The driving force behind the acceptance of EBM as policy
is a desire for certainty where none exists. Our rate of misdiagnosis at autopsy has not changed in
decades, despite the advent of advanced scanning, ultrasound, and serologic testinglvi; but our certainty,
our confidence in our diagnosis has gone up. Part of that is misunderstanding the post-test probabilities
offered by our testing, part is a social concern of availability of peers who would testify to
appropriateness of our actions (though it is notable that Kaiser Permanente has twice been successfully
sued for failing to approve treatments on the contention that they were “experimental”lvii), but part also
is an innate human need for certainty. It is that need that pushes us to only use “known” or “proven”
treatments, though each of those concepts is itself shaky at best.
Even if one were at peace with trading excellence for certainty in the short-term, it has always
been by examining excellence that betterment of the average has been achieved over time. What
happens to progress when our policy is that no one should be excellent, because we are afraid of
anyone being sub-par?
This leaves us with a problem – politics, like nature, abhors a vacuum. What policy can take
the place of the ubiquitous evidence based medicine?
As an osteopathic student, I've seen some strange, nearly unbelievable things; I've seen
strabismus treated non surgically in 15 minutes, I've manually treated viral meningitis, recurrent
tonsillitis, pain from a crohn's flare, edema from right heart failure. And while placebo effect is a
plausible explanation for any one of these, if one chooses to explain away all of these events as
placebo, the conversation does begin to resemble that of the skeptic regarding the pig. Perhaps these
concepts only have value in the hands of a master; until one achieves similar mastery, it's a bit silly to
27 I, like 99% of the world's population, cannot hold an iron cross, but most professional male gymnasts can. Examples of
the failure of averages exist in any field where one considers the existence of excellence. If one admits to the existence
of average-defying excellence, however, one dispels the myth of physician interchangeability, and what do patients think
of the average physicians then?
scorn an excellent cardiologist's persistence in carrying around that old-fashioned doohickey around his
neck, when everyone “knows” it doesn't work. Likewise, consider IVUS or stereoscopic stethoscopes
or 80-lead EKG28.
At the top of the Bell Curve, one stops being concerned with what happens in the middle of the
curve because frankly the same rules don't apply. When one can fix strabismus without surgery, or
remove longstanding severe scoliosis pain without medication, one wonders what else one can do.
Gregory House doesn't really care about the average doctor's opinion. Dr. Lillehei didn't pay
attention to those who believed that the heart was simply not amenable to operative correction29. Dr.
Speece30 doesn't care about the multicenter RCT for osteopathic treatments. Dr. Shah 31 doesn't care
about the studies on average stethoscope diagnostics. They're better than that. At the top of the bell
curve, one isn't looking at the middle, one is always looking for the next improvement, the next step,
the next idea in accordance with the vision.
Those at the edges of the bell curve have little or no evidence for what they do. They pursue
their treatment plans on the basis of their individually developed ideas and stories, and ultimately the
evidence is generated after the fact to justify or repudiate their viewpoint. Often irrespective of the
efficacy of individual interventions, their results are reflective of the efficacy of their viewpoint as a
whole. When we tell the story of a disease and its context in the physiology of the body, the efficacy of
that story is what ultimately determines our success and our quality as physicians.32
Suppose for a moment that cranio-sacral treatment for low back pain is completely bogus. That
there is in reality, no inherent value to cranio-sacral treatment. But suppose one believes (mistakenly)
that there is value, and persists in applying these treatments to patients. What will happen? Overall, the
same 95% of patients will get better that would have gotten better anyway, with one difference – the
physician's belief will have caused some transference to his patients, resulting in an effective placebo
treatment. Consequently, he will have achieved short-term pain control with fewer medications and
fewer side effects than another physician who did not have the same beliefs. Even though his belief
was false, it led to an efficacious treatment plan, with fewer side effects than his peers. This
improvement in care will not be captured in a subsequent sham-controlled study.
Suppose now that cranial osteopathy does have value, but only in the hands of an expert, much
like carotid endarterectomy. The multicenter RCT will not show value, but the master's personal
patients will show an effect. Hence, CEA data is generated only from specialty centers – proof of
concept. It is then the responsibility of each surgeon to track his or her efficacy and to compare it, not
to the average, but to the best33. If we stop training osteopaths in this method, or stop recommending
this method, or stop paying for this method of treatment, we lose the potential for benefits without side
effects that cranial osteopathy could provide, had we responded in the same spirit as when we resolve
to keep trying to improve open heart surgery, until it works34.
Rather than slavish adherence to the mere observational nature of EBM, perhaps we should
engage something one might term “science based medicine”, or “theory-based medicine”, wherein we
engage the natural uncertainty of the scientific method by treating patients on the basis of the stories
As Mark Twain wrote, “A person with a new idea is a crank until the idea succeeds.”
A quite common opinion at the time.
One of my osteopathic mentors, author of the textbook on Ligamentous Articular Strain techniques.
A pediatric cardiologist in Peoria, where I trained, who had a remarkable talent for clinical and auscultative diagnosis.
Admittedly, other qualities also play a large role, such as ingenuity/creativity, rapport, and consistency/diligence. But
ultimately these are all modes of the quality of application of a physician's disease or treatment 'story'. The underlying
factor is the story itself that drives the direction of those applications. If the car is headed the right way, even if it doesn't
have the best MPG, one will still get closer to the goal than if one is driving a fuel-cell car efficiently in the wrong
direction.
33 Thanks, Dr. Nabeel Rana, vascular surgeon, for discussing the implications of performing CEA from the perspective of a
responsible individual surgeon.
34 In both cases, his story of back pain treatment is more efficacious (even if less true) than the story told by his peers.
28
29
30
31
32
we tell ourselves about illness and wellness, and we follow the outcomes of treatments based on these
theories and model ourselves after successes (positive deviation from the mean)35.
As discussed above, data is used to create a theory or model and then to test the implications of
that theory. In order to say we know something, we must be able to say clearly not only what we mean
by knowing, but to explain clearly what it is that we know, think, or suspect, which takes the form of a
theory/story, hopefully founded on empiric (observed) evidence. In short, we must use evidence to
create or recreate our story/model of disease and its treatment, and we must base our treatments on that
model and follow the results.
Any idea for treatment that is reasonable, that makes sense within that model would then be
considered a valid intervention. To be sure, some of those ideas won't work out. Cardiac or renal
stenting seemed like great lifesaving ideas, but turned out not to have as much benefit as anticipated;
ppar-gamma agonists turned out not to work as expected; and Mustard's rhesus monkey lung
oxygenation was a failure. This process has a name, it's called being wrong. It is an essential
component of scientific progress.
And make no mistake, patients are harmed and lives are lost as a result of our errors. But the
only thing worse than making an error is allowing our necessary errors to make us so timid that we are
unwilling to venture away from the average. One would expect that for all the success stories of
bypass surgeries, there were an equal number of surgeons who had ideas that just didn't work. Those
surgeons killed people, maybe hundreds, persisting in attempting to prove an idea that was simply
false. But those are the necessary losses of engaging in progress. It's a cold view, and one that patients
and their families do not want to hear, any more than they want to hear that grandma died from a fatal
head bleed while on coumadin that was started on the basis of a 34% chance of pulmonary embolus.
But that is the nature of reality; that is the nature of science; that is the nature of the mechanism of
progress; and when we start basing our policy on that nature, we have a chance at representing our
patients' interests rather than the interests of whichever company, policymaker or grantwriter has the
most cash.
While one cannot imagine returning to a time of wild experimentation without followup, by
adopting this theory-based model, this policy, this outlook, we can reverse the mistake of EBP and
widen the bell curve again, we can increase the variability in our practice patterns, and then we can
reap extreme benefit from the original concept of EBM – the concept that seemed so promising, that
led us to adopt EBM – that we can, should, must, follow the results of different outlooks, different
visions, follow them closely and conscientiously, and pursue the theories that seem most promising.
Charting our results and noting when ideas do not pan out as expected is a valuable learning
tool, if we use that failure to reinvent our model. If we merely implement those observations as
“evidence”, we lose an opportunity for far reaching changes to our treatment models that could vastly
improve our patient care. We don't need n values of 20,000. We are trying to prove important things –
things that matter. We need to pursue disease oriented evidence and pharmacologic evidence,
implement strategies that make sense, and then follow the results of that implementation retrospectively
to inform our model for the future.
D. Craig Brater and Walter J. Daly (2000), “Clinical Pharmacology in the Middle Ages: Principles that presage the 21 st
century”, Clinical Pharmacology & Therapeutics 67 (5), p. 447-450 [449].
Walter J. Daly and D. Craig Brater (2000), “Medieval contributions to the search for truth in clinical medicine”,
Perspectives in Biology and Medicine 43 (4), p.530-540 [536], Johns Hopkins University Press.
ii Goodman, K. Ethics and Evidence-Based Medicine: Fallibility and Responsibility in Clinical Science. 0521819334
Excerpt retrieved from http://assets.cambridge.org/97805218/19336/excerpt/9780521819336_excerpt.pdf. On 8/30/10.
iii Cochrane, AL. 1931-1971: a critical review, with particular reference to the medical profession. In: Medicines for the
i
35 Gawande discusses modeling “positive deviation from the mean” as a process of improvement in his book Better. See
citations in endnotes.
year 2000. London: Office of Health Economics.
iv White, Kerr. “Archie Cochrane's legacy: an American perspective” in Non-random Reflections on Health Services
Research: on the 25th anniversary of Archie Cochrane's Effectiveness and Efficiency.
v Goodman, K. Ethics and Evidence-Based Medicine: Fallibility and Responsibility in Clinical Science. 0521819334
Excerpt retrieved from http://assets.cambridge.org/97805218/19336/excerpt/9780521819336_excerpt.pdf. On 8/30/10.
vi Http://en.wikipedia.org/wiki/Placebo-controlled_study.Last Retrieved 8/30/10.
vii Http://en.wikipedia.org/wiki/Evidence-based_medicine#History Last Retrieved 8/30/10.
viii El Dib RP, Atallah AN, Andriolo RB (August 2007). “Mapping the Cochrane evidence for decision making in health
care”. J Eval Clin Pract 13(4): 689-92. PMID 17683315.
ix Complementary and Alternative Medicine in the United States. 2005. ISBN 0-309-09270-1 pp.135-136
x Keller, Helen. The Story of My Life.
xi Wilson, Duff; Singer, Natasha (September 11, 2009). "Ghostwriting Is Called Rife in Medical Journals". The New York
Times. http://www.nytimes.com/2009/09/11/business/11ghost.html. Retrieved 11/7/2010.
xii Prevalence of Honorary and Ghost Authorship in Cochrane Reviews. Graham Mowatt, MBA; Liz Shirran, MSc; Jeremy
M. Grimshaw, MBChB,PhD; Drummond Rennie, MD; Annette Flanagin, RN,MA; Veronica Yank, BA; Graeme
MacLennan, BSc(Hons); Peter C. Gøtzsche, DrMedSci; Lisa A. Bero, PhD. JAMA. 2002;287:2769-2771.
xiii Guest Authorship and Ghostwriting in Publications Related to Rofecoxib: A Case Study of Industry Documents From
Rofecoxib Litigation. Joseph S. Ross, MD,MHS; Kevin P. Hill, MD, MHS; David S. Egilman, MD, MPH; Harlan M.
Krumholz, MD, SM. JAMA. 2008;299(15):1800-1812.
xiv Sismondo S (September 2007). "Ghost management: how much of the medical literature is shaped behind the scenes by
the pharmaceutical industry?". PLoS Med. 4 (9): e286. doi:10.1371/journal.pmed.0040286. PMID 17896859.
xv Braise, Twila, RN, PHN. “'Evidence Based Medicine': Rationing Care, Hurting Patients”. In The State Factor.
December, 2008. Retrieved from http://www.alec.org/am/pdf/ebmstatefactor.pdf. Last Retrieved 8/30/10.
xvi Sleight, Peter. “Debate: Subgroup Analyses in Clinical Trials: fun to look at – but don't believe them!” Curr Control
Trials Cardiovasc Med. (2000). 1(1): 25-27. PMCID: PMC59592
xvii
Casscells,W., Schoenberger,A., and Grayboys, T. (1978): “Interpretation by physicians of clinical laboratory
results.” N Engl J Med. 299:999-1001.
xviii
Eddy, David M. (1982): “Probabilistic reasoning in clinical Medicine: Problems and opportunities.” In
D.Kahneman, P. Slovic and A.Tversky, eds, Judgment under uncertainty: Heuristics and biases.” Cambridge University
Press, Cambridge, UK.
xix Gigerenzer, Gerd and Hoffrage, Ulrich (1995): “How to improve Bayesian Reasoning without instruction: Frequency
Formats.” Psychological Review. 102:684-704.
xx Robert D. Sheeler et al., "Accuracy of Rapid Strep Testing in Patients Who Have Had Recent Streptococcal Pharyngitis".
Journal of the American Board of Family Medicine, 2002-09-04.
xxi Rheumatic Heart Disease. Author: Thomas K Chin, MD. UpDated 8/4/10. retrieved from
http://emedicine.medscape.com/article/891897-overview on 11/7/2010.
xxii
Levitt, S., Dubner, S. Freakonomics. ISBN 0-06-123400-1
xxiii
“To be complete.” N Engl J Med. Correspondence. 1979 Jan 25;300(4):193–194.
xxiv
Woolf, S. Kamerow D. Testing for uncommon conditions, The heroic search for positive test results. Arch Intern
Med. 1990; 150(12)2451-2458.
xxv
Berrington de Gonzalez A, Darby S. Risk of cancer from diagnostic x-rays: estimates for the UK and 14 other
countries. Lancet 2004; 363: 345-351. PMID 15070562.
xxvi
Brenner DJ, Doll R, Goodhead DT, et al. Cancer risks attributable to low doses of ionizing radiation: assessing
what we really know. Proc Natl Acad Sci USA 2003; 100: 13761 – 13766. PMID 14610281.
xxvii
Ron E. Cancer risks from medical radiation. Health Phys. 2003 Jul; 85(1): 47-59. PMID 12852471.
xxviii Braise, Twila, RN, PHN. “'Evidence Based Medicine': Rationing Care, Hurting Patients”. In The State Factor.
December, 2008. Retrieved from http://www.alec.org/am/pdf/ebmstatefactor.pdf. Last Retrieved 8/30/10.
xxix
Fitzpatrick, Michael (2008). "Taking a political placebo". Spiked Online. http://www.spikedonline.com/index.php/site/article/5342/. Retrieved 2009-10-17. Retrieved from http://en.wikipedia.org/wiki/Evidencebased_medicine#Political_criticism. Last Retrieved 8/30/10.
xxx
Sing S and Ernst E (2008). “Trick or Treatment?”. Bantam Press. Retrieved from
http://en.wikipedia.org/wiki/Evidence-based_medicine#Political_criticism. Last Retrieved 8/30/10.
xxxi
Sackett, David L, et al. “Evidence based medicine: what it is and what it isn’t[Editorial],” British Medical Journal
(1996), 312:71-72.
xxxii
Braise, Twila, RN, PHN. “'Evidence Based Medicine': Rationing Care, Hurting Patients”. In The State Factor.
December, 2008. Retrieved from http://www.alec.org/am/pdf/ebmstatefactor.pdf. Last Retrieved 8/30/10.
xxxiii Http://en.wikipedia.org/wiki/Causality Last Retrieved 8/30/10.
xxxiv Pharmacotherapy for anxiety disorders in children and adolescents. Cochrane Database of Systematic Reviews, 6,
2010.
xxxv
Doidge, Norman. The Brain That Changes Itself.
xxxvi Gawande, Atul. Complications: A Surgeon's Notes on an Imperfect Science. 2002. Pp 39-41.
xxxvii Cebul RD, Snow RJ, Pine R, et al. Indications, outcomes, and provider volumes for carotid endarterectomy. JAMA
1998;279(16):1282-7. more studies on several operations at
http://www.mass.gov/?pageID=eohhs2terminal&L=7&L0=Home&L1=Consumer&L2=Physical+Health+and+Treatmen
t&L3=Quality+and+Cost&L4=Data+and+Statistics&L5=Physicians&L6=Volume+by+Surgeon+and+Hospital&sid=Eeo
hhs2&b=terminalcontent&f=dhcfp_quality_cost_volume_research_studies&csid=Eeohhs2 . Retrieved on 11/8/10.
xxxviii Gawande, Atul. “The Bell Curve”. December 6 2004. The New Yorker. Retrieved at
http://www.newyorker.com/archive/2004/12/06/041206fa_fact. On 8/30/10.
xxxix Gawande, Atul. “The Bell Curve”. December 6 2004. The New Yorker. Retrieved at
http://www.newyorker.com/archive/2004/12/06/041206fa_fact. On 8/30/10.
xl Gawande, Atul. “The Bell Curve”. December 6 2004. The New Yorker. Retrieved at
http://www.newyorker.com/archive/2004/12/06/041206fa_fact. On 8/30/10.
xli Gawande, Atul. Complications: A Surgeon's Notes on an Imperfect Science. 2002. Pp 27-28.
xlii Konstantinov, I et al. “Atrial Switch Operation: Past, Present, and Future” Ann Thorac Surg.2004;77;2250-8. Retrieved
at http://www.mc.vanderbilt.edu/root/sbworddocs/res_edu_thoracic/Peds10_7_04.pdf PMID 15172322. Last Retrieved
8/27/10.
xliii
Stoney, William. “Evolution of Cardiopulmonary Bypass” Circulation. 2009;119:2844-2853. doi
10.1161/circulationaha.108.830174.
xliv
Http://en.wikipedia.org/wiki/Jatene_procedure Last Retrieved 8/27/10.
xlv Gawande, Atul. Complications: A Surgeon's Notes on an Imperfect Science. 2002. P.28
xlvi
Evaluation and Management of the adult patient with transposition of the great arteries following atrial-level
(Senning or Mustard) repair. Nature Clinical Practice CardiovascularMedicine (2008) 5, 454-467. Retrieved at
http://www.nature.com/nrcardio/journal/v5/n8/full/ncpcardio1252.html . Doi: 10.1038/ncpcardio1252. Last Retrieved
8/30/10.
xlvii
Brogan TV, Alfieris GM. “Has the time come to rename the Blalock-Taussig shunt?” Pediatr Crit Care Med. 2003
Oct;4(4):450-3. PMID 14525641.
xlviii Yuan SM, Shinfield A, Raanani E. “The Blalock Taussig shunt” J Card Surg. 2009 Mar-Apr;24(2):101-8. PMID
19040408.
xlix
Stoney, William. “Evolution of Cardiopulmonary Bypass” Circulation. 2009;119:2844-2853. doi
10.1161/circulationaha.108.830174.
l Video interview with DeBakey, retrieved from http://www.ptca.org/news/2008/0712_DEBAKEY.html Last Retrieved
8/27/10.
li Lore, Roger. “Acupuncture Anaesthesia in Surgery.” Journal of Chinese Medicine. 79. October 2005. 23-27. Last
Retrieved 8/30/10. Retrieved from http://homepage.mac.com/sweiz/files/article/79-23.pdf
lii http://www.courtofmastersommeliers.org/sommeliers.php Last Retrieved 8/30/10.
liii Gladwell, Malcolm. Blink. 2005. First Back Bay trade paperback edition, April 2007. pp185-186.
liv Scarne, John. Scarne on Cards. 1949. Signet publishing, Pp xii-xiii.
lv U.S. Agency for Healthcare Research and Quality. “Key Themes and Highlights from the National Healthcare Quality
Report,” http://www.ahrq.gov/qual/nhqr07/Key.htm Last Retrieved 8/30/10.
lvi Gawande, Atul. Complications: A Surgeon's Notes on an Imperfect Science. 2002. p.197.
lvii Sugarman, Mitchell. "Permanente Physicians Determine Use of New Technology: Kaiser Permanente's Interregional
New Technologies Committee" cited in http://en.wikipedia.org/wiki/Evidence-based_medicine#cite_note-33. Retrieved
at http://xnet.kp.org/permanentejournal/winter01/HSnewtec.html. Last Retrieved 8/30/10.
Download