The Limitations and Problems of Adopting Evidence-Based Medicine as an Ideal. Zain A. Hakeem, DO, PGY-4 11/09/10 Evidence based medicine has become quite an entity in our education. It is so prevalent that it sometimes seems as if it has always been present, and yet we are aware that the concept is of relatively recent origin. The phrase “evidence-based” now has such cachet that it can be used as a descriptive qualifier for physician actions, thought processes or even general quality of care. We are even wont to use it as a personal description or denigration on occasion, as in “Dr. so-and-so? Well, he's nice enough, but he's not very evidence-based.” The importance of the concept of EBM and the size of the role it plays in both our education and in the practice of physicians throughout the country and the world is growing constantly. But, as with most models, the evidence-based practice model has a number of limitations that require address for several reasons: First, to encourage doubt, since a skeptic's doubt is fundamental to clear reasoning and is a necessary condition for questioning, especially in the face of the authority that the evidence-based model has assumed. Second, to question that authority, and to show that evidencebased practice, in the form that is commonly promoted, actually undermines the scientific method, or at least eschews physicians from engaging in that method. Third, to promote a return to the spirit of openness that characterizes good science and has led to so many medical advances. To fulfill these reasons, after discussing the history and evolution of EBM as policy, this paper will begin by addressing some practical limitations of evidence based medicine, that is, those limitations or problems that can (at least theoretically) be overcome with time, effort and/or education. These include problems of access and application, both at the individual and the socio-political level. After clarifying these problems by way of contradistinction, the paper will go on to discuss the main philosophical problems with the adoption of EBM as policy, including problems of epistemology, regression to the mean, and progress from excellence. Finally, the paper will briefly sketch the outline of a model that could encompass the good aspects of EBM while avoiding some of the pitfalls. To begin an exploration of evidence-based medicine, it is helpful to begin with an understanding of its origins and evolution. As far back as Avicenna or longeri, physicians have sought evidence of efficacy, if only to compete with each other. This is expected; as in any human endeavor; as one author put it, the need for evidence that our decisions are valid or beneficial is a cognitive itch that we seek to scratchii. Calls for an evidence matrix on which to base decisions came as long ago as the 18th century. But the founding push for that development came in 1972 from Archie Cochrane, a Scottish epidemiologist who expressed his dismay most aptly: “It is surely a great criticism of our profession that we have not organized a critical summary, by specialty or subspecialty, updated periodically, of all relevant randomized controlled trials.iii” A contemporary who estimated that 1520% of physicians' decisions were based on evidence received this rejoinder: “you're a damned liar, you know it isn't more than 10%.iv” Due to the efforts of Cochrane and others, the concept of evidence based practice became increasingly prevalent in medical practice, though the form of that practice remained quite different – the statistical methods were less sophisticated, and although the idea of outcomes research has been done since 1834 (Louis' “Numerical Method”, later commended by Osler)v, placebo control has been present since 1863 and controlled trials have been done since at least 1747 (scurvy)1, true randomization was first introduced as late as 1928vi. The process of generating and analyzing medical evidence led to the creation of a hierarchy of evidence, exemplified by the evidence pyramid, which denigrates expert opinion as the “lowest” form of evidence . A telling statement in the wikipedia article on EBM highlights this clearly - “EBM aims for the ideal that healthcare professionals should make 'conscientious, explicit, and judicious use of current best evidence' in their everyday practice. Ex cathedra statements by the 'medical expert' are considered to be the least valid form of evidence. All 1 The reference “limey” evolved from British sailors' requirement for eating citrus juices while aboard a vessel, based on this study which used 6 different concoctions in 6 pairs of sailors with scurvy and noted profound improvement only in the citrus juice group. An n=12 would hardly be considered profound today, but it changed history. 'experts' are now expected to reference their pronouncements to scientific studies.”vii Ultimately, however, an accessible body of evidence began to evolve, which physicians could use to guide their decision-making, which was further refined by computerized access techniques. The so-called information explosion (one might say the “noise” explosion) is upon us. Yet a 2007 review of the cochrane database found that while 44% of the reviews found the intervention in question “likely to be beneficial”, 49% concluded that the evidence “did not support either benefit or harm”, and only 7% concluded that the intervention was “likely to be harmful”. Interestingly, 96% recommended further researchviii(a 2004 review of alternative medicine Cochrane reviews found 38.4% positive, 56.6% insufficient evidence, 4.8% no effect, and 0.69% harmfulix)2. The presence of this body of evidence, however, became a double-edged sword with the progressive increase in health care costs. As costs rose, and companies sought ways to control those costs, the question arose whether payment should be offered for therapies that were known to be harmful or less effective. With globalization has come awareness of other medical traditions, which are untested and unknown, and with rapid advances in science and technology has come the “moving target” problem of generating up-to-date evidence, which highlights the essential challenge to an evolving body of evidence: The need to make decisions now, when clear evidence is unavailable. The atom bomb was not used until 20 years after Einstein's theory was published and multiple experiments had given evidence that the theory had better predictive capability than those that came before. Similar processes occurred in other technological disciplines. A theory is proposed, experiments are designed to test it, observations are made, and if the theory holds, then attempts are made to extrapolate technological advances from the theory. Medicine faces a peculiar challenge as a science, because we must make decisions in real time on patients that present to our care now.3 In general, however, the goals of evidence-based medicine seem reasonable and practical; indeed, bordering on mundane – to look closely and conscientiously at the outcomes of our interventions, to have guidance from others who have gone before us on how to make decisions that will benefit our patients, and avoid choices that will harm them. In some ways, this is the essential survival characteristic of humanity, the main function of language – the dissemination of experience (Helen Keller describes her first understanding of language as the moment she became human)x. That being so, what problems could such a reasonable goal entail? *** The first problem (and arguably most minor) problem with evidence based medicine is that the pot doesn't hold itself. If evidence based medicine is a clearly superior approach to practice, then those persons, groups and institutions that practice evidence based medicine most accurately and most conscientiously should have the best outcomes – and if there is in fact a large difference, a small n value should be sufficient to give a study enough power to detect that difference. But in fact, this has been difficult to show. One may pass over this issue as minor, but it deserves mention. If evidencebased practice is your touchstone, there should be evidence for it. There is one particular problem with EBP that this article will highlight: that despite all the good inherent to EBM as a concept, adopting EBM as a touchstone, an ideal, or a policy is fraught with the danger of destroying our ability to achieve excellence, and thereby to provide better care for our patients over time. This problem is analogous to the question raised by teaching residents – are we prepared to offer substandard care once, twice, three, ten or a hundred times in order to guarantee that we provide better care overall and better care in the future4? After highlighting this question, the author 2 If I were a pharma rep for alternative medicine, I'd probably say that this shows that standard medical practice is 1000% more likely to be harmful. 3 One might argue that climate science faces a similar problem. 4 I make two assumptions, both of which are fairly well substantiated – that inexperience leads to lower quality care, and that teaching hospitals have better outcomes overall. Furthermore, if no one ever lets the inexperienced physician practice, what happens when all the experienced physicians finally die? will attempt to provide an alternative to EBM that captures the benefits of rigorous retrospective observation without preventing progress or excellence. Before starting, however, one is forced to consider several other problems with EBM, to clarify the objections which are not of interest, so that one may clarify the primary problem. These other objections constitute what one may consider the “practical” problems of the evidence-based approach, since however troubling they may be, they are nonetheless merely technical difficulties, much the way programming a computer is trivial compared to inventing the concept of digital computing. This presentation will only cover these practical problems briefly, both because they are well discussed elsewhere, and because their solutions are comparatively easy (practical problems have practical solutions). After mentioning these problems, the presentation will turn to the deeper problem of the philosophical underpinnings of evidence based practice and firm footing in the scientific method. The question one must ultimately consider is this: if all practical problems were swept away by the forces of time, money, and human will, would implementation of EBM as policy stand as a goal worthy of such effort on its behalf? *** The first serious practical problem, and perhaps the most obvious, is that of obtaining evidence. Most of our research methods were designed to address chemical treatments, and when applied to nonchemical interventions (osteopathy, psychotherapy, surgery, etc.), may not give an accurate picture of the importance of the intervention. Newer research methods are being developed to address these interventions. Another commonly cited problem with obtaining evidence is the so-called “parachute problem”5. Just because there isn't evidence that one needs a parachute when jumping out of an airplane, it's not reasonable or ethical to conduct a blinded trial to test that perceived need. Many gaps in our knowledge cannot be filled without rather Nazi-ish lack of ethical regard6. Still more troubling is the interference of special interest groups in creating “data” that promotes their commercial benefit. The science of medicine has been, and continues to be corrupted by the business of medicine. A recent review indicated that 8% or more of articles in a major peer-reviewed journal were in fact ghostwritten by pharmaceutical company representativesxi,xii,xiii,xiv. There have been studies that were later discredited when their authors admitted to falsifying dataxv. Finally, even when appropriately chosen, appropriately performed and published studies are done, there rises the problems of access and interpretation. Evidence-based practice requires physicians to have a knowledge and skill set regarding the evaluation and use of sources that purport to be evidence. The residency education process is being updated constantly to incorporate teaching these skills, with ongoing EBM-focused lecture series. This practical aspect also includes problems of money and access, such as the budget cuts that are forcing university libraries to decrease their holdings of both print and online journal access, and problems of time, such as the time and education needed to perform a complete decision analysis. In the aforementioned skill set also lies the problem of evaluating subgroup and post-hoc analyses. Many studies that are cited as evidence of particular benefits were actually not positive for their main test, or were not powered sufficiently for even their main test7. Citing even an adequately powered study's subgroup analysis is fraught with peril, as elegantly shown by Peter Sleight's 5 Thanks, Dr. Matt Mischler, I had not heard this term before, but it summarizes the problem quite expressively. 6 Thanks, Dr. Atif Shanawaz, for the following citation, contained in an e-mail he titled, 'When Men Were Men'. In this 1880 paper, three aneurism(sic) patients were placed on enforced bedrest and “Their diet was reduced till it was found that their health was suffering.” then increased until health was maintained, in order to determine minimal nitrogen excretion. Other similar starvation studies are cited in the paper. "On the amount of Nitrogen excreted by Man at rest," by Samuel West, M.B. Oxon, and W. J. Russell, Ph.D., F.R.S. 7 I'm lookin' at you, Pediatrics illustrative analysis of the ISIS-2 trial results (aspirin vs. placebo in AMI) by subgroups of astrological sign. Two of these subgroups, Gemini and Libra, did not show a benefit of aspirin over placeboxvi. This is the nature of statistics. Even a p<0.05 carries a 1 in 20 chance that the results are not reflective of a true association. This seems a reasonable risk until you consider grouping all studies into sets of 20 and realize that if 1 study in each set were randomly declared invalid, we could have a large problem. This effect seems even more problematic with high NNTs – if a treatment only benefits 1 in 100 patients, and there is a 1 in 20 chance that even that benefit is not real, but a statistical happenstance, the importance of that treatment becomes a bit difficult to argue8. Several studies have shown that physicians lack the basic statistical skills necessary to perform an accurate decision tree analysis of possible optionsxvii,xviii,xix. At the very least, physicians are frequently unable to apply Bayesian reasoning in the proper interpretation of the meaning of the tests performed, and that misunderstanding leads to increased risk to patients. For instance, a patient with a McIsaac score of 0 and a positive rapid strep antigen test has a post-test probability of ~15%; a patient with a McIsaac score of 5 and a negative RSA has a post-test probability of ~17%xx. The odd situation resulting is that the patient with the lower probability of disease is more likely to receive treatment because the meaning of the test's result is not understood clearly. The risk of chronic complications from non-treatment in both examples is around 2 in 10,000xxi; the risk of chronic or serious complications from antibiotic treatment is not well defined, but is likewise rare. Consequently, although the relative benefit of treatment in these examples is unclear, many physicians favor treatment despite a poor understanding of the risk/benefit ratio9. A more troubling practical problem is our social mileu regarding liability. As Levitt and Dubner show in Freakonomics, incentives and disincentives are large forces in altering the behavior of populationsxxii. The incentives for proper analysis and treatment in the above case is minimal, and the disincentives are large. If a patient develops c.diff colitis, even requiring colectomy, partially as a result of the antibiotics ordered to treat the positive rapid strep test, how liable is the ordering physician? Not very, since c.diff is a known (supposedly random) complication of antibiotic therapy, and it is impossible to prove a causative role in the individual case. On the other hand, if the patient has group A strep, which is missed, or the risk of rheumatic heart disease is deliberately ignored in accordance with the analysis above, how liable is the physician for not ordering the rapid strep test? One suspects a lawyer could make a pretty good case, since once the heart condition is “found”, the retrospective analysis (and blame) becomes much easier10. In short, our social structure has not come to terms with the fact that we should be missing certain cases, certain diagnoses, if we are trying to follow the evidence and maximize the overall efficacy of our diagnostic and therapeutic methods. An evidence-based, statistical approach forces us to realize that we must miss things, which is not in accordance with a justice system that penalizes missed diagnoses, but not over-testing, nor over-diagnosis, nor over-treatment (often, “informed consent” protects us even if there are immediate complications)11. Moreover, the concept of playing the odds accurately is not in accordance with a medical education system that promotes the concepts and virtues of vigilance and diligence with inappropriate phrases like “be complete” or equates “thoroughness” with “ordering all possible related tests”xxiii,xxiv. Medical education inspires an inappropriate fear of “missing something”, and a similarly inappropriate lack of emphasis on a cognitive process that embraces missed diagnoses if the cognitive process itself 8 See the WOSCOPS trial. 9 I make this statement on the basis of both the previously cited articles and an unofficial sampling of physician selfreported anticipated practice patterns during a presentation of this material at OSF St. Francis. 10 One of the hopes of promoting Bayesian reasoning is that by making these probabilities and uncertainties explicit, physicians may gain some better footing in such cases. 11 I have seen several coronary angiograms performed under the “just to be safe” rubric, despite the known rate of complications. was correct. Consider the rise of HA-MRSA and C.diff., or the recent estimate that 0.6-3% of all cancer risk to age 75 is attributable to medical diagnostic radiationxxv,xxvi,xxvii. This is a direct result of our inability to follow the statistics we know, and our perpetuation of incorrect diagnostic and treatment patterns in order to avoid liability. Even if we give a kid 8 courses of omnicef for ear infections that we know are probably viral or self-limited, it's still just an accident when his brother is hospitalized for MRSA infection. In the end, we each make a personal choice about how inappropriate our practice is going to be, based on our emotional perception of liability risk and potential modification of that risk12. But a full discussion of decision making and cognitive diagnostic processes is beyond the scope of this paper. Suffice it to say that one limitation of EBP is our society's invocation (justified or otherwise) of a fear or worry that often trumps even our best evidence-based sensibilities. The same problem in our social system manifests in another way, briefly mentioned above – the rationing of resources will inevitably occur along lines of political power, which is merely reinforcement of the status quo. The application of the status quo (in the form of practice guidelines) to an individual patient, rather than being purely medical, becomes politicized by the question of payment. Indeed, the question posed by managed care and insurance is not a truly medical question, but one of personal freedom versus social contract. The essence of any insurance system is risk and cost distribution, with weighting and estimation of risks, probable outcomes and helpful interventions being key to that essence. The concept of evidence based medicine as policy is actively being twisted into a stick-and-carrot routine for enforcing physician compliance with guidelines13. As health economist Dr. Reinhardt states, “EBM is the sine qua non of managed care, the whole foundation of it.”xxviii,14. Similarly, Sing and Ernst suggestxxix that the main appeal of EBM is “to health economists, policymakers and managers, to whom it appears useful for measuring performance and rationing resources.”xxx Fundamentally we are faced with the problem of deciding how to allocate finite resources in providing the care and services for patients. In discussing the subject with peers, this goal is generally phrased as some version of “people are free to be wacky, but not on my dime.”15 More gently, one may ask whether the communal resource of insurance money (whether obtained by premiums or by taxation) should be used to pay for treatments of lesser or questionable efficacy? Although Sackett's oft-cited definition of EBM states “The practice of evidence-based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research”xxxi, the reality of implementing evidence-based medicine as a policy, particularly as a policy of payment and incentives, belies this soothing rhetoric. As Dr. Reinhardt continues, “My fear is that medicine will slide into the same intellectual morass in which economists now wallow, often with politics practiced in the guise of science. In medicine, it might be profit maximizing in the guise of science.”xxxii But even that is a merely pragmatic issue – a large one, that will require a tremendous cultural shift to correct, but still merely pragmatic. A deeper philosophical problem is epistemology – What constitutes evidence? What constitutes proof? What determines the accuracy of a theory? Specifically in medicine, the question arises: “Is fecundity enough?” Is the fact that lovastatin has more evidence 12 Actual analysis gives a somewhat different picture since likelihood of being sued is most closely related to rapport and voice tone, whereas likelihood of jury conviction is most closely tied to degree of injury or sympathy – neither of which have anything to do with medical decision making. 13 Pay for performance sounds better than pay for compliance. 14 By the way, if anyone happens to look up this reference, it's a completely ridiculous propaganda article against political EBM. Unfortunately, despite the low quality of the article, the gentleman found an eloquent phrasing of the idea I was trying to express, and once I had read it, I could not in good conscience fail to cite the quotation. 15 Thanks to Dr. Deepak Nair for this quote. than garlic powder sufficient? Is the truth merely dependent on who can publish the most? Particularly since our concept of evidence or knowledge is statistical, if a company can afford to run a trial 1000 times, it is naturally likely that they could publish a few trials that have a p<0.01, much the way one could “prove” that a coin is twice (or thrice or four times) as likely to fall heads than tails, if allowed to only publish the series of trials that suit that premise. Proof by fecundity rapidly devolves into proof per dollar funded. Along that line of thought is the inductive error and the nature of statistics – they can only show correlation, not causation. We know this, but our current evidence-based model denies it. Just because two things often occur together does not mean that there exists a causative relation between them. We are told to follow the statistics, or even to evaluate the statistics; but not taught the deeper process of recreating the models we were taught on the basis of certain correlations, or to reject the correlation on the basis of its incompatibility with our model. There exist valid, reproducible, statistical analyses that consistently correlate the incidence of rape and ice cream production, or global warming and the decreasing number of pirates, or other variables that do in fact covariate but do not have a causal relationship. Though he condemned a major Muslim prophet, one finds a certain sympathy for the medical relevance of Pontius Pilate, apparently a fan of both epistemological questions and handwashing. While the purpose of this article is not to reproduce the full extent of discourses in philosophy of science and philosophy of knowledge or even causality, some key points will be pertinent to communicating doubt in our standard medical process of “evidence-based practice”. First, “causality is in the mind”. This concept, best known from David Hume, is simply that humans infer causes from observations16. One sees a coin move toward another, stop nearby, and the other coin continue moving, and this is seen consistently and often, and one extrapolates theories of physics, momentum, kinetic energy and deformability of solids. What was actually seen in any particular case had no causation – your mind supplied the concept and assumption of causation. What was observed is completely distinct from the explanation given for those observations. Observation alone, no matter how sophisticated, is only useful for prediction through the use of an explanation, a model, a theory, which does not come from the world, but rather is something one's mind extends toward the world. To illustrate, when witnessing a magic trick, no matter how excellently or convincingly performed, are you often inclined to rewrite or reexamine the laws of physics? Of course not, because the strength of your faith in the theories of physical causation outweighs your faith in your current observations, that is, your faith that your current observations are complete and accurate representations of actual events. The key point here is that the theory of causation in our minds is often more important to us than even direct, convincing evidence17. The concept of probabilistic causation is more applicable to medicine (though similar problems arise from quantum mechanics), but has the same basic flaw. Probabilistic causation of B by A is implied if the probability of B is increased given A. Determining relationships such as this can be accomplished via “causal calculus”, wherein interventional probabilities (e.g. P(cancer|do(smoking)) may be inferred from conditional probabilities (e.g. P(cancer|smoking)) through analysis of missing arrows in Baysian Networksxxxiii. Without getting into the depths of such analyses, which I don't fully understand, the point is that observations help make associations, but the inference of causality is always in the mind, no matter how sophisticated the method of determination used. The formation of theories from observation is essentially the entire function of the scientific method, and when we attempt to use evidence directly in patient care, we short-circuit that method. Observe, theorize, test, repeat. From Kepler to Pasteur to Einstein, this process remains the same. 16 That is, causation cannot be observed, only events can be observed, and causation is something we infer therefrom. 17 As I will argue later, this is as it should be. Kepler's intensive observations of the motions of heavenly bodies would have been useless if published as sheafs of statistical correlations; it is the laws derived from those observations that have proven useful. Indeed, without those laws, there was nothing to test – no statement to prove or disprove. Medical statistical literature often merely asserts covariance, and allows the reader to infer causation without an explicit theory for support, or assumes that the readers are all familiar with the same theory. This defeats the entire purpose of science, since theorizing is the essential function of the scientific method, specifically, theorizing the particulars of causation. One cannot apply evidence to patient care without a theory18, but by failing to make our theories explicit, we end up with poor quality theories and inconsistent application. Even more troubling, we cannot revise our theories on the basis of new evidence nor can we reject new evidence in favor of our theory, because our theory was never made clear. Take for example the Cochrane review of medication for treatment of anxiety disorders in children and adolescents. It found 22 short term (<16weeks) double-blind, randomised, placebo controlled trials which found that treatment response was greater in the treatment group (58.1%) than in the placebo group (31.5%), but that the “medication was less well tolerated than placebo, as indicated by the significant proportion of children and adolescents who dropped out due to adverse effects during the short term trials.” Reading further, the majority of trials were for OCDxxxiv. So what's the theory? That SSRIs are relaxing? That SSRIs effectively reduce anxiety symptoms in children and adolescents? That SSRIs are effective treatments for children/adolescents who meet DSMIV criteria for anxiety disorders? That SSRIs have a little better than 50% chance of reducing anxiety symptoms in a child/adolescent with OCD? Or that while SSRIs have no direct chemical effect on anxiety, the side effects of SSRIs are prominent enough to undermine doubleblinding and reinforce the placebo effect sufficiently to raise the efficacy of placebo treatment from 31 to 58 percent? Whichever theory one chooses, one cannot effectively prove that one's chosen theory has much greater support than another. To my knowledge, no study has been done comparing SSRIs with a chemical that reproduced the same side effects but did not have the same chemical action in the brain. To do such a study, one would have to have a very clear, accurate theory of the action of SSRIs. While several theories do exist, there is (to my knowledge) no clear consensus on the exact mechanism of action exists (dopamine, norepinephrine, sure ... so? Explain how that affects thinking... precisely), largely because there exists no clear consensus on the nature, function and methods of the mind19. This example is pertinent because on first glance the data seems “self-explanatory” – it seems so simple, so straightforward, until one considers an alternate explanation that also fits the data, but has completely different implications. It is not difficult to imagine that two different interpretations would lead to rather different practice patterns. One could cite any number of topics – cholesterol/statin studies or ppar-gamma studies, or aspirin, or many other topics – and have a similar discussion. The main point, however, remains the same – no matter how excellent the statistical correlation, without a theory or explanation that is clearly expressed, no science has been done and no extrapolation, prediction or application can follow. This process of story formation is the entire purpose of the scientific method – not merely to gather data, but to weave that data into a model that has testable implications, then to test those implications to refine or refute the model. Application of science in medical practice can only be managed on the basis of a theory – data is not enough. Whatever concept of the scientific method we adopt, whether Popper's falsifiability or Bayesian networks' elucidation of probabilistic causal relationships, we must adopt some concept and use that concept as the basis of our practice. 18 Since evidence is merely a collection of data, unless one has some theory, one could never apply that evidence to a set of patients, any more than one could apply the number one, or the observation “that apple fell”. 19 Again, I will not delve into the realm of philosophy of the mind – there are arguments for and against several theories. Just as with the magic trick above, we may find that the logic of a particular theory is so appealing that we would rather doubt the validity of some evidence than to reject that theory. When we argue about the methodological quality of various papers, we are engaging this process – if you believe lifestyle changes can make a difference in obesity, you will interpret papers that way, and critique the methods of papers promoting genetic explanations of weight gain; the reverse is true if you think that genetics is the controlling factor. This common process, when two opposing sides cite the literature that supports their view and critique the methodology of papers with contrary findings, is the scientific process at work. This debate of theories on the basis of seemingly conflicting evidence represents the successful implementation of the scientific method on incompletely understood and incompletely observed phenomena. *** Of the cochrane SSRI analysis above, one is led to ask (of the meta-analysis or of each study within), “What did this study really test?” The obvious answer is that it tested the efficacy of SSRIs in treating anxiety in children and adolescents; but just as we asked, “efficacy according to whom?”, we may now ask, “anxiety according to whom?” How well did each researcher tease out the symptoms of anxiety? Did each one equally avoid asking leading questions? How precisely did the researchers interpret the DSM criteria? The researchers were not interested in the ability of SSRIs to treat nausea or abdominal pain, but how well did they differentiate these symptoms from anxiety? Ultimately, the function of applying the DSM in this study was to separate or “sort” the patients into a group that was amenable to study. One can postulate that some researchers were more effective than others at accomplishing that sorting; that is, some researchers more accurately identified those patients who would respond to treatments directed at reducing their anxiety. One could postulate that other researchers were less accurate, but tended in particular to confuse depression symptoms with anxiety symptoms in children. Others may have tended to mistakenly include children with normal degrees or types of anxiety in the treatment group. Each of those researchers might get different results from a trial of SSRIs. And that difference is not trivial. Ultimately, medicine is a process of sorting patients into treatment (and prognosis) groups; the accuracy of this sorting process directly affects the outcomes seen. Naturally, the excellence or mediocrity of the diagnostic (sorting) process profoundly determines the efficacy of any intervention over a population. Interestingly, this “sorting” can be considered as fundamentally procedural. It's a slow procedure, performed over a population rather than over a single person, but it is a procedure, a series of decisions and actions, nonetheless. Being given a mixed set of colored blocks and told to sort by color, shape, and size, is fundamentally no different than being told to sort each patient who comes through the door into diagnostic groups for treatment. Some doctors will be more or less accurate than others, some more or less creative, or insightful, or aggressive, or convincing, and those differences will determine their individual outcomes just as much as the efficacy of any particular treatment modality. Like most procedures, from hernia repairs to colon cancer resection efficacy, to CABG, efficacy will fall on a Bell curve20. The usual response to this idea of variability is that these differences will average out among multiple centers, or in meta-analysis of multiple trials, and that is true. As a result, this kind of averaging is standard practice in randomized controlled trials and this is why the meta-analysis is held in such high regard. But this same process deletes vital information; specifically, information about what happens at the top of the Bell curve. We sacrifice that information in order to gain information about the average. And while it is interesting and perhaps more directly pertinent to know whether SSRIs are effective in the hands of “average” general pediatricians, it might be more interesting, or at least interesting in a different way, to know whether they are effective in the hands of pediatric psychiatrists. Similarly, some pediatric psychiatrists are simply better than others; a study done by 20 Support for this idea is provided later in the paper, so I have left the statement uncited here. those psychiatrists may well have different results than one done by “average” psychiatrists. Troublesomely, the data from the top of the bell curve is the data that most effectively informs our model – What if SSRIs don't work for OCD that is accurately diagnosed? What if truly expert psychiatrists can treat anxiety so effectively with counseling that the SSRI effect is negligible? What if they sort so accurately that the SSRI effect is halved, or doubled or tripled? What would that say about the nature of anxiety/OCD treatment and about the nature of SSRIs? Wouldn't that be more pertinent data than information about the “average”? This is the sacrifice demanded by the apotheosis of the meta-analysis. One receives information about the average, but loses information about excellence. What did the meta-analysis test? The efficacy of the SSRIs, or the mediocrity of the “average” diagnostic process? Meta-analyses certainly convey some useful information, but not all the information. Indeed, data that is most pertinent to our understanding of disease processes and optimal treatment modalities is lost in our perpetual search for a higher n-value. The sheer quantity of the n does not compensate for the loss in quality. V.S. Ramachandran (inventor of the light box therapy) is described in The Brain that Changes Itself by Norman Doidge: “He is a sleuth, solving mysteries one case at a time, as though utterly unaware that modern science is now occupied with large statistical studies. He believes that individual cases have everything to contribute to science. As he puts it, 'Imagine I were to present a pig to a skeptical scientist, insisting it could speak English, then waved my hand, and the pig spoke English. Would it really make sense for the skeptic to argue, 'But that is just one pig, Ramachandran. Show me another, and I might believe you!' ' ”xxxv Consider these two concepts – that the practice of medicine is a procedure, very slowly performed over the population of patients each physician treats, and that that procedure's efficacy will vary by physician on a Bell curve. Consider “being a patient of Dr. Zain Hakeem” as a risk factor for mortality and morbidity. Where would you fall on that Bell curve? What factors determine one's placement on the Bell Curve? To answer this, start by examining other procedures. Hernia surgeries have an average recurrence rate of 5-10%. At the best centers in the world, it is less than 1% (about one in 500). At those centers, everything about the hernia repair is different. The surgeons do nothing else – six to eight hundred a year, and the entire hospital is set up for hernia patients. “Their rooms have no phones or televisions, and their meals are served in a downstairs dining hall; as a result, the patients have no choice but to get up and walk around, thereby preventing problems associated with inactivity”xxxvi. Carotid endarterectomies are recommended to be performed only at high volume centers for similar reasonsxxxvii. Post surgical colon cancer resection, ten-year survival rates ranged from 20 to 63%, depending on the surgeonxxxviii. A more conventionally “medical” illness exemplifies this process: treatment of CF21. CF, as an area of study, has an unusually long record of following outcomes data from particular doctors. The reason is, as Atul Gawande says, “because, in the nineteen-sixties, a pediatrician from Cleveland named LeRoy Matthews was driving people in the field crazy.” He was claiming an annual mortality rate of less than two percent at a time when the national mortality rates were in excess of 20% per year; he has an average life expectancy of 21 years at a time when the rest of the country's CF patients died at 3. The CF foundation reviewed these claims, found them true, and began a registry of outcomes by center to track the results of nationally adopting Matthews' treatment guidelines. Two years later, average life expectancy nationally had reached 10 years of age. In the early nineteenseventies, the national average was 18 years, though Matthew's center's was higher. In 2003 the average was 33 years; at the best center, it was more than 47. As Gawande puts it, “There was a bell curve, and the spread had narrowed a little. Yet every time the average moved up Matthews and a few others somehow managed to stay ahead of the pack.” 21 I copy the content of this and the following section to nearly the point of plagarism from Atul Gawande's article The Bell Curve. Gawande's article chronicles his visit to an average CF center – Cincinnati Children's at Fairview, concluding that, “This was, it seemed to me, real medicine: untidy, human, but practiced carefully and conscientiously—as well as anyone could ask for. Then I went to Minneapolis.” At the best CF center in the country, consistently over almost 40 years, he found “Patients with CF at Fairview got the same things that patients everywhere did ... Yet, somehow, everything he did was different.” What was different? Gawande finds it difficult to characterize. Warwick, the director who took over from Matthews, is a remarkable character. He invented “The Vest” in his garage. He invented and uses a stereoscopic stethoscope, he's invented a new cough that he teaches to his patients. Ten percent of his patients receive G-tube feeds solely because “by his standards, they aren't gaining enough weight.” Though there is now evidence that “The Vest” is at least as effective as manual percussion, Warwick has little evidence to justify his other individual interventions: “There’s no published research showing that you need to do this. But not a single child or teenager at the center has died in years. Its oldest patient is now sixty-four. The buzzword for clinicians these days is 'evidence-based practice'—good doctors are supposed to follow research findings rather than their own intuition or ad-hoc experimentation. Yet Warwick is almost contemptuous of established findings. National clinical guidelines for care are, he says, 'a record of the past, and little more—they should have an expiration date.'”xxxix One theme that persists among these examples is volume, and certainly we all recognize the value of experience; another theme is consistency, and medicine has begun to recognize the importance of a consistent approach, as demonstrated by the literature on cognitive biases, diagnostic aids, cognitive forcing strategies and checklists. But there are surgeons with years of experience who do not achieve 1/500 hernia repair failure rates. CF centers all over the country see similar numbers of patients, but most do not achieve >100% predicted average lung function. The true difference is highlighted in this comment from The Bell Curve: “Unlike pediatricians elsewhere, Matthews viewed [emphasis added] CF as a cumulative disease and provided aggressive treatment long before his patients became sick.” Similarly, his protege's success is attributed to his belief that “excellence came from seeing [emphasis added], on a daily basis, the difference between being 99.5-per-cent successful and being 99.95-per-cent successful.”xl The main difference at the top of the bell curve is a difference in vision. The physicians who live and work at the top of the curve see the illness or the treatment process differently, in a way that causes or forces them to approach each patient, and indeed each intervention, differently. The story they tell themselves about the disease is different, and consequently, their story of optimal treatment is different, and their results are different. The hernia surgeons at Shouldice see the process of hernia surgery differently, they see the importance of consistency on improved outcomes, and the act in accordance with that vision and now (in retrospect), the effect of acting on that vision is apparent. Matthews saw CF differently than anyone else at the time; even now, his protege Warwick sees each patient's outcomes on a different scale, they see the potential of CF patients differently, and the actions they have taken in accordance with that vision now (in retrospect) speak for themselves. At the top of the Bell Curve, the view is just ... different. Studies show that our cardiac exams are, on average, inaccurate. Should we stop doing them? Where is the evidence for the accuracy of these? But some cardiologists are rather accurate with the echocardiograms they carry between their ears. Should they also stop practicing these esoteric skills? Because there is no “evidence”? Consider another case – there exists an immediately life-threatening illness for which there are two surgical treatments. Treatment A extends the life expectancy by 40-50 years, but has late complications that are thought to be due to surgical techniques. Perioperative mortality risk is about 6%. Treatment B is newer, and theoretically should avoid the late term complications. In the first series of 77 cases, 18 children died perioperatively (23% mortality risk), and late-term data is as yet unavailable. Which would be the preferred treatment in our EBM society? The treatments described above, however, are the Senning/Mustard corrections of d-TGA and the Jatene arterial switch procedures, respectively; the latter is the current standard of care, and indeed the procedure has extended life expectancy by an additional 20 years beyond that offered by the Mustard procedure, and now offers equivalent or lower perioperative risksxli,xlii 22, xliii. But the Jatene procedure was not standard when first introduced. Indeed, it took 10 years from the first successful arterial switch until the procedure was considered standard of carexliv, 23. In between, for 10 years, some patients benefited by an average of 20 years or life due solely to the placement of their surgeon on the bell curvexlv (by contrast, two-stage late arterial switch has had poor performance, resulting in a high death rate after 12 years of age ... the physicians performing these procedures were also on the high end of the bell curve, but realized after relatively few attempts (largest case study was 35 patients) that the procedure was not yielding good resultsxlvi. This is true throughout the historical surgical literature. Prior to the first Blalock-Taussig shunt surgery on a 15 month-old girl, Blalock performed exactly one animal practice procedure (he relied on his assistant, Thomas' experience)xlvii, and in 2009, despite the prevalence of the procedure for more than half a century, systematic reviews were scantyxlviii. Reading “Evolution of Cardiopulmonary Bypass”, one feels as if the two most common phrases are 'it was diastrous' and 'the patient died on the operating table'. It is an interesting, if somewhat disturbing read. Between 1951 and 1955, 18 published attempts were made at bypass operations; 17 of those patients died. Dodrill tried autologous oxygenation and isolated left and right heart bypasses for a total of 4 cases (one right, 2 left, one total, the latter died), then abandoned the technique. Mustard tried rhesus monkey lungs and rubber ballon pumps in seven patients, all of whom died. The article states, “Dodrill's [technique] and Mustard's ... oxygenator were not thought by others to be the path to success, and both methods were abandoned.” Gibbon developed his bypass machine and operated 4 times, with one success (though the failures were often due to preoperative misdiagnosis). Lillehei operated 48 times using cross circulation with a parental oxygenation donor; there were 28 survivors and 2 major donor adverse events, one of which resulted in permanent harm (due to the anaesthesiologist accidentally pumping air into the IV). His assistant, DeWall, was assigned to work on the problem of developing an artificial oxygenator to allow the removal of risk to a donor, and to allow the higher flow rates needed for larger children and adults. Lillehei's instructions to DeWall are telling: “First of all, do not go to bubble oxygenator systems because they have a very poor record of success. Second, avoid libraries and avoid literature searches, as I want you to keep an open mind and not be prejudiced by the mistakes of others.” Incidentally, the first successful heart-lung bypass machine, the DeWall oxygenator, was a bubble systemxlix. DeBakey's entire career might stand as an example of this kind of thinking: he went to the department store to buy some nylon, but all they had was this new Dacron stuff and “it looked pretty good to me, so I bought a yard of it” - the first Dacron graft patient was treated for AAA and lived 13 yearsl. So perhaps we should learn to interpret the evidence differently. Should we say that we ought to be better than we are, on average, at cardiac auscultation? Should we say that certain cardiac surgeries should work, that they must be made to work? Should we say that the top of the bell curve has shown us what is possible, but it is now our responsibility to move the average rightward? What is the theory that allows us to interpret the evidence? 22 Interestingly, Mustard was a pediatric orthopedic surgeon who was asked by a senior surgeon to consider starting a cardiac surgery program; he studied with Blalock at Hopkins for one month, and subsequently pioneered several cardiac surgeries, some of which were successful. 23 How many years of life were lost? The number of births in those 10 years times the incidence rate of d-TGA, figured on a logarithmic curve approaching 1 to estimate the relative rates of each procedure over time, times 20 years per patient. Do you care what the number is? If I said 20 years of life lost versus 40 or 400? What number makes this slowness acceptable in the name of appropriate caution? This same reasoning pattern can be applied equally to any intervention or strategy that has “insufficient evidence”, or “no evidence of benefit”, or even, like the cardiac surgeries above, ... dare I say it? “evidence of harm”. Should we stop? Or should we, on average, improve ourselves until we can create evidence of benefit? The worth of pursuing a treatment lies not in the immediate outcomes, not in the evidence, for or against, but in the logic of its model. Outcomes are merely a means of refining that logic. Herein lies the value of anecdotal evidence or case series or single events and especially expert opinion - proof of concept, or lessons from failure. Bypass surgery was made possible by a series of failed attempts, and often, perseverance in spite of discouraging numbers. Acupuncture for pain control may have “no evidence of benefit”, but there was a surgery that was done with only acupuncture as anaesthesiali. There are patients that have gotten significant benefits that were not achieved by their evidence-based (or evidence-bound) practitioners. Maybe we should look into the potential of excellence rather than rejecting the failure of averages. This is a fundamental problem with our current evidence-based model - it does not account for individual excellence beyond the mean, largely because the model is designed solely to detect the differences in average effects of different chemical compounds in standard doses24. Perhaps the average acupuncturist is no better than placebo, but what about the best acupuncturists in the world? What would their success (or failure) say about the presence (or absence) of energy meridians in the body? Not that they exist, perhaps, but that belief in their existence may lead to patient benefit. The reason we are so quick to reject the average failure of acupuncture rather than the average failure of cardiac auscultation is because the former's basic model does not fit within our own, much the way casual wine consumers' model of “good wine” is different than a connoisseur's. Several tests have been done for wine tasting, and in fact, a taster's ability to reproducibly blind-select a wine from amongst others is a part of the testing for master sommelier certification (there are only 177 people who hold this certification worldwidelii). By contrast, the average consumer cannot reliably distinguish coca-cola from pepsi if given a 1 out of 3 blind tastingliii,25. A failure of averages is not a failure of concept. A proof of concept, however, can hinge on a single profound example26. By similar analogy, there was a magician named John Scarne who had developed what he called a magic trick. He could, on demand, cut directly to an ace. He could repeat this trick endlessly. The effect is described: “McManus cut the cards, stood back, and said, “Again!” Silently there under the glare of lights John worked ... and worked. He worked seven nights in a row ... $200 an hour for practicing what he liked to do best – and at last one of A.R.'s men broke. “All right” he blurted, “that's enough; now, how do you do it?” Scarne had expected it. “The only possible way,” he said. “You notice I always give the deck one riffle myself. When I do it I count the cards so I can see the indices. When the ace falls I just count the number of cards that drop into place on top of it. Then when I cut I count down that number of cards and break the deck there, and of course there's the ace. That must be obvious now, isn't it?” There was a long silence. “If that's it,” Rothstein said, “it's uncanny.” “You can do it easy, if you practice three or four hours a day, in – hmmm – twenty years,” said Scarne affably. “And you're how old?” murmured McManus. liv “Nineteen. But,” John Scarne added hastily, “I've been practicing ten hours a day.” Once again, excellence, shows us the potential we all have, but will not appear in large studies of averages. What Scarne was able to do looked impossible, and was accepted only as a magic “trick” 24 One considers in passing the valid, if clichéd, argument along similar lines, that EBM has great difficulty in commenting on outcomes that are important, like quality of life, because they are difficult to reduce to statistics. 25 Average consumers are able to differentiate coke and pepsi head to head, because it relies on simpler discriminations, like sweetness, rather than on exact characterization and memory for particular flavors. 26 In some sense, this is the application of Bayes' Theorem to the scientific method – the contrasting example always has a larger effect on the post-test probability than the confirming example. initially, because no one could conceive of that level of skill27. The difference in acceptance lies solely in the prevalence of the dogma underlying the model in question, and in the relative power of the groups that promote each. When we mistakenly accept EBM not as an aspect of truth finding, as a valuable way to explore information about averages, but as a policy, as the (sole) way of determining what is funded or promoted or taught or performed, we push EBM beyond the bounds of its capabilities and destroy our opportunities to see our patient populations from a viewpoint that could push our treatments away from the average. There is a CDR (cognitive disposition to respond, or bias) known as the aggregate bias – the idea that group data doesn't apply to an individual patient. By analogy, there is an error that I call the House fallacy – thinking that you as a physician, are better than the evidence. But adopting EBM as policy makes that error in reverse – it assumes that all doctors are interchangeable, that there are no Houses, or DeBakeys or Oslers in the world. When EBM becomes policy, that is, when it becomes our explicit ideal in the practice of medicine, we cut off the benefits of having Houses in the population of physicians that could lead the rest of us up the bell curve. EBM as policy accomplishes nothing but purchasing a dependable mediocrity at the price of uncertain excellence. Though the AHRQ states it with a positive spin, “One goal of quality improvement efforts nationally is to reduce differences in health care quality that patients receive in one state versus another.”lv, one can see the dangers of reducing variability. This is the most significant objection I can make of adopting EBM as one's gold-standard of practice propriety – that the sole effect of that adoption is not an improvement in care, but a narrowing of the bell curve around the average (perhaps this is why the benefits of EBM have been so difficult to show), and thereby gives us the comforting illusion of certainty and dependability. Our desire for certainty has led us to adopt a policy that encourages the perception of certainty through the absence (or ridicule) of dissent. The driving force behind the acceptance of EBM as policy is a desire for certainty where none exists. Our rate of misdiagnosis at autopsy has not changed in decades, despite the advent of advanced scanning, ultrasound, and serologic testinglvi; but our certainty, our confidence in our diagnosis has gone up. Part of that is misunderstanding the post-test probabilities offered by our testing, part is a social concern of availability of peers who would testify to appropriateness of our actions (though it is notable that Kaiser Permanente has twice been successfully sued for failing to approve treatments on the contention that they were “experimental”lvii), but part also is an innate human need for certainty. It is that need that pushes us to only use “known” or “proven” treatments, though each of those concepts is itself shaky at best. Even if one were at peace with trading excellence for certainty in the short-term, it has always been by examining excellence that betterment of the average has been achieved over time. What happens to progress when our policy is that no one should be excellent, because we are afraid of anyone being sub-par? This leaves us with a problem – politics, like nature, abhors a vacuum. What policy can take the place of the ubiquitous evidence based medicine? As an osteopathic student, I've seen some strange, nearly unbelievable things; I've seen strabismus treated non surgically in 15 minutes, I've manually treated viral meningitis, recurrent tonsillitis, pain from a crohn's flare, edema from right heart failure. And while placebo effect is a plausible explanation for any one of these, if one chooses to explain away all of these events as placebo, the conversation does begin to resemble that of the skeptic regarding the pig. Perhaps these concepts only have value in the hands of a master; until one achieves similar mastery, it's a bit silly to 27 I, like 99% of the world's population, cannot hold an iron cross, but most professional male gymnasts can. Examples of the failure of averages exist in any field where one considers the existence of excellence. If one admits to the existence of average-defying excellence, however, one dispels the myth of physician interchangeability, and what do patients think of the average physicians then? scorn an excellent cardiologist's persistence in carrying around that old-fashioned doohickey around his neck, when everyone “knows” it doesn't work. Likewise, consider IVUS or stereoscopic stethoscopes or 80-lead EKG28. At the top of the Bell Curve, one stops being concerned with what happens in the middle of the curve because frankly the same rules don't apply. When one can fix strabismus without surgery, or remove longstanding severe scoliosis pain without medication, one wonders what else one can do. Gregory House doesn't really care about the average doctor's opinion. Dr. Lillehei didn't pay attention to those who believed that the heart was simply not amenable to operative correction29. Dr. Speece30 doesn't care about the multicenter RCT for osteopathic treatments. Dr. Shah 31 doesn't care about the studies on average stethoscope diagnostics. They're better than that. At the top of the bell curve, one isn't looking at the middle, one is always looking for the next improvement, the next step, the next idea in accordance with the vision. Those at the edges of the bell curve have little or no evidence for what they do. They pursue their treatment plans on the basis of their individually developed ideas and stories, and ultimately the evidence is generated after the fact to justify or repudiate their viewpoint. Often irrespective of the efficacy of individual interventions, their results are reflective of the efficacy of their viewpoint as a whole. When we tell the story of a disease and its context in the physiology of the body, the efficacy of that story is what ultimately determines our success and our quality as physicians.32 Suppose for a moment that cranio-sacral treatment for low back pain is completely bogus. That there is in reality, no inherent value to cranio-sacral treatment. But suppose one believes (mistakenly) that there is value, and persists in applying these treatments to patients. What will happen? Overall, the same 95% of patients will get better that would have gotten better anyway, with one difference – the physician's belief will have caused some transference to his patients, resulting in an effective placebo treatment. Consequently, he will have achieved short-term pain control with fewer medications and fewer side effects than another physician who did not have the same beliefs. Even though his belief was false, it led to an efficacious treatment plan, with fewer side effects than his peers. This improvement in care will not be captured in a subsequent sham-controlled study. Suppose now that cranial osteopathy does have value, but only in the hands of an expert, much like carotid endarterectomy. The multicenter RCT will not show value, but the master's personal patients will show an effect. Hence, CEA data is generated only from specialty centers – proof of concept. It is then the responsibility of each surgeon to track his or her efficacy and to compare it, not to the average, but to the best33. If we stop training osteopaths in this method, or stop recommending this method, or stop paying for this method of treatment, we lose the potential for benefits without side effects that cranial osteopathy could provide, had we responded in the same spirit as when we resolve to keep trying to improve open heart surgery, until it works34. Rather than slavish adherence to the mere observational nature of EBM, perhaps we should engage something one might term “science based medicine”, or “theory-based medicine”, wherein we engage the natural uncertainty of the scientific method by treating patients on the basis of the stories As Mark Twain wrote, “A person with a new idea is a crank until the idea succeeds.” A quite common opinion at the time. One of my osteopathic mentors, author of the textbook on Ligamentous Articular Strain techniques. A pediatric cardiologist in Peoria, where I trained, who had a remarkable talent for clinical and auscultative diagnosis. Admittedly, other qualities also play a large role, such as ingenuity/creativity, rapport, and consistency/diligence. But ultimately these are all modes of the quality of application of a physician's disease or treatment 'story'. The underlying factor is the story itself that drives the direction of those applications. If the car is headed the right way, even if it doesn't have the best MPG, one will still get closer to the goal than if one is driving a fuel-cell car efficiently in the wrong direction. 33 Thanks, Dr. Nabeel Rana, vascular surgeon, for discussing the implications of performing CEA from the perspective of a responsible individual surgeon. 34 In both cases, his story of back pain treatment is more efficacious (even if less true) than the story told by his peers. 28 29 30 31 32 we tell ourselves about illness and wellness, and we follow the outcomes of treatments based on these theories and model ourselves after successes (positive deviation from the mean)35. As discussed above, data is used to create a theory or model and then to test the implications of that theory. In order to say we know something, we must be able to say clearly not only what we mean by knowing, but to explain clearly what it is that we know, think, or suspect, which takes the form of a theory/story, hopefully founded on empiric (observed) evidence. In short, we must use evidence to create or recreate our story/model of disease and its treatment, and we must base our treatments on that model and follow the results. Any idea for treatment that is reasonable, that makes sense within that model would then be considered a valid intervention. To be sure, some of those ideas won't work out. Cardiac or renal stenting seemed like great lifesaving ideas, but turned out not to have as much benefit as anticipated; ppar-gamma agonists turned out not to work as expected; and Mustard's rhesus monkey lung oxygenation was a failure. This process has a name, it's called being wrong. It is an essential component of scientific progress. And make no mistake, patients are harmed and lives are lost as a result of our errors. But the only thing worse than making an error is allowing our necessary errors to make us so timid that we are unwilling to venture away from the average. One would expect that for all the success stories of bypass surgeries, there were an equal number of surgeons who had ideas that just didn't work. Those surgeons killed people, maybe hundreds, persisting in attempting to prove an idea that was simply false. But those are the necessary losses of engaging in progress. It's a cold view, and one that patients and their families do not want to hear, any more than they want to hear that grandma died from a fatal head bleed while on coumadin that was started on the basis of a 34% chance of pulmonary embolus. But that is the nature of reality; that is the nature of science; that is the nature of the mechanism of progress; and when we start basing our policy on that nature, we have a chance at representing our patients' interests rather than the interests of whichever company, policymaker or grantwriter has the most cash. While one cannot imagine returning to a time of wild experimentation without followup, by adopting this theory-based model, this policy, this outlook, we can reverse the mistake of EBP and widen the bell curve again, we can increase the variability in our practice patterns, and then we can reap extreme benefit from the original concept of EBM – the concept that seemed so promising, that led us to adopt EBM – that we can, should, must, follow the results of different outlooks, different visions, follow them closely and conscientiously, and pursue the theories that seem most promising. Charting our results and noting when ideas do not pan out as expected is a valuable learning tool, if we use that failure to reinvent our model. If we merely implement those observations as “evidence”, we lose an opportunity for far reaching changes to our treatment models that could vastly improve our patient care. We don't need n values of 20,000. We are trying to prove important things – things that matter. We need to pursue disease oriented evidence and pharmacologic evidence, implement strategies that make sense, and then follow the results of that implementation retrospectively to inform our model for the future. D. Craig Brater and Walter J. Daly (2000), “Clinical Pharmacology in the Middle Ages: Principles that presage the 21 st century”, Clinical Pharmacology & Therapeutics 67 (5), p. 447-450 [449]. Walter J. Daly and D. Craig Brater (2000), “Medieval contributions to the search for truth in clinical medicine”, Perspectives in Biology and Medicine 43 (4), p.530-540 [536], Johns Hopkins University Press. ii Goodman, K. Ethics and Evidence-Based Medicine: Fallibility and Responsibility in Clinical Science. 0521819334 Excerpt retrieved from http://assets.cambridge.org/97805218/19336/excerpt/9780521819336_excerpt.pdf. On 8/30/10. iii Cochrane, AL. 1931-1971: a critical review, with particular reference to the medical profession. In: Medicines for the i 35 Gawande discusses modeling “positive deviation from the mean” as a process of improvement in his book Better. See citations in endnotes. year 2000. London: Office of Health Economics. iv White, Kerr. “Archie Cochrane's legacy: an American perspective” in Non-random Reflections on Health Services Research: on the 25th anniversary of Archie Cochrane's Effectiveness and Efficiency. v Goodman, K. Ethics and Evidence-Based Medicine: Fallibility and Responsibility in Clinical Science. 0521819334 Excerpt retrieved from http://assets.cambridge.org/97805218/19336/excerpt/9780521819336_excerpt.pdf. On 8/30/10. vi Http://en.wikipedia.org/wiki/Placebo-controlled_study.Last Retrieved 8/30/10. vii Http://en.wikipedia.org/wiki/Evidence-based_medicine#History Last Retrieved 8/30/10. viii El Dib RP, Atallah AN, Andriolo RB (August 2007). “Mapping the Cochrane evidence for decision making in health care”. J Eval Clin Pract 13(4): 689-92. PMID 17683315. ix Complementary and Alternative Medicine in the United States. 2005. ISBN 0-309-09270-1 pp.135-136 x Keller, Helen. The Story of My Life. xi Wilson, Duff; Singer, Natasha (September 11, 2009). "Ghostwriting Is Called Rife in Medical Journals". The New York Times. http://www.nytimes.com/2009/09/11/business/11ghost.html. Retrieved 11/7/2010. xii Prevalence of Honorary and Ghost Authorship in Cochrane Reviews. Graham Mowatt, MBA; Liz Shirran, MSc; Jeremy M. Grimshaw, MBChB,PhD; Drummond Rennie, MD; Annette Flanagin, RN,MA; Veronica Yank, BA; Graeme MacLennan, BSc(Hons); Peter C. Gøtzsche, DrMedSci; Lisa A. Bero, PhD. JAMA. 2002;287:2769-2771. xiii Guest Authorship and Ghostwriting in Publications Related to Rofecoxib: A Case Study of Industry Documents From Rofecoxib Litigation. Joseph S. Ross, MD,MHS; Kevin P. Hill, MD, MHS; David S. Egilman, MD, MPH; Harlan M. Krumholz, MD, SM. JAMA. 2008;299(15):1800-1812. xiv Sismondo S (September 2007). "Ghost management: how much of the medical literature is shaped behind the scenes by the pharmaceutical industry?". PLoS Med. 4 (9): e286. doi:10.1371/journal.pmed.0040286. PMID 17896859. xv Braise, Twila, RN, PHN. “'Evidence Based Medicine': Rationing Care, Hurting Patients”. In The State Factor. December, 2008. Retrieved from http://www.alec.org/am/pdf/ebmstatefactor.pdf. Last Retrieved 8/30/10. xvi Sleight, Peter. “Debate: Subgroup Analyses in Clinical Trials: fun to look at – but don't believe them!” Curr Control Trials Cardiovasc Med. (2000). 1(1): 25-27. PMCID: PMC59592 xvii Casscells,W., Schoenberger,A., and Grayboys, T. (1978): “Interpretation by physicians of clinical laboratory results.” N Engl J Med. 299:999-1001. xviii Eddy, David M. (1982): “Probabilistic reasoning in clinical Medicine: Problems and opportunities.” In D.Kahneman, P. Slovic and A.Tversky, eds, Judgment under uncertainty: Heuristics and biases.” Cambridge University Press, Cambridge, UK. xix Gigerenzer, Gerd and Hoffrage, Ulrich (1995): “How to improve Bayesian Reasoning without instruction: Frequency Formats.” Psychological Review. 102:684-704. xx Robert D. Sheeler et al., "Accuracy of Rapid Strep Testing in Patients Who Have Had Recent Streptococcal Pharyngitis". Journal of the American Board of Family Medicine, 2002-09-04. xxi Rheumatic Heart Disease. Author: Thomas K Chin, MD. UpDated 8/4/10. retrieved from http://emedicine.medscape.com/article/891897-overview on 11/7/2010. xxii Levitt, S., Dubner, S. Freakonomics. ISBN 0-06-123400-1 xxiii “To be complete.” N Engl J Med. Correspondence. 1979 Jan 25;300(4):193–194. xxiv Woolf, S. Kamerow D. Testing for uncommon conditions, The heroic search for positive test results. Arch Intern Med. 1990; 150(12)2451-2458. xxv Berrington de Gonzalez A, Darby S. Risk of cancer from diagnostic x-rays: estimates for the UK and 14 other countries. Lancet 2004; 363: 345-351. PMID 15070562. xxvi Brenner DJ, Doll R, Goodhead DT, et al. Cancer risks attributable to low doses of ionizing radiation: assessing what we really know. Proc Natl Acad Sci USA 2003; 100: 13761 – 13766. PMID 14610281. xxvii Ron E. Cancer risks from medical radiation. Health Phys. 2003 Jul; 85(1): 47-59. PMID 12852471. xxviii Braise, Twila, RN, PHN. “'Evidence Based Medicine': Rationing Care, Hurting Patients”. In The State Factor. December, 2008. Retrieved from http://www.alec.org/am/pdf/ebmstatefactor.pdf. Last Retrieved 8/30/10. xxix Fitzpatrick, Michael (2008). "Taking a political placebo". Spiked Online. http://www.spikedonline.com/index.php/site/article/5342/. Retrieved 2009-10-17. Retrieved from http://en.wikipedia.org/wiki/Evidencebased_medicine#Political_criticism. Last Retrieved 8/30/10. xxx Sing S and Ernst E (2008). “Trick or Treatment?”. Bantam Press. Retrieved from http://en.wikipedia.org/wiki/Evidence-based_medicine#Political_criticism. Last Retrieved 8/30/10. xxxi Sackett, David L, et al. “Evidence based medicine: what it is and what it isn’t[Editorial],” British Medical Journal (1996), 312:71-72. xxxii Braise, Twila, RN, PHN. “'Evidence Based Medicine': Rationing Care, Hurting Patients”. In The State Factor. December, 2008. Retrieved from http://www.alec.org/am/pdf/ebmstatefactor.pdf. Last Retrieved 8/30/10. xxxiii Http://en.wikipedia.org/wiki/Causality Last Retrieved 8/30/10. xxxiv Pharmacotherapy for anxiety disorders in children and adolescents. Cochrane Database of Systematic Reviews, 6, 2010. xxxv Doidge, Norman. The Brain That Changes Itself. xxxvi Gawande, Atul. Complications: A Surgeon's Notes on an Imperfect Science. 2002. Pp 39-41. xxxvii Cebul RD, Snow RJ, Pine R, et al. Indications, outcomes, and provider volumes for carotid endarterectomy. JAMA 1998;279(16):1282-7. more studies on several operations at http://www.mass.gov/?pageID=eohhs2terminal&L=7&L0=Home&L1=Consumer&L2=Physical+Health+and+Treatmen t&L3=Quality+and+Cost&L4=Data+and+Statistics&L5=Physicians&L6=Volume+by+Surgeon+and+Hospital&sid=Eeo hhs2&b=terminalcontent&f=dhcfp_quality_cost_volume_research_studies&csid=Eeohhs2 . Retrieved on 11/8/10. xxxviii Gawande, Atul. “The Bell Curve”. December 6 2004. The New Yorker. Retrieved at http://www.newyorker.com/archive/2004/12/06/041206fa_fact. On 8/30/10. xxxix Gawande, Atul. “The Bell Curve”. December 6 2004. The New Yorker. Retrieved at http://www.newyorker.com/archive/2004/12/06/041206fa_fact. On 8/30/10. xl Gawande, Atul. “The Bell Curve”. December 6 2004. The New Yorker. Retrieved at http://www.newyorker.com/archive/2004/12/06/041206fa_fact. On 8/30/10. xli Gawande, Atul. Complications: A Surgeon's Notes on an Imperfect Science. 2002. Pp 27-28. xlii Konstantinov, I et al. “Atrial Switch Operation: Past, Present, and Future” Ann Thorac Surg.2004;77;2250-8. Retrieved at http://www.mc.vanderbilt.edu/root/sbworddocs/res_edu_thoracic/Peds10_7_04.pdf PMID 15172322. Last Retrieved 8/27/10. xliii Stoney, William. “Evolution of Cardiopulmonary Bypass” Circulation. 2009;119:2844-2853. doi 10.1161/circulationaha.108.830174. xliv Http://en.wikipedia.org/wiki/Jatene_procedure Last Retrieved 8/27/10. xlv Gawande, Atul. Complications: A Surgeon's Notes on an Imperfect Science. 2002. P.28 xlvi Evaluation and Management of the adult patient with transposition of the great arteries following atrial-level (Senning or Mustard) repair. Nature Clinical Practice CardiovascularMedicine (2008) 5, 454-467. Retrieved at http://www.nature.com/nrcardio/journal/v5/n8/full/ncpcardio1252.html . Doi: 10.1038/ncpcardio1252. Last Retrieved 8/30/10. xlvii Brogan TV, Alfieris GM. “Has the time come to rename the Blalock-Taussig shunt?” Pediatr Crit Care Med. 2003 Oct;4(4):450-3. PMID 14525641. xlviii Yuan SM, Shinfield A, Raanani E. “The Blalock Taussig shunt” J Card Surg. 2009 Mar-Apr;24(2):101-8. PMID 19040408. xlix Stoney, William. “Evolution of Cardiopulmonary Bypass” Circulation. 2009;119:2844-2853. doi 10.1161/circulationaha.108.830174. l Video interview with DeBakey, retrieved from http://www.ptca.org/news/2008/0712_DEBAKEY.html Last Retrieved 8/27/10. li Lore, Roger. “Acupuncture Anaesthesia in Surgery.” Journal of Chinese Medicine. 79. October 2005. 23-27. Last Retrieved 8/30/10. Retrieved from http://homepage.mac.com/sweiz/files/article/79-23.pdf lii http://www.courtofmastersommeliers.org/sommeliers.php Last Retrieved 8/30/10. liii Gladwell, Malcolm. Blink. 2005. First Back Bay trade paperback edition, April 2007. pp185-186. liv Scarne, John. Scarne on Cards. 1949. Signet publishing, Pp xii-xiii. lv U.S. Agency for Healthcare Research and Quality. “Key Themes and Highlights from the National Healthcare Quality Report,” http://www.ahrq.gov/qual/nhqr07/Key.htm Last Retrieved 8/30/10. lvi Gawande, Atul. Complications: A Surgeon's Notes on an Imperfect Science. 2002. p.197. lvii Sugarman, Mitchell. "Permanente Physicians Determine Use of New Technology: Kaiser Permanente's Interregional New Technologies Committee" cited in http://en.wikipedia.org/wiki/Evidence-based_medicine#cite_note-33. Retrieved at http://xnet.kp.org/permanentejournal/winter01/HSnewtec.html. Last Retrieved 8/30/10.