Here - Exp Platform

advertisement
The Importance of Controlled Experiments and the Semmelweis Reflex
Ronny Kohavi
5/7/2008, updated 5/19/2008, updated 7/13/2008
I wanted to share some stories I collected in the last few months about the importance of controlled
experiments vs. other experimental designs in establishing causality. These are summarized in the first
part of this document.
The second part provides stories where people reject results because they conflict with their strong-held
beliefs. Evidence (e.g., experimental results) that contradict beliefs will be questioned and attempts will
be made to find any flaw with the design or the analysis. That’s one reason to build a trustworthy
system and spend significant time on analysis and reliability.
Gary Loveman, the COO of Harrah’s, said that there were three ways to get fired at Harrah’s: steal,
harass women, or institute a program or policy without first running an experiment (Hard Facts, p. 15).
The culture at Microsoft is far away from the Loveman’s culture, but even when experiments will be
executed here, there will sometimes be resistance to incorporating the results.
1. Importance of Randomized Controlled Experiments
Here are two good examples of the importance of running randomized controlled experiments. The first
example uses the term “placebo-controlled clinical trial,” which is the medical term for the randomized
controlled experiment we use. The second example shows results from a quasi-experimental design,
which, like it names implies, is not a real randomized design, but tries to imitate it.
1.1 Hormone-Replacement Therapy
The following story is from the NY Times, May 5, 2002. It’s a story told by Kevin Patterson, an internist.
When I started practicing medicine in the early 90's, one of my enthusiasms was hormonereplacement therapy. At that time, the observation had been made, repeatedly, that postmenopausal
women who happened to take estrogen -- for osteoporosis or hot flashes, for instance -- were less
likely to have heart attacks and strokes than women who didn't. I remember telling women in their
50's how premenopausal women were relatively immune to cardiovascular disease, at least compared
with men, but that once they had been through menopause, this relative protection disappeared
quickly. ''Take the estrogen,'' I suggested over and over. ''Preserve your youthful coronaries.''
This was in Manitoba, and these were pragmatic, sensible prairie women. I insisted to them that the
recommendations and the evidence seemed clear. I remember my patients' brows knitting at the
thought of menstrual cycles extending into their dotage, but ultimately the argument felt compelling.
Certainly it did for me. I remembered being told in medical school that the underuse of estrogen was
one of the great crimes of the medical patriarchy, itself an expression of latent misogyny. No
misogynist I, off I went to work, my prescription pad leaping to hand at the sight of bifocals or pastel
cardigans.
Page 1
Semmelweis Reflex
Then in 1998, the results of a formal, placebo-controlled clinical trial called the Heart and
Estrogen/Progestin Replacement Study (HERS) were published. It showed that estrogen did not
prevent heart attacks or strokes and, in fact, it made women more susceptible to blood clots. The net
cardiovascular effect therefore was negative. This study astonished most doctors -- for me, it certainly
felt like a betrayal. Betrayed by the recommendations, we had in turn betrayed many of the cardiganclad women of our acquaintance.
A few months ago, in the emergency room of one of the hospitals I work in on Vancouver Island, I
saw a woman in her mid-70's who was still taking Premarin, a common estrogen preparation. She had
been having chest pain, and I was admitting her for observation, to make sure she wasn't having a
heart attack.
''So, you take the Premarin because . . . ?'' I asked.
''My sisters all had heart attacks in their 50's,''she said. ''My doctor said the estrogen lowered my risk.''
''We now think it probably doesn't.''
''Really.''
''Yes.'' Me, nodding, smiling weakly.
''What changed?''
''Well, there were these studies that seemed to show that women who took estrogen had a relatively
low incidence of heart attacks, but it turns out that really, it was the sort of woman who took estrogen
who was less likely to have a heart attack. She was probably also less likely to smoke, more likely to
seek regular medical attention -- she did something important different, anyway. When, just recently,
they took a large group of women and randomly gave each woman either a placebo or estrogen, the
ones taking estrogen didn't do at all better.''
''Well,'' she said. ''Isn't that something?''
My patient was not alone. The data from HERS were so surprising that many health-care providers
seem not to believe them, even today. In 2001, Premarin was the third most-prescribed drug in the
United States.
The key point: it was the woman who took estrogen who was less likely to have a heart attack – a
correlation! In 2002, more than 6 million women were taking PremPro [similar to Premarin].
Statistically, that translates into (translation should be taken with a grain of salt since it’s from an
attorney site hrt-attorneys.com ):



480,000 additional breast cancer cases.
420,000 more heart attacks.
480,000 more strokes.

480,000 more blood clot cases.
Experimentation Platform
Page 2
Semmelweis Reflex
1.2 Twin Studies
The following story comes from a Nov 2007 article by the Washington Post titled Study Debunks Theory
On Teen Sex, Delinquency and the Twin Study article .
A deep study by Ohio State University early in the year found that youngsters who lose their virginity
earlier than their peers are more likely to become juvenile delinquents. To reach this conclusion, the
authors took into account (technically, controlled for) many variables that could affect the dependent
variable (juvenile delinquency). This is commonly called a “quasi-experimental design.” If you believe
no other causes other than those controlled for could cause juvenile delinquency, then you should
believe the result (losing virginity earlier causes them to be more likely to be juvenile delinquents). This
assumption is called Causal Sufficiency.
In the above study, the authors controlled for a range of variables, including:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Gender
Race
Receipt of public assistance
Parental education
Family structure
Previous substance use and depression
Importance of religion
School GPA
Relative pubertal status
Virginity pledge status
With such an impressive list of causes under control, and data from a massive database called the
National Longitudinal Study of Adolescent Health, who could question the result? The paper was
accepted for publication.
But then someone did question the result. Paige Harden, a PhD student from the University of Virginia
and her colleagues, used the same database and found 534 same-sex twins. Their study now controlled
for genetic and environmental variables, and the result was reversed: earlier age at first sex predicted
lower levels of delinquency in early adulthood. Since a twin study practically trumps all other quasiexperimental designs, the new result was published in the same journal. The first paper found a noncausal correlation!
While I shouldn’t make recommendations about early sex, this example is a good example of why
controlled experiments are important. Causal Sufficiency is a strong assumption. While you may think
you're controlling for all effects of time-of-day, day-of-week, season, geography, etc., there may be
other factors that you are not taking into account that may reverse the trend, as was shown in this
study.
That's why randomized experimental designs are the gold standard!
Experimentation Platform
Page 3
Semmelweis Reflex
2. The Semmelweis Reflex: Rejecting Results that Contradict StrongHeld Beliefs
The following stories illustrate what happens when there is solid evidence that contradicts strongly-held
beliefs.
2.1 Ignaz Semmelweis’s Childbed Fever
The story below is from the book Leadership and Self-Deception, the
Encyclopedia Britannica, Childbed Fever: A Scientific Biography of Ignaz
Semmelweis, and Wikipedia. I sent it to the authors of Hard Facts as a
better example of something they discussed in the book. One of the
authors blogged about it
http://bobsutton.typepad.com/my_weblog/2008/05/thesemmelweis.html and correctly pointed out that controlled experiments
are not always possible (e.g., the Yahoo/Microsoft merger). On the web
and in services, they *are* possible, so let’s use this opportunity to its full
extent.
Semmelweis was a European doctor, an obstetrician, in the mid 1800s.
He worked at Vienna’s General Hospital, an important research hospital.
The mortality rate in the ward where he practiced was one in 10 – one in every ten women giving birth
there died! The reputation of Vienna General was so bad that women preferred to give birth on the
street and then went to the hospital. In the book Childbed Fever, they estimated that 2,000 women died
each year from childbed fever in Vienna alone, and that in nineteenth-century Europe, childbed fever
killed more than a million women.
The collection of symptoms associated with these deaths was known as “childbed fever” or Puerperal
fever. More than half the women who contracted the disease died within days. Patients begged to be
moved to a second section of the maternity ward where the mortality rate was one in fifty – still horrific,
but far better than one-in-ten in Semmelweis’s section.
Semmelweis became obsessed with the problem. He tried to control for all factors, including birthing
positions, ventilation, diet, and even the way laundry was done. The one obvious difference between
the sections was that Semmelweis’s section was attended by doctors, while the other section was
attended by midwives.
After a four-month leave to visit another hospital, he discovered that the death rate had fallen
significantly in his section of the ward in his absence. This, coupled with the death of his friend Jakob
Kolletschka from an infection led to the breakthrough. Jakob’ contracted an infection after his finger
was accidentally punctured with a knife while performing a postmortem examination and his autopsy
showed a pathological situation similar to that of the women who were dying from childbed fever.
Semmelweis proposed a connection between cadaveric contamination and childbed fever.
Experimentation Platform
Page 4
Semmelweis Reflex
Yes, cadavers. Semmelweis spent far more time doing research on cadavers than other doctors.
Vienna General was a teaching and research hospital and many doctors split their time between
research on cadavers and treatment of live patients. The doctors in his section performed autopsies
each morning on women who had died the previous day, but the midwives were not required or
allowed to perform such autopsies. They hadn’t seen any problem with that practice because there was
as yet no understanding of germs.
Semmelweis concluded that ‘particles’ from cadavers and other diseased patients were being
transmitted to healthy patients on the hands of the physicians. He experimented with various cleansing
agents and instituted a policy requiring physicians to wash their hands thoroughly in a chlorine and lime
solution before examining any patient. The death rate fell to one in a hundred!
After the initial success, where rates dropped, a new group of students was admitted and the students
neglected the washings. Mortality rate increased and Semmelweis instituted stricter controls: the
names of students were publicly displayed and assigned to each woman in labor, making it obvious who
neglected the washings. Once again, the mortality rate fell (Childbed Fever, p. 53)
What is surprising about this story isn’t the discovery through attempts to control for factors, which led
to the unthinkable conclusion (at the time) that there was something invisible that was transferred by
the doctors. What is really shocking is how long it took the community of doctors to accept the results.
According to Encyclopedia Britannica, the mortality rate in Semmelweis’s division fell from 18.27% to
1.27% in 1848. That was not enough to generate sufficient recognition and in 1849 he was dropped
from his post at the clinic and turned down for a teaching post. Semmelweis spent the next six years at
a Hospital in Pest, Hungary, where he reduced mortality rate in the
obstetrics department to 0.85% while in Prague and Vienna the rate
was still about 10% to 15%.
According to Childbed Fever: A Scientific Biography of Ignaz
Semmelweis (p. 69) an 1856 publication in a prominent Viennese
medical periodical, Viennese Medical Weekly, by Jozsef Fleischer, a
student of Semmelweis, showed success of chlorine washings.
However, the editor for the periodical wrote at the end of the report
“We believe that this chlorine-washing theory has long outlived its
usefulness. The experiences and statistical results of most maternity
institutions protest against the views presented above. It is time we
are no longer to be deceived by this theory.”
Vienna continued to ignore his recommendations. In 1861, he
published a book, but the community rejected his doctrine. In 1865 he suffered a nervous breakdown
and was taken to a mental hospital, where he was beaten by asylum personnel and died. It took
Experimentation Platform
Page 5
Semmelweis Reflex
another 14 years for the discovery to be accepted, after Louis Pasteur, in
1879, show ed the presence of Streptococcus in the blood of women with
child fever. Semmelweis is now recognized as a pioneer of antiseptic
policy.
More is available at Wikipedia’s Contemporary reaction to Ignaz
Semmelweis.
A 2005 article called Simpson, Semmelweis, and Transformational Change
by Grant etal. in Obstetrics and Gynecology claims that despite 150 years of evidence, recent research
shows that hand-hygiene practices by healthcare workers remain unacceptably low. Inadequate hand
washing is one of the prime contributors to the 2 million health-care-associated infections and 90,000
related deaths annually in the United States.
The Semmelweis Reflex is reflex-like rejection of new knowledge because it contradicts entrenched
norms, beliefs or paradigms.
2.2 Bloodletting: The First Clinical Trial
The following story is from the NY Times, May 5, 2002, Childbed Fever, A Physician Looks at the Death of
Washington, and from Wikipedia
Since the days of the ancient people, including Mesopotamians, the Egyptians, the Greeks, the Mayans,
and the Aztecs, the prevailing conception of illness was that the sick were contaminated by some toxin
or contagion. These conditions could be improved by opening a vein and
letting the sickness run out – bloodletting.
Once the toxins were gone, the patient immediately felt different, and
often better. As anyone who has given blood can tell you, losing a pint or
two can make you feel transported, transformed. Intuitively, it was
satisfying to doctors that the procedure left the patient feeling drained –
physically, emotionally and into the sink.
The practice was continued by surgeons and barber-surgeons. Though the Figure 1: Breathing a Vein" in
1804
bloodletting was often recommended by physicians, it was carried out by
barbers. This division of labor led to the distinction between physicians and
surgeons. The red-and-white-striped pole of the barbershop, still in use today, is derived from this
practice: the red represents the blood being drawn, the white represents the tourniquet used, and the
pole itself represents the stick squeezed in the patient's hand to dilate the veins.
Experimentation Platform
Page 6
Semmelweis Reflex
Bloodletting was used to treat almost every disease.
One British medical text recommended bloodletting
for acne, asthma, cancer, cholera, coma, convulsions,
diabetes, epilepsy, gangrene, gout, herpes, indigestion,
insanity, jaundice, leprosy, ophthalmia, plague,
pneumonia, scurvy, smallpox, stroke, tetanus,
tuberculosis, and for some one hundred other diseases
(Childbed Fever, p. 6). It was judged most effective to
bleed patients while they were sitting upright or
standing erect, and blood was often removed until the
patient fainted.
Figure 2: The Lancet, a medical instrument used to open
veins
Physicians often reported the simultaneous use of fifty or more leeches on a given patient. Through the
1830s the French imported about forty million leeches a year for medical purposes, and in the next
decade, England imported six million leeches a year from France alone (Childbed Fever, p. 7).
On December 12, 1799, President George Washington, 68 years of age, rode his horse in heavy snowfall
to inspect his plantation at Mount Vernon. It was about 30 degrees Fahrenheit, and he complained
about a sore throat, yet rode again the day after. On December 14, he was in respiratory distress. Mr
Albin Rawlins, the estate overseer, prepared a medicinal mixture of molasses, vinegar, and butter, and
when Washington almost suffocated trying to swallow the concoction, decided on bloodletting and
removed 12-14 ounces of blood. Dr. James Craik was brought in, and extracted another 20 ounces of
blood, followed by yet another 20 ounces of blood. When a vinegar and hot water solution did not help,
he extracted another 40 ounces of blood. In the afternoon, another doctor arrived, Dr. Dick, and he
drew another 32 ounces of blood for a total of about 82 to 124 ounces, or 2.5 to 3.7 liters in ten hours.
The total blood in George Washington’s body was estimated at 7 liters, so about 35% to over 50% was
extracted, which inevitably led to preterminal anemia, hypovolemia, and hypotension. The fact that
General Washington stopped struggling and appeared physically calm shortly before his death may have
been due to profound hypotension and shock (A Physician Looks at the Death of Washington).
It is understood now that bloodletting only hastened the death of the ill.
We know that bloodletting is unhelpful because a Parisian doctor named Pierre Louis did an experiment
in 1836 that is now recognized as one of the first clinical trials, or a randomized controlled experiment.
He treated people with pneumonia either with early, aggressive bloodletting or less aggressive
measures; at the end of the experiment, Dr. Louis counted the bodies. They were stacked higher over by
the bloodletting sink.
Despite the result of the controlled experiment, it took years for bloodletting to be recognized as useful
in very limited situations (e.g., in cases involving agitation, it has a sedative effect). Broussais, a well
known French physician, continued to recommend leeches, fifty at a time. Since leeches were used
repeatedly and in treatment of various diseases, it was possible for the leeches themselves to convey
the disease.
Experimentation Platform
Page 7
Semmelweis Reflex
Interestingly, Biopharm Leeches, originally established in 1812, is still alive and providing 50,000 leeches
for modern surgery. Their tagline: The Biting Edge of Science.
2.3 Police Lineups, Technical Knockout of Morality vs. Science
While the previous stories showed how it took time to learn, the following story (also available at
http://jenk.livejournal.com/160588.html) shows that it may take a long time for results to sink.
Experimentation Platform
Page 8
Semmelweis Reflex
Experimentation Platform
Page 9
Download