licensed under a . Your use of this Creative Commons Attribution-NonCommercial-ShareAlike License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this
material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2006, The Johns Hopkins University and Jonathan M. Samet. All rights reserved. Use of these materials
permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or
warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently
review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for
obtaining permissions for use from third parties as needed.
Department of Epidemiology
Is Biostatistics Necessary?
A Non-Systematic Review of the Evidence
Jonathan M. Samet, MD, MS
Pre-test
Is Biostatistics Necessary?
• □ YES
• □ NO
• □ It depends
“Some of my best friends are …”
PubMed “Hits” on Biostatistics1 and
Epidemiology, 19822 - 2004
100000
10000
1000
Biostatistics
Epidemiology
100
10
20
04
20
03
20
02
20
01
20
00
19
99
19
98
19
97
19
96
19
95
19
94
19
93
19
92
19
91
19
90
19
89
19
88
19
87
19
86
19
85
19
84
19
83
19
82
1
1
“English language” – only qualifier
2
1982 – Scott Zeger is appointed to the faculty of the School of Hygiene and Public Health
Drs. Zaner and Zeger
Dr. Karl Zaner
Dr. Scott Zeger
Sex Expert
Biostatistics Expert
1. Why biostatistics is
irrelevant
2. A cause is a cause
3. Ocular data analysis
4. Finding haystacks not
needles
5. The seven deadly sins
of biostatistics
6. When is biostatistics
unavoidable?
7. Tips on the care and
feeding of
biostatisticians.
Why Biostatistics Is Irrelevant
Chapter 1
Advice From the Data Guru
Al Sommer on Data
• “Don’t pose a question, give the
data to your statisticians, and ask
them ‘What’s the p value?’”
Sommer advises.” If I had done
that I would have missed the
entire vitamin A mortality story.”
Source: Lancet, Feb 19, 2005
Sommer on Data
• “He still loves to steep himself in the
data. “I say ‘data talk to me, tell me
what you have to say’”. Often, though,
the answers come at odd times,
Sommer says. “You don’t get the
insights you need—either the answer
or how you are going to approach a
question—while you are actively
thinking about it.”
Source: Lancet, Feb 19, 2005
Sommer on Data
“You have to know your data, you have
to smell it, you have to be in it”, he says.
“If you’re not living inside the data you
are going to miss the most interesting
things, because the most interesting
things are not going to be the questions
your originally proposed, the interesting
things are going to be questions you
hadn’t thought about.”
Source: Lancet, Feb 19, 2005
“The real purpose of
the scientific method is
to make sure Nature
hasn’t misled you into
thinking you know
something you don’t
actually know.”
(Robert M. Pirsig, 1974)
Mislead By the Model
(Barr et al., 2004)
Adapted by CTLT
A Cause Is A Cause
Chapter 2
A Cause is a Cause
• Causal
criteria
•
•
•
•
Consistency
Strength
Temporality
Coherence
The 1964 Surgeon General’s Report
• “Statistical methods cannot
establish proof of a causal
relationship in an association.
The causal significance of an
association is a matter of
judgment which goes beyond any
statement of statistical
probability”.
Ocular Data Analysis
Chapter 3
Raymond Pearl, 1938:
Smoking Shortens Lifespan
Raymond Pearl, 1879-1940
Source: Adapted by CTLT from
Pearl, Science 1938
1952 London Fog
Adapted by CTLT
This is a graph shown in several documents published shortly after the
episode. Showing the high levels of pollution and the similar patterns in
daily mortality.
Xeropthalmia
and Child Mortality
(Sommer et al., 1983)
Adapted by CTLT
Therapy for
Wegener’s
Granulomatosis
Adapted by CTLT
(WGET Research Group, 2005)
John Wilder Tukey
16 June 1915 - 26 July 2000
John W. Tukey on His Book,
Exploratory Data Analysis
• This book is based on an important principle:
• It is important to understand what you CAN DO
before you learn to measure how WELL you seem
to have DONE it.
• Learning first what you can do will help you to
work more easily and effectively.
• This book is about exploratory data analysis, about
looking at data to see what it seems to say. It
concentrates on simple arithmetic and easy-todraw pictures. It regards whatever appearances
we have recognized as partial descriptions, and
tries to look beneath them for new insights. Its
(Tukey, 1977)
concern is with appearance, not with confirmation.
Discussion of “Role of Statistics in
National Health Policy Decisions”
• The time spent by the medical members of
the Surgeon-General’s committee on
“analyzing data and interpreting it”
encourages me. The analysis and
interpretation of data can neither be a
domain left to statisticians nor one over
which statistician’s rule as tyrants. There
will always be too few statisticians; they
must spread the insight, the techniques,
and the attitudes as widely as possible.
(Tukey, 1976)
Finding Haystacks Not Needles
Chapter 4
Small Sample Gems
• They exist! For example:
– DES and vaginal adenocarcinoma
– Uranium mining and lung cancer
– Vinyl chloride and angiosarcoma
of the liver
Adenocarcinoma of the Vagina: Association of
Maternal Stilbestrol Therapy with Tumor
Appearance in Young Women
•
might be associated with tumor appearance…. Most
significantly,
seven of the eight mothers of patients with
carcinoma had been treated with
diethylstilbestrol started during the first
trimester. None in the control group were so
treated (p less than 0.00001). Maternal ingestion of
stilbestrol during early pregnancy appears to have enhanced the risk of
vaginal adenocarcinoma developing years later in the offspring exposed.
Source: Herbst , Ulfelder H, Poskanzer DC.
Adenocarcinoma of the vagina in young women has been
recorded rarely before the report of several cases treated at the Vincent
Memorial Hospital between 1966 and 1969. The unusual occurrence of this
tumor in eight patients born in New England hospitals between 1946 and
1951 led us to conduct a retrospective investigation in search of factors that
Uranium Mining and Navajo Men
“The association
between uranium
mining and lung
cancer was
statistically
significant
(p = 1.1 x 10-11).”
Source: Samet et al. NEJM 1984
Finding Haystacks not Needles
• For large effects, who needs a p
value?
• Principles
– Small numbers, large effect
– Worry
– Bias > Chance > Cause
– Publish? or Perish?
The Seven Deadly Biostatisticians
The Seven Deadly Sins of Biostatistics
•
•
•
•
•
•
•
P valuing
Modeling not thinking
Model as message
Kitchen sink modeling
Seduction by sophistication
Picking the prior
Intimidating the naive
P-Valuing: A Recent Example
• A Manuscript Reviewed
• Study of race and treatment (N=240)
• Key finding: OR for association of black
vs white for being offered treatment =
0.49 (p=0.09)
• Author interpretation: No association
• Samet interpretation: Key finding
Relative Risk of breast cancer according to quintile of adolescent
caloric and fat intake in women in the NHS II
a
Multivariate model was adjusted for age,
time period (two year interval), height (<62,
62–<65, 65–<68, 68þ in.), parity and age at first
birth (nulliparous, parity £2 and age at first
birth <25 years, parity £2 and age at first birth
25–<30 years, parity £2 and age at first birth
30þ years, parity 3þ and age at first birth <25
years, parity 3þ and age at first birth 25þ
years), body mass index at age 18 (<18.5,
18.5–22.4, 22.5–29.9, 30.0þ kg/m2), age at
menarche (<12, 12, 13, ‡14 years), family
history of breast cancer (yes, no), history of
BBD (yes, no), menopausal status
(premenopausal, postmenopausal, dubious,
unsure), alcohol intake (non-drinkers, <5, 5–
<10, 10–<20, 20þ g/d), oral- contraceptive use
(never, past ‡4 years, past <4 years, current
<8 years, current ‡8 years), weight gain since
age 18 (weight loss greater than 5 kg, weight
gain or loss 5 kg, weight gain 5–10 kg, weight
gain 10–20 kg, weight gain >20 kg).
(Frazier et al, 2004)
Kitchen Sink Modeling
Intimidating by Sophistication
• The model was fitted with the Efron
method for ties and a robust variance
estimator to account for patient-episode
level clustering, using Stata 7.0
software (College Station, TX, USA).
The proportional-hazards assumption
was assessed with log-log survival plots
and, formally, with scaled Schoenfeld
residuals (Stata).
(Cepeda et al., 2005)
Model As Message:
Analysis to Meet the Policy Need
(USEPA, 1974)
Finding Needles Not Haystacks
Chapter 6
Daily time series of air pollution mortality
and weather in Baltimore 1987-1994
Adapted by CTLT
Air pollution signal order of
magnitude smaller than confounders
Estimates of model predictors in the GAM model Pittsburgh (1987-1994)
Adapted by CTLT
National
Morbidity
Mortality
Air
Pollution
Study
Adapted by CTLT
Adapted by CTLT from: Jonathan M. Samet, M.D., Francesca Dominici, Ph.D., Frank C.
Curriero, Ph.D., Ivan Coursac, M.S., and Scott L. Zeger, Ph.D. New England Journal of
Medicine
Gibson’s Law
• For every Ph.D. there’s an equal and
opposite Ph.D.
Or for every biostatistician, there’s an
equal and opposite biostatistician.
The Care And Feeding Of
Biostatisticians
Chapter 7
Post-test
Is Biostatistics Necessary?
• □ YES
• □ NO
□ It depends
PubMed “Hits” on Biostatistics1, 19822 2004
4500
4000
3500
3000
2500
2000
1500
1000
500
0
1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
1
2
“English language” – only qualifier
1982 – Scott Zeger is appointed to the faculty of the School of Hygiene
and Public Health