This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2006, The Johns Hopkins University and Jonathan M. Samet. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed. Department of Epidemiology Is Biostatistics Necessary? A Non-Systematic Review of the Evidence Jonathan M. Samet, MD, MS Pre-test Is Biostatistics Necessary? • □ YES • □ NO • □ It depends “Some of my best friends are …” PubMed “Hits” on Biostatistics1 and Epidemiology, 19822 - 2004 100000 10000 1000 Biostatistics Epidemiology 100 10 20 04 20 03 20 02 20 01 20 00 19 99 19 98 19 97 19 96 19 95 19 94 19 93 19 92 19 91 19 90 19 89 19 88 19 87 19 86 19 85 19 84 19 83 19 82 1 1 “English language” – only qualifier 2 1982 – Scott Zeger is appointed to the faculty of the School of Hygiene and Public Health Drs. Zaner and Zeger Dr. Karl Zaner Dr. Scott Zeger Sex Expert Biostatistics Expert 1. Why biostatistics is irrelevant 2. A cause is a cause 3. Ocular data analysis 4. Finding haystacks not needles 5. The seven deadly sins of biostatistics 6. When is biostatistics unavoidable? 7. Tips on the care and feeding of biostatisticians. Why Biostatistics Is Irrelevant Chapter 1 Advice From the Data Guru Al Sommer on Data • “Don’t pose a question, give the data to your statisticians, and ask them ‘What’s the p value?’” Sommer advises.” If I had done that I would have missed the entire vitamin A mortality story.” Source: Lancet, Feb 19, 2005 Sommer on Data • “He still loves to steep himself in the data. “I say ‘data talk to me, tell me what you have to say’”. Often, though, the answers come at odd times, Sommer says. “You don’t get the insights you need—either the answer or how you are going to approach a question—while you are actively thinking about it.” Source: Lancet, Feb 19, 2005 Sommer on Data “You have to know your data, you have to smell it, you have to be in it”, he says. “If you’re not living inside the data you are going to miss the most interesting things, because the most interesting things are not going to be the questions your originally proposed, the interesting things are going to be questions you hadn’t thought about.” Source: Lancet, Feb 19, 2005 “The real purpose of the scientific method is to make sure Nature hasn’t misled you into thinking you know something you don’t actually know.” (Robert M. Pirsig, 1974) Mislead By the Model (Barr et al., 2004) Adapted by CTLT A Cause Is A Cause Chapter 2 A Cause is a Cause • Causal criteria • • • • Consistency Strength Temporality Coherence The 1964 Surgeon General’s Report • “Statistical methods cannot establish proof of a causal relationship in an association. The causal significance of an association is a matter of judgment which goes beyond any statement of statistical probability”. Ocular Data Analysis Chapter 3 Raymond Pearl, 1938: Smoking Shortens Lifespan Raymond Pearl, 1879-1940 Source: Adapted by CTLT from Pearl, Science 1938 1952 London Fog Adapted by CTLT This is a graph shown in several documents published shortly after the episode. Showing the high levels of pollution and the similar patterns in daily mortality. Xeropthalmia and Child Mortality (Sommer et al., 1983) Adapted by CTLT Therapy for Wegener’s Granulomatosis Adapted by CTLT (WGET Research Group, 2005) John Wilder Tukey 16 June 1915 - 26 July 2000 John W. Tukey on His Book, Exploratory Data Analysis • This book is based on an important principle: • It is important to understand what you CAN DO before you learn to measure how WELL you seem to have DONE it. • Learning first what you can do will help you to work more easily and effectively. • This book is about exploratory data analysis, about looking at data to see what it seems to say. It concentrates on simple arithmetic and easy-todraw pictures. It regards whatever appearances we have recognized as partial descriptions, and tries to look beneath them for new insights. Its (Tukey, 1977) concern is with appearance, not with confirmation. Discussion of “Role of Statistics in National Health Policy Decisions” • The time spent by the medical members of the Surgeon-General’s committee on “analyzing data and interpreting it” encourages me. The analysis and interpretation of data can neither be a domain left to statisticians nor one over which statistician’s rule as tyrants. There will always be too few statisticians; they must spread the insight, the techniques, and the attitudes as widely as possible. (Tukey, 1976) Finding Haystacks Not Needles Chapter 4 Small Sample Gems • They exist! For example: – DES and vaginal adenocarcinoma – Uranium mining and lung cancer – Vinyl chloride and angiosarcoma of the liver Adenocarcinoma of the Vagina: Association of Maternal Stilbestrol Therapy with Tumor Appearance in Young Women • might be associated with tumor appearance…. Most significantly, seven of the eight mothers of patients with carcinoma had been treated with diethylstilbestrol started during the first trimester. None in the control group were so treated (p less than 0.00001). Maternal ingestion of stilbestrol during early pregnancy appears to have enhanced the risk of vaginal adenocarcinoma developing years later in the offspring exposed. Source: Herbst , Ulfelder H, Poskanzer DC. Adenocarcinoma of the vagina in young women has been recorded rarely before the report of several cases treated at the Vincent Memorial Hospital between 1966 and 1969. The unusual occurrence of this tumor in eight patients born in New England hospitals between 1946 and 1951 led us to conduct a retrospective investigation in search of factors that Uranium Mining and Navajo Men “The association between uranium mining and lung cancer was statistically significant (p = 1.1 x 10-11).” Source: Samet et al. NEJM 1984 Finding Haystacks not Needles • For large effects, who needs a p value? • Principles – Small numbers, large effect – Worry – Bias > Chance > Cause – Publish? or Perish? The Seven Deadly Biostatisticians The Seven Deadly Sins of Biostatistics • • • • • • • P valuing Modeling not thinking Model as message Kitchen sink modeling Seduction by sophistication Picking the prior Intimidating the naive P-Valuing: A Recent Example • A Manuscript Reviewed • Study of race and treatment (N=240) • Key finding: OR for association of black vs white for being offered treatment = 0.49 (p=0.09) • Author interpretation: No association • Samet interpretation: Key finding Relative Risk of breast cancer according to quintile of adolescent caloric and fat intake in women in the NHS II a Multivariate model was adjusted for age, time period (two year interval), height (<62, 62–<65, 65–<68, 68þ in.), parity and age at first birth (nulliparous, parity £2 and age at first birth <25 years, parity £2 and age at first birth 25–<30 years, parity £2 and age at first birth 30þ years, parity 3þ and age at first birth <25 years, parity 3þ and age at first birth 25þ years), body mass index at age 18 (<18.5, 18.5–22.4, 22.5–29.9, 30.0þ kg/m2), age at menarche (<12, 12, 13, ‡14 years), family history of breast cancer (yes, no), history of BBD (yes, no), menopausal status (premenopausal, postmenopausal, dubious, unsure), alcohol intake (non-drinkers, <5, 5– <10, 10–<20, 20þ g/d), oral- contraceptive use (never, past ‡4 years, past <4 years, current <8 years, current ‡8 years), weight gain since age 18 (weight loss greater than 5 kg, weight gain or loss 5 kg, weight gain 5–10 kg, weight gain 10–20 kg, weight gain >20 kg). (Frazier et al, 2004) Kitchen Sink Modeling Intimidating by Sophistication • The model was fitted with the Efron method for ties and a robust variance estimator to account for patient-episode level clustering, using Stata 7.0 software (College Station, TX, USA). The proportional-hazards assumption was assessed with log-log survival plots and, formally, with scaled Schoenfeld residuals (Stata). (Cepeda et al., 2005) Model As Message: Analysis to Meet the Policy Need (USEPA, 1974) Finding Needles Not Haystacks Chapter 6 Daily time series of air pollution mortality and weather in Baltimore 1987-1994 Adapted by CTLT Air pollution signal order of magnitude smaller than confounders Estimates of model predictors in the GAM model Pittsburgh (1987-1994) Adapted by CTLT National Morbidity Mortality Air Pollution Study Adapted by CTLT Adapted by CTLT from: Jonathan M. Samet, M.D., Francesca Dominici, Ph.D., Frank C. Curriero, Ph.D., Ivan Coursac, M.S., and Scott L. Zeger, Ph.D. New England Journal of Medicine Gibson’s Law • For every Ph.D. there’s an equal and opposite Ph.D. Or for every biostatistician, there’s an equal and opposite biostatistician. The Care And Feeding Of Biostatisticians Chapter 7 Post-test Is Biostatistics Necessary? • □ YES • □ NO □ It depends PubMed “Hits” on Biostatistics1, 19822 2004 4500 4000 3500 3000 2500 2000 1500 1000 500 0 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 1 2 “English language” – only qualifier 1982 – Scott Zeger is appointed to the faculty of the School of Hygiene and Public Health