Statistical Rules of Thumb Gerald van Belle Departments of Biostatistics and Environmental Health University of Washington, Seattle, WA. CHS October 28, 2003 1 Outline A. Statistical rule of thumb B. Experimental or observational studies C. Covariation D. Sample size E. Presentation of results F. Recapitulation Slides available on WEB at: www.vanbelle.org CHS October 28, 2003 2 A. Statistical rule of thumb-1 A statistical rule of thumb is defined as a widely applicable guide to statistical practice—with sound theoretical basis. Characteristics include intuitive appeal, elegance, and transparency. CHS October 28, 2003 3 A. Statistical rule of thumb-2 1. Statistical rule as quick response a. Typically committee meetings b. Consultation sessions c. Need to know the basics d. May be preaching to the choir CHS October 28, 2003 4 A. Statistical rule of thumb-3 2. What are the basics? a. Definition of statistical rule of thumb b. Characteristics of rule of thumb c. Substantive areas are idiosyncratic. In my case: environmental studies, epidemiology, statistical consulting CHS October 28, 2003 5 B. Randomized and Observational Studies-1 Rule 1.1 Distinguish between observational and randomized studies. a. Hohum! b. What’s so nice about randomized studies? c. What’s nice about observational studies? d. Gradation in observational studies. e. Some references. CHS October 28, 2003 6 B. Randomized and observational studies-2 a. Hohum!? * Epidemiology vs biostatistics * Selection issues * Missing data issues * Fragility of causal models Arm Waving CHS October 28, 2003 7 B. Randomized and observational studies-3 a. Hohum! b. What’s so nice about randomized studies? * Provides a probability model * Rule 6.1 “Randomization puts systematic sources of variability into the error term.” (DeLury) * Lack of randomization leads to arm waving CHS October 28, 2003 8 B. Randomized and observational studies-4 a. Hohum! b. What’s so nice about randomized studies? c. What’s nice about observational studies? * Easier to carry out * Majority of data; e.g. administrative data bases * Ethical constraints on randomization * Lots of data CHS October 28, 2003 9 B. Randomized and observational studies-5 a. b. c. d. Hohum! What’s so nice about randomized studies? What’s nice about observational studies? Gradation in observational studies. Registry/ Cohort Case Report CHS October 28, 2003 10 B. Randomized and observational studies-6 e. Some references. Benson and Hartz (2000) NEJM Concato, Shah and Horwitz (2000) NEJM Copas and Li (1997) JRSS B Copas and Shi (2000) BMJ Hays (2001) Am. Scientist (See May 2002 ROM on website) CHS October 28, 2003 11 C. Statistical rules of thumb for covariation-1 Rule 3.1: “Before choosing a measure of covariation determine the source of the data (sampling scheme), the nature of the variables, and the symmetry status of the measure.” CHS October 28, 2003 12 C. Statistical rules of thumb for covariation-2 CHS October 28, 2003 13 C. Statistical rules of thumb for covariation-3 Rule 3.2: “Do not summarize regression sampling schemes with correlations.” Assume simple linear regression, Y on X 2 regression 2 regression r 1− r 2 random 2 random r = 1− r CHS October 28, 2003 × 2 x,regression 2 x,random s s 14 C. Statistical rules of thumb for covariation-5 True r2 Ratio s2(x)/s2(true) 0.25 0 0 0.10 0.03 0.20 0.06 0.30 0.10 1 4 0 0 0.10 0.31 0.20 0.50 0.30 0.63 CHS October 28, 2003 15 C. Statistical rules of thumb for covariation-6 Rule 3: “Assess agreement by addressing accuracy, scale differential, and precision. Accuracy can be thought of as the lack of bias.” CHS October 28, 2003 16 C. Statistical rules of thumb for covariation-7 Model: Y1 and Y2 measured on the same “objects” E (Y1 − Y2 ) = ( µ1 − µ 2 ) + (σ 1 − σ 2 ) + 2(1 − ρ )σ 1σ 2 2 Total Deviance 2 2 + = Bias + Scale differential Imprecision Lin (1989) CHS October 28, 2003 17 C. Statistical rules of thumb for covariation-8 Model: Y1 and Y2 measured on the same “objects” E (Y1 − Y2 ) 2 ( µ1 − µ 2 ) 2 (σ 1 − σ 2 ) 2 = + + (1 − ρ ) 2σ 1σ 2 2σ 1σ 2 2σ 1σ 2 Deviance = Bias + Scale + Imprecision differential CHS October 28, 2003 18 D. Sample size responsibilities-1 n=2 ( z1−α / 2 + z1− β ) µ1 − µ 2 σ 2 Statistician 2 Investigator Investigator CHS October 28, 2003 19 D. Sample size responsibilities-2 n= Type I error = 0.05 Type II error = 0.20 (Power = 0.80) 16 µ1 − µ2 σ 2 Topic for discussion: Treatment effect+ Variability=Effect Size Two sample (default) CHS October 28, 2003 20 D. Sample size responsibilities-3 Effect Size µ1 − µ2 = Effect Size σ 4 Effect Size = n CHS October 28, 2003 21 E. Presentation of results-1 Rule 7.1: When text, when tables, when graphs? “Use sentence structure for displaying 2 to 5 numbers, tables for displaying more numerical information, and graphs for complex relationships.” CHS October 28, 2003 22 E. Presentation of results-2 Rule 7.1: When text, when tables, when graphs? a. Illustration-version 1 “The blood type of the population of the United States is approximately 40%, 11%, 4% and 45% A, B, AB, and O, respectively.” CHS October 28, 2003 23 E. Presentation of results-3 Rule 7.1: When text, when tables, when graphs? a. Illustration-version 2 “The blood type of the population of the United States is approximately 40% A, 11% B, 4% AB, and 45% O.” CHS October 28, 2003 24 E. Presentation of results-4 Rule 7.1: When text, when tables, when graphs? a. Illustration-version 3 “The blood type of the population of the United States is approximately, O 45% A 40% B 11% AB 4% CHS October 28, 2003 25 E. Presentation of results-5 Rule 7.2: Table Structure “Arrange rows and columns in meaningful way, Limit the number of significant digits, Make the table as self-contained as possible, Use white space and lines to organize rows and columns, Do not stint on table headings” CHS October 28, 2003 26 Rule 7.2: Table structure Original table on right. 1. Different degrees of precision, due to different sources of data. 2. Ordering by alphabet; Spanish version would look different. CHS October 28, 2003 27 Rule 7.2: Table structure 1. Reduced number of digits, 2. Rows arranged by frequency, 3. White space suggests similar groups CHS October 28, 2003 28 E. Presentation of results-6 Rule 7.2: Table Structure Significant digits for frequencies CHS October 28, 2003 29 CHS October 28, 2003 30 Table 7.5 Reformatted Number of activities Women Age 70-74 Age 75-79 Age 75-79 Age 85+ % 1 7 27 65 % 1 10 28 61 % 2 12 32 54 % 3 19 38 39 0 1-2 3-4 5-7 Mean 4.8 4.5 CHS October 28, 2003 5 23 36 36 3 16 37 44 2 13 30 55 2 10 26 61 0 1-2 3-4 5-7 4.0 4.5 4.8 5.0 Mean Men 4.2 4.0 31 E. Presentation of results-7 Rule 7.3: Graph data “When possible graph the data.” CHS October 28, 2003 32 CHS October 28, 2003 33 The following three tables and figures are from a great paper: Gelman, A., Pasarica, C. and Dohdia, R. (2002). Let’s practice what we preach: Turning tables into graphs. The American Statistician, 56: 121-130. CHS October 28, 2003 34 E. Presentation of results-8 Rule 7.4: Pie Charts “Never use a pie chart. Present a simple list of percentages, or whatever constitutes the divisions of the pie chart.” CHS October 28, 2003 35 CHS October 28, 2003 36 E. Presentation of results-9 Rule 7.5: Bar Charts “Always think of alternatives to a bar graph.” CHS October 28, 2003 37 CHS October 28, 2003 38 CHS October 28, 2003 39 E. Presentation of results-10 Rule 7.6: Stacked bar charts “There are much more effective ways of showing data structure than stacked bar charts.” CHS October 28, 2003 40 CHS October 28, 2003 41 E. Presentation of results-11 Rule 7.7: Three-dimensional bar graphs “Never use three-dimensional bar graphs.” CHS October 28, 2003 42 CHS October 28, 2003 43 E. Presentation of results-12 Rule 7.8: Longitudinal data “In the case of longitudinal data identify both cross-sectional and longitudinal patterns.” CHS October 28, 2003 44 CHS October 28, 2003 45 E. Presentation of results-13 Rule 7.9: High dimensional data “Three key aspects of presenting high dimensional data are: rendering, manipulation, and linking. Rendering determines what is to be plotted, manipulation determines the structure of the relationships, and linking determines what information will be shared between plots or sections of the graph.” CHS October 28, 2003 46 CHS October 28, 2003 47 E. Presentation of results-14 1. Chew words carefully 2. Watch table manners 3. Avoid pies 4. Stay away from bars CHS October 28, 2003 48 F. Summary and recapitulation 1. Concept of statistical rule of thumb 2. Observation studies are fragile 3. Covariation needs to be described correctly 4. Logic to presentation of results CHS October 28, 2003 49 G. Resources 1. Gerald van Belle, (2002). Statistical Rules of Thumb, John Wiley and Sons, New York, NY. 2. WEB sites: evolving rapidly 4. Statistical books and journals: Audience recommendations 4. My choices at http://www.vanbelle.org 5. General purpose books and journals on science 6. Colleagues. Find a consultants’ consultant CHS October 28, 2003 50 H. Acknowledgments 1. DeLury (University of Toronto) 2. Steve Millard, Jim Hughes, Michael Levin 3. Paul Crane 4. Biostatistics, statistics, and epidemiology colleagues 5. Consultees who sharpened my skills CHS October 28, 2003 51