WHATS NEW: Book Revision History This is a list of all updates made to individual chapters. This is a quick reference to verify that you have downloaded the latest revision of any particular chapter. 26 Nov 2012 Chapter 5-14 Added bootstrap validation of linear regression to end of chapter. 22 Oct 2012 Chapter 4-4 Added this chapter on power analysis for basic science applications (small sample sizes), specifically, Fisher’s exact test. ---2 Sep 2012 Chapter 2-16 Added a section on how the ICC is affected by having a very narrow range of scores, a problem analogous to what happens to the Kappa statistic when data clump up in a corner of a 2 x 2 table. 2 Sep 2012 Chapter 1-6 Revised the section on submitting graphs to journals, now recommending EPS graphs, rather than TIFF graphs, and explain how to make the fonts of EPS graphs look crisp when importing into Microsoft PowerPoint or Word. 16 Mar 2012 Chapters 3-98, 3-99 Added some homework problems for Chapter 3-5 20 Feb 2012 Chapter 1-4 Fixed a couple of confusing typos 20 Feb 2012 Title Page Added year 2012 to copyright line and modified it slightly 20 Feb 2012 Chapter 1-2 Updated for Stata version 12. Now shows the import command for reading directly from an Excel file. 20 Feb 2012 Chapter 4-3 Added this chapter on sample size paragraphs for grants 19 Feb 2012 Chapter 4-2 Added this chapter on computing sample sizes and power for noninferiority studies. 15 Feb 2012 Chapter 5-99 Updated a problem for Chapter 5-9 to match Stata version 12 15 Feb 2012 Chapter 5-98 Updated a problem for Chapter 5-9 to match Stata version 12 15 Feb 2012 Chapter 5-9 Added Stata version 12 command for imputation using chained equations. 14 Feb 2012 Chapter 3-5 Corrected a calculation and wrong example—basically minor typographical error corrections. 13 Feb 2012 Chapter 5-3 In formula for population standard deviation, replaced x-bar with mu, so the notation is correct. 13 Feb 2012 Chapter 2-6 In formula for population standard deviation, replaced x-bar with mu, so the notation is correct. Also, correctly ordered the categories for the orthopaedic scale in the “a refinement of this idea” section so that it is an ordinal scale. 13 Jan 2012 Chapter 5-2 It now reads better. Also, two nice graphs were added for illustrating the slope formula and for illustrating how control for confounding works. The do-file, Ch 5-2.do updated to reflect the changes to the chapter. The data file Ch5-2.dta is no longer used and can be deleted, as it is now created in the do-file editor. … continued on page 4… _____________________ Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual. Salt Lake City, UT: University of Utah School of Medicine. Revision History Page. (Accessed September 2, 2012, at http://www.ccts.utah.edu/biostats/ ?pageId=5385). Revision History (revision 26 Nov 2012) p. 1 Last Revision Date of Each Chapter Front Matter 20 Feb 2012 16 May 2010 15 Mar 2011 16 May 2010 Title & copyright page Preface & suggestions for use & author contact information WHATS NEW: Book revision history Table of contents Section 1. Stata: Data Management, Graphics, and Programming 8 Jan 2012 Chapter 1-1 Installing Stata and recovering Stata windows 20 Feb 2012 Chapter 1-2 Getting data into Stata and some other basics 15 Feb 2011 Chapter 1-3 Cleaning data 20 Feb 2012 Chapter 1-4 Merging files 23 Feb 2011 Chapter 1-5 Labeling variables and values 2 Sep 2012 Chapter 1-6 Basic graphics 4 Mar 2011 Chapter 1-7 Looping, collapsing, and reshaping 16 May 2010 Chapter 1-8 Operators, ifs, dates, and times 27 Jun 2011 Chapter 1-9 More graphics: popular scientific graphs 16 May 2010 Chapter 1-10 Programming Stata 16 May 2010 Chapter 1-11 Compilation of frequently used variable generation and Modifying commands (a chapter for quick look up) 14 Oct 2011 Chapter 1-12 Stata results into Excel & Word 9 Mar 2011 Chapter 1-13 replaced by Chapter 1-98 to allow for book expansion 9 Mar 2011 Chapter 1-14 replaced by Chapter 1-99 8 Jan 2012 Chapter 1-98 Homework problems 8 Jan 2012 Chapter 1-99 Homework problem solutions Section 2. Biostatistics 8 Jan 2011 Chapter 2-1 15 Mar 2011 16 May 2010 17 Oct 2011 22 Dec 2011 13 Feb 2012 16 May 2010 8 May 2011 16 May 2010 13 Feb 2011 16 May 2010 17 Aug 2010 Chapter 2-2 Chapter 2-3 Chapter 2-4 Chapter 2-5 Chapter 2-6 Chapter 2-7 Chapter 2-8 Chapter 2-9 Chapter 2-10 Chapter 2-11 Chapter 2-12 6 Oct 2011 Chapter 2-13 16 May 2011 Chapter 2-14 30 Aug 2011 Chapter 2-15 2 Sept 2012 Chapter 2-16 Describing variables, levels of measurement, and choice of descriptive statistics Logic of significance tests Choice of significance test Comparison of two independent groups Basics of power analysis More on levels of measurement Comparison of two paired groups Multiplicity and the Comparison of 3+ Groups Correlation Linear regression Logistic regression and dummy variables Survival analysis: Kaplan-Meier graphs, Log-rank Test, and Cox regression Confidence intervals versus p values and trends toward significance Pearson correlation coefficient with clustered data Equivalence and noninferiority tests Validity and reliability Revision History (revision 26 Nov 2012) p. 2 9 Jan 2011 16 May 2010 8 May 2011 8 May 2011 8 Jan 2012 8 Jan 2012 Chapter 2-17 Chapter 2-18 Chapter 2-19 Chapter 2-20 Chapter 2-98 Chapter 2-99 Bland-Altman analysis One sample tests replaced by 2-98 to allow for book expansion replaced by 2-99 Homework problems Homework problem solutions Section 3. Epidemiology 16 May 2010 Chapter 3-1 16 May 2010 Chapter 3-2 1 Aug 2010 Chapter 3-3 19 Aug 2011 Chapter 3-4 14 Feb 2012 Chapter 3-5 16 May 2010 Chapter 3-6 14 Jul 2011 Chapter 3-7 16 May 2010 Chapter 3-8 16 May 2010 Chapter 3-9 16 May 2010 Chapter 3-10 16 May 2010 Chapter 3-11 16 May 2010 Chapter 3-12 16 May 2010 Chapter 3-13 16 May 2010 Chapter 3-14 16 May 2010 Chapter 3-15 16 Mar 2012 Chapter 3-98 16 Mar 2012 Chapter 3-99 Introduction to epidemiologic thinking Sufficient/component cause theory of disease Hill’s causal criteria Logic and errors Effect measures Study designs Randomization using Excel Bias and confounding Random error and statistics Crude analysis Stratified analysis Standardization Sensitivity (bias) analysis Case-cohort study design replaced by 3-98 Homework problems Homework problem solutions Section 4. Power Analysis 23 Jun 2010 Chapter 4-1 19 Feb 2012 20 Feb 2012 22 Oct 2012 8 Jan 2012 8 Jan 2012 Sample size determination and power analysis for specific applications Chapter 4-2 Sample size determination and power analysis for equivalence, noninferiority, and nonsuperiority tests. Chapter 4-3 Grant sample size paragraphs Chapter 4-4 Basic science applications Chapter 4-98 Homework problems Chapter 4-99 Homework problem solutions Section 5. Regression Models 16 May 2010 Chapter 5-1 What regression is and curvilinear correlation 13 Jan 2012 Chapter 5-2 Holding constant 13 Feb 2012 Chapter 5-3 Dichotomous predictor variables 16 May 2010 Chapter 5-4 Adjusted means, Analysis of Variance (ANOVA), and interaction 16 May 2010 Chapter 5-5 Deriving logistic regression 16 May 2010 Chapter 5-6 Exact logistic regression 16 May 2010 Chapter 5-7 Introducing Cox regression and Kaplan-Meier plots 16 May 2010 Chapter 5-8 Interaction 15 Feb 2011 Chapter 5-9 Missing data imputation 16 May 2010 Chapter 5-10 Linear regression robust to assumptions Revision History (revision 26 Nov 2012) p. 3 16 May 2010 16 May 2010 16 May 2010 26 Nov 2012 16 May 2010 16 May 2010 16 May 2010 16 May 2010 16 May 2010 16 May 2010 17 Aug 2010 16 May 2010 16 May 2010 16 May 2010 16 May 2010 16 May 2010 8 Jan 2011 8 May 2011 8 May 2011 15 Feb 2012 15 Feb 2012 Chapter 5-11 Chapter 5-12 Chapter 5-13 Chapter 5-14 Chapter 5-15 Chapter 5-16 Chapter 5-17 Chapter 5-18 Chapter 5-19 Chapter 5-20 Chapter 5-21 Chapter 5-22 Chapter 5-23 Chapter 5-24 Chapter 5-25 Chapter 5-26 Chapter 5-27 Chapter 5-28 Chapter 5-29 Chapter 5-98 Chapter 5-99 Linear regression diagnostics and transformations Variable selection and collinearity Monte Carlo Simulation and Bootstrapping Model Validation Response feature (summary measure) analysis Analysis of covariance (ANCOVA) versus change analysis Conditional logistic regression Repeated measures analysis of variance Generalized estimating equations (GEE) Multilevel (mixed effects) models Regression post tests Modeling cost Cox regression proportional hazards assumption Cluster analysis Multilevel (mixed effects) logistic regression Trend tests Propensity Scores replaced by 5-98 replaced by 5-99 Homework problems Homework problem solutions Section 6. Diagnostic Tests 16 May 2010 Chapter 6-1 16 May 2010 Chapter 6-1 16 May 2010 Chapter 6-1 16 May 2010 Chapter 6-1 Test characteristics Comparing diagnostic tests Imperfect reference tests Sampling with verification bias Appendices 12 Jul 2011 Appendix 1 9 Jan 2010 Appendix 2 16 May 2010 Appendix 3 Dataset Descriptions Bibliography List of cross references Continuation from page 1 (list of specific changes to book 8 Jan 2012 8 Jan 2012 8 Jan 2012 8 Jan 2012 8 Jan 2012 8 Jan 2012 8 Jan 2012 8 Jan 2012 8 Jan 2012 8 Jan 2012 Chapter 5-99 Chapter 5-98 Chapter 4-99 Chapter 4-98 Chapter 3-99 Chapter 3-98 Chapter 2-99 Chapter 2-98 Chapter 1-99 Chapter 1-98 Replaces Chapter 5-29 “Homework problem solutions” Replaces Chapter 5-28 “Homework problems” Added Chapter 4-99 “Homework problem solutions” Added Chapter 4-99 “Homework problems” Added Chapter 3-99 “Homework problem solutions” replaces Chapter 3-15 “Homework problems” replaces Chapter 2-20 “Homework problem solutions” replaces Chapter 2-19 “Homework problems” replaces Chapter 1-14 “Homework problem solutions” replaces Chapter 1-13 “Homework problems” to allow for book expansion Revision History (revision 26 Nov 2012) p. 4 8 Jan 2012 Chapter 1-1 22 Dec 2011 Chapter 2-5 Updated to make consistent with Stata version 12. Added another journal that requires a power analysis be reported when statistical significance is not achieved for a primary outcome 5 Nov 2011 Preface Changed “it is not appropriate to cite” to “it is” and direct the reader to the Title page for citations instructions. 5 Nov 2011 Title Page Added a second page to show detailed correct citation format for this internet-based textbook. 17 Oct 2011 Chapter 2-4 Quoted two additional works further defining the concept of a confidence interval 14 Oct 2011 Chapter 1-12 revised this chapter to take advantage of the Windows mouse right click options for moving Stata output into Microsoft Word and Excel. What was in the chapter before was very clutsy. 6 Oct 2011 Chapter 2-13 finished the section on making claims of “marginally significant” when 0.05 < p value < 0.10. 30 Aug 2011 Chapter 2-15 expanded discussion of noninferiority testing, recommending the use of one-sided tests using a two-sided 95% confidence interval (so alpha is 0.025). 19 Aug 2011 Chapter 3-4 added how to apply Stoddard’s aphorism, listed some additional articles discussing false conclusions in the medical literature, and added a summary paragraph tying together some ideas 27 Jul 2011 Chapter 2-4 made the definition of a confidence interval more rigorous, added reporting style for p values (number of decimals) 14 Jul 2011 Chapter 3-7 12 Jul 2011 Appendix 1 27 Jun 2011 Chapter 1-9 16 May 2011 Chapter 2-14 8 May 2011 Chapter 5-28 8 May 2011 Chapter 5-29 8 May 2011 Chapter 2-20 8 May 2011 Chapter 2-19 8 May 2011 Chapter 2-16 8 May 2011 Chapter 2-8 15 Mar 2011 Chapter 2-4 added examples of researchers used random permuted blocks approach and a suggested citation. added internet source of data or data itself, for some more of the datasets, to the Dataset Descriptions chapter added publication quality ROC curve graph and Kaplan-Meier curve graph fixed a bug in the program betweencorr so it now gives the correct sample size when missing data are present added more homework problem solutions to the Regression Models section of the book added the chapter Homework problem solutions to the Regression Models section of the book added more homework problem solutions to the Biostatistics section of the book added more homework problems to the Biostatistics section of the Book added section explaining why anomalous values can occurr with the ICC, where low ICCs occur even though the agreement looks tight added mcpi and fdri programs as an appendix; added how to convert between adjusted p and adjusted alpha formulas for Bonferroni and Finner procedures toned down discussion of checking assumptions for t-test, pointing out robustness of t-test to normality and homogeneity assumptions Revision History (revision 26 Nov 2012) p. 5 15 Mar 2011 Chapter 2-2 added a very large section on choosing between standard deviations, standard errors, and 95% CIs, for reporting study outcomes and for error bars on graphs. 15 Mar 2011 Chapter 1-6 added how to change text size in Stata version 10. 9 Mar 2011 Chapter 1-14 added this new chapter of solutions to homework problems for the Stata section 9 Mar 2011 Chapter 1-13 greatly expanded this chapter of homework problems for the Stata section 4 Mar 2011 Chapter 1-7 added more on looping structures 3 Mar 2011 Chapter 5-9 updated with “mi ice” so get more imputed values for multiple imputation than when using “mi impute regress”; added how to automate imputed categorical variables using most frequent category 23 Feb 2011 Chapter 1-5 reformatted so much easier to follow discussion 22 Feb 2011 Chapter 1-4 updated to Stata version 11 merge syntax that uses 1:1, 1:m, etc. 15 Feb 2011 Chapter 1-3 expanded it, added the inlist command 13 Feb 2011 Chapter 1-2 added how to import an Excel file when the variable names are not on the first row 13 Feb 2011 Chapter 1-1 added that “run as administrator” should be used when installing with Windows 7 in order to be able to create license file 29 Jan 2011 Chapter 2-5 added null and alternative hypothesis notation and added a power function graph created with Stata 9 Jan 2011 Chapter 5-28 reformated the homework problems and gave it a new chapter number 9 Jan 2011 Chapter 2-17 added confidence interval for the limits of agreement and added a protocol suggestion; renamed chapter from “methods comparison analysis” to “Bland-Altman analysis” 9 Jan 2011 Chapter 2-8 added a quote by Zolman explaining conservativeness of ANOVA, Added a quote by Scott justifying use of false discovery rate (FDR) in this research article. 13 Feb 2011 Chapter 2-10 added Rothman’s explanation of why restriction is more important than representativeness 9 Jan 2011 Chapter 2-6 added a page discussing the number of categories in an ordinal scale that are required to analyze it as an interval scale, giving a citation. Added a citation for justifying the treatment of a visual analog scale as an interval scale. 8 Jan 2011 Chapter 5-27 began development of new chapter on propensity scores 8 Jan 2011 Chapter 2-1 added box plot for two grouping variables 8 Jan 2011 Title & copyright page added a suggestion citation 8 Sep 2010 Chapter 5-9 added “set seed” preceding “hotdeckvar” command. 6 Sep 2010 Chapter 2-20 new chapter—solutions to homework problems 6 Sep 2010 Chapter 2-19 added more homework problems 6 Sep 2010 Chapter 1-2 added setting up file association in Windows so clicking on file correctly opens Stata and reads in the data. 6 Sep 2010 Chapter 2-1 added simulation of standard error and when to use it, graph showing varying SDs, description of degrees of freedom 4 Sep 2010 Chapter 2-13 added quote by Altman in favor of not interpreting a p=0.04 and Revision History (revision 26 Nov 2012) p. 6 p=0.06 differently. 17 Aug 2010 Chapter 5-21 added some additional examples of post-estimation tests: comparison of regression coefficients within the same model, comparison of regression coefficients from separate models, and comparison of two correlation coefficients. 17 Aug 2010 Chapter 2-12 shortened the chapter, taking out tests of assumptions for Cox regression (these are still available in Chapter 5-23) and taking out the advanced formulas (still available in Chapter 5-7). Added interpretation exercise of Kaplan-Meier probabilities. 6 Aug 2010 title & copyright page added the two websites where this book is available from 1 Aug 2010 Chapter 3-3 improved the introduction to the Cheskin article 1 Aug 2010 Chapter 1-1 added more clarification and removed installation question responses specific to the author’s institution 29 Jul 2010 Chapter 1-6 expanded the chapter to include: change size of symbols, lines, and text by multiplying the default size; mention of graphics editor; logarithm y-axis to odds ratio graph; finer details, such as 300 dpi, of preparing graph for publication 19 Jul 2010 Chapter 2-2 added presentation of role of sampling distribution and how to simulate it 12 Jul 2010 Chapter 2-1 expanded the chapter to include: graph demonstrating relationship of mean, median, and mode for symmetrical and skewed distributions; graph demonstrating percent of scores within 1,2, and 3 standard deviations; explanation of degrees of freedom for standard deviation formula; Stata commands table and tabstat for descriptive statistics 23 Jun 2010 Chapter 4-1 added section, “Interrater Reliability (Precision of Confidence Interval Around Intraclass Correlation Coefficient)” which provides a Stata program to compute sample size for interrater reliability Revision History (revision 26 Nov 2012) p. 7