620.152 Introduction to Biomedical Statistics 620-152 Introduction to Biomedical Statistics Ray Watson room 104 [Mon, Wed & Fri 1.15–2.15] email: rayw@ms.unimelb.edu.au Pre-requisite: VCE Math Methods (and 620-151) “Good, Watson! You always keep us flat-footed on the ground.” Sherlock Holmes, The Adventure of the Creeping Man, 1927. Course Notes There are several recommended text-books (see below), but the course notes, the tutorial problems, the computer prac-notes should be sufficient. These are all available from the course web-site: Course web-site Notes Chapter 0 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Summary Notes Statistical Tables Problems Answers Computer Labs Answers Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5 Problem Set 6 Problem Set 7 Problem Set 8* Problem Set 9 Problem Set 10 Problem Set 11 Problem Set 12* Assignment Revision Exercises Last Year’s Exam Answers 1 Answers 2 Answers 3 Answers 4 Answers 5 Answers 6 Answers 7 Answers 8 Answers 9 Answers 10 Answers 11 Answers 12 Asst Answers RE Answers LYE Answers Computer Lab 1 Computer Lab 2 Computer Lab 3 Computer Lab 4 Computer Lab 5 Computer Lab 6 Computer Lab 7 Computer Lab 8 Computer Lab 9 Computer Lab 10 Computer Lab 11 CL Answers 1 CL Answers 2 CL Answers 3 CL Answers 4 CL Answers 5 CL Answers 6 CL Answers 7 CL Answers 8 CL Answers 9 CL Answers 10 CL Answers 11 Reference books: If you find the course notes are not quite right for you, (and even if you do), there are a range of similar texts (mostly with Biostatistics and Introduction in the title) which may suit you better, including: Pagano M & Gauvreau K “Principles of Biostatistics’ (Duxbury) Rosner B “Fundamentals of Biostatistics” (Thomson) Devore J & Peck R Statistics, the Exploration and Analysis of Data. (Duxbury) Utts JM & Heckard RF “Statistical Ideas and Methods” (Duxbury) Lectures: Monday 9.00 Laby Theatre Wednesday 9.00 Laby Theatre Friday 9.00 Laby Theatre Lectures: The lecture notes will appear on the web-site — mostly the weekend before, or possibly earlier. Tutorials & Computer Labs start in the second week. Tutorials: set problems, homeworks and general clarification of the stuff I haven’t explained properly. Computer Labs: One hour per week. Using MINITAB. MINITAB will be essential for some of the homework questions, some of the assignment questions and will also be examined. 620.152 Introduction to Biomedical Statistics Computer packages: MINITAB is the standard statistical package available in the Computer Labs; it’s easy to use and will do all the statistical things you’ll need. You will be expected to handle MINITAB and in particular to interpret MINITAB output. [A student version is available for ∼A$200; 5 months rental costs ∼$50.] EXCEL is readily available and will get a lot of the Statistics done, even if it is a bit DIY; there is a Statistics add-on, which is clunky and minimally useful. It is often useful for data and for simple graphs . . . but not pie charts! WORD is useful for presentation of reports. At least one of the questions on the assignment must be presented as a report. Assessment: end of semester exam weekly homeworks assignment prac-tests 80% 10% 5% 5% Exam: Standard three hour format. Questions like those in the problem sets and computer labs. But . . . No calculators: simple and approximate arithmetic required. Homework: Each week a problem sheet will be handed out. This will contain a number of homework problems to be submitted for assessment and a number of problems for discussion in the tutorial. Assignment: There will be an assignment, handed out in week 8, with questions that are a bit longer and more involved that those in the weekly homeworks. Prac tests: There will be five short and simple tests in the computer practical classes (roughly every second week) to ensure that you can use MINITAB to do some basic statistical analysis. Statistical tables “Statistical Tables for Students” are available from the web-site. This set of tables will be available for use in the exam. Summary notes are available from the web-site. These summary notes will be available for use in the exam. Note: if there are any additional formulae that you want included, just ask. 620.152 Introduction to Biomedical Statistics types of studies Probability ? −→ ←− population model sample observations data description Statistics ? rates life tables Course contents 0. Introduction 1. Exploratory data analysis 2. Studies and Design 3. Probability and applications 4. Probability distributions 5. Sampling and sampling distributions 6. Estimation: point and interval estimations 7. Hypothesis testing 8. Inference on proportions 9. Comparative inference 10. Correlation and regression 11. Life tables and standardisation There are eleven “chapters” and there are twelve weeks in the semester. The chapters correspond roughly (but only roughly) to weeks. Mathematics and Statistics Administration Teaching and Learning 620.152 Introduction to Biomedical Statistics 0.1 0.2 0.3 What is statistics? Population and sample Variability References: Pagano & Gauvreau, Chapter 1 What is Statistics? Statistics is the science of collecting, analysing and drawing conclusions from data. Statistics is the study of variability and uncertainty. The science of collecting, organizing, and analyzing data. A branch of applied mathematics concerned with the collection and interpretation of quantitative data and the use of probability theory to estimate population parameters. A branch of mathematics that deals with the analysis and interpretation of numerical data in terms of samples and populations The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling. Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities. Biostatistics or biometry is the application of statistics to a wide range of topics in biology. It has particular applications to medicine and to agriculture. Epidemiology is the science devoted to the statistical study of categories of persons and the patterns of diseases from which they suffer, with the aim of determining the events or circumstances causing these diseases. Epidemiology is the use of medical science and statistics to track population health and to find causes of disease in groups of people. Statistics provides the tools scientists use to analyse their data, and principles on how best to design their experiments to collect data. In evidence-based medicine, treatments and procedures advocated must be supported by hard evidence, which means data from well-designed experiments, ensuring valid and efficient outcomes; and analysed by appropriate statistical methods. Why might you study statistics? • because it’s interesting, useful, enjoyable! • to conduct research (in any field); to read and understand research papers in your discipline area; • to apply basic statistical methods in your course (project, lab work, honours thesis); and more immediately, to pass this course. 620.152 Introduction to Biomedical Statistics Population and sample Perhaps the most fundamental concepts in statistics are: • Population — the totality of units under study, which may be (and often is) hypothetical; • Sample — the observed units, i.e. the units on which we have information (measurements); types of studies Probability ? population model −→ ←− sample observations data description Statistics ? rates life tables We first describe and explore the data (i.e. the sample); and then examine where it came from (i.e. the population) and how [though, in practice this would come first]; then we learn how to use the data to infer about the where it came from . . . possible generalisations. By and large, we want to be able to say something about the population on the basis of the information you have in the sample. This requires a scientific investigation, which typically takes the following steps: Question(s) ↓ * Study design (2) ↓ Data collection ↓ * Data display (EDA) (1) ↓ * Inference ↓ * Answers and Conclusions ↓ * Reporting results (1) The steps marked with an asterisk all involve statistics. 620.152 Introduction to Biomedical Statistics Example (Communication) Data in Four Areas and Eight Three-Month Periods in 1998-1999 13-15 16-18 19-21 22-24 25-27 28-30 31-33 34-36 A 97.63 92.24 98.90 90.39 95.69 94.44 91.13 97.81 B 48.29 42.31 49.98 39.09 46.38 49.74 41.74 37.39 C 75.23 75.16 77.04 74.23 74.23 76.97 71.66 76.47 D 49.69 57.21 75.19 51.09 52.88 49.41 59.32 52.56 Variability We need to use statistics when the data show variability: i.e. all the time! Usually the mean is “obvious” or “guessable”; but the variability is usually not. This is one reason why you need a course in Statistics. For a lot of things you do in this course, modelling or estimating the mean is “obvious”, but assessment of the variability (and hence the accuracy or precision) of your model or estimate is not so obvious. 32.4 59.1 53.6 89.9 30.0 58.4 50.6 51.7 58.9 87.3 61.0 58.0 71.4 63.0 54.4 63.5 67.8 63.3 36.9 52.2 45.1 57.4 44.2 75.9 56.6 42.9 46.6 66.1 41.1 41.2 60.9 47.9 39.6 73.9 67.9 72.1 55.7 78.5 40.6 61.3 44.2 37.3 49.1 39.4 29.6 61.7 73.2 53.5 99.9 47.1 60.8 50.5 51.3 48.9 45.4 32.1 40.7 75.8 27.9 34.1 67.9 57.0 43.2 61.3 32.7 44.1 52.4 42.1 54.7 42.1 43.1 61.5 28.2 54.6 37.9 56.1 60.3 60.4 63.5 52.1 61.4 42.2 65.6 72.9 56.1 51.6 46.2 48.9 41.1 45.6 57.5 37.2 66.0 55.9 61.7 63.2 60.6 28.9 39.1 41.3 . : . : . . :. . :.:::.. .:.::::.:: .. . ::.:. :::::::.:::::::::: :: :: :. . . . -----+---------+---------+---------+---------+-------- x 30 45 60 75 90 x N 100 Mean 53.40 StDev 13.96 Min 27.90 Q1 42.38 Med 53.55 Q3 61.48 Max 99.90 What is the underlying mean? It is not enough to say that the estimate is 53.4. Is it 53.4 ± 0.6? or 53.4 ± 2.6? or 53.4 ± 12.6? Statistics provides a handle on variability. Statistics, as a scientific discipline, is concerned with: 1. dealing with variability in a population; 2. drawing conclusions that will stand the test of reproducibility. “You are developing a certain unexpected vein of pawky humour, Watson, against which I must learn to guard myself.” Sherlock Holmes, The Valley of Fear, 1914.