Introduction to statistics in medicine – Part 1 Arier Lee Introduction • Who am I • Who do I work with • What do I do Why do we need statistics Population Sample The important role of statistics in medicine • Statisticians pervades every aspect of medical research • Medical practice and research generates lots of data • Research involves asking lots of questions with strong statistical aspects • The evaluation of new treatments, procedures and preventative measures relies on statistical concepts in both design and analysis • Statisticians are consulted at early stage of a medical study Research process Research question Primary and secondary endpoints Analyse data Study design Interpret results Sampling and/or randomisation scheme Disseminate Power and sample size calculation Pre-define analyses methods Bias • A form of systematic error that can affect scientific research • Selection bias – well defined inclusion / exclusion criteria, randomisation • Assessment bias – blinding • Response bias, lost-to-follow-up bias – maximise response • Questionnaire bias – careful wording and good interviewer training Some common data types • Continuous age, weight, height, blood pressure • Percentages % of households owning a dog • Counts Number of pre-term babies • Binary yes/no, male/female, sick/healthy • Ordinal taste of biscuits: strongly dislike, dislike, neutral, like, strongly like • Nominal categorical Ethnicity: European, Maori, Pacific Islander, Chinese etc. Descriptive statistics for continuous data – the average • Mean (sum of values)/(number in group) • Median The middle value, 50th percentile • Mode The value that occurs the most often 3 4 7 8 8 8 9 11 11 13 21 23 24 mean=11.54 median mode=8 Descriptive statistics for continuous data – the spread 0, 1, 2, 5, 8, 8, 9, 10, 12, 14, 18, 20 21, 23, 25, 27, 34, 43 Q1 Q2 Q3 18 numbers • Range Minimum and maximum numbers • Interquartile range Quartiles divide data into quarters • Standard deviation A statistic that tells us how far away from the mean the data is spread (95% of the data lies between 2 SD) √ (xi - x) 2 /(n-1) Estimation – Estimation: determine value of a variable and its likely range (ie. 95% confidence intervals) • Statistical inference is a process of generalising results calculated from a sample to a population • We are interested in some numerical characteristic of a population (called a parameter). e.g. the mean height or the proportion of pregnant women with hypertension • We take a sample from the population and calculate an estimate of this parameter Estimation – a simple example • We want to estimate the mean height of 10 years old boys • Take a random sample of 100 ten years old boys and calculate the sample mean • The mean height of my random sample is 141cm • Based on our random sample, we estimate the mean height of 10 years old boys is 141cm Distribution of Data • It is essential to know the distribution of your data so you can choose the appropriate statistical method to analyse the data • Data can be distributed (spread out) in different ways • Continuous data: There are many cases when the data tends to be around a central value with no bias to the left or right – normal distribution Distribution of data – Normal distribution • Many parametric methods assumes data is normally distributed • • • • • Bell curve Peak at a central value Symmetric about the centre Mean=median=mode The distribution can be described by two parameters – mean and standard deviation Standard deviation • Standard deviation – shows how much variation or ‘dispersion’ exists in the data. • 95% of the data are contained within 2 standard deviations A simulated example – Birth weight Histogram of birth weight Mean=3250g SD=550g Some other common distributions • Some common distributions – Binomial distribution – gestational diabetes (Yes/No) – Uniform distribution - throwing a die, equal (uniform) probability for each of the six sides – And many many more… Sampling variability • Because of random sampling, the estimated value will be just an estimate – not exactly the same as the true value • If repeated samples are taken from a population then each sample and hence sample mean and standard deviation is different. This is known as Sampling Variability Sampling variability • In practice we do not repeat the sampling to measure sampling variability we endeavour to obtain a random sample and use statistical theory to quantify the error • Fundamental principle to justify our estimate is reasonable: If it were possible to repeat a study over and over again, in the long run the estimates of each study would be distributed around the true value • If we have a random sample then the sampling variability depends on the size of the sample and the underlying variability of the variable being measured