Normality or not? Different distributions and their

Normality or not? Different distributions and their importance Stats Club 3 Marnie Brennan References • Petrie and Sabin - Medical Statistics at a Glance: Chapter 7, 8, 9, 35 Good • Petrie and Watson - Statistics for Veterinary and Animal Science: Chapter 3 Good • Thrusfield – Veterinary Epidemiology: Chapter 12 • Kirkwood and Sterne – Essential Medical Statistics What is a distribution? • Empirical frequency distribution versus theoretical distribution • Very easy! – Empirical frequency distribution is something that you actually measure and calculate • E.g. Coat colour in cats – Tabby, Ginger, Tortoiseshell, Seal-point • In a population, each one of these has a frequency e.g. 5 x Tabby, 9 x Ginger, 15 x Tortoiseshell, 8 x Seal-point Theoretical distributions • Theoretical distribution – is just that – theoretical! • It is something we measure our data (empirical frequency distribution) against to see which distribution describes it the best – This helps to signpost us to what statistical analyses we do next, according to the distribution it ‘approximates’ Theoretical distributions and types of data • Back to our flow charts in the back of Petrie and Sabin, and Petrie and Watson • Relates to what type of variable you have – Continuous? E.g. Heights of Japanese men – Categorical or discrete? E.g. Coat colour in cats Continuous distributions • Normal distribution – The grandaddy of them all! – Also known as the Gaussian distribution (after Gauss, German mathematician) Our focus today – e.g. heights of adult men in the UK • T-distribution – Similar shape to Normal, but is more spread out with longer tails – Useful for calculating confidence intervals • Chi-squared distribution – Right-skewed distribution – Useful for analysing categorical data • F-distribution – Skewed to the right – Useful for comparing variances and more than 2 means (i.e. > 2 groups) • DO NOT BE SCARED – THIS IS ANOTHER EXERCISE IN TERMINOLOGY!! Discrete distributions • Binomial distribution – Could be skewed to the right or left (!) – Good for analysing proportion data – i.e. it is either one thing, or another, such as an animal either has a disease or does not have a disease • Poisson distribution – Right skewed – Good for analysing count data – i.e. the number of hospital admissions per day, the number of parasitic eggs per gram of faecal sample • Many of these distributions approximate normal when your sample size increases • A lot of this goes on behind your computer when doing statistics; it is here to help explain some of the terminology and basic ideas only (don’t worry too much about it!) The useful bit....... • You have collected continuous data from your research e.g. length in millimetres of the diameter of rabbit skulls • You would like to find out if this is normally distributed or not (as you know that this will affect what statistical tests you do) • How do you measure whether this variable is normal or not? 4 steps to Normality! • Plot your data – Create a histogram with frequencies and determine by eye • Does it look bell-shaped and symmetrical? • Does it look unimodal i.e. does it only have one peak? – Subjective measurement, but you should be doing this anyway! 4 steps (continued) • How different are the mean and median? – Mean = Total of your data added up/total no. of measurements – Median = The midpoint of your values i.e. what is the ‘halfway’ value in your data? • If they are very different, the data is probably not normally distributed • If they are very similar, your data could be normally distributed – Another rule of thumb, so not always correct 4 steps (continued) • Skewness and kurtosis – Skewness (how symmetrical the data is) • Normal – this value is 0 • Right-skewed distribution – positive value • Left-skewed distribution – negative value – Kurtosis (the ‘peakedness’ of the data) – does your data have a pointy bit, or is it flat? • Normal – this value is 0 • Sharply peaked data – positive value • Flat peaked data – negative value – Can measure these in Minitab or SPSS 4 steps (continued) • Bespoke tests for normality – Shapiro-Wilk test (Ryan-Joiner test) – Kolmogorov-Smirnov test – Anderson-Darling test • Watch interpretation of p-values – if it is <0.05, it is not normal (reject null hypothesis of normality) • The good news! – Computers do this for us so we don’t have to! Next month • Spread of your data – how do we measure this? – mean, standard deviation, variance – median, interquartile range – mode

Normality or not? Different distributions and their

Related documents

Products

Support

Normality or not? Different distributions and their

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib