EPI-546 Block I Lecture 2 – Descriptive Statistics Michael Brown MD, MSc Professor Epidemiology and Emergency Medicine Credit to Michael P. Collins, MD, MS Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 1 Objectives - Concepts Classification of data Distributions of variables Measures of central tendency and dispersion Criteria for abnormality Sampling Regression to the mean Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 2 Objectives - Skills Distinguish and apply the forms of data types. Define mean, median, and mode and locate on a skewed distribution chart. Apply the concept of the standard deviation to specific circumstances. Explain why a strategy for sampling is needed. Recognize the phenomenon of regression to the mean when it occurs or is described. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 3 Clinical Measurement – 2 kinds of data Categorical Interval Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 4 Distinction Interval = “the interval between successive values is equal, throughout the scale” Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 5 Clinical Measurement – subtypes of data Categorical Nominal Ordinal Interval Discrete Continuous Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 6 Nominal data: no order Alive vs. dead Male vs. female Rabies vs. no rabies Blood group O, A, B, AB Resident of Michigan, Ohio, Indiana… Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 7 Ordinal scale: natural order, but not interval 1st vs. 2nd vs. 3rd degree burns Pain scale for migraine headache: None, mild, moderate, severe Glasgow Coma Score (3-15) Stage of cancer spread – 0 through 4 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 8 Clinical Measurement – 2 kinds of data Categorical Nominal Ordinal Interval Discrete Continuous Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 9 Discrete Interval variables: on a “number line” Number of live births Number of sexual partners Diarrheal stools per day Vision – 20/? 1 2 3 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 10 Continuous variables: Blood pressure Weight, or Body Mass Index Random blood sugar IQ Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 11 Interval: Continuous vs. Discrete No variable is perfectly continuous – e.g. you never see a BP of 152.47 mmHg It’s a matter of degree – lots of possible values within the range clinically possible = continuous Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 12 Recording data Sometimes the variable is intrinsically one type or another – but, frequently it is the observer who decides how a variable will be measured and reported Consider cigarette smoking: Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 13 Continuous variable Underlying (nearly) continuous variable – cigarettes/day 32, 63, 2,… However, this level of detail may not be necessary or desirable. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 14 Discrete interval variable Packs per day (probably rounded off to the nearest whole number) 2, 1, 0 Cruder - but maybe good enough and more reliably reported Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 15 Ordinal categorical variable Non-smoker vs. light smoker vs. heavy smoker. May further collapse the pack/day variable. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 16 Nominal categorical variable Non-smoker vs. former smoker vs. current smoker. No obvious order here, just named categories Ever-smoker vs. never-smoker. Dichotomous outcome Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 17 So, the form of the variable is often decided by the investigator, not by nature In fact, the normal vs. abnormal distinction is generally a matter of taking a much richer measure and making it dichotomous. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 18 Quick Quiz Slide What kind of a variable is religion? – Protestant, Catholic, Islamic, Judaism. . . What kind is Body Mass Index (weight divided by height2)? What is alcohol intake if classed as none, < 2 drinks/day, and > 2 drinks/day? Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 19 First question when meeting with statistician: 1. Define the type of data (continuous, ordinal, categorical, etc.) Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 20 A Few Examples of Statistical Tests Test Comparison Principal Assumptions Student's t test Means of two groups Continuous variable, normally distributed, equal variance Wilcoxon rank sum Medians of two groups Continuous variable Chi-square Proportions Categorical variable, more than 5 patients in any particular "cell" Fisher's exact Proportions Categorical variable Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 21 Objectives - Concepts Classification of data Distributions of variables Measures of central tendency and dispersion Criteria for abnormality Sampling Regression to the mean Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 22 Distributions of continuous variables A way to display the individual – to – individual variation in some clinical measure. Consider the example in Fletcher using PSA levels: Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 23 Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 24 F r e q u e n c y x Variable www.msu.edu/user/sw/statrev/images/normal01.gif Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 25 Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 26 The “nicest” distribution Is the normal, or Gaussian, distribution – the “bell-shaped curve”. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 27 If we want to summarize a frequency distribution, there are two major aspects to include: Central tendency Dispersion Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 28 Principles of Epidemiology, 2nd edition. CDC. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 29 Principles of Epidemiology, 2nd edition. CDC. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 30 Measures of Central Tendency: Mean Median Mode Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 31 Consider this data: Parity (how many babies have you had?) among 19 women: 0,2,0,0,1,3,1,4,1,8,2,2,0,1,3,5,1,7,2 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 32 Mean (Arithmetic) Add up all the values and divide by N 43 / 19 = 2.26 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 33 Median The middle value Must first sort the data and put in order: 0,0,0,0,1,1,1,1,1,2,2,2,2,3,3,4,5,7,8 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 34 Mode The most common value 0,0,0,0,1,1,1,1,1,2,2,2,2,3,3,4,5,7,8 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 35 In a normal distribution, all three are equal Parametric statistical methods assume a distribution with known shape (i.e. normal or Gaussian distribution) Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 36 F r e q u e n c y x Variable Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 37 Quick Quiz Slide If the mode is “100” and the mean is “80” – what can you tell me about the median? Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 38 mean mode F r e q u e n c y x Variable 80 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 100 39 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 40 Dispersion Standard Deviation - most common measure used for normal or near normal distributions. Defined by a statistical formula, but remember that: The mean +/- one SD contains about 2/3 of the observations. the mean +/- 2 SD’s includes about 95% of the observations. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 41 Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 42 M J Campbell, Statistics at Square One, 9th Ed, 1997. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 43 So, how about this definition of “abnormal” for total serum cholesterol: A value higher than the mean + 1 S.D.? How many people would fall beyond that cutoff? Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 44 Rose, G: The Strategy of Preventive Medicine; Oxford Press, 1998. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 45 So what’s the “best” definition of abnormality? Fletcher lists three: Being unusual Sick Greater than 2 SD from mean Observation regularly associated with disease Treatable Consider abnormal only if treatment of the condition represented by the measurement leads to improved outcome Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 46 Miura et al, Archives Int Med 2001; 161:1504. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 47 If you were to design a study to define an abnormal DBP for adult females in the US, how would you do it? Measure DBP in every adult female in the US? Then define abnormal as above 2 SD from mean? Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 48 Sampling Impossible to measure the BP of everyone, so must take measurements of a representative sample of subjects. Random sample May miss important subgroup (ethnicity for example) May need to obtain a larger sample from these important subgroups and select subjects at random within subgroup Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 49 Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 50 Hanna C, Greenes D. How Much Tachycardia in Infants Can Be Attributed to Fever? Ann Emerg Med June 2004 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. 51