Descriptive Epidemiology Önder Ergönül, MD, MPH Koç University, School of Medicine Summer Course on Research Methodology in Health Sciences June 24-28, 2013, Istanbul Objectives of the session • Data collection • Variables • Types of data • Central tendency measures • Central dispersion measures • Distribution of data What is variable? • Dependent variable: outcome • Independent variables: parameter, factor, predictor – Gender, height, weight, blood pressure, drug A, severity of disease, infection, etc. • Variables : the columns • Data : the rows Presentation of Findings There are 3 main groups of tables in the results section of a scientific manuscript: 1. Demographic Characteristics of the Subjects 2. Univariate Analysis 3. Multivariate Analysis 4 Types of Data Data continuous categorical Nominal dicotomous Ordinal Types of Numerical Data • Continuous: measurable quantities – Age – Cholesterol level 6 Types of Numerical Data • Categorical data – Nominal • Dichotomous or binary: male or female • Blood groups 7 Types of Numerical Data • Categorical data – Ordinal • The level of severity • Grading among cancer patients A31 8 Variables String Nurse Nurse Physician Nurse Phycisian Lab tech Nurse Physician Lab tech numeric 1 1 2 1 2 3 1 2 3 Variables Dichotomous 0 1 0 1 1 0 0 1 0 Ordinal/No minal 1 2 3 5 2 4 3 3 1 Continous 11 24 224 45 56 57 866 34 23 Measures of Central Tendency • mean • median • Mod • Geometric mean Measures of Central Tendency • Arithmetic mean – The arithmetic mean is the most frequently used central tendency measure. It is used for normally distributed data. – The mean is calculated by summing all the observations in a set of data and dividing by the total number of measurements. • Median – The median is defined as the 50th percentile of a set of measurements. The median can be used as a a summary measure for ordinal observations as well as for continous data. It is used to represent the average when the data are not symmetrical, but skewed. If a list of observations is ranked from smallest to largest, it is the point which has half the values above and half below. 12 Measures of Central Tendency 2 different sets of data 1, 2, 3, 4, 1000 1, 2, 3, 4, 5 Measures of Central Tendency Mod – The most frequent value – In a group • • • • Hb 14.5 g/dL in 12 of the subjects Hb 14.0 g/dL in 10 Hb 13.5 g/dL in 5 Hb 15.5 g/dL in 3 – Rarely used in medicine Measures of Central Tendency Geometric mean – If there is logarithmic distribution – If there is a large diversity among the values log(GM) logX n Measures of Central Tendency 18 Non-normal (Asymmetrical) Distribution of Continous Variables 19 Measures of Dispersion Range Standard deviation Variance = SD2 Percentile Interquartile range 20 Dispersion Measures Sedimentation values • Grup 1: 11, 11, 12, 12, 12, 12, 13 • Grup 2: 4, 5, 6, 8, 19, 20, 21 sedim | Obs Mean Std. Dev. Min Max -------------+---------------------------------------------------------------grup 1 | 7 11.85714 .6900656 11 13 sedim | Obs Mean Std. Dev. Min Max -------------+---------------------------------------------------------------grup 2 | 7 11.85714 7.733662 4 21 Standart Deviation (X X) SD n1 2 X=mean n=number of subjects Which one has bigger SD? Tests for Normal Distribution • Visual methods – Histogram – Box plot • Statistical tests – Kolmogorov-Smirnov – Lilliefors – Shapiro wilk • Variation coefficient (SD/mean) – If SD/mean ≤%30, distribution ≈ normal 0 .05 Density .1 Histogram 0 5 10 15 age 25 20 30 40 1.soru 50 60 70 Box plot Standard Deviation vs Standard Error of the Mean (SOM) • The “standard error of the sample mean” depends on both the standard deviation and the sample size: SE = SD/ √(sample size) • The standard error decreases as the sample size increases, as the extent of chance variation is reduced. • By contrast the standard deviation will not tend to change as we increase the size of our sample. Altman DG. Standard deviations and standard errros. BMJ 2005; 331:29903 Frequency Distribution and Cumulative Frequency Age n= % Cumulative frequency 5-14 15 17.6 17.6 15-24 19 22.3 39.9 25-34 21 24.8 64.7 35-44 30 35.3 100 5000 4344 3434 4000 3000 2000 1000 0 erkek kadın Zamana gore GTD 100% cip tec 80% GTD van 60% lvx 40% tzp fep 20% caz 0% imp 1 2 3 4 5 6 aylik donemler 6 7 8 mem cro DM HT KBY KKC Nörolojik Data Collection • Hard copy – To make things sure • Excell • Stata • SPSS • Access – Electonic forms, easy to use • Web based systems Data Collection • Mutually exclusive • Each column represents a variable either dependent or independent • All data should be collected in one data sheet The Study Unit • Person time • Patient day • Drug day What is one row?