&Ouml;nder Erg&ouml;n&uuml;l, MD, MPH
Ko&ccedil; University, School of Medicine
Summer Course on Research Methodology in Health Sciences
June 24-28, 2013, Istanbul
Objectives of the session
• Data collection
• Variables
• Types of data
• Central tendency measures
• Central dispersion measures
• Distribution of data
What is variable?
• Dependent variable: outcome
• Independent variables: parameter, factor, predictor
– Gender, height, weight, blood pressure, drug A, severity
of disease, infection, etc.
• Variables
: the columns
• Data
: the rows
Presentation of Findings
There are 3 main groups of tables in the
results section of a scientific manuscript:
1. Demographic Characteristics of the Subjects
2. Univariate Analysis
3. Multivariate Analysis
Types of Data
Data
continuous
categorical
Nominal
dicotomous
Ordinal
Types of Numerical Data
• Continuous: measurable quantities
– Age
– Cholesterol level
Types of Numerical Data
• Categorical data
– Nominal
• Dichotomous or binary: male or female
• Blood groups
Types of Numerical Data
• Categorical data
– Ordinal
• The level of severity
A31
Variables
String
Nurse
Nurse
Physician
Nurse
Phycisian
Lab tech
Nurse
Physician
Lab tech
numeric
1
1
2
1
2
3
1
2
3
Variables
Dichotomous
0
1
0
1
1
0
0
1
0
Ordinal/No
minal
1
2
3
5
2
4
3
3
1
Continous
11
24
224
45
56
57
866
34
23
Measures of Central Tendency
• mean
• median
• Mod
• Geometric mean
Measures of Central Tendency
• Arithmetic mean
– The arithmetic mean is the most frequently used central tendency
measure. It is used for normally distributed data.
– The mean is calculated by summing all the observations in a set of
data and dividing by the total number of measurements.
• Median
– The median is defined as the 50th percentile of a set of
measurements. The median can be used as a a summary measure for
ordinal observations as well as for continous data. It is used to
represent the average when the data are not symmetrical, but
skewed. If a list of observations is ranked from smallest to largest, it is
the point which has half the values above and half below.
Measures of Central Tendency
2 different sets of data
1, 2, 3, 4, 1000
1, 2, 3, 4, 5
Measures of Central Tendency
Mod
– The most frequent value
– In a group
•
•
•
•
Hb 14.5 g/dL in 12 of the subjects
Hb 14.0 g/dL in 10
Hb 13.5 g/dL in 5
Hb 15.5 g/dL in 3
– Rarely used in medicine
Measures of Central Tendency
Geometric mean
– If there is logarithmic distribution
– If there is a large diversity among the values
log(GM) 

logX
n
Measures of Central Tendency
18
Non-normal (Asymmetrical)
Distribution of Continous Variables
19
Measures of Dispersion
Range
Standard deviation
Variance = SD2
Percentile
Interquartile range
Dispersion Measures
Sedimentation values
• Grup 1: 11, 11, 12, 12, 12, 12, 13
• Grup 2: 4, 5, 6, 8, 19, 20, 21
sedim |
Obs
Mean Std. Dev.
Min
Max
-------------+---------------------------------------------------------------grup 1 |
7 11.85714 .6900656
11
13
sedim |
Obs
Mean Std. Dev.
Min
Max
-------------+---------------------------------------------------------------grup 2 |
7 11.85714 7.733662
4
21
Standart Deviation
(X  X)

SD 
n1
2
X=mean
n=number of subjects
Which one has bigger SD?
Tests for Normal Distribution
• Visual methods
– Histogram
– Box plot
• Statistical tests
– Kolmogorov-Smirnov
– Lilliefors
– Shapiro wilk
• Variation coefficient (SD/mean)
– If SD/mean ≤%30, distribution ≈ normal
0
.05
Density
.1
Histogram
0
5
10
15
age
25
20
30
40
1.soru
50
60
70
Box plot
Standard Deviation
vs
Standard Error of the Mean (SOM)
• The “standard error of the sample mean”
depends on both the standard deviation and the
sample size:
SE = SD/ √(sample size)
• The standard error decreases as the sample size
increases, as the extent of chance variation is
reduced.
• By contrast the standard deviation will not tend
to change as we increase the size of our sample.
Altman DG. Standard deviations and standard errros. BMJ 2005; 331:29903
Frequency Distribution and
Cumulative Frequency
Age
n=
%
Cumulative
frequency
5-14
15
17.6
17.6
15-24
19
22.3
39.9
25-34
21
24.8
64.7
35-44
30
35.3
100
5000
4344
3434
4000
3000
2000
1000
0
erkek
Zamana gore GTD
100%
cip
tec
80%
GTD
van
60%
lvx
40%
tzp
fep
20%
caz
0%
imp
1
2
3
4
5
6 aylik donemler
6
7
8
mem
cro
DM
HT
KBY
KKC
N&ouml;rolojik
Data Collection
• Hard copy
– To make things sure
• Excell
• Stata
• SPSS
• Access
– Electonic forms, easy to use
• Web based systems
Data Collection
• Mutually exclusive
• Each column represents a variable either
dependent or independent
• All data should be collected in one data sheet
The Study Unit
• Person time
• Patient day
• Drug day
What is one row?
