EPI-546 Block I
Lecture 2 – Descriptive Statistics
Michael Brown MD, MSc
Professor Epidemiology and Emergency
Medicine
Credit to Michael P. Collins, MD, MS
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
1
Objectives - Concepts
Classification of data
Distributions of variables
Measures of central tendency and dispersion
Criteria for abnormality
Sampling
Regression to the mean
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
2
Objectives - Skills
Distinguish and apply the forms of data
types.
Define mean, median, and mode and locate
on a skewed distribution chart.
Apply the concept of the standard deviation
to specific circumstances.
Explain why a strategy for sampling is
needed.
Recognize the phenomenon of regression to
the mean when it occurs or is described.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
3
Clinical Measurement –
2 kinds of data
Categorical
Interval
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
4
Distinction Interval = “the interval between
successive values is equal, throughout
the scale”
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
5
Clinical Measurement –
subtypes of data
Categorical
Nominal
Ordinal
Interval
Discrete
Continuous
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
6
Nominal data: no order
Alive vs. dead
Male vs. female
Rabies vs. no rabies
Blood group O, A, B, AB
Resident of Michigan, Ohio, Indiana…
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
7
Ordinal scale: natural order,
but not interval
1st vs. 2nd vs. 3rd degree burns
Pain scale for migraine headache:
None, mild, moderate, severe
Glasgow Coma Score (3-15)
Stage of cancer spread – 0 through 4
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
8
Clinical Measurement –
2 kinds of data
Categorical
Nominal
Ordinal
Interval
Discrete
Continuous
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
9
Discrete Interval variables:
on a “number line”
Number of live births
Number of sexual partners
Diarrheal stools per day
Vision – 20/?
1 2 3
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
10
Continuous variables:
Blood pressure
Weight, or Body Mass Index
Random blood sugar
IQ
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
11
Interval: Continuous vs. Discrete
No variable is perfectly continuous – e.g. you
never see a BP of 152.47 mmHg
It’s a matter of degree – lots of possible values
within the range clinically possible = continuous
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
12
Recording data
Sometimes the variable is intrinsically one type
or another – but, frequently it is the observer
who decides how a variable will be measured
and reported
Consider cigarette smoking:
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
13
Continuous variable
Underlying (nearly) continuous variable –
cigarettes/day
32, 63, 2,…
However, this level of detail may not be
necessary or desirable.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
14
Discrete interval variable
Packs per day (probably rounded off to the
nearest whole number)
2, 1, 0
Cruder - but maybe good enough and more
reliably reported
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
15
Ordinal categorical variable
Non-smoker vs. light smoker vs. heavy smoker.
May further collapse the pack/day variable.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
16
Nominal categorical variable
Non-smoker vs. former smoker vs. current
smoker.
No obvious order here, just named categories
Ever-smoker vs. never-smoker.
Dichotomous outcome
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
17
So, the form of the variable is often decided by
the investigator, not by nature
In fact, the normal vs. abnormal
distinction is generally a matter of
taking a much richer measure and
making it dichotomous.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
18
Quick Quiz Slide
What kind of a variable is religion? – Protestant,
Catholic, Islamic, Judaism. . .
What kind is Body Mass Index (weight divided
by height2)?
What is alcohol intake if classed as none,
< 2 drinks/day, and > 2 drinks/day?
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
19
First question when meeting with statistician:
1.
Define the type of data (continuous, ordinal,
categorical, etc.)
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
20
A Few Examples of Statistical Tests
Test
Comparison
Principal Assumptions
Student's
t test
Means of
two groups
Continuous variable,
normally distributed,
equal variance
Wilcoxon
rank sum
Medians of
two groups
Continuous variable
Chi-square
Proportions
Categorical variable,
more than 5 patients in
any particular "cell"
Fisher's
exact
Proportions
Categorical variable
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
21
Objectives - Concepts
Classification of data
Distributions of variables
Measures of central tendency and dispersion
Criteria for abnormality
Sampling
Regression to the mean
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
22
Distributions of continuous variables
A way to display the individual – to – individual
variation in some clinical measure.
Consider the example in Fletcher using PSA
levels:
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
23
Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
24
F
r
e
q
u
e
n
c
y
x Variable
www.msu.edu/user/sw/statrev/images/normal01.gif
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
25
Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
26
The “nicest” distribution
Is the normal, or Gaussian, distribution
– the “bell-shaped curve”.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
27
If we want to summarize a frequency
distribution, there are two major aspects to
include:
Central tendency
Dispersion
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
28
Principles of Epidemiology, 2nd edition. CDC.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
29
Principles of Epidemiology, 2nd edition. CDC.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
30
Measures of Central Tendency:
Mean
Median
Mode
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
31
Consider this data: Parity (how many
babies have you had?) among 19
women:
0,2,0,0,1,3,1,4,1,8,2,2,0,1,3,5,1,7,2
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
32
Mean (Arithmetic)
Add up all the values and divide by N
43 / 19 = 2.26
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
33
Median
The middle value
Must first sort the data and put in order:
0,0,0,0,1,1,1,1,1,2,2,2,2,3,3,4,5,7,8
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
34
Mode
The most common value
0,0,0,0,1,1,1,1,1,2,2,2,2,3,3,4,5,7,8
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
35
In a normal distribution, all three are
equal
Parametric statistical methods assume
a distribution with known shape
(i.e. normal or Gaussian distribution)
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
36
F
r
e
q
u
e
n
c
y
x Variable
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
37
Quick Quiz Slide
If the mode is “100” and the mean is “80” –
what can you tell me about the median?
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
38
mean
mode
F
r
e
q
u
e
n
c
y
x Variable
80
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
100
39
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
40
Dispersion
Standard Deviation - most common measure
used for normal or near normal distributions.
Defined by a statistical formula, but remember
that:
The mean +/- one SD contains about 2/3 of the
observations.
the mean +/- 2 SD’s includes about 95% of the
observations.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
41
Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
42
M J Campbell, Statistics at Square One, 9th Ed, 1997.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
43
So, how about this definition of “abnormal” for total
serum cholesterol: A value higher than the mean + 1
S.D.?
How many people would fall beyond that cutoff?
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
44
Rose, G: The Strategy of Preventive Medicine; Oxford Press, 1998.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
45
So what’s the “best” definition of
abnormality?
Fletcher lists three:
Being unusual
Sick
Greater than 2 SD from mean
Observation regularly associated with disease
Treatable
Consider abnormal only if treatment of the condition
represented by the measurement leads to improved
outcome
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
46
Miura et al, Archives Int Med 2001; 161:1504.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
47
If you were to design a study to define an
abnormal DBP for adult females in the US,
how would you do it?
Measure DBP in every adult female in the US?
Then define abnormal as above 2 SD from mean?
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
48
Sampling
Impossible to measure the BP of everyone, so
must take measurements of a representative
sample of subjects.
Random sample
May miss important subgroup (ethnicity for example)
May need to obtain a larger sample from these
important subgroups and select subjects at random
within subgroup
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
49
Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
50
Hanna C, Greenes D. How Much Tachycardia in Infants
Can Be Attributed to Fever? Ann Emerg Med June 2004
Dr. Michael Brown
© Epidemiology Dept., Michigan State Univ.
51