Uses of Biostatistics (1)

advertisement
Uses of Biostatistics
in Epidemiology (1)
Amornrath Podhipak, Ph.D.
Department of Epidemiology
Faculty of Public Health
Mahidol University
2006
Medical doctors and public health personnel
Why Statistics ??
Why Computers ??
Why Software ??
A tools for calculation
Why do we need “statistics” in medicine and public health?
(particularly, epidemiology??)
*Medicine is becoming increasingly quantitative in describing a condition.
Most of malaria patients are infected with P.falciparum.
82.5% got P.falciparum.
Those patients looks pale.
Haemoglobin level was 9.89 mg%, on
average.
Epidemiology concerns with describing disease pattern in a
group of people. Descriptive statistics give a clearer picture
of what we want to describe.
* The answer to a research question need to be more definite.
Is the new treatment better: how much better?, in what aspect?,
any evidence? could it be a real difference?
Inferential statistics give an answer in the world of
uncertainty.
Before using statistics, we need some kinds of measurements, in order
to get more detailed information.
Measurement of characteristics (Variables vs Constant)
4 scales of measurement
Qualitative variables
- Nominal scale (group classification only)
- Ordinal scale (classification with ordering / ranking)
Quantitative variables
- Interval (magnitude + constant distance between points)
- Ratio (magnitude + constant distance between points + true
zero)
Intelligent?
BP?
140/90
Handsom
e?
Income?
100,000
Weght?
80 kg
Married?
Height?
160 cm
HIV?
Female
Nominal scale
1
Male
Values have no meaning.
Ordinal scale
Equal distance between
points does not reflect equal
interval value.
2
2
1
3
Interval scale i.e. degree celcius
0
10
20
30
Freezing point was supposed to be zero degree celcius
Not the true ZERO temperature (no heat )
Equal distance between points means equal interval value.
Ratio scale i.e. weight
0
10
20
30
True ZERO (nothing here)
Equal distance between points means equal interval value.
Questionnaire
(TB and Passive smoking)
Sex
[ ] Male
[ ] Female
Education [ ] 1-6 yr [ ] 7-9 yr [ ] 9+ yr
Family income ……………………. Baht/m
Passive Smoking ……...
Record form
Result from tuberculin test ……………………. mm
X-ray [ ] +ve
Weight …………. kg,
[ ] -ve
Height ………………….. cm
Variable (characteristic
being measured)
Result of measurement
Marital status
single/married/divorced
Type
nominal
gender
male/female
nominal
smoking
yes/no
nominal
smoking
nonsmoker/ light smoker/
ordinal
moderate smoker/ heavy smoker
smoking
number of cig/day
ratio
feeling of pain
yes/no
nominal
feeling of pain
none/light/moderate/high
ordinal
feeling of pain
0 ---------> 10
ordinal
attitude toward
strongly agree/ agree/
ordinal
selective abortion
not sure/ disagree/ strongly disagree
blood pressure
mmHg
ratio
temperature
degree celcius
interval
weight
gram
ratio
tumor stage
I, II, III, IV
ordinal
Quantitative (numeric, metric) variables are classified as
continuous
It can take all values in an interval
e.g. weight, temperature, etc.
discrete
It can take only certain values (often integer value)
e.g. parity, number of sex partners, etc.
Continuous data can be categorised into groups, which one needs to define
“upper boundary” and “lower boundary” of a value (or a class)
120
121
122
123
124
125
boundaries:
120.5,
121.5, 122.5,
123.5, 124.5 …
126
127
120.1
120.2
120.3
120.4
120.5
120.6
boundaries:
120.15, 120.25, 120.35, 120.45, 120.55 …
120.7
120.8
120.11 120.12 120.13 120.14 120.15 120.16 120.17 120.18
boundaries:
120.115, 120.125, 120.135, 120.145, 120.155 …
Descriptive statistics - a way to summarize a dataset (a group of measurement)
Example:
Height of 100 children, 10-12 years of age.
140
123
140
151
155
147
134
151
132
134
140
142
138
138
134
158
141
151
141
138
140
161
132
141
130
155
140
140
140
141
136
155
142
125
146
135
153
140
142
130
141
129
155
123
135
141
142
141
165
141
123
130
125
134
139
136
127
147
153
132
125
139
136
135
134
136
147
139
146
140
134
129
129
135
142
147
142
134
134
138
125
134
136
135
139
139
146
140
151
127
What are values that best describe the height of these 100 persons?
129
130
153
130
149
132
127
149
151
129
1)
Rearrange the data:
123
129
132
134
136
139
140
142
147
153
123
129
132
134
136
139
140
142
147
153
124
129
132
134
136
140
141
142
147
153
125
129
132
135
138
140
141
142
149
155
125
129
134
135
138
140
141
142
149
155
125
130
134
135
138
140
141
142
151
155
125
130
134
135
138
140
141
146
151
155
127
130
134
135
139
140
141
146
151
158
127
130
134
136
139
140
141
146
151
161
Minimum, Maximum, Range, Median, Mode
123 , 165 ,
42 , 139,
140
Max-Min , Value in the middle, Most repeated value
127
130
134
136
139
140
141
147
151
165
2) Present in a table (Frequency distribution)
Class Boundaries:
(depends on the
boundaries of these
values)
119.5-124.5
124.5-129.5
129.5-134.5
134.5-139.5
139.5-144.5
144.5-149.5
149.5-154.5
154.5-159.5
159.5-164.5
164.5-169.5
Height (cm)
Mid point (X)
120-124
125-129
130-134
135-139
140-144
145-149
150-154
155-159
160-164
165-169
122
127
132
137
142
147
152
157
162
167
f = frequency
3
12
18
24
19
9
8
5
1
1
25
3) Present in a graph (Histogram)
Frequency
20
15
10
5
0
120
125
130
135
140
145
150 155
160
165
170
Height (cm)
Methods of data presentation
1. Table
2. Graph
- line graph
- bar chart
- pie chart
- scatter plot
- area graph
- error bar
- histogram
Another set of value for describing a dataset is the
MEAN and STANDARD DEVIATION.
Mean indicates the location.
Standard deviation indicates the scatterness of data (roughly).
Example: Dataset 1: Age of 6 children
4
4
4
4
4
4 Mean = 4.0 years
sd = 0 y (no variation)
Example: Dataset 2: Age of 6 children
2
2
4
4
6
6 Mean = 4.0 years
sd = 1.79 y(with variation)
or, another example:
The average body height of these children was 138.9 cm. with standard
deviation of 8.9 cm.
The average body height of these children was 138.9 cm. with standard
deviation of 0.2 cm.
If we categorize the data into qualitative (tall/short) the
proportion would then be calculated.
Descriptive statistics (proportion and/or percentage)
Most of the children were less than 150 cm. tall.
85% of them had height less than 152 cm.
A final note on defining a variable and a measurement:
Important things to consider before making any measurement:
1. Do we measure the right thing?
Fatty food and CVD
2. What is the tool that can actually measure what we want to measure?
Morphology (measure)
indicators
% standard weight
body mass index (wt/ht2)
tricep skinfold thickness
Wt for age
Wt for height
etc.
Food intake (ask)
Protein calorie intake (ask & calculate)
3. How valid the instrument?
Does the questionnaire actually get the fatty food intake information?
(scope of questions, recall of subjects, certainty of reported amount of food,
variability of ingredients, etc.) Does the information obtained actually reflect fatty
food intake?
4. How precise the instrument?
Does the information precisely estimate the amount of fatty food intake
for each individual?
In summary:
Statistics (and epidemiology) deals with a group (the bigger the
group, the better the result) of persons (not one individual patient).
We look for the characteristics which are most common in the group.
Descriptive statistics is used for explaining our sample (or findings)
i.e.
Most of the patients were anemic.
 80% of them had haemoglobin level less than 10 mg%.
The average haemoglobin level was 9.5 mg% with standard deviation
of 1.5 mg%.
Inferential statistics

(Infer to general population of interest)

Download