STATISTICS INTRODUCTION AND DEFINITIONS Dr Faris Al Lami

advertisement
STATISTICS
INTRODUCTION AND
DEFINITIONS
Dr Faris Al Lami
MB,ChB PhD
Learning Objectives
By the end of this lecture the student
should be able to:
1. Define Biostatistics
2. Identify advantages, applications and
purposes of Biostatistics
3. Define types and scales of variables
4. Make grouped data
STATISTICS
A field of study concerned with methods
and procedures of:
- Collection, Organization, Classification &
Summarization of data.
(Descriptive Statistics)
- Analysis, and Drawing of inferences
about a body of data when only a part of
data are observed. ( Analytic Statistics)
BIOSTATISTICS
When the data being analyzed are
derived from biological and medical
sciences, the term “ Biostatistics” is
used.
ADVANTAGES
1. Carrying out a research
Statistical analysis should be considered
in the planning phase of the study
2. Evaluating published articles
Statistical errors are common in clinical
researches that may invalidate the
conclusion.
ADVANTAGES
3. Ethical consideration
It is unethical to use erroneous statistics
especially in scientific publications. Using
harmful or ineffective treatment or
avoidance of useful treatment can occur
if the statistics is wrong.
4. Professional and personal satisfaction
PURPOSES
1. Data reduction
By condensing data to manageable
proportions thus facilitating interpretation
2. Evaluate role of chance
To see if the effect of a certain event is a real
one or arise from chance fluctuation because of
the sample of subjects
3. Sampling and generalization
What proportion of discharged patients required
readmission? What are their characteristics? The
answer required generalization of the sample's
result.
APPLICATIONS
Ø Are the differences between groups
significant?
Ø Are these two measures related or
associated?
Ø Can one predict the value of one variable
(outcome) from knowledge of the values
of other variables?
VARIABLE
A characteristic that takes on different
values in different persons, places, or
things.
eg.
- Heights of adult males.
- Weights of preschool children.
- Ages of patients seen in a dental clinic.
Variables
Quantitative
Variables
The variable
that can be measured
in the usual sense
of measurement
as age , weight, height,…
Qualitative
Variables
It is the variable that
can not be measured
in the usual sense
but can be described or
categorized ..Socio-economic
QUALITATIVE VARIABLE
v. eg.;
- socio-economic groups.
- ill person with medical diagnosis.
- object is said to possess or not possess
some characteristic of interest.
v In this case we count the number of
individuals falling into each category as
the socioeconomic status, diagnostic
category,…
Quantitative
Variables
DISCRETE VARIABLE
It is characterized by
gaps or interruptions
in the values
that it can assume.
CONTINOUS VARIABLE
It does not posses the gaps
or interruption, It can assume
any value within a
specified interval of values
assumed by any variable
- The number of daily
admissions
-The number of decayed,
missing or filled teeth
per child
-Weight,
-Height,
-Mid-arm circumference
VARIABLES SCALE
1. NOMINAL SCALE
It uses names, numbers or other symbols. Each
measurement assigned to a limited number of
unordered categories and fall in only one
category.
eg. males & females
2. ORDINAL SCALE
• Each measurement is assigned to one of a
limited number of categories that are ranked in
a graded order. ( 1st, 2nd, 3rd..)
• Differences among categories are not
necessary equal and often not measurable.
VARIABLES SCALE
3. INTERVAL SCALE
Each measurement is assigned to one of
unlimited categories that are equally
spaced with NO true zero point.
4. RATIO SCALE
Measurement begins at a true zero
point and the scale has equal intervals
POPULATION
• POPULATION OF ENTITIES
Largest collection of entities that had
common characteristics for which we have
an interest at a particular time.
• POPULATION OF VARIABLES
It is the largest collection of values of a
random variable for which we have an
interest at a particular time.
SAMPLE
• It is part or subset of the population
Sample of entities:
which is a subset of population of entities
Sample of variables:
which is subset of population of variables
GROUPED DATA
To group a set of observations, we select a
set of contiguous, non overlapping
intervals, such that each value in the set
of observation can be placed in one, and
only one, of the interval, and no single
observation should be missed.
The interval is called:
CLASS INTEVAL.
NUMBER OF CLASS INTERVALS
The number of class intervals :
• Should not be too few because of the loss of
important information. and
• Not too many because of the loss of the needed
summarization .
When there is a priori classification of that
particular observation we can follow that
classification ( annual tabulations), but when there
is no such classification we can follow the
Sturge's Rule
NUMBER OF CLASS INTERVALS
Sturge's Rule:
k=1+3.322 log n
• k= number of class intervals
• n= number of observations in the set
• The result should not be regarded as final, modification
is possible
WIDTH OF CLASS INTERVAL
The width of the class intervals should be the
same, if possible.
R
W = -------K
W= Width of the class interval
R= Range (largest value – smallest value)
K= Number of class intervals
FREQUENCY DISTRIBUTION
It determines the
number of
observations falling
into each class
interval
Fasting blood
glucose levels
Frequency
< 60
60-62
63-65
66-68
69-71
72+
10
23
33
22
34
33
155
RELATIVE FREQUENCY
DISTRIBUTION
• It determines the
proportion of
observation in the
particular class
interval relative to
the
total observations
in the set.
Fasting blood
glucose levels
Frequency
Relative frequency
%
< 60
10
6.45
60-62
23
14.84
63-65
33
21.29
66-68
22
14.19
69-71
34
21.94
72+
33
21.29
155
100
CUMULATIVE FREQUENCY
DISTRIBUTION
• This is calculated by
adding the number of
observation in each
class interval to the
number of
observations in the
class interval above,
starting from the
second class interval
onward.
Fasting blood
glucose
levels
< 60
60-62
63-65
66-68
69-71
72+
Frequency Cumulative
frequency
distribution
10
23
33
22
34
33
155
10
33
66
88
122
155
CUMULATIVE RELATIVE
FREQUENCY DISTRIBUTION
This calculated
by adding the
relative
frequency in
each class
interval to the
relative
frequency in the
class interval
above, starting
also from the
second class
interval onward.
Fasting
blood
glucose
levels
F
Cumulative
frequency
distribution
Relative
frequency
%
< 60
10
23
33
22
34
33
10
6.45
14.84
21.29
14.19
21.94
21.29
60-62
63-65
66-68
69-71
72+
155
33
66
88
122
155
100
Cumulative
relative
frequency
distribution
6.45
21.29
42.58
56.77
78.71
100.00
CUMULATIVE DISTRIBUTION
• Cumulative frequency and cumulative
relative frequency distributions are used to
facilitate obtaining information regarding
the frequency or relative frequency within
two or more contagious class intervals.
EXERCISE
• The followings
are the weights
(Kg) of 45 adult
male individuals
attending a
primary health
care centers:
76
86
70
85
66
55
73
49
79
56
62
73
88
90
41
65
69
58
99
63
77
72
68
55
54
78
77
59
64
68
71
47
73
85
66
52
72
63
65
48
83
90
80
85
71
1
2
3
4
5
6
7
8
9
76
55
62
73
88
90
41
52
72
10
11
12
13
14
15
16
17
18
86
73
65
69
58
99
63
63
65
19
20
21
22
23
24
25
26
27
70
49
77
72
68
55
54
48
83
28
29
30
31
32
33
34
35
36
85
79
78
77
59
64
68
90
80
37
38
39
40
41
42
43
44
45
66
56
71
47
73
85
66
85
71
EXERCISE
• Construct a table showing:
Ø Frequency
Ø Relative frequency
Ø Cumulative frequency
Ø Cumulative relative frequency distribution.
Number of class intervals:
K=1+3.322 log n
=1+3.322 log45
=1+3.322 X 1.653
=6.4
=6
Width of class interval:
R
99-41
W= ------ = ------- = 9.7 = 10
K
6
CLASS
INTERVAL
(Kg)
40-49
50-59
60-69
70-79
80-89
90-99
Total
FREQUENCY
4
7
11
13
7
3
45
RELATIVE
FREQUENCY
%
8.9
15.6
24.4
28.9
15.6
6.7
100.1
CUMULATIVE
FREQUENCY
CUM.REL.
FREQUENCY
%
4
11
22
35
42
45
8.9
24.5
48.9
77.8
93.4
100.1
EXERCISE
• The following are the number of babies
born during a year in 60 public hospitals
1
30
11 27 21
56 31
45 41 32
2
37
12 52 22
54 32
32 42 35 52 24
3
32
13 40 23
53 33
29 43 42 53 53
4
39
14 59 24
49 34
30 44 21 54 28
5
52
15 43 25
54 35
22 45 24 55 57
6
55
16 45 26
48 36
49 46 57 56 56
7
55
17 34 27
42 37
59 47 46 57 57
8
26
18 28 28
54 38
42 48 54 58 59
9
56
19 58 29
53 39
53 49 34 59 50
10 57
20 46 30
31 40
31 50 24 60 29
51 47
EXERCISE
• Construct a table showing :
ØFrequency
ØRelative frequency
ØCumulative frequency
ØCumulative relative frequency
Exercise
• For the following data construct a table
showing age and gender distribution.
Age Group
Number of cases by Gender
Male
Female
Total
0-9
0
2
2
10-19
5
1
6
20-29
7
4
11
30-39
6
4
10
40-49
2
2
4
50+
0
1
1
Total
20
14
34
• Complete the table showing the relative
frequency distribution
Thanks
Download