Uploaded by Pharmacist Mosab

3 Organizing-and-Displaying-Data -Frequency-Tables

advertisement
BIOSTATISTICS
NURS 3324
3 Organizing and Displaying Data
Any survey or experiment yields a list of observations. These need to be organized and
summarized in a logical fashion so that we may perceive the outcome clearly. Tables and
graphs are popularly used to organize and summarize data and description of data.
A. Frequency Tables/ Frequency Distributions
Considerable information can be obtained from large masses of statistical data by
grouping the raw data into classes and determining the number of observations that fall in
each of the classes. Such an arrangement is called a frequency distribution or frequency
table. Frequency table may be the most convenient way of summarizing or displaying
data. The types of frequency distributions that will be considered here are categorical or
qualitative frequency distributions, and grouped frequency distributions.
Categorical or Simple Frequency Distributions
Categorical frequency distributions represent data that can be placed in specific
categories, such as gender, hair color, or blood group.
Example: The blood types of 25 blood donors are given below. Summarize the data
using a frequency distribution.
AB
O
B
A
A
B
B
O
O
B
A
O
B
AB
AB
O
A
B
AB
O
B
O
B
O
A
Solution: We will represent the blood types as classes and the number of occurrences for
each blood type as frequencies. The frequency table (distribution) in the following table
summarizes the data.
Frequency Table for the above Example
Class (Blood Type)
A
B
O
AB
Total
Frequency
5
8
8
4
25
Grouped Frequency Distributions
A grouped frequency distribution is obtained by constructing class intervals for the data,
and then listing the corresponding number of values (frequency count) in each interval.
Tables 3.2 and 3.3 are examples of frequency tables, constructed from the systolic blood
pressure readings (by smoking status) of Table 3.1.
13
BIOSTATISTICS
NURS 3324
Table 3.1 Smoking status and the systolic blood pressure for a sample of 100 individuals
Smoking
Status*
Systolic
blood
pressure
Smoking
Status*
Systolic
blood
pressure
1
2
3
4
5
1
0
1
1
0
102
138
190
122
128
36
37
38
39
40
0
0
0
1
1
142
122
146
126
176
6
7
8
9
10
0
0
1
0
0
112
128
116
134
104
41
42
43
44
45
1
0
1
1
1
11
12
13
14
15
16
17
18
19
20
1
0
0
0
0
0
1
0
0
0
116
152
134
132
130
118
136
108
108
128
46
47
48
49
50
51
52
53
54
55
21
22
23
24
25
1
1
1
0
1
118
134
178
134
162
26
27
28
29
30
0
1
0
0
0
31
32
33
34
35
0
1
0
0
0
ID
Smoking
Status*
Systolic
blood
pressure
71
72
73
74
75
1
0
1
1
0
116
154
126
140
122
104
112
140
102
142
76
77
78
79
80
0
1
0
0
1
154
140
120
140
114
1
0
1
0
1
0
0
1
0
1
146
92
112
152
116
118
128
116
134
108
81
82
83
84
85
86
87
88
89
90
0
0
0
0
0
1
0
0
1
0
122
94
122
172
100
150
154
170
140
144
56
57
58
59
60
0
0
0
0
0
134
124
124
114
154
91
92
93
94
95
0
0
0
1
0
156
132
140
150
130
162
120
98
144
118
61
62
63
64
65
1
0
1
0
1
114
114
98
128
130
96
97
98
99
100
0
0
0
0
1
118
162
128
130
208
118
138
134
108
96
66
67
68
69
70
1
1
0
0
0
122
112
106
128
128
ID
ID
*: 1 and 0 represent smoking and nonsmoking person respectively.
How to construct a frequency table?
1. Arrange the data into an array, a listing of all observations from smallest to largest in
order to determine the interval spanned by the data. We find that the blood pressure
interval for smokers for example is 98-208.
14
BIOSTATISTICS
Systolic Blood Pressure of
Smokers from Table 3.1
98
116
130
150
102
116
134
150
102
116
136
162
104
116
138
176
108
118
140
178
112
120
140
190
112
122
140
208
114
122
140
114
126
142
116
126
146
NURS 3324
Systolic Blood Pressure of Non-Smokers from
Table 3.1
92
112 122 128 134 144 162
94
112 122 128 134 146 170
96
114 122 128 134 152 172
98
114 122 128 134 152
100
118 124 130 134 154
104
118 124 130 138 154
106
118 128 130 140 154
108
118 128 132 140 154
108
118 128 132 142 156
108
120 128 134 144 162
2. Determine the range (R) from the difference between the smallest and largest value in
the set of observations i.e.
R = largest data value – smallest data value
= 208-98 =110 mm.
3. Divide the range into a number of equal and non-overlapping segments called class
intervals.
Important Note
 The number of intervals in general should range from 5 to 15.
 With too many class intervals, the data are not summarized enough for a clear
visualization of how they are distributed.
 With too few, the data are over-summarized and some of the details of the
distribution may lost.
Sturges’s formula and the desired number of class intervals

Those who wish more specific guidance in the matter of deciding how many class
intervals are needed may use Sturges’s formula;
k = 1 + 3.322(log10 n),
where k stands for the number of class intervals and n is the number of values in the data
set under consideration (or the sample size)
Example:
Determine the k value for the 37 smokers we want to group.
k = 1 + 3.322(log10 37)
k = 1 + 3.322(1.568) = 6.21  6
Note that the value of k has been rounded to the nearest whole number.

The answer obtained by Sturges’ rule should not considered as final, but as
guide only, should be increased or decreased for convenience and clear
presentation. Suppose we decide to use 6 intervals.
15
BIOSTATISTICS
NURS 3324
4. Determine the size (length or width) of the class interval (w) by dividing the range
(R) by the number of class intervals required or (k). If you want the class width to be
a whole number, always increase the result to the next whole number so that the
classes cover the data.
w  R/k = 110/6 = 18.3 increase to 19 or for easiness to 20.
5. Construct a table with three columns, and then write the class intervals in the first
column.
 Start the first class interval with the smallest value or less. This value is called as
the lower class limit.
Example: The smallest value for systolic blood pressure of smokers and nonsmokers is
98 and 92 respectively. For easiness and for comparison purposes, we will begin at 90.

Add the class width to this number to get the lower class limit of the next class
interval.
 Determine the first class interval which contains all the values between the lower
class limits of two successive intervals including the lower class limit of the first
class interval only.
i.e., 90, 91, 92, 93, 94, ……………………………. 109
The 109 here is called the upper class limits.
 Repeat the above steps for the second, third, …….until the last class interval
Notes
 Intervals are usually equal in size (= 20 in our example), thereby aiding the
comparisons between the frequencies of any intervals.
 The upper limit of the last interval consists of either the largest value or larger.
6. Insert in the next column provided a tally for each individual observation in the raw
data table. Note that, the tally column is included simply as an aid for determining the
frequencies. It is not a necessary part of a frequency table.
7. Sum the tally in each row and record them in the third column entitled Frequency (f).
8. Sum the frequency column (n). This serves as a useful check that all data have been
included in the table.
Note
Frequency tables should be numbered, includes an appropriate descriptive title, specify
the units of measurement, and cite the source of data.
16
BIOSTATISTICS
NURS 3324
Table 3.2 Frequency Table for Systolic Blood Pressure of Smokers from Table 3.1
Class interval
(Systolic Blood Pressure*)
90-109
110-129
130-149
150-169
170-189
190-209
Total
Tally
f
(frequency)
5
15
10
3
2
2
37
*In millimeters of mercury.
Table 3.3 Frequency Table for Systolic Blood Pressure of Nonsmokers from Table 3.1
Class Interval
(Systolic Blood Pressure*)
Tally
f
(Frequency)
90-109
110-129
130-149
150- 169
170-189
190-209
Total
10
24
18
9
2
0
63
*In millimeters of mercury.
Frequency Tables with class boundaries (true class intervals)
Class boundaries may be used in place of class limits. Class boundaries are points that
demarcate the true upper limit of one class and the true lower limit of the next. Class
boundaries can be easily obtained by applying the formula:
Class boundary =
Upper limit of one class + lower limit of next class
2
Example
Determine the class boundaries for the class intervals listed in the table of smokers
Class interval
Class boundaries
(Systolic Blood Pressure*)
f
(frequency)
90-109
89.5-109.5
5
110-129
109.5-129.5
15
130-149
129.5-149.5
10
150-169
149.5-169.5
3
170-189
169.5-189.5
2
190-209
189.5-209.5
2
Total
n = 37
17
BIOSTATISTICS
NURS 3324
Relative frequency ‫التكرارات النسبية‬
 The relative frequency for any class is obtained by dividing the frequency for that
class by the total number of all frequencies (observations or sample size) i.e., f/n.
Example, the relative frequency of the first class, 90-109 mm of smoker is
5/37= 0.14
Percentage relative frequency (p)
 If each relative frequency is multiplied by 100%, we have a percentage relative
frequency (p),
i.e. p=(f/n).100.
Example, the percentage relative frequency of the first class, 90-109 mmHg of
nonsmoker is (5/37)100 = 14%.
Class Interval
(Systolic Blood Pressure*)
90-109
Frequency
5
Relative
Frequency
0.14
Relative
Frequency (%)
14
110-129
15
0.41
41
130-149
10
0.27
27
150-169
3
0.08
8
170-189
2
0.05
5
190-209
Total
2
37
0.05
1
5
100
Significance
It is helpful in making comparison between two sets of data that have a different number
of observations, like our 63 nonsmokers and 37 smokers. For example, in the blood
pressure range of 90-109 mm, 10 (16%) of the nonsmokers and 5 (14%) of the smokers
were represented.
Example
Class Interval
(Systolic Blood Pressure*)
Relative Frequency
(%)
Nonsmokers
Smokers
90-109
16
14
110-129
38
41
130-149
29
27
150-169
14
8
170-189
3
5
190-209
0
5
18
BIOSTATISTICS
NURS 3324
Cumulative percentage relative frequency ‫التكرار النسبي المتجمع‬,
 It is also known as cumulative percentage
 It shows the percentage of elements lying within and below each class interval
 Cumulative percentage relative frequency can be computed by cumulating the
percentage relative frequencies of each of the various class intervals. For
nonsmokers (Table 3.4), the cumulative percentage for the first four intervals is
16 + 38 + 29 + 14 = 97%.
Significance: Make a rapid comparison of entire frequency distributions, ruling out any
need to compare individual class intervals. In our example, for example, 97% of the
nonsmokers in the sample have a systolic blood pressure ≤ 169 (or below 169.5). By
comparison, 90% of the smokers have a blood pressure below the same level. An
alternate way of looking at this is to note that 3% of the nonsmokers and 10% of the
smokers have a systolic blood pressure above > 169 (or > 169.5).
Table 3.4 Comparison of Systolic Blood Pressure between Smokers and Nonsmokers
from Table 3.1
Relative Frequency (%)
Cumulative percentage
Class Interval
relative Frequency (%)
(Systolic Blood Pressure*)
Nonsmokers Smokers Nonsmokers Smokers
90-109
16
14
16
14
110-129
38
41
54
55
130-149
29
27
83
82
150-169
14
8
97
90
170-189
3
5
100
95
190-209
0
5
100
100
*
In millimeters of mercury.
Example 2: The following table gives the hemoglobin levels in (g/dl) of a sample of
50 apparently healthy men aged 20-24.
17.0
16.1
15.2
17.4
16.4
13.5
16.8
15.8
17.4
15.9
16.4
16.1
15.9
17.1
17.5
17.8
15.8
18.3
16.4
14.4
13.9
15.9
16.3
16.2
17.3
14.2
14.6
15.1
16.7
16.2
17.3
17.0
16.2
15.0
14.9
17.7
15.5
15.3
16.5
15.3
16.3
15.9
14.0
15.7
16.1
16.1
15.7
15.8
13.7
16.3
Prepare a grouped frequency distribution for this data, for the class intervals:
13.0 – 13.9, 14.0 – 14.9, 15.0 – 15.9, 16.0 – 16.9, 17.0 – 17.9, 18.0 – 18.9
19
BIOSTATISTICS
NURS 3324
Solution:
It is a very popular to define class intervals in this way
Cumulative frequency
Determine the boundaries (true class intervals) and midpoint of each class interval
20
BIOSTATISTICS
NURS 3324
Example 3: Find the class boundaries (true class intervals), midpoint, relative
frequencies, and cumulative frequency for the following table of distributions for
the age
Solution:
21
Download