Grouped Data and Graphical Methods

advertisement
Biostatistics
Unit 3
Graphs
1
Grouped data
• Data can be grouped into a set of nonoverlapping, contiguous intervals called
class intervals (Excel calls them
bins).
• Class intervals are used to sort the
data.
• Between 6 and 15 class intervals are
usually used depending on the range of
the data.
2
Grouped data
• The frequency tells how many of the
data values fall into each class
interval.
• Frequency can be displayed
graphically using the histogram and
the frequency polygon.
3
Bacterial cell lengths
Below are the measured lengths of 30
individual bacterial cells. As they have not yet
been sorted to make a sorted list, they can be
considered as raw data.
1)
6)
1.5
3.2
2) 2.0
7) 2.3
3) 2.0
8) 1.5
4) 3.0
9) 2.0
5) 2.0
10) 2.0
11) 1.0
16) 2.0
12) 1.0
17) 4.0
13) 2.5
18) 3.0
14) 3.4
19) 2.0
15) 2.1
20) 2.0
21) 2.2
26) 1.5
22) 2.0
27) 2.0
23) 2.0
28) 1.0
24) 2.0
29) 1.0
25) 2.0
30) 1.0
4
Basic statistics
The values of the basic statistics for the data
are presented below. They were obtained
using the TI-83 calculator. Similar results are
available using Microsoft Excel.
mean = 2.04
min x = 1.0
s = .7186
Q1 = 1.5
n = 30
median = 2.0
*mode = 2.0
Q3 = 2.2
*range = 3.0
max x = 4.0
---------------------------------------*found by inspecting the data
5
Frequency Table for Bacterial
Cell Lengths
Class interval ( m )
0.50 - 1.49
1.50 - 2.49
2.50 - 3.49
3.50 - 4.49
Frequency
5
19
5
1
6
Histogram
7
Histogram
• The histogram is a vertical bar graph.
Excel calls this a column graph.
• The bars on a histogram must touch
each other to show that all possible
values of data are accounted for.
8
Frequency Polygon
9
Frequency Polygon
• The frequency polygon is a line
graph. It is made by connecting the top
center points of each of the bars.
• The ends of the line must be anchored
on the x-axis. This requires an
additional class interval with a value of
0 (zero) at each end of the table of
class intervals.
10
Percentiles and quartiles
• Percentiles are used for location of data on the
horizontal axis.
• The median corresponds to the 50th percentile.
• We generally are interested in quartiles which are
the 25th percentiles.
• The first quartile (Q1) is the 25th percentile. It
contains one-quarter of the data. The second
quartile (Q2) is the median which marks the point
with half of the data.
(continued)
11
Percentiles and quartiles
• The third quartile (Q3) is the 75th percentile
representing three-quarters of the data.
• Sometimes it is useful to know what the
interquartile range is. The interquartile range is
represented by Q3 - Q1.
• Before calculating quartiles, it is essential that the
data be sorted in ascending order. The term
“ascending order” means that the smallest number
is first in the list and the largest number is last in
the list.
12
Calculation of quartiles
For the sorted data set of 30 observations of
bacteria, this means that:
Q1 = (n+1)/4 -> 7.75 --> 8th observation (1.5)
Q2 = 2(n+1)/4 ->15.5 --> average of the 15th and
16th observations (2.0)
Q3 = 3(n+1)/4 -> 23.25 --> 23rd observation (2.2)
(continued)
13
Calculation of quartiles
• Be careful when interpreting quartile
calculations.
• An answer of Q1 = 7.75 rounded to 8 does
not mean that the first quartile value is 8.
• It means that the first quartile is the data item
in the 8th position. In this data set the value
occupying the 8th position is 1.5.
14
Box Plot
The box plot is used to convey information
about the data. It makes use of the quartiles
that were calculated above.
1. Draw a number line representing cell
length on the horizontal axis.
2. Above the horizontal axis draw a
rectangle with the left-hand end of the
rectangle directly above Q1 and the right-hand
end of the rectangle directly above Q3.
15
Box Plot
3. Draw a vertical line across the box directly
over Q2.
4. Draw a horizontal extension line out of the
left-hand end of the box to a point above the
smallest measurement of the data. At this point
draw a vertical line. This is the left whisker.
5. Draw another horizontal line out of the righthand end of the box to a point above the largest
measurement of the data. Draw a vertical line at
16
this point. This is the right whisker.
Box plot of cell measurements
17
fin
18
Download