Biostatistics Unit 3 Graphs 1 Grouped data • Data can be grouped into a set of nonoverlapping, contiguous intervals called class intervals (Excel calls them bins). • Class intervals are used to sort the data. • Between 6 and 15 class intervals are usually used depending on the range of the data. 2 Grouped data • The frequency tells how many of the data values fall into each class interval. • Frequency can be displayed graphically using the histogram and the frequency polygon. 3 Bacterial cell lengths Below are the measured lengths of 30 individual bacterial cells. As they have not yet been sorted to make a sorted list, they can be considered as raw data. 1) 6) 1.5 3.2 2) 2.0 7) 2.3 3) 2.0 8) 1.5 4) 3.0 9) 2.0 5) 2.0 10) 2.0 11) 1.0 16) 2.0 12) 1.0 17) 4.0 13) 2.5 18) 3.0 14) 3.4 19) 2.0 15) 2.1 20) 2.0 21) 2.2 26) 1.5 22) 2.0 27) 2.0 23) 2.0 28) 1.0 24) 2.0 29) 1.0 25) 2.0 30) 1.0 4 Basic statistics The values of the basic statistics for the data are presented below. They were obtained using the TI-83 calculator. Similar results are available using Microsoft Excel. mean = 2.04 min x = 1.0 s = .7186 Q1 = 1.5 n = 30 median = 2.0 *mode = 2.0 Q3 = 2.2 *range = 3.0 max x = 4.0 ---------------------------------------*found by inspecting the data 5 Frequency Table for Bacterial Cell Lengths Class interval ( m ) 0.50 - 1.49 1.50 - 2.49 2.50 - 3.49 3.50 - 4.49 Frequency 5 19 5 1 6 Histogram 7 Histogram • The histogram is a vertical bar graph. Excel calls this a column graph. • The bars on a histogram must touch each other to show that all possible values of data are accounted for. 8 Frequency Polygon 9 Frequency Polygon • The frequency polygon is a line graph. It is made by connecting the top center points of each of the bars. • The ends of the line must be anchored on the x-axis. This requires an additional class interval with a value of 0 (zero) at each end of the table of class intervals. 10 Percentiles and quartiles • Percentiles are used for location of data on the horizontal axis. • The median corresponds to the 50th percentile. • We generally are interested in quartiles which are the 25th percentiles. • The first quartile (Q1) is the 25th percentile. It contains one-quarter of the data. The second quartile (Q2) is the median which marks the point with half of the data. (continued) 11 Percentiles and quartiles • The third quartile (Q3) is the 75th percentile representing three-quarters of the data. • Sometimes it is useful to know what the interquartile range is. The interquartile range is represented by Q3 - Q1. • Before calculating quartiles, it is essential that the data be sorted in ascending order. The term “ascending order” means that the smallest number is first in the list and the largest number is last in the list. 12 Calculation of quartiles For the sorted data set of 30 observations of bacteria, this means that: Q1 = (n+1)/4 -> 7.75 --> 8th observation (1.5) Q2 = 2(n+1)/4 ->15.5 --> average of the 15th and 16th observations (2.0) Q3 = 3(n+1)/4 -> 23.25 --> 23rd observation (2.2) (continued) 13 Calculation of quartiles • Be careful when interpreting quartile calculations. • An answer of Q1 = 7.75 rounded to 8 does not mean that the first quartile value is 8. • It means that the first quartile is the data item in the 8th position. In this data set the value occupying the 8th position is 1.5. 14 Box Plot The box plot is used to convey information about the data. It makes use of the quartiles that were calculated above. 1. Draw a number line representing cell length on the horizontal axis. 2. Above the horizontal axis draw a rectangle with the left-hand end of the rectangle directly above Q1 and the right-hand end of the rectangle directly above Q3. 15 Box Plot 3. Draw a vertical line across the box directly over Q2. 4. Draw a horizontal extension line out of the left-hand end of the box to a point above the smallest measurement of the data. At this point draw a vertical line. This is the left whisker. 5. Draw another horizontal line out of the righthand end of the box to a point above the largest measurement of the data. Draw a vertical line at 16 this point. This is the right whisker. Box plot of cell measurements 17 fin 18