CHAPTER 2: ORGANIZING DATA RAW DATA Definition Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. 2 Table 2.1 21 18 25 22 25 19 20 19 28 23 24 19 31 21 18 25 22 19 20 37 Ages of 50 students 29 19 23 22 27 34 19 18 22 23 26 25 23 21 21 27 22 19 20 25 37 25 23 19 21 33 23 26 21 24 3 Table 2.2 J F J SO SE F F F SE SE SO J SE J F SE F SO SO SE Status of 50 Students J F SO SO J J F F J SO SE SE J J F J SO F SO J J SE SE F SO J J SE SO SO 4 ORGANIZING AND GRAPHING QUALITATIVE DATA Frequency Distributions Relative Frequency and Percentage Distributions Graphical Presentation of Qualitative Data Bar Graphs Pie Charts 5 TABLE 2.3 Variable Category Type of Employment Students Intend to Engage In Type of Employment Private companies/businesses Federal government State/local government Own business Number of Frequency column Students 44 Frequency 16 23 17 Sum = 100 6 Frequency Distributions Definition A frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories. 7 Example 2-1 A sample of 30 employees from large companies was selected, and these employees were asked how stressful their jobs were. The responses of these employees are recorded next where very represents very stressful, somewhat means somewhat stressful, and none stands for not stressful at all. 8 Example 2-1 Some what None Very Somewhat Very Very None Somewhat Somewhat Very Somewhat Somewhat Very Somewhat None None Somewhat Somewhat Very Somewhat Somewhat Very None Somewhat Very very Somewhat Very somewhat None Construct a frequency distribution table for these data. 9 Solution 2-1 Table 2.4 Frequency Distribution of Stress on Job Stress on Job Very Somewhat None Tally |||| |||| |||| |||| |||| |||| | Frequency (f) 10 14 6 Sum = 30 10 Relative Frequency and Percentage Distributions Calculating Relative Frequency of a Category Frequency of that category Re lative frequency of a category Sum of all frequencie s 11 Relative Frequency and Percentage Distributions cont. Calculating Percentage Percentage = (Relative frequency) · 100 12 Example 2-2 Determine the relative frequency and percentage for the data in Table 2.4. 13 Solution 2-2 Table 2.5 Relative Frequency and Percentage Distributions of Stress on Job Stress on Job Very Somewhat None Relative Frequency 10/30 = .333 14/30 = .467 6/30 = .200 Sum = 1.00 Percentage .333(100) = 33.3 .467(100) = 46.7 .200(100) = 20.0 Sum = 100 14 Graphical Presentation of Qualitative Data Definition A graph made of bars whose heights represent the frequencies of respective categories is called a bar graph. 15 Figure 2.1 Bar graph for the frequency distribution of Table 2.4 16 14 Frequency 12 10 8 6 4 2 0 Very Somewhat None Strees on Job 16 Graphical Presentation of Qualitative Data cont. Definition A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories is called a pie chart. 17 Table 2.6 Calculating Angle Sizes for the Pie Chart Stress on Job Relative Frequency Very Somewhat None .333 .467 .200 Sum = 1.00 Angle Size 360(.333) = 119.88 360(.467) = 168.12 360(.200) = 72.00 Sum = 360 18 Figure 2.2 Pie chart for the percentage distribution of Table 2.5. None, 20% Very, 33.30% Somewhat, 46.70% 19 ORGANIZING AND GRAPHING QUANTITATIVE DATA Frequency Distributions Constructing Frequency Distribution Tables Relative and Percentage Distributions Graphing Grouped Data Histograms Polygons 20 Frequency Distributions Table 2.7 Variable Third class Weekly Earnings of 100 Employees of a Company Weekly Earnings (dollars) Number of Employees f 401 to 600 601 to 800 801 to 1000 1001 to 1200 1201 to 1400 1401 to 1600 9 22 39 15 9 6 Lower limit of the sixth class Frequency column Frequency of the third class Upper limit of the sixth class 21 Frequency Distributions cont. Definition A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class. Data presented in the form of a frequency distribution are called grouped data. 22 Frequency Distributions cont. Definition The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class. 23 Frequency Distributions cont. Finding Class Width Class width = Upper boundary – Lower boundary 24 Frequency Distributions cont. Calculating Class Midpoint or Mark Lower limit Upper limit Class midpoint or mark 2 25 Constructing Frequency Distribution Tables Calculation of Class Width Largest va lue - Smallest v alue Approximat e class width Number of classes 26 Table 2.8 Class Limits Class Boundaries, Class Widths, and Class Midpoints for Table 2.7 Class Boundaries 401 to 600 400.5 to less than 600.5 601 to 800 600.5 to less than 800.5 801 to 1000 800.5 to less than 1000.5 1001 to 1200 1000.5 to less than 1200.5 1201 to 1400 1200.5 to less than 1400.5 1401 to 1600 1400.5 to less than 1600.5 Class Width Class Midpoint 200 200 200 200 200 200 500.5 700.5 900.5 1100.5 1300.5 1500.5 27 Example 2-3 Table 2.9 gives the total home runs hit by all players of each of the 30 Major League Baseball teams during the 2002 season. Construct a frequency distribution table. 28 Table 2.9 Team Anaheim Arizona Atlanta Baltimore Boston Chicago Cubs Chicago White Sox Cincinnati Cleveland Colorado Detroit Florida Houston Kansas City Los Angeles Home Runs Hit by Major League Baseball Teams During the 2002 Season Home Runs 152 165 164 165 177 200 217 169 192 152 124 146 167 140 155 Team Milwaukee Minnesota Montreal New York Mets New York Yankees Oakland Philadelphia Pittsburgh St. Louis San Diego San Francisco Seattle Tampa Bay Texas Toronto Home Runs 139 167 162 160 223 205 165 142 175 136 198 152 133 230 187 29 Solution 2-3 230 124 Approximat e width of each class 21.2 5 Now we round this approximate width to a convenient number – say, 22. 30 Solution 2-3 The lower limit of the first class can be taken as 124 or any number less than 124. Suppose we take 124 as the lower limit of the first class. Then our classes will be 124 – 145, 146 – 167, 168 – 189, 190 – 211, and 212 - 233 31 Table 2.10 Frequency Distribution for the Data of Table 2.9 Total Home Runs Tally f 124 – 145 146 – 167 168 – 189 190 – 211 212 - 233 |||| | |||| |||| ||| |||| |||| ||| 6 13 4 4 3 ∑f = 30 32 Relative Frequency and Percentage Distributions Relative Frequency and Percentage Distributions Frequency of that class Relative frequency of a class Sum of all frequencie s f f Percentage (Relative frequency) 100 33 Example 2-4 Calculate the relative frequencies and percentages for Table 2.10 34 Solution 2-4 Table 2.11 Relative Frequency and Percentage Distributions for Table 2.10 Total Home Runs Class Boundaries Relative Frequency Percentage 124 – 145 146 – 167 168 – 189 190 – 211 212 - 233 123.5 to less than 145.5 145.5 to less than 167.5 167.5 to less than 189.5 189.5 to less than 211.5 211.5 to less than 233.5 .200 .433 .133 .133 .100 20.0 43.3 13.3 13.3 10.0 Sum = .999 Sum = 99.9% 35 Graphing Grouped Data Definition A histogram is a graph in which classes are marked on the horizontal axis and the frequencies, relative frequencies, or percentages are marked on the vertical axis. The frequencies, relative frequencies, or percentages are represented by the heights of the bars. In a histogram, the bars are drawn adjacent to each other. 36 Figure 2.3 Frequency histogram for Table 2.10. 15 Frequency 12 9 6 3 0 124 - 146 145 167 168 - 190 - 212 - 189 211 233 Total home runs 37 Figure 2.4 Relative frequency histogram for Table 2.10. Relative Frequency .50 .40 .30 .20 .10 0 124 145 146 167 168 - 190 - 212 - 189 211 233 Total home runs 38 Graphing Grouped Data cont. Definition A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon. 39 Figure 2.5 Frequency polygon for Table 2.10. 15 Frequency 12 9 6 3 0 124 145 146 167 168 - 190 - 212 - 189 211 233 40 Frequency Distribution curve. Frequency Figure 2.6 x 41 Example 2-5 The following data give the average travel time from home to work (in minutes) for 50 states. The data are based on a sample survey of 700,000 households conducted by the Census Bureau (USA TODAY, August 6, 2001). 42 Example 2-5 22.4 19.7 21.6 15.4 21.1 18.2 27.0 21.9 22.1 25.4 23.7 21.7 23.2 19.6 24.9 19.8 17.6 16.0 21.4 25.5 26.7 17.7 16.1 23.8 20.1 23.4 22.5 22.3 21.9 17.1 23.5 23.7 24.4 21.9 22.5 21.2 28.7 15.6 24.3 29.2 19.9 22.7 26.7 26.1 31.2 23.6 24.2 22.7 22.6 20.8 Construct a frequency distribution table. Calculate the relative frequencies and percentages for all classes. 43 Solution 2-5 31.2 15.4 Approximat e width of each class 2.63 6 44 Solution 2-5 Table 2.12 Frequency, Relative Frequency, and Percentage Distributions of Average Travel Time to Work Class Boundaries 15 18 21 24 27 30 to to to to to to less than less than less than less than less than less than 18 21 24 27 30 33 f Relative Frequency Percentage 7 7 23 9 3 1 .14 .14 .46 .18 .06 .02 14 14 46 18 6 2 Σf = 50 Sum = 1.00 Sum = 100% 45 Example 2-6 The administration in a large city wanted to know the distribution of vehicles owned by households in that city. A sample of 40 randomly selected households from this city produced the following data on the number of vehicles owned: 5 1 1 2 0 1 1 2 1 1 1 3 3 0 2 5 1 2 3 4 2 1 2 2 1 2 2 1 1 1 4 2 1 1 2 1 1 4 1 3 Construct a frequency distribution table for these data, and draw a bar graph. 46 Solution 2-6 Table 2.13 Frequency Distribution of Vehicles Owned Vehicles Owned 0 1 2 3 4 5 Number of Households (f) 2 18 11 4 3 2 Σf = 40 47 Figure 2.7 Bar graph for Table 2.13. 20 18 16 Frequency 14 12 10 8 6 4 2 0 No Car 1 Car 2 Cars 3 Cars 4 Cars 5 Cars Vehicles ow ned 48 SHAPES OF HISTOGRAMS 1. 2. 3. Symmetric Skewed Uniform or rectangular 49 Figure 2.8 Symmetric histograms. 50 Figure 2.9 (a) (a) A histogram skewed to the right. (b) A histogram skewed to the left. (b) 51 Figure 2.10 A histogram with uniform distribution. 52 Figure 2.11 (a) and (b) Symmetric frequency curves. (c) Frequency curve skewed to the right. (d) Frequency curve skewed to the left. 53 CUMULATIVE FREQUENCY DISTRIBUTIONS Definition A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class. 54 Example 2-7 Using the frequency distribution of Table 2.10, reproduced in the next slide, prepare a cumulative frequency distribution for the home runs hit by Major League Baseball teams during the 2002 season. 55 Example 2-7 Total Home Runs f 124 – 145 146 – 167 168 – 189 190 – 211 212 - 233 6 13 4 4 3 56 Solution 2-7 Table 2.14 Class Limits 124 124 124 124 124 – – – – – 145 167 189 211 233 Cumulative Frequency Distribution of Home Runs by Baseball Teams Class Boundaries 123.5 to less than 145.5 123.5 to less than 167.5 123.5 to less than 189.5 123.5 to less than 211.5 123.5 to less than 233.5 Cumulative Frequency 6 6 6 6 6 + + + + 13 13 13 13 = + + + 19 4 = 23 4 + 4 = 27 4 + 4 + 3 = 30 57 CUMULATIVE FREQUENCY DISTRIBUTIONS cont. Calculating Cumulative Relative Frequency and Cumulative Percentage Cumulative frequency of a class Cumulative relative frequency Total observatio ns in the data set Cumulative percentage (Cumulativ e relative frequency) 100 58 Table 2.15 Cumulative Relative Frequency and Cumulative Percentage Distributions for Home Runs Hit by baseball Teams Class Limits Cumulative Relative Frequency Cumulative Percentage 124 – 145 124 – 167 124 – 189 124 – 211 124 - 233 6/30 = .200 19/30 = .633 23/30 = .767 27/30 = .900 30/30 = 1.00 20.0 63.3 76.7 90.0 100.0 59 CUMULATIVE FREQUENCY DISTRIBUTIONS cont. Definition An ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes. 60 Cumulative frequency Figure 2.12 Ogive for the cumulative frequency distribution in Table 2.14 30 25 20 15 10 5 123.5 145.5 167.5 189.5 211.5 233.5 Total home runs 61 STEM-AND-LEAF DISPLAYS Definition In a stem-and-leaf display of quantitative data, each value is divided into two portions – a stem and a leaf. The leaves for each stem are shown separately in a display. 62 Example 2-8 The following are the scores of 30 college students on a statistics test: 75 69 83 52 72 84 80 81 77 96 61 64 65 76 71 79 86 87 71 79 72 87 68 92 93 50 57 95 92 98 Construct a stem-and-leaf display. 63 Solution 2-8 To construct a stem-and-leaf display for these scores, we split each score into two parts. The first part contains the first digit, which is called the stem. The second part contains the second digit, which is called the leaf. 64 Solution 2-8 We observe from the data that the stems for all scores are 5, 6, 7, 8, and 9 because all the scores lie in the range 50 to 98 65 Figure 2.13 Stem-and-leaf display. Stems Leaf for 52 5 6 7 8 9 2 Leaf for 75 5 66 Solution 2-8 After we have listed the stems, we read the leaves for all scores and record them next to the corresponding stems on the right side of the vertical line. 67 Figure 2.14 5 6 7 8 9 2 5 5 0 6 Stem-and-leaf display of test scores. 0 9 9 7 3 7 1 1 1 5 8 2 6 2 4 6 9 7 1 2 3 4 7 2 8 68 Figure 2.15 5 6 7 8 9 0 1 1 0 2 Ranked stem-and-leaf display of test scores. 2 4 1 1 2 7 5 2 3 3 8 2 4 5 9 5 6 7 9 9 6 7 7 6 8 69 Example 2-9 The following data are monthly rents paid by a sample of 30 households selected from a small city. 880 1210 1151 1081 721 1075 985 1231 932 630 1175 952 1023 775 850 825 1100 1140 1235 1000 750 750 965 960 915 1191 1035 1140 1370 1280 Construct a stem-and-leaf display for these data. 70 Solution 2-9 Figure 2.16 Stem-and-leaf display of rents. 6 7 8 9 10 11 12 13 30 75 80 32 23 91 10 70 50 25 52 81 51 31 21 50 15 35 40 35 50 60 85 65 75 00 75 40 00 80 71 Example 2-10 The following stem-and-leaf display is prepared for the number of hours that 25 students spent working on computers during the last month. 72 Example 2-10 0 1 2 3 4 5 6 7 8 6 1 2 2 1 3 2 7 6 4 5 6 4 9 7 8 6 9 9 8 4 5 7 5 6 Prepare a new stem-and-leaf display by grouping the stems. 73 Solution 2-10 Figure 2.17 0–2 3–5 6–8 Grouped stem-and-leaf display. 6 * 1 7 9 * 2 6 2 4 7 8 * 1 5 6 9 9 * 3 6 8 2 4 4 5 7 * * 5 6 74