CHAPTER 2 Descriptive Statistics: Tabular and Graphical Methods 19 20 CHAPTER 2 KEY CONCEPTSs CONCEPT EXAMPLES EXERCISES Summarizing Qualitative Data Frequency Distribution 1 1 Relative Frequency Distribution 2 1 Percent Frequency Distribution 2 1 Bar Graph 3 2 Pie Chart 4 2 Frequency Distribution 5 3 Relative Frequency Distribution 5 3 Percent Frequency Distribution 5 3 Histogram 6 4 Cumulative Distributions 7 5 Ogive 8 5 9 6 Crosstabulation 10 7 Scatter Diagram 11 8 Summarizing Quantitative Data Exploratory Data Analysis Stem-and-Leaf Display Relationship Between Two Variables Excel Used REVIEWs Summarizing Qualitative Data Frequency Distribution A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several nonoverlapping classes. The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data. Excel Function for Qualitative Frequency Distributions TABULAR AND GRAPHICAL METHODS 21 Excel’s COUNTIF function can be used to construct a frequency distribution for qualitative data. The function has two arguments: First – the range of cell addresses containing the observations (qualitative data) to be counted by class. Second – the label, or address of the cell containing the label, of the class to be counted. Note that each time the function is employed, it counts the frequency of only one class – the class entered as the second argument in the function. If, for example, you have four classes of data, you would employ the function four times. Relative Frequency Distribution The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class. A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class. Percent Frequency Distribution The percent frequency of a class is the relative frequency multiplied by 100. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class. Bar Graph A bar graph is a graphical device for depicting qualitative data that have been summarized in a frequency, relative frequency, or percent frequency distribution. On the horizontal axis we specify the labels that are used for each of the classes. A frequency, relative frequency, or percent frequency scale can be used for the vertical axis. Using a bar of fixed width drawn above each class label, we extend the height appropriately. The bars are separated to emphasize the fact that each class is a separate category. Pie Chart The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data. First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle. Excel’s Tool for Bar Graphs and Pie Charts Excel’s Chart Wizard can be used to construct bar graphs and pie charts. After frequencies, relative frequencies, or percent frequencies have been computed, these values can be graphically displayed using Chart Wizard. Summarizing Quantitative Data 22 CHAPTER 2 Frequency Distribution With quantitative data we have to be careful in defining the nonoverlapping classes to be used in the frequency distribution. The three steps necessary to define the classes are: Determine the number of nonoverlapping classes. Determine the width of each class. Determine the class limits. The guidelines for selecting number of classes are: Use between 5 and 20 classes. Data sets with a larger number of elements usually require a larger number of classes. Smaller data sets usually require fewer classes. The guidelines for selecting the width of classes are: Use classes of equal width. Approximate Class Width = Largest Data Value - Smallest Data Value Number of Classes Excel Function for Quantitative Frequency Distributions Excel’s FREQUENCY function can be used to construct a frequency distribution for quantitative data. Unlike simple Excel functions (COUNTIF for example), the FREQUENCY function can provide multiple values such as class frequencies. A formula containing a function that can return multiple values is called an array formula and must be entered in a special way. The function has two arguments: First – the range of cell addresses containing the observations (quantitative data) to be counted by class. Second – a list of the upper class limits (this tells Excel which frequency to put in each cell within the range of the array formula) After selecting the cells in which you want the frequencies to appear and typing in the array formula containing the FREQUENCY function, press CTRL+SHIFT+ENTER. The formula will be entered into each of the cells selected. Regardless of the number of classes, you enter the array formula only once and no copying/pasting is necessary. Histogram A common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the horizontal axis and the frequency, relative frequency, or percent frequency is placed on the vertical axis. A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency. Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes. Excel’s Tool for Histograms Excel’s Chart Wizard can be used to construct histograms. TABULAR AND GRAPHICAL METHODS 23 After frequencies, relative frequencies, or percent frequencies have been computed, these values can be displayed graphically using Chart Wizard. Cumulative Distributions The cumulative frequency distribution shows the number of items with values less than or equal to the upper limit of each class. The cumulative relative frequency distribution shows the proportion of items with values less than or equal to the upper limit of each class. The cumulative percent frequency distribution shows the percentage of items with values less than or equal to the upper limit of each class. Ogive An ogive is a graph of a cumulative distribution. The data values are shown on the horizontal axis. Shown on the vertical axis are one of the following: cumulative frequencies, cumulative relative frequencies, or cumulative percent frequencies. The frequency (one of the above) of each class is plotted as a point. Straight lines connect the plotted points. Excel’s Tool for Ogives Excel’s Chart Wizard can be used to construct ogives. After cumulative frequencies have been computed, these values can be displayed graphically as a form of scatter diagram using Chart Wizard. Exploratory Data Analysis The techniques of exploratory data analysis consist of simple arithmetic and easy-todraw pictures that can be used to summarize data quickly. One such technique, for quantitative data, is the stem-and-leaf display. Stem-and-Leaf Display A stem-and-leaf display shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but it has the advantage of showing the actual data values. The first digit(s) of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in rank order. Each line in the display is referred to as a stem. Each digit on a stem is a leaf. Relationship Between Two Variables Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. Crosstabulation and a scatter diagram are two methods for summarizing the data for two variables simultaneously. Crosstabulation A crosstabulation is a tabular summary of data for two variables. The two variables might both be qualitative, both be quantitative, or be one of each. 24 CHAPTER 2 The classes for one variable are represented by the rows; the columns represent the classes for the other variable. Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the variables. Excel’s Tool for Crosstabulation Excel’s PivotTable Report is a general tool for summarizing the data for two or more variables simultaneously. This tool can be used to construct a crosstabulation (an Excel PivotTable). Scatter Diagram A scatter diagram is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. A positive relationship, negative relationship, or no relationship might be apparent to a manager with the aid of a scatter diagram. Excel’s Tool for Scatter Diagrams Excel’s Chart Wizard can be used to construct scatter diagrams. Once a scatter diagram has been developed, other Excel tools (covered in later chapters) can be used, for example to fit a trend line to the plotted data. EXAMPLESs EXAMPLE 1 FREQUENCY DISTRIBUTION – QUALITATIVE DATA Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 quests are shown below. Below Average Above Average Below Average Above Average Average Above Average Average Average Above Average Above Average Poor Above Average Above Average Below Average Poor Average Provide a frequency distribution, using Excel, showing the number of occurrences of each rating level in the sample. SOLUTION 1 USING EXCEL’S COUNTIF FUNCTION FOR FREQUENCY DISTRIBUTIONS TABULAR AND GRAPHICAL METHODS 25 Excel can be used to count the frequencies and construct a frequency distribution for the quality ratings data. Enter Data: The label “Rating Given” and the data for the 20-guest sample are entered into cells A1:A21. (Note: Misspelled data will not be counted as it should. Misspelling includes typing any spaces before or after the phrases!) Enter Functions and Formulas: Excel’s COUNTIF function can be used to count the number of times each rating level appears in cells A2:A21. We first enter a label and the rating levels in cells C1:C6, the label “Total” in cell C7, and the label “Frequency” in cell D1. Then, we enter the following function into cell D2: =COUNTIF($A$2:$A$21,C2) To count the number of times the other rating levels appear in our data, we copy the above formula into cells D3:D6. Finally, we total the frequency counts using the SUM function in cell D7. (Note: It is a good idea to total your frequency count as a partial check of the accuracy of your data and function entries. We know from our sample size that the total frequency should be 20.) Formula Worksheet 1 2 3 4 5 6 7 8 20 21 A Rating Given Above Average Below Average Above Average Average Average Above Average Above Average Excellent Poor B C Quality Rating Poor Below Average Average Above Average Excellent Total D Frequency =COUNTIF($A$2:$A$21,C2) =COUNTIF($A$2:$A$21,C3) =COUNTIF($A$2:$A$21,C4) =COUNTIF($A$2:$A$21,C5) =COUNTIF($A$2:$A$21,C6) =SUM(D2:D6) Note: Rows 9-19 are hidden. Value Worksheet 1 2 3 4 5 6 7 8 20 21 A Rating Given Above Average Below Average Above Average Average Average Above Average Above Average Excellent Poor B C Quality Rating Poor Below Average Average Above Average Excellent Total D Frequency 2 3 5 9 1 20 26 CHAPTER 2 Note: Rows 9-19 are hidden. EXAMPLE 2 RELATIVE AND PERCENT FREQUENCY DISTRIBUTIONS – QUALITATIVE DATA Refer to the quality ratings data in Example 1. Construct, using Excel, a relative frequency distribution and percent frequency distribution for the data. SOLUTION 2 USING EXCEL For a data set with n observations, the relative frequency of each class is computed as follows: Relative Frequency of a Class = (Frequency of the Class)/n Extending the worksheet shown in the solution to Example 1, we can construct the relative frequency and percent frequency distributions for the quality ratings data. Enter Data: The data set is already entered. We simply enter the label “Relative Frequency” in cell E1and “Percent Frequency” in cell F1. Enter Functions and Formulas: Using the relative frequency formula above, we compute the relative frequency for the Poor rating by entering the formula =D2/$D$7 in cell E2. We then copy cell E2 to cells E3:E6. To compute the percent frequency for the Poor rating we enter the formula =E2*100 into cell F2. We then copy cell F2 to cells F3:F6. Finally, we copy cell D7 to cells E7:F7 to compute the total of the relative frequencies (1.00) and the total of the percent frequencies (100). Formula Worksheet C D 1 Quality Rating Frequency 2 Poor =COUNTIF($A$2:$A$21,C2) 3 Below Average =COUNTIF($A$2:$A$21,C3) 4 Average =COUNTIF($A$2:$A$21,C4) 5 Above Average =COUNTIF($A$2:$A$21,C5) 6 Excellent =COUNTIF($A$2:$A$21,C6) 7 Total =SUM(D2:D6) 8 Note: Columns A and B are not shown. E Relative Frequency =D2/$D$7 =D3/$D$7 =D4/$D$7 =D5/$D$7 =D6/$D$7 =SUM(E2:E6) F Percent Frequency =E2*100 =E3*100 =E4*100 =E5*100 =E6*100 =SUM(F2:F6) TABULAR AND GRAPHICAL METHODS 27 Value Worksheet C 1 2 3 4 5 6 7 8 Quality Rating Poor Below Average Average Above Average Excellent Total D Frequency 2 3 5 9 1 20 E Relative Frequency 0.10 0.15 0.25 0.45 0.05 1.00 F Percent Frequency 10 15 25 45 5 100 Note: Columns A and B are not shown. EXAMPLE 3 BAR GRAPH Refer to the quality ratings data in Example 1. Display the frequencies (computed in Example 1) graphically with a bar graph using Excel. SOLUTION 3 USING EXCEL’S CHART WIZARD FOR BAR GRAPHS Extending the worksheet shown in the solution to Example 1, we can construct the bar graph using Excel’s Chart Wizard. A third task (in addition to Enter Data and Enter Formulas/Functions) is now necessary: Apply Tools. Enter Data: The data set was entered in Example 1. Enter Functions and Formulas: The functions and formulas for the relative frequencies we want to graph were entered in Example 1. Apply Tools: The following steps describe how to use Excel’s Chart Wizard to construct a bar graph using the frequency distribution appearing in cells C1:D6. Step 1 Select cells C1:D6 Step 2 Select the Chart Wizard button Step 3 When the Chart Wizard-Step 1 of 4-Chart Type dialog box appears: Choose Column in the Chart type list Choose Clustered Column from the Chart sub-type display Select Next > Step 4 When the Chart Wizard-Step 2 of 4-Chart Source Data dialog box appears Select Next > Step 5 When the Chart Wizard-Step 3 of 4-Chart Options dialog box appears: 28 CHAPTER 2 Select the Titles tab and then Type Bar Graph of Quality Ratings in the Chart title box Enter Quality Rating in the Value (X) axis box Enter Frequency in the Value (Y) axis box Select the Legend tab and then Remove the check in the Show Legend box Select Next > Step 6 When the Chart Wizard-Step 4 of 4-Chart Location dialog box appears: Specify the location for the new chart Select Finish to display the bar graph With practice and patience, you can alter the graph initially produced by Excel to look like the one below or to suit your personal preferences. A right-click on almost any item in the chart will bring up a menu of alteration options. Bar Graph C D E F Bar Graph of Quality Ratings 10 Frequency 9 10 11 12 13 14 15 16 17 18 19 20 21 22 8 6 4 2 0 Poor Below Average Average Above Average Excellent Quality Rating EXAMPLE 4 PIE CHART Refer to the quality ratings data in Example 1. Display the percent frequencies (computed in Example 2) graphically with a pie chart using Excel. SOLUTION 4 USING EXCEL’S CHART WIZARD FOR PIE CHARTS Excel’s Chart Wizard provides a general tool for constructing graphical displays such as pie charts. Extending the worksheet shown in the solution to Example 2, we can construct the pie chart. A third task (in addition to Enter Data and Enter Formulas/Functions) is necessary: Apply Tools. TABULAR AND GRAPHICAL METHODS 29 Enter Data: The data set was entered in Example 1. Enter Functions and Formulas: The functions and formulas for the percent frequencies we want to graph were entered in Example 2. Apply Tools: The following steps describe how to use Excel’s Chart Wizard to construct a pie chart using the percent frequency distribution appearing in cells C1:C6 and F1:F6. Step 1 Select cells C1:C6 and F1:F6 (To select nonadjacent cells, select cells C1:C6 and then press the Control key while selecting cells F1:F6.) Step 2 Select the Chart Wizard button Step 3 When the Chart Wizard-Step 1 of 4-Chart Type dialog box appears: Choose Pie in the Chart type list Choose Pie from the Chart sub-type display Select Next > Step 4 When the Chart Wizard-Step 2 of 4-Chart Source Data dialog box appears Select Next > Step 5 When the Chart Wizard-Step 3 of 4-Chart Options dialog box appears: Select the Titles tab and then Type Customers’ Quality Ratings at Marada in the Chart title box Select the Legend tab and then Remove the check in the Show Legend box Select the Data Labels tab and then Select Show Label and percent Select Show leader lines Select Next > Step 6 When the Chart Wizard-Step 4 of 4-Chart Location dialog box appears: Specify the location for the new chart Select Finish to display the pie chart With practice and patience, you can alter the chart initially produced by Excel to look like the one below or to suit your personal preferences. A right-click on almost any item in the chart will bring up a menu of alteration options. Pie Chart C 9 10 11 12 13 14 15 16 17 18 19 20 21 22 D E F Customers' Quality Ratings at Marada Inn Excellent 5% Above Average 45% Poor 10% Below Average 15% Average 25% G 30 CHAPTER 2 EXAMPLE 5 FREQUENCY DISTRIBUTION – QUANTITATIVE DATA The manager of Hudson Auto Repair would like to get a better picture of the distribution of costs for new parts used in the engine tune-up jobs done in the garage. A sample of 50 customer invoices for tune-ups has been taken and the costs of parts, rounded to the nearest dollar, are listed below. 91 71 104 85 62 78 69 74 97 82 93 72 62 88 98 57 89 68 68 101 75 66 97 83 79 52 75 105 68 105 99 79 77 71 79 80 75 65 69 69 73 72 80 67 62 62 76 109 74 73 Develop, using Excel, a frequency distribution for these cost data. Use your own judgment to determine the number of classes and class width that provide a distribution that will be meaningful and helpful to the manager. SOLUTION 5 USING EXCEL’S FREQUENCY FUNCTION Before we employ Excel we must define the nonoverlapping classes to be used in the frequency distribution. In this example the data is in dollars. Most people will think in increments of $5, $10, $20, and so on because these increments coincide with the denominations of U.S. currency. We should consider using one of these increments as the class width. If we round up the largest data value (109) to 110 and we round down the smallest data value (52) to 50, we have a range of 110 – 50 = 60 for the frequency distribution to span. If we choose 10 as the class width, the result will be 60/10 = 6 classes, which looks reasonable. The data is in integer dollar (no cents) amounts. If we set the lower limit of the first class at 50, the upper limit of the first class will be 59 (not 60). There are 10 dollar amounts between 50 and 59 inclusively. The second class will have limits of 60 and 69, and so on. Now we are ready to use Excel. Using Excel’s COUNTIF function to construct a frequency distribution for quantitative data is cumbersome. Excel’s FREQUENCY function is more appropriate here. Enter Data: The label “Parts Cost” and the cost data from the 50 customer invoices are entered into cells A1:A51. Enter Functions and Formulas: Descriptive labels are entered into cells C1 and D1, and the class limits 50-59, 60-69, and so on, are entered in a text format into cells C2:C7. TABULAR AND GRAPHICAL METHODS A B C Parts Parts Cost Cost 1 2 91 50-59 3 71 60-69 4 104 70-79 5 85 80-89 6 62 90-99 7 78 100-109 8 69 50 74 51 73 31 D Frequency Note: Rows 9-49 are hidden. The FREQUENCY function is not a “simple” Excel function. FREQUENCY is capable of providing multiple values and for this reason it is called an array formula. An array formula must be entered in a special way. Entering the Necessary Array Formula Step 1 Select cells D2:D7 (where the frequencies will appear) Step 2 Type the following formula: =FREQUENCY(A2:A51,{59,69,79,89,99,109}) Step 3 Hold down the CTRL and SHIFT keys while pressing the ENTER key (Array formula will be entered in cells D2:D7) A B C Parts Parts Cost 1 Cost 2 91 50-59 3 71 60-69 4 104 70-79 5 85 80-89 6 62 90-99 7 78 100-109 8 69 50 74 51 73 D Frequency 2 13 17 7 6 5 Note: Rows 9-49 are hidden. EXAMPLE 6 32 CHAPTER 2 HISTOGRAM Refer to the auto parts cost data in Example 5. Display the frequency distribution (constructed in Example 5) graphically with a histogram using Excel. SOLUTION 6 USING EXCEL’S CHART WIZARD FOR HISTOGRAMS Enter Data: The data set was entered in Example 5. Enter Functions and Formulas: The functions and formulas for the frequencies we want to graph were entered in Example 5. Apply Tools: The following steps describe how to use Excel’s Chart Wizard to construct a Histogram. Step 1 Select cells C1:D7 Step 2 Select the Chart Wizard button Step 3 When the Chart Wizard-Step 1 of 4-Chart Type dialog box appears: Choose Column in the Chart type list Choose Clustered Column from the Chart sub-type display Select Next > Step 4 When the Chart Wizard-Step 2 of 4-Chart Source Data dialog box appears Select Next > Step 5 When the Chart Wizard-Step 3 of 4-Chart Options dialog box appears: Select the Titles tab and then Type Histogram for Parts Cost Data in the Chart title box Enter Parts Cost ($) in the Value (X) axis box Enter Frequency in the Value (Y) axis box Select the Legend tab and then Remove the check in the Show Legend box Select Next > Step 6 When the Chart Wizard-Step 4 of 4-Chart Location dialog box appears: Specify the location for the new chart Select Finish to display the histogram Initial Histogram TABULAR AND GRAPHICAL METHODS C E Histogram for Parts Cost Data 20 Frequency 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 D 15 10 5 0 50-59 60-69 70-79 80-89 90-99 Parts Cost ($) Eliminating Gaps Between Rectangles Step 1 Right click on any rectangle in the column chart Step 2 Select the Format Data Series option Step 3 When the Format Data Series Option dialog box appears: Select the Options tab and then Enter 0 in the Gap width box Select OK Finished Histogram 100-109 33 34 CHAPTER 2 C D E Histogram for Parts Cost Data 20 Frequency 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 15 10 5 0 50-59 60-69 70-79 80-89 90-99 100-109 Parts Cost ($) EXAMPLE 7 CUMULATIVE DISTRIBUTIONS Refer to the auto parts cost data in Example 5. Develop a cumulative frequency distribution and a cumulative percent frequency distribution for this data using Excel. SOLUTION 7 USING EXCEL FOR CUMULATIVE DISTRIBUTIONS Extending the worksheet shown in the solution to Example 5, we can construct the cumulative frequency and cumulative percent frequency distributions for the cost data. Enter Data: The data set was entered in Example 5. Enter Functions and Formulas: The functions and formulas for the frequencies we need were entered in Example 5. The first cumulative frequency (2) is simply the frequency for the first class, so we enter the formula =D2 in cell E2. The second cumulative frequency (15) is equal to the frequency of the second class (13) plus the prior cumulative frequency (2), so we enter the formula =D3+E2 in cell E3. Now, we copy cell E3 to cells E4:E7 to complete the cumulative distribution. To compute the cumulative percent frequency distribution, we must convert each cumulative frequency value to a cumulative relative frequency value and then multiply it by 100. We enter the formula =E2/$D$8*100 into cell F2 to compute the first cumulative percent frequency. We then copy cell F2 to cells F3:F7 to complete our distribution. Formula Worksheet 35 TABULAR AND GRAPHICAL METHODS 1 2 3 4 5 6 7 8 A B C D Parts Parts Cost Cost Frequency 91 $50-59 2 71 $60-69 13 104 $70-79 16 85 $80-89 7 62 $90-99 7 78 $100-109 5 69 Sum =SUM(D2:D7) E Cumulative Frequency =D2 =D3+E2 =D4+E3 =D5+E4 =D6+E5 =D7+E6 F Cumulative % Frequency =E2/$D$8*100 =E3/$D$8*100 =E4/$D$8*100 =E5/$D$8*100 =E6/$D$8*100 =E7/$D$8*100 G F Cumulative Frequency 2 15 31 38 45 50 G Cumulative % Frequency 4 30 62 76 90 100 G Note: Rows 9-51 are not shown. Value Worksheet 1 2 3 4 5 6 7 8 A B C Parts Parts Cost Cost 91 $50-59 71 $60-69 104 $70-79 85 $80-89 62 $90-99 78 $100-109 69 Sum D Frequency 2 13 16 7 7 5 50 Note: Rows 9-51 are not shown. EXAMPLE 8 OGIVE Refer to the auto parts cost data in Example 5. Construct an ogive, using Excel, showing the cumulative percent frequency distribution for the data. SOLUTION 8 USING EXCEL’S CHART WIZARD FOR OGIVES We will extend the worksheet shown in the solution to Example 7 to develop the ogive. We need to modify the worksheet in two ways before we can apply the Chart Wizard tool. We will “insert” (figuratively) a new row between existing rows 1 and 2, and we will insert (literally) a new column between existing columns C and D. The reason for a “new” row is that we need a starting point for the ogive showing that no 36 CHAPTER 2 data values fall below the 50-59 class. In other words, we need to create a 40-49 class with a cumulative percent frequency equal to 0. The reason for a new column is this: Because the class limits are 50-59, 60-69, and so on, there appear to be one-unit gaps from 59 to 60, 69 to 70, and so on. These gaps are eliminated by plotting points, on our ogive, at the midpoints of the gaps. Thus, 59.5 is used as the upper limit for the 50-59 class, 69.5 is used for the 60-69 class, and so on. We will enter these new upper limits in the new column. Enter Data: The data set was entered in Example 5. Enter Functions and Formulas: The formulas for the cumulative percent frequencies we need were entered in Example 7. Now, in order to enter the new 40-49 class in the worksheet we must free up a row. To do this, we drag cells C2:G8 down one row to cells C3:G9. This frees up cells C2:G2 for the new 40-49 class information. Next, we will insert a new column between existing columns C and D. To do this, select any cell in column D and then go to the main menu, select Insert, and select Columns in the drop-down menu. The worksheet will now look like this: 1 2 3 4 5 6 7 8 9 A B C Parts Parts Cost Cost 91 71 50-59 104 60-69 85 70-79 62 80-89 78 90-99 69 100-109 74 Sum D E Frequency 2 13 16 7 7 5 50 F Cumulative Frequency G Cumulative % Frequency 2 15 31 38 45 50 4 30 62 76 90 100 In the new column D, enter the label “Parts Cost Upper Limit” in cell D1 and enter the new upper limits 49.5, 59.5, and so on, in cells D2:D8. In cell C2 enter, in text format, 4049. In cells E2:G2 enter zeros. Finally, drag the Sum label in cell C9 to cell D9. The worksheet will now look like this: A B C D E F G Parts Parts Parts Cost Cumulative Cumulative 1 Cost Cost Upper Limit Frequency Frequency % Frequency 2 91 40-49 49.5 0 0 0 3 71 50-59 59.5 2 2 4 4 104 60-69 69.5 13 15 30 5 85 70-79 79.5 16 31 62 6 62 80-89 89.5 7 38 76 7 78 90-99 99.5 7 45 90 8 69 100-109 109.5 5 50 100 9 74 Sum 50 Apply Tools: The following steps describe how to use Excel’s Chart Wizard to construct an 37 TABULAR AND GRAPHICAL METHODS ogive. Step 1 Select cells D2:D8 and G2:G8 (To select nonadjacent cells, select cells D2:D8 and then press the Control key while selecting cells G2:G8.) Step 2 Select the Chart Wizard button Step 3 When the Chart Wizard-Step 1 of 4-Chart Type dialog box appears: Choose XY (Scatter) in the Chart type list Choose Scatter with data points connected by lines from the Chart sub-type display Select Next > Step 4 When the Chart Wizard-Step 2 of 4-Chart Source Data dialog box appears Select Next > Step 5 When the Chart Wizard-Step 3 of 4-Chart Options dialog box appears: Select the Titles tab and then Type Ogive for the Parts-Cost Data in the Chart title box Enter Parts Cost ($) in the Value (X) axis box Enter Cumulative Percent Frequency in the Value (Y) axis box Select the Legend tab and then Remove the check in the Show Legend box Select Next > Step 6 When the Chart Wizard-Step 4 of 4-Chart Location dialog box appears: Specify the location for the new chart Select Finish to display the ogive Ogive C E F G H Ogive for the Parts-Cost Data 100 90 80 Cumulative Percent Frequency 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 D EXAMPLE 9 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 Parts Cost ($) 80 90 100 110 120 38 CHAPTER 2 STEM-AND-LEAF DISPLAY Refer to the auto parts cost data in Example 5. a) Develop a stem-and-leaf display showing both the rank order and shape of the data set simultaneously. b) Also, develop a stretched stem-and-leaf display using two stems for each leading digit(s). c) Which of the two displays is better in terms of revealing the natural grouping and variation in the data? SOLUTION 9 a) To develop a stem-and-leaf display, we first arrange the leading digits (all but the last digit) of each data value to the left of a vertical line. To the right of the vertical line, we record the last digit for each data value as we pass through the observations in the order they are recorded. The last digit for each data value is placed on the line corresponding to its first digit. At this point, the display will look like this: 5 6 7 8 9 10 7 2 8 0 1 4 2 9 5 9 3 5 6 3 0 9 9 2 1 5 7 1 8 2 8 7 5 5 8 8 9 7 2 9 2 5 9 5 2 6 4 7 1 4 9 9 3 3 2 8 Next, we sort (in ascending order) the digits on each line. The result is the finished stem-and-leaf display. 5 6 7 8 9 10 2 2 1 0 1 1 7 2 1 0 3 4 2 2 2 7 5 2 2 3 7 5 5 3 5 8 9 6 7 8 8 8 9 9 9 3 4 4 5 5 5 6 7 8 9 9 9 8 9 9 b) If we believe that our stem-and-leaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s). All data values ending in 0, 1, 2, 3, and 4 are placed on one line, and all values ending in 5, 6, 7, 8, and 9 are placed on a second line. The resulting stretched stem-and-leaf display will look like this: TABULAR AND GRAPHICAL METHODS 5 5 6 6 7 7 8 8 9 9 10 10 2 7 2 5 1 5 0 5 1 7 1 5 2 6 1 5 0 8 3 7 4 5 2 7 2 5 2 9 39 2 8 8 8 9 9 9 2 3 3 4 4 6 7 8 9 9 9 3 8 9 9 c) The stretched stem-and-leaf display in (b) does a better job of revealing the dispersion of the data. EXAMPLE 10 CROSSTABULATION Ithaca Log Homes manufactures four styles of log houses that are sold in kits. The number of log houses the company has sold, by price and style, for the past year is shown below. <99 Colonial >100 A-Frame >100 Colonial <99 Ranch >100 Split-Level <99 Colonial >100 Split-Level <99 Colonial <99 A-Frame >100 Split-Level >100 Ranch >100 Split-Level <99 Colonial >100 Colonial >100 Ranch <99 A-Frame <99 A-Frame <99 Split-Level <99 Split-Level <99 Split-Level >100 Split-Level <99 A-Frame <99 Split-Level >100 Colonial >100 Ranch <99 Colonial >100 Ranch >100 Split-Level <99 Ranch >100 Split-Level <99 A-Frame >100 Split-Level <99 Colonial <99 Colonial >100 Colonial >100 Colonial >100 Ranch <99 Split-Level <99 Split-Level <99 Colonial Prepare a crosstabulation for the variables price and style. SOLUTION 10 40 CHAPTER 2 USING EXCEL’S PIVOTTABLE REPORT FOR CROSSTABULATION On a new worksheet, enter the data in columns A, B, and C. A B 1 Home Price ($) >100K 2 1 <99K 3 2 >100K 4 3 <99K 5 4 <99K 6 5 <99K 7 6 >100K 8 7 >100K 9 8 C Style Colonial Ranch Ranch A-Frame Colonial Split-Level A-Frame Colonial D E Note: Rows 10-41 are not shown. Changing the Default Order for the PivotTable Report Step 1 Select the Tools pull-down menu Step 2 Choose Options Step 3 When the Options dialog box appears: Select the Custom lists tab In the List entries: box, type <99K and press Enter, and type >100K Select Add Select OK We are now ready to use the PivotTable Report to construct a crosstabulation. Using the PivotTable Report Step 1 Select the Data pull-down menu Step 2 Choose the PivotTable and PivotChart Report Step 3 When the PivotTable and PivotChart Wizard-Step 1 of 3 dialog box appears: Choose Microsoft Excel list or database Choose PivotTable Select Next > Step 4 When the PivotTable and PivotChart Wizard Step 2 of 3 dialog box appears: Enter A1:C41 in the Range box Select Next > Step 5 When the PivotTable and PivotChart Wizard Step 3 of 3 dialog box appears: Select New Worksheet Click on the Layout button TABULAR AND GRAPHICAL METHODS 41 When the PivotTable and PivotChart Wizard – Layout diagram appears: Drag the Price ($) field button to the ROW section of the diagram Drag the Style field button to the COLUMN section of the diagram Drag the Home field button to the DATA section of the diagram Double click the Sum of Home field button in the data section When the PivotTable Field dialog box appears: Choose Count under Summarized by: Select OK Select OK When the PivotTable and PivotChart Wizard-Step 3 of 3 dialog box reappears: Select Finish > 1 2 3 4 5 6 E F G H Count of Home Style Price ($) Colonial Ranch Split-Level <99K 8 2 8 >100K 5 5 6 Grand Total 13 7 14 I J A-Frame Grand Total 5 23 1 17 6 40 EXAMPLE 11 SCATTER DIAGRAM The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. The following data was collected for five recent games. x = Number of y = Number of Interceptions Points Scored 1 14 3 24 2 18 1 17 3 27 Develop a scatter diagram to show the relationship between the two variables, number of interceptions and number of points scored. SOLUTION 11 USING EXCEL’S CHART WIZARD FOR SCATTER DIAGRAMS 42 CHAPTER 2 Enter Data: The appropriate labels and the data for the five football games are entered into cells A1:B6. 1 2 3 4 5 6 7 A Number of Interceptions 1 3 2 1 3 B Number of Points Scored 14 24 18 17 27 C Enter Functions and Formulas: No functions or formulas are needed. Apply Tools: The following steps describe how to use Excel’s Chart Wizard to produce a scatter diagram from the data in the worksheet. Step 1 Select cells A1:B6 Step 2 Select the Chart Wizard Step 3 When the Chart Wizard-Step 1 of 4-Chart Type dialog box appears: Choose XY (Scatter) in the Chart type list Choose Scatter from the Chart sub-type display Select Next > Step 4 When the Chart Wizard-Step 2 of 4-Chart Source Data dialog box appears Select Next > Step 5 When the Chart Wizard-Step 3 of 4-Chart Options dialog box appears: Select the Titles tab and then Delete Number of Points Scored in the Chart title box Enter Number of Interceptions in the Value (X) axis box Enter Number of Points Scored in the Value (Y) axis box Select the Legend tab and then Remove the check in the Show Legend box Select Next > Step 6 When the Chart Wizard-Step 4 of 4-Chart Location dialog box appears: Specify the location for the new chart Select Finish to display the scatter diagram TABULAR AND GRAPHICAL METHODS 43 Finished Scatter Diagram Scatter Diagram for the Panthers 30 25 Number of Points Scored 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 D C B A 20 15 10 5 0 0 3 2 1 4 Number of Interceptions EXERCISESs EXERCISE 1 FREQUENCY DISTRIBUTIONS – QUALITATIVE DATA It is time for Roger Hall, manager of new car sales at the Maxwell Ford dealership, to submit his order for new Mustang coupes. These cars will be parked in the lot, available for immediate sale to buyers who are not special-ordering a car. One of the decisions he must make is how many Mustangs of each color option he should order. The new color options are very similar to the past year’s options. Roger believes that the colors chosen by customers who special-order their cars best reflect most customers’ true color preferences. For that reason, he has taken a random sample of 40 special orders for Mustang coupes placed in the past year. The color preferences found in the sample are listed below. Blue Black Red Green Blue Black Red White Black Red Green White Blue Red Black White Blue White Black White Black Blue Red Blue Black Red Green Red Black Red Red Red Black White Black White Black Black Green Blue Prepare, using Excel, a frequency distribution, relative frequency distribution, and percent frequency distribution for the data set. 44 CHAPTER 2 EXERCISE 2 BAR GRAPH AND PIE CHART Refer to the data set in Exercise 1. Construct, using Excel, a bar graph showing the frequency distribution of the car colors. Also construct, using Excel, a pie chart showing the percent frequency distribution of the car colors. EXERCISE 3 FREQUENCY DISTRIBUTIONS – QUANTITATIVE DATA Missy Walters owns a mail-order business specializing in clothing, linens, and furniture for children. She is considering offering her customers a discount on shipping charges for furniture based on the dollar-amount of the furniture order. Before Missy decides the discount policy, she needs a better understanding of the dollar-amount distribution of the furniture orders she receives. Missy had an assistant randomly select 50 recent orders that included furniture and record the value, to the nearest dollar, of the furniture portion of each order. The data collected is listed below. 136 211 194 277 231 281 162 242 348 154 226 212 368 173 166 123 241 258 409 214 178 182 323 264 311 445 290 196 237 141 231 434 183 490 159 389 167 209 222 362 196 246 198 472 189 175 338 212 248 260 Prepare, using Excel, a frequency distribution, relative frequency distribution, and percent frequency distribution for the data set. Use your own judgment to determine the number of classes and class width that provide a distribution that will be meaningful and helpful to Missy in deciding the shipping discount policy. EXERCISE 4 HISTOGRAM Refer to the data set in Exercise 3. Construct, using Excel, a histogram showing the percent frequency distribution of the furniture-order values in the sample. EXERCISE 5 TABULAR AND GRAPHICAL METHODS 45 CUMULATIVE DISTRIBUTIONS AND OGIVE Refer to the data set in Exercise 3. Develop a cumulative frequency distribution and a cumulative percent frequency distribution for this data using Excel. Then construct, using Excel, an ogive showing the cumulative percent frequency distribution. EXERCISE 6 STEM-AND-LEAF DISPLAY Refer to the furniture-order values data in Example 3. Develop a stem-and-leaf display for the data set. (Employ the following tips to avoid having 38 stems and other shortcomings.) 1) Set the leaf unit equal to 10. In other words, convert the data to 10’s of dollars. For example, the data value 226 would be treated as 22. The last digit, 6, is ignored. 2) Try using a stretched stem-and-leaf display. That is, have two stems labeled 1, two stems labeled 2, and so on. Values 10-14 are entered on the first stem labeled 1, values 15-19 are entered on the second stem labeled 1, and so on. EXERCISE 7 CROSSTABULATION Tony Zamora, a real estate investor, has just moved to Clarksville and wants to learn about the local real estate market. He wants to understand, for example, the relationship between geographical segment of the city and selling price of a house, the relationship between selling price and number of bedrooms, and so on. Tony has randomly selected 25 house-for-sale listings from the Sunday newspaper and collected the data listed below. a) Use Excel to construct a crosstabulation for the variables city segment and number of bedrooms. b) Compute the row percentages for your crosstabulation in part (a). c) Comment on any apparent relationship between the variables. 46 CHAPTER 2 Segment of City Northwest South Northeast Northwest West South West West Northwest Northeast Northwest South Northwest West South South Northeast Northwest West Northeast West Northwest South Northeast West Selling Price ($000) 290 95 170 375 350 125 310 275 340 215 295 190 385 430 185 175 190 330 405 170 365 280 135 205 260 House Size Number of (00 sq. ft.) Bedrooms 21 4 11 2 19 3 38 5 24 4 10 2 31 4 25 3 27 5 22 4 20 4 24 4 36 5 32 5 14 3 18 4 19 4 29 4 33 5 23 4 34 5 25 4 17 3 21 4 26 4 Number of Bathrooms 2 1 2 4 3 2 4 2 3 3 3 3 4 4 2 2 2 4 4 2 4 2 1 3 3 Garage Size (cars) 2 0 2 3 2 0 2 2 3 2 2 2 3 2 1 2 2 3 3 2 3 2 1 2 2 EXERCISE 8 SCATTER DIAGRAM Refer to the real estate data in Exercise 7. Develop a scatter diagram to show the relationship between the two variables size of house and number of bathrooms. Place the variable number of bathrooms on the horizontal axis. TABULAR AND GRAPHICAL METHODS 47 SELF-TESTs TRUE/FALSE 1. The lines that connect the points plotted in an ogive cannot have a negative slope. 2. In a stem-and-leaf display, a single digit is used to define each leaf, while more than one digit can be used to define each stem. 3. At least one of the variables in a crosstabulation must be a quantitative variable. 4. There should be no gaps between adjacent bars in a histogram. 5. For a bar graph to be worthwhile, the data being displayed must involve at least two variables. FILL-IN-THE-BLANK 1. The techniques of _____________________ consist of simple arithmetic and easy-to-draw graphs that can be used to summarize data quickly. 2. A ______________________ is a tabular summary of data showing the number of items in each of several nonoverlapping classes. 3. The last entry in a ____________________________ distribution is always 1.00. 4. In general practice, cumulative frequency distributions are appropriate for summarizing _____________ data and not _______________ data. 5. Adjacent bars are touching in a bar graph and are not touching in a _____________. MULTIPLE CHOICE 1. Which one of the following graphical methods is most appropriate for qualitative data? a) ogive b) scatter diagram c) histogram d) pie chart 2. A graphical method that will assist in the understanding of the relationship between two variables is a a) crosstabulation b) scatter diagram c) stem-and-leaf display d) bar graph 3. A graphical method that can be used to show both the rank order and shape of a data set 48 CHAPTER 2 simultaneously is a a) relative frequency distribution b) pie chart c) stem-and-leaf display d) pivot table 4. The proper way to construct a stem-and-leaf display for the data set {62,67,68,73,73,79,91,94,95,97} is to a) exclude a stem labeled ‘8’ b) include a stem labeled ‘8’ and enter no leaves on the stem c) include a stem labeled ‘(8)’ and enter no leaves on the stem d) include a stem labeled ‘8’ and enter one leaf value of ‘0’ on the stem 5. The graphical method presented in the chapter for displaying cumulative frequencies is a) an ogive b) a stem-and-leaf display c) a histogram d) a bar graph ANSWERSs EXERCISES 1) 1 2 3 4 5 6 7 C D Color of Car Black Blue Green Red White Total Frequency 12 7 4 10 7 40 2) Bar Graph - Frequencies E Relative Frequency 0.300 0.175 0.100 0.250 0.175 1.000 F Percent Frequency 30.0 17.5 10.0 25.0 17.5 100.0 TABULAR AND GRAPHICAL METHODS Car Color Frequencies 14 Frequency 12 10 8 6 4 2 0 Black Blue Green Red White Car Color Pie Chart – Percent Frequencies Car Colors' Percent Frequencies Black 30.0% White 17.5% Red 25.0% Blue 17.5% Green 10.0% 3) 1 2 3 4 5 6 7 8 9 4) D Parts Cost 100-149 150-199 200-249 250-299 300-349 350-399 400-449 450-499 E Frequency 3 15 14 6 4 3 3 2 F Relative Frequency 0.06 0.30 0.28 0.12 0.08 0.06 0.06 0.04 G Cum. Rel. Frequency 0.06 0.36 0.64 0.76 0.84 0.90 0.96 1.00 49 50 CHAPTER 2 Percent Frequencies of Furniture-Order Values 35 Percent Frequency 30 25 20 15 10 5 0 100-149 150-199 200-249 250-299 300-349 350-399 Value of Furniture Order ($) 5) D Parts Cost 100-149 150-199 200-249 250-299 300-349 350-399 400-449 450-499 1 2 3 4 5 6 7 8 9 E Frequency 3 15 14 6 4 3 3 2 F Cumulative Frequency 3 18 32 38 42 45 48 50 6) 1 1 2 2 3 3 4 4 2 5 0 5 1 6 0 7 3 5 1 6 2 6 3 9 4 6 1 6 3 8 4 6 6 7 7 7 8 8 8 9 9 9 9 1 1 2 2 3 3 3 4 4 4 4 7 8 9 4 G Cumulative % Frequency 6 36 64 76 84 90 96 100 TABULAR AND GRAPHICAL METHODS 51 7) a) Crosstabulation: 1 2 3 4 5 6 7 D Count of Home Segment of City Northeast Northwest South West Grand Total E F G Number of Bedrooms 2 3 4 0 1 4 0 0 4 2 2 2 0 1 3 2 4 13 H I 5 0 3 0 3 6 Grand Total 5 7 6 7 25 E F G Number of Bedrooms 2 3 4 0.0 20.0 80.0 0.0 0.0 57.1 33.3 33.3 33.3 0.0 14.3 42.9 H I 5 0.0 42.9 0.0 42.9 Grand Total 100.0 100.0 100.0 100.1 b) Row Percentages: 1 2 3 4 5 6 D Percent of Home Segment of City Northeast Northwest South West c) We see that fewest bedrooms are associated with the South, and the most bedrooms are associated with the West and particularly the Northwest. 8) Scatter Diagram for the Real Estate Data 40 35 House Size (00 sq.ft.) 30 25 20 15 10 5 0 0 1 2 3 Number of Bathrooms TRUE/FALSE 4 5 52 1. 2. 3. 4. 5. CHAPTER 2 True True False True False FILL-IN-THE-BLANK 1. 2. 3. 4. 5. exploratory data analysis frequency distribution cumulative relative frequency quantitative, qualitative histogram MULTIPLE CHOICE 1. 2. 3. 4. 5. d b c b a