y •Exploratory data analysis •Cross tabulations and scatter diagrams x Exploratory data analysis consists of simple arithmetic and easy-to-draw graphs that can be used to summarize data quickly The Stem and Leaf Display • A stem-and-leaf display shows both the rank order and shape of the distribution of the data. • It is similar to a histogram on its side, but it has the advantage of showing the actual data values. •The first digits of each data item are arranged to the left of a vertical line. •To the right of the vertical line we record the last digit for each item in rank order. Example: Hudson Auto Repair The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide. Stretched Stem and Leaf •If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s). •Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 - 4, and the second value corresponds to leaf values of 5 - 9. Sample parts cost for 50 tune-ups 91 71 104 85 62 78 69 74 97 82 93 72 62 88 98 57 89 68 68 101 75 66 97 83 79 52 75 105 68 105 99 79 77 71 79 80 75 65 69 69 97 72 80 67 62 62 76 109 74 73 A Stem and Leaf Display for the Auto Parts Cost data Stem 5 27 6 222256788899 7 1122344555678999 8 0023589 9 1377789 10 1 4 5 5 9 Leaf Stretched Stem and Leaf for Hudson Auto parts data 5 5 6 6 7 7 8 8 9 9 10 10 2 7 2222 5 1 5 0 5 1 7 1 5 6 1 5 0 8 3 7 4 5 7 2 5 2 9 88899 2344 678999 3 789 9 Leaf Units A single digit is used to define each leaf. In the preceding example, the leaf unit was 1. But it does not have to be 1. The leaf unit can be 0.1, 10, or 100. Example: Leaf unit = .1 Suppose we have the following data: 8.6 11.7 9.4 10.2 11.0 8.8 The leaf unit is .1. Thus: 8 68 9 10 4 2 11 07 Example: Leaf Unit = 10 If we have data with values such as 1806 1717 1974 1791 1682 1910 1838 a stem-and-leaf display of these data will be Leaf Unit = 10 16 8 17 1 9 18 0 3 19 1 7 The 82 in 1682 is rounded down to 80 and is represented as an 8. Crosstabulations and Scatter Diagrams So far we have considered only ONE variable (parts cost, audit time). But often we are interested in tabular and graphical data that uncover the relationship between TWO variables. Crosstabulations A tabular method for summarizing the data for two variables simultaneously Crosstabulations can be used when • one variable is qualitative and the other is quantitative, • both variables are qualitative, or • both variables are quantitative. Example: Finger Lakes Homes Crosstabulation The number of Finger Lakes homes sold for each style and price for the past two years is shown below. qualitative variable quantitative variable Home Style Log Split A-Frame Price Range Colonial < $99,000 > $99,000 18 12 6 14 19 16 12 3 55 30 20 35 15 100 Total Total 45 Crosstabulation: Row or Column Percentages • Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables. Crosstabulation: Row Percentages Price Range Colonial < $99,000 > $99,000 32.73 26.67 Home Style Log Split A-Frame 10.91 31.11 34.55 35.56 21.82 6.67 Total 100 100 Note: row totals are actually 100.01 due to rounding. (Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100 Crosstabulation: Column Percentages Price Range Colonial < $99,000 > $99,000 60.00 40.00 30.00 70.00 54.29 45.71 80.00 20.00 Total 100 100 100 100 Home Style Log Split A-Frame (Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100 Using Excel’s PivotTable Report to Construct a Crosstabulation Step 1: Click on the Insert tab on the ribbon Step 2: In the Tables group, click the icon above PivotTable Step 3 When the Create Pivot Table dialog box appears: Choose Select a table or range Enter A1:C301 in the Table/Range box Select New Worksheet Click OK Chapter 2 file Restaurant.xlsx Using the Pivot table Field List • Step 1: In the PivotTable Field List, go to Choose Fields to add to report: – Drag the Quality Rating Field to the Row Labels area. – Drag the ($)Meal Price field to the Column Labels area. – Drag the Restaurant field to the Values area. • Step 2: Click Sum of Restaurant in the Values area – Select Value Field Settings. • Step 3: When the Value Field Settings dialog box appears: – Under Summarize value field by, choose Count – Click OK Finalizing the PivtotTable Report • Step 1: Right-click in cell B4 (or any other cell containing meal prices) – Select Group • Step 2: When the Grouping dialog box appears: – Enter 10 in the Starting at box – Enter 49 in the Ending at box – Enter 10 in the By box • Step 3: Right-click on Excellent in cell A5 – Choose Move – Select Move “Excellent” to End • Step 4: Close the PivotTable Field List box Crosstabulation for the LA Restaurant Example Meal Price ($) Quality Rating 10-19 20-29 Good 42 40 2 Very Good 34 64 46 6 150 Excellent 2 14 28 22 66 Grand Total 78 118 76 28 300 Chapter 2 file Restaurant.xlsx 30-39 40-49 Grand Total 84 Crosstabulation: Simpson’s Paradox Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation. We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation. Simpson’ Paradox: In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data. Judge Verdict Kendall Luckett Total Upheld Reversed 129 (86%) 21 (14%) 110 (88%) 15 (12%) 239 36 Total (%) 150 (100%) 125 (100%) 275 You might think Luckett is the better Judge. However, a larger share of Kendall’s cases were in municipal court—where the likelihood of being overturned on appeal is higher. Scatter Diagram and Trendline A scatter diagram is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. The general pattern of the plotted points suggests the overall relationship between the variables. A trendline is an approximation of the relationship. A Positive Relationship Y 0 X A Negative Relationship Y 0 X No Apparent Relationship Y 0 X Example: Panthers Football Team • Scatter Diagram The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions 1 3 2 1 3 y = Number of Points Scored 14 24 18 17 30 Scatter Diagram Number of Points Scored y 35 30 25 20 15 10 5 0 0 1 x 2 3 Number of Interceptions 4 Example: Panthers Football Team Insights Gained from the Preceding Scatter Diagram • The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. • Higher points scored are associated with a higher number of interceptions. • The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line. Using Excel’s Chart Wizard to Construct a Scatter Diagram and Trendline Formula Worksheet (showing data entered) 1 2 3 4 5 6 7 A Number of Interceptions 1 3 2 1 3 B Number of Points Scored 14 24 18 17 30 C Using Excel’s Chart Wizard to Construct a Scatter Diagram Step 1 Select cells A1:B6 Step 2 Click the Chart Wizard button on standard toolbar Step 3 When the Chart Wizard - Step 1 of 4 - Chart Type dialog box appears: Choose XY (Scatter) in the Chart Type list Choose Scatter from the Chart subtype display Click Next > . . . continue Using Excel’s Chart Wizard to Construct a Scatter Diagram Step 4 When the Chart Wizard - Step 2 of 4 - Chart Source Data dialog box appears: Click Next > . . . continue Using Excel’s Chart Wizard to Construct a Scatter Diagram Step 5 When the Chart Wizard - Step 3 of 4 – Chart Options dialog box appears: Select the Titles tab and then Type Scatter Diagram for the Panthers in the Chart title: box Type Number of Interceptions in the Value (X) axis: box Type Number of Points Scored in the Value (Y) axis: box . . . continue Using Excel’s Chart Wizard to Construct a Scatter Diagram Step 5 (continued) Select the Legend tab and then Remove the check in the Show Legend box Click Next > Step 6 When the Chart Wizard – Step 4 of 4 - Chart Location dialog box appears: Specify a location for the new chart Click Finish Using Excel’s Chart Wizard to Construct a Scatter Diagram A C Scatter Diagram for the Panthers 35 30 Num ber of Points Scored. 8 9 10 11 12 13 14 15 16 17 18 19 20 B 25 20 15 10 5 0 0 1 2 3 Num ber of Interceptions 4 Using Excel’s Chart Wizard to Construct a Scatter Diagram and Trendline Adding a Trendline Step 1 Position the mouse pointer over any data point in the scatter diagram and right click Step 2 Choose the Add Trendline option Step 3 When the Add Trendline dialog box appears: Select the Type tab and then Choose Linear from the Trend/ Regression type display Click OK Using Excel’s Chart Wizard to Construct a Scatter Diagram and Trendline A C Scatter Diagram for the Panthers 35 30 Num ber of Points Scored. 8 9 10 11 12 13 14 15 16 17 18 19 20 B 25 20 15 10 5 0 0 1 2 3 Num ber of Interceptions 4 Scatter Diagram for the Stereo and Sound Equipment Store Example Sales Voleum Scatter Diagram for Stereo and Sound Equipment Store 70 60 50 40 30 20 10 0 0 1 2 3 Commercials 4 5 6 Scatter Diagram for the Stereo and Sound Equipment Store Example—with a Trendline Sales Voleum Scatter Diagram for Stereo and Sound Equipment Store 70 60 50 40 30 20 10 0 0 1 2 3 Commercials 4 5 6