Slides by JOHN LOUCKS St. Edward’s University Slide 1 Chapter 2, Part A Descriptive Statistics: Tabular and Graphical Presentations Summarizing Categorical Data Summarizing Quantitative Data Slide 2 Summarizing Categorical Data Frequency Distribution Relative Frequency Distribution Percent Frequency Distribution Bar Chart Pie Chart Crosstabulation Slide 3 Frequency Distribution A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several non-overlapping classes. The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data. Slide 4 Frequency Distribution Example: Marada Inn Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 guests are: Below Average Above Average Above Average Average Above Average Average Above Average Average Above Average Below Average Poor Excellent Above Average Average Above Average Above Average Below Average Poor Above Average Average Slide 5 Frequency Distribution Example: Marada Inn Rating Frequency 2 Poor 3 Below Average 5 Average 9 Above Average 1 Excellent Total 20 Slide 6 Using Excel’s COUNTIF Function to Construct a Frequency Distribution 1 2 3 4 5 6 7 8 Excel Formula Worksheet A Quality Rating Above Average Below Average Above Average Average Average Above Average Above Average B C Quality Rating Poor Below Average Average Above Average Excellent Total D Frequency =COUNTIF($A$2:$A$21,C2) =COUNTIF($A$2:$A$21,C3) =COUNTIF($A$2:$A$21,C4) =COUNTIF($A$2:$A$21,C5) =COUNTIF($A$2:$A$21,C6) =SUM(D2:D6) Note: Rows 9-21 are not shown. Slide 7 Using Excel’s COUNTIF Function to Construct a Frequency Distribution 1 2 3 4 5 6 7 8 Excel Value Worksheet A Quality Rating Above Average Below Average Above Average Average Average Above Average Above Average B C Quality Rating Poor Below Average Average Above Average Excellent Total D Frequency 2 3 5 9 1 20 Note: Rows 9-21 are not shown. Slide 8 Relative Frequency Distribution The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class. A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class. Slide 9 Percent Frequency Distribution The percent frequency of a class is the relative frequency multiplied by 100. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class. Slide 10 Relative Frequency and Percent Frequency Distributions Example: Marada Inn Relative Frequency Rating .10 Poor .15 Below Average .25 Average .45 Above Average .05 Excellent Total 1.00 Percent Frequency 10 15 25 .10(100) = 10 45 5 100 1/20 = .05 Slide 11 Using Excel to Construct Relative Frequency and Percent Frequency Distributions 1 2 3 4 5 6 7 8 Excel Formula Worksheet C D Quality Rating Poor Below Average Average Above Average Excellent Total Frequency =COUNTIF($A$2:$A$21,C2) =COUNTIF($A$2:$A$21,C3) =COUNTIF($A$2:$A$21,C4) =COUNTIF($A$2:$A$21,C5) =COUNTIF($A$2:$A$21,C6) =SUM(D2:D6) E Relative Frequency =D2/$D$7 =D3/$D$7 =D4/$D$7 =D5/$D$7 =D6/$D$7 =SUM(E2:E6) F Percent Frequency =E2*100 =E3*100 =E4*100 =E5*100 =E6*100 =SUM(F2:F6) Note: Columns A-B and rows 9-21 and are not shown. Slide 12 Using Excel to Construct Relative Frequency and Percent Frequency Distributions 1 2 3 4 5 6 7 8 Excel Value Worksheet C D Quality Rating Poor Below Average Average Above Average Excellent Total Frequency 2 3 5 9 1 20 E Relative Frequency 0.10 0.15 0.25 0.45 0.05 1.00 F Percent Frequency 10 15 25 45 5 100 Note: Columns A-B and rows 9-21 and are not shown. Slide 13 Bar Chart (In Excel this is called a Column Chart) A bar chart is a graphical device for depicting qualitative data. On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes. A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical axis). Using a bar of fixed width drawn above each class label, we extend the height appropriately. The bars are separated to emphasize the fact that each class is a separate category. Slide 14 Bar Chart (In Excel this is called a Column Chart) Marada Inn Quality Ratings 10 9 Frequency 8 7 6 5 4 3 2 1 Poor Below Average Above Excellent Average Average Rating Slide 15 Using Excel’s Chart Tools to Construct a Bar Chart Step 1. Select cells C1:D6 Step 2. Click the Insert tab on the Ribbon Column Step 3. In the Charts group, click Step 4. When the list of column chart subtypes appears: Go to the 2-D Column section Click Clustered Column (the leftmost chart) Step 5. In the Chart Layouts group, click the More button (the downward pointing arrow with a line over it) to display all the options … continued Slide 16 Using Excel’s Chart Tools to Construct a Bar Chart Step 6. Choose Layout 9 Step 7. Click the Chart Title and replace it with Marada Inn Quality Ratings Step 8. Click the Horizontal Axis (Category) Title and replace it with Quality Rating Step 9. Click the Vertical Axis (Value) Title and replace it with Frequency Step 10. Right click the Series 1 Legend Entry and choose Delete from the list of options that appear … continued Slide 17 Using Excel’s Chart Tools to Construct a Bar Chart Step 11. Right click the vertical axis and choose Format Axis from the options that appear Step 12. When the Format Axis dialog box appears: Go to the Axis Options section Select Fixed for Major Unit and enter 2.0 in the corresponding box Click Close Slide 18 Using Excel’s Chart Tools to Construct a Bar Chart C E Marada Inn Quality Ratings 10 Frequency 9 10 11 12 13 14 15 16 17 18 19 20 21 D 8 6 4 2 0 Poor Below Average Average Above Average Excellent Quality Rating Slide 19 Pie Chart The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data. First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle. Slide 20 Pie Chart Marada Inn Quality Ratings Excellent 5% Above Average 45% Poor 10% Below Average 15% Average 25% Slide 21 Example: Marada Inn Insights Gained from the Preceding Pie Chart • One-half of the customers surveyed gave Marada a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might please the manager. • For each customer who gave an “excellent” rating, there were two customers who gave a “poor” rating (looking at the top of the pie). This should displease the manager. Slide 22 Using Excel’s Chart Tools to Construct a Pie Chart Excel’s chart tools can be used to develop a pie chart for the Marada quality rating data in much the same way we developed the bar chart. The major difference is that in step 3 we would choose Pie in the Charts group. Slide 23 Using Excel’s Chart Tools to Construct a Pie Chart C 9 10 11 12 13 14 15 16 17 18 19 20 E D Marada Inn Quality Ratings Excellent 5% Above Average 45% Poor 10% Below Average 15% Average 25% Slide 24 Excel’s PivotTable Report and PivotChart Report You have now seen how Excel’s COUNTIF function can be used to develop a frequency distribution and Excel’s Chart Tools can be used to create bar and pie charts. But there is a more powerful set of Excel tools that can be used for categorical data: • PivotTable report • PivotChart report Slide 25 Summarizing Quantitative Data Frequency Distribution Relative Frequency and Percent Frequency Distributions Dot Plot Histogram Cumulative Distributions Ogive Stem-Leaf Display Crosstabulation Scatter Diagram Slide 26 Frequency Distribution Example: Hudson Auto Repair The manager of Hudson Auto would like to gain a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide. Slide 27 Frequency Distribution Example: Hudson Auto Repair Sample of Parts Cost($) for 50 Tune-ups 91 71 104 85 62 78 69 74 97 82 93 72 62 88 98 57 89 68 68 101 75 66 97 83 79 52 75 105 68 105 99 79 77 71 79 80 75 65 69 69 97 72 80 67 62 62 76 109 74 73 Slide 28 Frequency Distribution Guidelines for Selecting Number of Classes • Use between 5 and 20 classes. • Data sets with a larger number of elements usually require a larger number of classes. • Smaller data sets usually require fewer classes. Slide 29 Frequency Distribution Guidelines for Selecting Width of Classes •Use classes of equal width. •Approximate Class Width = Largest Data Value Smallest Data Value Number of Classes Slide 30 Frequency Distribution Example: Hudson Auto Repair If we choose six classes: Approximate Class Width = (109 - 52)/6 = 9.5 10 Parts Cost ($) Frequency 50-59 2 60-69 13 70-79 16 80-89 7 90-99 7 100-109 5 Total 50 Slide 31 Using Excel’s PivotTable Report to Construct a Frequency Distribution Step 1 Click the Insert tab on the Ribbon Step 2 In the Tables group, click the icon above the word PivotTable Step 3 When the Create PivotTable dialog box appears: Choose Select a table or range Enter A1:A51 in the Table/Range box Choose Existing Worksheet as the location for the PivotTable Enter C1 in the Location box Click OK … continued Slide 32 Using Excel’s PivotTable Report to Construct a Frequency Distribution Step 4 In the PivotTable Field List, go to Choose Fields to add to report: Drag the Parts Cost field to the Row Labels area Drag the Parts Cost field to the Values area Step 5 Click on Sum of Parts Cost in the Values area Step 6 Click Value Field Settings from the list of options that appear Step 7 When the Value Field Settings dialog box appears: Under Summarize value field by, choose Count Click OK Slide 33 Using Excel’s PivotTable Report to Construct a Frequency Distribution To construct the frequency distribution, we must group the rows containing parts costs. Step 1 Right click any cell in the PivotTable report containing a parts cost. Step 2 Choose Group from the list of options that appears Step 3 When the Grouping dialog box appears: Enter 50 in the Starting at box Enter 109 in the Ending at box Enter 10 in the By box Click OK Slide 34 Using Excel’s PivotTable Report to Construct a Frequency Distribution Excel Value Worksheet A 1 Parts Cost 2 91 3 71 4 104 5 85 6 62 7 78 8 69 B C D Parts Cost 50-59 60-69 70-79 80-89 90-99 100-109 Grand Total Count of Parts Cost 2 13 16 7 7 5 50 Note: Rows 9-51 are not shown. Slide 35 Relative Frequency and Percent Frequency Distributions Example: Hudson Auto Repair Parts Relative Percent Cost ($) Frequency Frequency 50-59 .04 4 60-69 .26 2/50 26 .04(100) 70-79 .32 32 80-89 .14 14 90-99 .14 14 100-109 .10 10 Total 1.00 100 Slide 36 Relative Frequency and Percent Frequency Distributions Example: Hudson Auto Repair Insights Gained from the % Frequency Distribution: • • • • Only 4% of the parts costs are in the $50-59 class. 30% of the parts costs are under $70. The greatest percentage (32% or almost one-third) of the parts costs are in the $70-79 class. 10% of the parts costs are $100 or more. Slide 37 Dot Plot One of the simplest graphical summaries of data is a dot plot. A horizontal axis shows the range of data values. Then each data value is represented by a dot placed above the axis. Slide 38 Dot Plot Example: Hudson Auto Repair Tune-up Parts Cost 50 60 70 80 90 100 110 Cost ($) Slide 39 Histogram Another common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the horizontal axis. A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency. Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes. Slide 40 Histogram Example: Hudson Auto Repair 18 Tune-up Parts Cost 16 14 Frequency 12 10 8 6 4 2 Parts 5059 6069 7079 8089 9099 100-110 Cost ($) Slide 41 Using Excel’s Chart Tools to Construct a Histogram Step 1. Step 2. Step 3. Step 4. Select cells C2:D7 Click the Insert tab on the Ribbon In the Charts group, click Column When the list of column chart subtypes appears: Go to the 2-D Column section Click Clustered Column (the leftmost chart) Step 5. In the Chart Layouts group, click the More button (the downward pointing arrow with a line over it) to display all the options … continued Slide 42 Using Excel’s Chart Tools to Construct a Histogram Step 6. Choose Layout 8 Step 7. Select the Chart Title and replace it with Tune-up Parts Cost Step 8. Select the Horizontal (Category) Axis Title and replace it with Parts Cost ($) Step 9. Select the Vertical (Value) Axis Title and replace it with Frequency Slide 43 Using Excel’s Chart Tools to Construct a Histogram C E Tune-up Parts Cost 20 Frequency 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 D 15 10 5 0 50-59 60-69 70-79 80-89 90-99 100-109 Parts Cost ($) Slide 44 Histogram Symmetric • Left tail is the mirror image of the right tail • Examples: heights and weights of people .35 Relative Frequency .30 .25 .20 .15 .10 .05 0 Slide 45 Histogram Moderately Skewed Left • A longer tail to the left • Example: exam scores .35 Relative Frequency .30 .25 .20 .15 .10 .05 0 Slide 46 Histogram Moderately Right Skewed • A Longer tail to the right • Example: housing values .35 Relative Frequency .30 .25 .20 .15 .10 .05 0 Slide 47 Histogram Highly Skewed Right • A very long tail to the right • Example: executive salaries .35 Relative Frequency .30 .25 .20 .15 .10 .05 0 Slide 48 Cumulative Distributions Cumulative frequency distribution shows the number of items with values less than or equal to the upper limit of each class.. Cumulative relative frequency distribution – shows the proportion of items with values less than or equal to the upper limit of each class. Cumulative percent frequency distribution – shows the percentage of items with values less than or equal to the upper limit of each class. Slide 49 Cumulative Distributions Hudson Auto Repair Cost ($) < 59 < 69 < 79 < 89 < 99 < 109 Cumulative Cumulative Cumulative Relative Percent Frequency Frequency Frequency 2 .04 4 15 .30 30 31 2 + 13 .62 15/50 62 .30(100) 38 .76 76 45 .90 90 50 1.00 100 Slide 50 Ogive An ogive is a graph of a cumulative distribution. The data values are shown on the horizontal axis. Shown on the vertical axis are the: • cumulative frequencies, or • cumulative relative frequencies, or • cumulative percent frequencies The frequency (one of the above) of each class is plotted as a point. The plotted points are connected by straight lines. Slide 51 Ogive Hudson Auto Repair • Because the class limits for the parts-cost data are 50-59, 60-69, and so on, there appear to be one-unit gaps from 59 to 60, 69 to 70, and so on. • These gaps are eliminated by plotting points halfway between the class limits. • Thus, 59.5 is used for the 50-59 class, 69.5 is used for the 60-69 class, and so on. Slide 52 Ogive with Cumulative Percent Frequencies Example: Hudson Auto Repair Tune-up Parts Cost Cumulative Percent Frequency 100 80 60 (89.5, 76) 40 20 50 60 70 80 90 100 110 Parts Cost ($) Slide 53 Using Excel’s PivotChart Report You have now seen how Excel’s PivotTable report can be used to construct a frequency distribution for quantitative data and how Excel’s Chart tools can be used to construct the corresponding histogram. However, Excel’s PivotChart report can be used to develop a frequency distribution and a graphical display at the same time. Slide 54 Using Excel’s PivotChart Report Step 1. Click the Insert tab on the Ribbon Step 2. In the Tables group, click the word PivotTable Step 3. Choose PivotChart from the options that appear Step 4. When the Create PivotTable with PivotChart dialog box appears: Choose Select a table or range Enter A1:A51 in the Table/Range box Choose Existing Worksheet as the location for the PivotTable and PivotChart Enter C1 in the Location box Click OK … continued Slide 55 Using Excel’s PivotChart Report Step 5. In the PivotTable Field List, go to Choose Fields to add to report Drag the Parts Cost field to the Axis Fields (Categories) area Drag the Parts Cost field to the Values area Step 6. Click Sum of Parts Cost in the Values area Step 7. Click Value Field Settings from the list of options that appear Step 8. When the Value Field Settings dialog appears: Under Summarize value field by, choose Count Click OK … continued Slide 56 Using Excel’s PivotChart Report Step 9. Right click cell C2 n the PivotTable report or any other cell containing a parts cost Step 10. Choose Group from the list of options Step 11. When the Grouping dialog box appears: Enter ___ in the Starting at box Enter ___ in the Ending at box Click OK Step 12. Click inside the resulting PivotChart Step 13. Click the Design tab on the Ribbon … continued Slide 57 Using Excel’s PivotChart Report Step 14. In the Chart Layouts group, click the More button (the downward pointing arrow with a line over it) to display all the options Step 15. Choose Layout 8 Step 16. Select the Chart Title and replace it with Tune-up Parts Costs Step 17. Select the Horizontal Axis (Category) Title and replace it with Parts Cost ($) Step 18. Select the Vertical (Value) Axis Title and replace it with Frequency Slide 58 End of Chapter 2, Part A Slide 59 Chapter 2, Part B Descriptive Statistics: Tabular and Graphical Presentations Exploratory Data Analysis: Stem-and-Leaf Display Crosstabulations and Scatter Diagrams Slide 60 Exploratory Data Analysis The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw pictures that can be used to summarize data quickly. One such technique is the stem-and-leaf display. Slide 61 Stem-and-Leaf Display A stem-and-leaf display shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in rank order. Each line in the display is referred to as a stem. Each digit on a stem is a leaf. Slide 62 Example: Hudson Auto Repair The manager of Hudson Auto would like to gain a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide. Slide 63 Stem-and-Leaf Display Example: Hudson Auto Repair Sample of Parts Cost ($) for 50 Tune-ups 91 71 104 85 62 78 69 74 97 82 93 72 62 88 98 57 89 68 68 101 75 66 97 83 79 52 75 105 68 105 99 79 77 71 79 80 75 65 69 69 97 72 80 67 62 62 76 109 74 73 Slide 64 Stem-and-Leaf Display Example: Hudson Auto Repair 5 6 7 8 9 10 a stem 2 2 1 0 1 1 7 2 1 0 3 4 2 2 2 7 5 2 2 3 7 5 5 3 5 7 9 6 4 8 8 7 8 8 8 9 9 9 4 5 5 5 6 7 8 9 9 9 9 9 a leaf Slide 65 Stretched Stem-and-Leaf Display If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s). Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 4, and the second value corresponds to leaf values of 5 9. Slide 66 Stretched Stem-and-Leaf Display Example: Hudson Auto Repair 5 5 6 6 7 7 8 8 9 9 10 10 2 7 2 5 1 5 0 5 1 7 1 5 2 6 1 5 0 8 3 7 4 5 2 7 2 5 2 9 2 8 8 8 9 9 9 2 3 4 4 6 7 8 9 9 9 3 7 8 9 9 Slide 67 Stem-and-Leaf Display Leaf Units • A single digit is used to define each leaf. • In the preceding example, the leaf unit was 1. • Leaf units may be 100, 10, 1, 0.1, and so on. • Where the leaf unit is not shown, it is assumed to equal 1. Slide 68 Example: Leaf Unit = 0.1 If we have data with values such as 8.6 11.7 9.4 9.1 10.2 11.0 8.8 a stem-and-leaf display of these data will be Leaf Unit = 0.1 8 6 8 9 1 4 10 2 11 0 7 Slide 69 Example: Leaf Unit = 10 If we have data with values such as 1806 1717 1974 1791 1682 1910 1838 a stem-and-leaf display of these data will be Leaf Unit = 10 16 8 17 1 9 18 0 3 19 1 7 The 82 in 1682 is rounded down to 80 and is represented as an 8. Slide 70 Crosstabulations and Scatter Diagrams Thus far we have focused on methods that are used to summarize the data for one variable at a time. Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. Crosstabulation and a scatter diagram are two methods for summarizing the data for two variables simultaneously. Slide 71 Crosstabulation A crosstabulation is a tabular summary of data for two variables and helps to reveal the relationship between the two variables. Crosstabulation can be used when: • One variable is Qualitative and the other is Categorical, • Both variables are Qualitative, or • Both variables are Categorical. The left and top margin labels define the classes for the two variables. Slide 72 Crosstabulation Example: Finger Lakes Homes The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitative categorical variable variable Home Style Price Colonial Log Split A-Frame Total Range < $99,000 > $99,000 Total 18 12 6 14 19 16 12 3 55 30 20 35 15 100 45 Slide 73 Crosstabulation Example: Finger Lakes Homes Insights Gained from Preceding Crosstabulation • The greatest number of homes (19) in the sample are a split-level style and priced at less than or equal to $99,000. • Only three homes in the sample are an A-Frame style and priced at more than $99,000. Slide 74 Crosstabulation Example: Finger Lakes Homes Frequency distribution for the price range variable Home Style Log Split A-Frame Price Range Colonial < $99,000 > $99,000 18 12 6 14 19 16 12 3 55 30 20 35 15 100 Total Total 45 Frequency distribution for the home style variable Slide 75 Using Excel’s PivotTable Report to Create a Crosstabulation Excel Worksheet (showing partial data) 1 2 3 4 5 6 7 8 9 A B Home Price ($) 1 >99K 2 <=99K 3 >99K 4 <=99K 5 <=99K 6 <=99K 7 >99K 8 >99K C Style Colonial Log Log A-Frame Colonial Split-Level A-Frame Colonial D E Note: Rows 10-101 are not shown. Slide 76 Using Excel’s PivotTable Report to Create a Crosstabulation Displaying the Initial PivotTable Field List and PivotTable Report Step 1 Click the Insert tab on the Ribbon Step 2 In the Tables group, click the icon above the word PivotTable Step 3 When the Create PivotTable dialog box appears: Choose Select a Table or Range Enter A1:C101 in the Table/Range box Choose New Worksheet as the location for the PivotTable Report Click OK Slide 77 Using Excel’s PivotTable Report to Create a Crosstabulation Setting Up the PivotTable Field List Step 1 In the PivotTable Field List, go to Choose Fields to add to report Drag the Price ($) field to Row Labels area Drag the Style field to Column Labels area Drag the Home field to the Values area Step 2 Click on Sum of Home in the Values area Step 3 Click Value Field Settings from the list of options Step 4 When the Value Field Settings dialog box appears: Under Summarize value field by, choose Count Choose New Worksheet as the location for Click OK Slide 78 Using Excel’s PivotTable Report to Create a Crosstabulation Value Worksheet A 1 2 3 4 5 6 B C Count of Home Style Price ($) Colonial <=99K 18 >99K 12 Grand Total 30 D E F G Log Split-Level A-Frame Grand Total 6 19 12 55 14 16 3 45 20 35 15 100 Slide 79 Crosstabulation: Row or Column Percentages Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables. Slide 80 Crosstabulation: Row Percentages Example: Finger Lakes Homes Price Range Colonial < $99,000 > $99,000 32.73 26.67 Home Style Log Split A-Frame 10.91 31.11 34.55 35.56 21.82 6.67 Total 100 100 Note: row totals are actually 100.01 due to rounding. (Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100 Slide 81 Crosstabulation: Column Percentages Example: Finger Lakes Homes Price Range Colonial < $99,000 > $99,000 60.00 40.00 30.00 70.00 54.29 45.71 80.00 20.00 Total 100 100 100 100 Home Style Log Split A-Frame (Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100 Slide 82 Crosstabulation: Simpson’s Paradox Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation. We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation. Simpson’ Paradox: In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data. Before drawing conclusions about relationships between two variables (for aggregated data), you must investigate whether any hidden variables could affect the results. Slide 83 Scatter Diagram and Trendline A scatter diagram is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. The general pattern of the plotted points suggests the overall relationship between the variables. A trendline is an approximation of the relationship. Slide 84 Scatter Diagram A Positive Relationship y x Slide 85 Scatter Diagram A Negative Relationship y x Slide 86 Scatter Diagram No Apparent Relationship y x Slide 87 Scatter Diagram Example: Panthers Football Team The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions 1 3 2 1 3 y = Number of Points Scored 14 24 18 17 30 Slide 88 Scatter Diagram Number of Points Scored y 35 30 25 20 15 10 5 0 0 1 x 2 3 4 Number of Interceptions Slide 89 Example: Panthers Football Team Insights Gained from the Preceding Scatter Diagram • The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. • Higher points scored are associated with a higher number of interceptions. • The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line. Slide 90 Using Excel’s Chart Wizard to Construct a Scatter Diagram and Trendline Excel Worksheet (showing data) 1 2 3 4 5 6 7 A Number of Interceptions 1 3 2 1 3 B Number of Points Scored 14 24 18 17 30 C Slide 91 Using Excel’s Chart Tools to Construct a Scatter Diagram and Trendline Step 1 Select cells A2:B6 Step 2 Click the Insert tab on the Excel Ribbon Step 3 In the Charts group, click Scatter Step 4 When the list of scatter diagram subtypes appears: Click Scatter with only Markers Step 5 In the Chart Layout group, click Layout 1 Step 6 Select the Chart Title and replace it with Scatter Diagram for the Panthers Step 7 Select the Horizontal Axis (Value) Title and replace it with Number of Interceptions . . . continue Slide 92 Using Excel’s Chart Tools to Construct a Scatter Diagram and Trendline Step 8 Select the Vertical (Value) Axis Title and replace it with Number of Points Scored Step 9 Right click Series 1 Legend Entry and click Delete - - - - - - - - - - - - - - - - To Add a Trendline - - - - - - - - - - - - - - - Step 10 Position the pointer over any data point in the scatter diagram and right-click to display options Step 11 Choose Add Trendline Step 12 When the Format Trendline dialog box appears: Select Trendline Options Choose Linear from Trend/Regression Type list Click Close Slide 93 Using Excel’s Chart Tools to Construct a Scatter Diagram and Trendline A C Scatter Diagram for the Panthers 35 30 Number of Points Scored. 8 9 10 11 12 13 14 15 16 17 18 19 20 B 25 20 15 10 5 0 0 1 2 3 Num ber of Interceptions 4 Slide 94 Tabular and Graphical Methods Data Categorical Data Tabular Methods • Frequency Distribution • Rel. Freq. Dist. • Percent Freq. Distribution • Crosstabulation Quantitative Data Graphical Methods Tabular Methods • Bar Graph • Pie Chart • Frequency Distribution • Rel. Freq. Dist. • % Freq. Dist. • Cum. Freq. Dist. • Cum. Rel. Freq. Distribution • Cum. % Freq. Distribution • Crosstabulation Graphical Methods • Dot Plot • Histogram • Ogive • Stem-andLeaf Display • Scatter Diagram Slide 95 End of Chapter 2, Part B Slide 96